Please go ahead. If you can share your jira ID, we can add you to
contributors to get the ticket assigned to you

On Mon, Nov 11, 2019 at 1:14 PM Scheller, Brandon
<bsche...@amazon.com.invalid> wrote:

> Yep, you are correct that it is throwing the exception because of the
> DataSourceUtils.getNestedFieldValAsString.
> I can take up the work to fix this behavior if it is not intended. I'd
> also like to add extra error messaging and validation because currently it
> is not clear to users what the error is when a record_key is empty. It just
> throws the following and requires trace debugging to root cause:
> 19/11/11 21:09:36 ERROR HoodieSparkSqlWriter$: insert failed with 1 errors
> :
>
> -Brandon
>
> On 11/10/19, 11:24 PM, "Jaimin Shah" <shahjaimin0...@gmail.com> wrote:
>
>     Hi Brandon,
>       I contributed to complex ComplexKeyGenerator sometime back. I don't
> think
>     it is intended behavior. If you are getting exception is it because of
>     DataSourceUtils.getNestedFieldValAsString(record, recordKeyField) ? I
> can't
>     think of any other reason why it should throw exception.
>      I think when val is null DataSourceUtils.getNestedFieldValAsString
> throw
>     error due to
>     if (!(val instanceof GenericRecord)) {
>     throw new HoodieException("Cannot find a record at part value :" +
> part);
>     }
>     Maybe Balaji can confirm it.
>
>     Thanks,
>     Jaimin
>
>     On Sun, 10 Nov 2019 at 03:40, Scheller, Brandon
> <bsche...@amazon.com.invalid>
>     wrote:
>
>     > Thanks for the quick response Balaji!
>     >
>     > I think there is a lot here to continue with:
>     > 1. I did see that recent pull request for the delete API. I think
>     > collaborating to support another delete API with just record key
> would be a
>     > great next step. I'll begin looking into it. Additionally, the
> scenario of
>     > using Hbase as the global index is definitely something which we'd be
>     > interested in understanding further.
>     > 2. Actually I was speaking to the case of ComplexKeyGenerator.
> Currently
>     > if any single component of it is null, it will throw an exception.
> If this
>     > is not intended behavior, I'd be happy to fix this bug as it looks
> to solve
>     > our use case.
>     > 3. Thanks for the update on this. The spark upgrade is definitely a
> large
>     > undertaking that I'd be happy to help with.
>     >
>     > Thanks again,
>     > Brandon
>     >
>     > On 11/8/19, 3:52 PM, "Balaji Varadarajan" <varadar...@gmail.com>
> wrote:
>     >
>     >     Brandon,
>     >
>     >     Great initiative and thoughts. Thanks for writing detailed
> description
>     > on
>     >     what you are looking to achieve.
>     >
>     >     Here are some of my  comments/thoughts:
>     >
>     >        1. HUDI-326 : There is some work that is happening in this
>     > direction.
>     >        But, we should be able to collaborate on this. Siva has
> opened a PR
>     > (
>     >        https://github.com/apache/incubator-hudi/pull/1004) to
> support
>     > delete
>     >        using only HoodieKey (partitionPath, recordKey). Technically,
> we can
>     >        support an interface for delete with only recordKeys if the
> index
>     > is of
>     >        type global (Current implementation supports
>     > HoodieGlobalBloomIndex).
>     >        Within Uber, we use Hbase as the global Hudi index to support
>     > partition
>     >        agnostic record-key lookups. In other words, we can have 2
> flavors
>     > of
>     >        delete APIs - one with input being RDD<HoodieKeys> (works for
> all
>     > index
>     >        types) and another with input RDD<RecordKey> that works with
> global
>     >        index. Our vision is to support an external clustered index
>     > (global) as the
>     >        de-facto index that resides in DFS along with dataset.
>     >        2. HUDI-327 :  IIUC, Just like ComplexKeyGenerator, the new
> key
>     >        generator would need composite keys (in this case primary and
>     > secondary for
>     >        breaking the "null" tie ). Are you concerned about the
> record-key
>     > footprint
>     >        for each key when using the key generated by
> ComplexKeyGenerator?
>     > In that
>     >        case, makes sense to me. Otherwise, ComplexKeyGenerator
> should be
>     > able to
>     >        handle cases when some component of it is null. right ?
>     >        3. As for HUDI-83, at-least on the write side, we have tied
> this
>     > with
>     >        spark-2.4 upgrade. There is ongoing work happening in this
> regard.
>     > I will
>     >        request folks who is working on this to provide status. Last I
>     > know, we
>     >        were running into some test failures when doing  this
> upgrade.  But
>     > yes, as
>     >        this is a massive upgrade, we would need your help in
> reviewing,
>     > debugging
>     >        and testing this change  :)
>     >
>     >     Others, Thoughts ?
>     >
>     >     Thanks,
>     >     Balaji.V
>     >
>     >     On Fri, Nov 8, 2019 at 2:49 PM Scheller, Brandon
>     >     <bsche...@amazon.com.invalid> wrote:
>     >
>     >     > Hi Hudi community,
>     >     >
>     >     > We at AWS EMR are interested in starting work on a few
> different
>     > usability
>     >     > improvements for Hudi and we’re interested to hear your
> feedback.
>     >     >
>     >     > Here are some of our ideas:
>     >     > https://issues.apache.org/jira/browse/HUDI-326
>     >     > https://issues.apache.org/jira/browse/HUDI-327
>     >     >
>     >     > Additionally, we were hoping to help drive:
>     >     > https://issues.apache.org/jira/browse/HUDI-83 and its
> associated
>     > Hive
>     >     > Jira: https://issues.apache.org/jira/browse/HIVE-22224
>     >     >
>     >     > I am looking forward to improving Hudi with you all. And feel
> free
>     > to let
>     >     > us know if there is anything specific, you’d like us to look
> at.
>     >     >
>     >     > Thanks,
>     >     > Brandon
>     >     >
>     >
>     >
>     >
>
>
>

Reply via email to