Hi Brandon,
  I contributed to complex ComplexKeyGenerator sometime back. I don't think
it is intended behavior. If you are getting exception is it because of
DataSourceUtils.getNestedFieldValAsString(record, recordKeyField) ? I can't
think of any other reason why it should throw exception.
 I think when val is null DataSourceUtils.getNestedFieldValAsString throw
error due to
if (!(val instanceof GenericRecord)) {
throw new HoodieException("Cannot find a record at part value :" + part);
}
Maybe Balaji can confirm it.

Thanks,
Jaimin

On Sun, 10 Nov 2019 at 03:40, Scheller, Brandon <bsche...@amazon.com.invalid>
wrote:

> Thanks for the quick response Balaji!
>
> I think there is a lot here to continue with:
> 1. I did see that recent pull request for the delete API. I think
> collaborating to support another delete API with just record key would be a
> great next step. I'll begin looking into it. Additionally, the scenario of
> using Hbase as the global index is definitely something which we'd be
> interested in understanding further.
> 2. Actually I was speaking to the case of ComplexKeyGenerator. Currently
> if any single component of it is null, it will throw an exception. If this
> is not intended behavior, I'd be happy to fix this bug as it looks to solve
> our use case.
> 3. Thanks for the update on this. The spark upgrade is definitely a large
> undertaking that I'd be happy to help with.
>
> Thanks again,
> Brandon
>
> On 11/8/19, 3:52 PM, "Balaji Varadarajan" <varadar...@gmail.com> wrote:
>
>     Brandon,
>
>     Great initiative and thoughts. Thanks for writing detailed description
> on
>     what you are looking to achieve.
>
>     Here are some of my  comments/thoughts:
>
>        1. HUDI-326 : There is some work that is happening in this
> direction.
>        But, we should be able to collaborate on this. Siva has opened a PR
> (
>        https://github.com/apache/incubator-hudi/pull/1004) to support
> delete
>        using only HoodieKey (partitionPath, recordKey). Technically, we can
>        support an interface for delete with only recordKeys if the index
> is of
>        type global (Current implementation supports
> HoodieGlobalBloomIndex).
>        Within Uber, we use Hbase as the global Hudi index to support
> partition
>        agnostic record-key lookups. In other words, we can have 2 flavors
> of
>        delete APIs - one with input being RDD<HoodieKeys> (works for all
> index
>        types) and another with input RDD<RecordKey> that works with global
>        index. Our vision is to support an external clustered index
> (global) as the
>        de-facto index that resides in DFS along with dataset.
>        2. HUDI-327 :  IIUC, Just like ComplexKeyGenerator, the new key
>        generator would need composite keys (in this case primary and
> secondary for
>        breaking the "null" tie ). Are you concerned about the record-key
> footprint
>        for each key when using the key generated by ComplexKeyGenerator?
> In that
>        case, makes sense to me. Otherwise, ComplexKeyGenerator should be
> able to
>        handle cases when some component of it is null. right ?
>        3. As for HUDI-83, at-least on the write side, we have tied this
> with
>        spark-2.4 upgrade. There is ongoing work happening in this regard.
> I will
>        request folks who is working on this to provide status. Last I
> know, we
>        were running into some test failures when doing  this upgrade.  But
> yes, as
>        this is a massive upgrade, we would need your help in reviewing,
> debugging
>        and testing this change  :)
>
>     Others, Thoughts ?
>
>     Thanks,
>     Balaji.V
>
>     On Fri, Nov 8, 2019 at 2:49 PM Scheller, Brandon
>     <bsche...@amazon.com.invalid> wrote:
>
>     > Hi Hudi community,
>     >
>     > We at AWS EMR are interested in starting work on a few different
> usability
>     > improvements for Hudi and we’re interested to hear your feedback.
>     >
>     > Here are some of our ideas:
>     > https://issues.apache.org/jira/browse/HUDI-326
>     > https://issues.apache.org/jira/browse/HUDI-327
>     >
>     > Additionally, we were hoping to help drive:
>     > https://issues.apache.org/jira/browse/HUDI-83 and its associated
> Hive
>     > Jira: https://issues.apache.org/jira/browse/HIVE-22224
>     >
>     > I am looking forward to improving Hudi with you all. And feel free
> to let
>     > us know if there is anything specific, you’d like us to look at.
>     >
>     > Thanks,
>     > Brandon
>     >
>
>
>

Reply via email to