Hi Brandon,
I contributed to complex ComplexKeyGenerator sometime back. I don't think
it is intended behavior. If you are getting exception is it because of
DataSourceUtils.getNestedFieldValAsString(record, recordKeyField) ? I can't
think of any other reason why it should throw exception.
I think when val is null DataSourceUtils.getNestedFieldValAsString throw
error due to
if (!(val instanceof GenericRecord)) {
throw new HoodieException("Cannot find a record at part value :" + part);
}
Maybe Balaji can confirm it.
Thanks,
Jaimin
On Sun, 10 Nov 2019 at 03:40, Scheller, Brandon <[email protected]>
wrote:
> Thanks for the quick response Balaji!
>
> I think there is a lot here to continue with:
> 1. I did see that recent pull request for the delete API. I think
> collaborating to support another delete API with just record key would be a
> great next step. I'll begin looking into it. Additionally, the scenario of
> using Hbase as the global index is definitely something which we'd be
> interested in understanding further.
> 2. Actually I was speaking to the case of ComplexKeyGenerator. Currently
> if any single component of it is null, it will throw an exception. If this
> is not intended behavior, I'd be happy to fix this bug as it looks to solve
> our use case.
> 3. Thanks for the update on this. The spark upgrade is definitely a large
> undertaking that I'd be happy to help with.
>
> Thanks again,
> Brandon
>
> On 11/8/19, 3:52 PM, "Balaji Varadarajan" <[email protected]> wrote:
>
> Brandon,
>
> Great initiative and thoughts. Thanks for writing detailed description
> on
> what you are looking to achieve.
>
> Here are some of my comments/thoughts:
>
> 1. HUDI-326 : There is some work that is happening in this
> direction.
> But, we should be able to collaborate on this. Siva has opened a PR
> (
> https://github.com/apache/incubator-hudi/pull/1004) to support
> delete
> using only HoodieKey (partitionPath, recordKey). Technically, we can
> support an interface for delete with only recordKeys if the index
> is of
> type global (Current implementation supports
> HoodieGlobalBloomIndex).
> Within Uber, we use Hbase as the global Hudi index to support
> partition
> agnostic record-key lookups. In other words, we can have 2 flavors
> of
> delete APIs - one with input being RDD<HoodieKeys> (works for all
> index
> types) and another with input RDD<RecordKey> that works with global
> index. Our vision is to support an external clustered index
> (global) as the
> de-facto index that resides in DFS along with dataset.
> 2. HUDI-327 : IIUC, Just like ComplexKeyGenerator, the new key
> generator would need composite keys (in this case primary and
> secondary for
> breaking the "null" tie ). Are you concerned about the record-key
> footprint
> for each key when using the key generated by ComplexKeyGenerator?
> In that
> case, makes sense to me. Otherwise, ComplexKeyGenerator should be
> able to
> handle cases when some component of it is null. right ?
> 3. As for HUDI-83, at-least on the write side, we have tied this
> with
> spark-2.4 upgrade. There is ongoing work happening in this regard.
> I will
> request folks who is working on this to provide status. Last I
> know, we
> were running into some test failures when doing this upgrade. But
> yes, as
> this is a massive upgrade, we would need your help in reviewing,
> debugging
> and testing this change :)
>
> Others, Thoughts ?
>
> Thanks,
> Balaji.V
>
> On Fri, Nov 8, 2019 at 2:49 PM Scheller, Brandon
> <[email protected]> wrote:
>
> > Hi Hudi community,
> >
> > We at AWS EMR are interested in starting work on a few different
> usability
> > improvements for Hudi and we’re interested to hear your feedback.
> >
> > Here are some of our ideas:
> > https://issues.apache.org/jira/browse/HUDI-326
> > https://issues.apache.org/jira/browse/HUDI-327
> >
> > Additionally, we were hoping to help drive:
> > https://issues.apache.org/jira/browse/HUDI-83 and its associated
> Hive
> > Jira: https://issues.apache.org/jira/browse/HIVE-22224
> >
> > I am looking forward to improving Hudi with you all. And feel free
> to let
> > us know if there is anything specific, you’d like us to look at.
> >
> > Thanks,
> > Brandon
> >
>
>
>