Yep, you are correct that it is throwing the exception because of the DataSourceUtils.getNestedFieldValAsString. I can take up the work to fix this behavior if it is not intended. I'd also like to add extra error messaging and validation because currently it is not clear to users what the error is when a record_key is empty. It just throws the following and requires trace debugging to root cause: 19/11/11 21:09:36 ERROR HoodieSparkSqlWriter$: insert failed with 1 errors :
-Brandon On 11/10/19, 11:24 PM, "Jaimin Shah" <shahjaimin0...@gmail.com> wrote: Hi Brandon, I contributed to complex ComplexKeyGenerator sometime back. I don't think it is intended behavior. If you are getting exception is it because of DataSourceUtils.getNestedFieldValAsString(record, recordKeyField) ? I can't think of any other reason why it should throw exception. I think when val is null DataSourceUtils.getNestedFieldValAsString throw error due to if (!(val instanceof GenericRecord)) { throw new HoodieException("Cannot find a record at part value :" + part); } Maybe Balaji can confirm it. Thanks, Jaimin On Sun, 10 Nov 2019 at 03:40, Scheller, Brandon <bsche...@amazon.com.invalid> wrote: > Thanks for the quick response Balaji! > > I think there is a lot here to continue with: > 1. I did see that recent pull request for the delete API. I think > collaborating to support another delete API with just record key would be a > great next step. I'll begin looking into it. Additionally, the scenario of > using Hbase as the global index is definitely something which we'd be > interested in understanding further. > 2. Actually I was speaking to the case of ComplexKeyGenerator. Currently > if any single component of it is null, it will throw an exception. If this > is not intended behavior, I'd be happy to fix this bug as it looks to solve > our use case. > 3. Thanks for the update on this. The spark upgrade is definitely a large > undertaking that I'd be happy to help with. > > Thanks again, > Brandon > > On 11/8/19, 3:52 PM, "Balaji Varadarajan" <varadar...@gmail.com> wrote: > > Brandon, > > Great initiative and thoughts. Thanks for writing detailed description > on > what you are looking to achieve. > > Here are some of my comments/thoughts: > > 1. HUDI-326 : There is some work that is happening in this > direction. > But, we should be able to collaborate on this. Siva has opened a PR > ( > https://github.com/apache/incubator-hudi/pull/1004) to support > delete > using only HoodieKey (partitionPath, recordKey). Technically, we can > support an interface for delete with only recordKeys if the index > is of > type global (Current implementation supports > HoodieGlobalBloomIndex). > Within Uber, we use Hbase as the global Hudi index to support > partition > agnostic record-key lookups. In other words, we can have 2 flavors > of > delete APIs - one with input being RDD<HoodieKeys> (works for all > index > types) and another with input RDD<RecordKey> that works with global > index. Our vision is to support an external clustered index > (global) as the > de-facto index that resides in DFS along with dataset. > 2. HUDI-327 : IIUC, Just like ComplexKeyGenerator, the new key > generator would need composite keys (in this case primary and > secondary for > breaking the "null" tie ). Are you concerned about the record-key > footprint > for each key when using the key generated by ComplexKeyGenerator? > In that > case, makes sense to me. Otherwise, ComplexKeyGenerator should be > able to > handle cases when some component of it is null. right ? > 3. As for HUDI-83, at-least on the write side, we have tied this > with > spark-2.4 upgrade. There is ongoing work happening in this regard. > I will > request folks who is working on this to provide status. Last I > know, we > were running into some test failures when doing this upgrade. But > yes, as > this is a massive upgrade, we would need your help in reviewing, > debugging > and testing this change :) > > Others, Thoughts ? > > Thanks, > Balaji.V > > On Fri, Nov 8, 2019 at 2:49 PM Scheller, Brandon > <bsche...@amazon.com.invalid> wrote: > > > Hi Hudi community, > > > > We at AWS EMR are interested in starting work on a few different > usability > > improvements for Hudi and we’re interested to hear your feedback. > > > > Here are some of our ideas: > > https://issues.apache.org/jira/browse/HUDI-326 > > https://issues.apache.org/jira/browse/HUDI-327 > > > > Additionally, we were hoping to help drive: > > https://issues.apache.org/jira/browse/HUDI-83 and its associated > Hive > > Jira: https://issues.apache.org/jira/browse/HIVE-22224 > > > > I am looking forward to improving Hudi with you all. And feel free > to let > > us know if there is anything specific, you’d like us to look at. > > > > Thanks, > > Brandon > > > > >