Please go ahead. If you can share your jira ID, we can add you to contributors to get the ticket assigned to you
On Mon, Nov 11, 2019 at 1:14 PM Scheller, Brandon <bsche...@amazon.com.invalid> wrote: > Yep, you are correct that it is throwing the exception because of the > DataSourceUtils.getNestedFieldValAsString. > I can take up the work to fix this behavior if it is not intended. I'd > also like to add extra error messaging and validation because currently it > is not clear to users what the error is when a record_key is empty. It just > throws the following and requires trace debugging to root cause: > 19/11/11 21:09:36 ERROR HoodieSparkSqlWriter$: insert failed with 1 errors > : > > -Brandon > > On 11/10/19, 11:24 PM, "Jaimin Shah" <shahjaimin0...@gmail.com> wrote: > > Hi Brandon, > I contributed to complex ComplexKeyGenerator sometime back. I don't > think > it is intended behavior. If you are getting exception is it because of > DataSourceUtils.getNestedFieldValAsString(record, recordKeyField) ? I > can't > think of any other reason why it should throw exception. > I think when val is null DataSourceUtils.getNestedFieldValAsString > throw > error due to > if (!(val instanceof GenericRecord)) { > throw new HoodieException("Cannot find a record at part value :" + > part); > } > Maybe Balaji can confirm it. > > Thanks, > Jaimin > > On Sun, 10 Nov 2019 at 03:40, Scheller, Brandon > <bsche...@amazon.com.invalid> > wrote: > > > Thanks for the quick response Balaji! > > > > I think there is a lot here to continue with: > > 1. I did see that recent pull request for the delete API. I think > > collaborating to support another delete API with just record key > would be a > > great next step. I'll begin looking into it. Additionally, the > scenario of > > using Hbase as the global index is definitely something which we'd be > > interested in understanding further. > > 2. Actually I was speaking to the case of ComplexKeyGenerator. > Currently > > if any single component of it is null, it will throw an exception. > If this > > is not intended behavior, I'd be happy to fix this bug as it looks > to solve > > our use case. > > 3. Thanks for the update on this. The spark upgrade is definitely a > large > > undertaking that I'd be happy to help with. > > > > Thanks again, > > Brandon > > > > On 11/8/19, 3:52 PM, "Balaji Varadarajan" <varadar...@gmail.com> > wrote: > > > > Brandon, > > > > Great initiative and thoughts. Thanks for writing detailed > description > > on > > what you are looking to achieve. > > > > Here are some of my comments/thoughts: > > > > 1. HUDI-326 : There is some work that is happening in this > > direction. > > But, we should be able to collaborate on this. Siva has > opened a PR > > ( > > https://github.com/apache/incubator-hudi/pull/1004) to > support > > delete > > using only HoodieKey (partitionPath, recordKey). Technically, > we can > > support an interface for delete with only recordKeys if the > index > > is of > > type global (Current implementation supports > > HoodieGlobalBloomIndex). > > Within Uber, we use Hbase as the global Hudi index to support > > partition > > agnostic record-key lookups. In other words, we can have 2 > flavors > > of > > delete APIs - one with input being RDD<HoodieKeys> (works for > all > > index > > types) and another with input RDD<RecordKey> that works with > global > > index. Our vision is to support an external clustered index > > (global) as the > > de-facto index that resides in DFS along with dataset. > > 2. HUDI-327 : IIUC, Just like ComplexKeyGenerator, the new > key > > generator would need composite keys (in this case primary and > > secondary for > > breaking the "null" tie ). Are you concerned about the > record-key > > footprint > > for each key when using the key generated by > ComplexKeyGenerator? > > In that > > case, makes sense to me. Otherwise, ComplexKeyGenerator > should be > > able to > > handle cases when some component of it is null. right ? > > 3. As for HUDI-83, at-least on the write side, we have tied > this > > with > > spark-2.4 upgrade. There is ongoing work happening in this > regard. > > I will > > request folks who is working on this to provide status. Last I > > know, we > > were running into some test failures when doing this > upgrade. But > > yes, as > > this is a massive upgrade, we would need your help in > reviewing, > > debugging > > and testing this change :) > > > > Others, Thoughts ? > > > > Thanks, > > Balaji.V > > > > On Fri, Nov 8, 2019 at 2:49 PM Scheller, Brandon > > <bsche...@amazon.com.invalid> wrote: > > > > > Hi Hudi community, > > > > > > We at AWS EMR are interested in starting work on a few > different > > usability > > > improvements for Hudi and we’re interested to hear your > feedback. > > > > > > Here are some of our ideas: > > > https://issues.apache.org/jira/browse/HUDI-326 > > > https://issues.apache.org/jira/browse/HUDI-327 > > > > > > Additionally, we were hoping to help drive: > > > https://issues.apache.org/jira/browse/HUDI-83 and its > associated > > Hive > > > Jira: https://issues.apache.org/jira/browse/HIVE-22224 > > > > > > I am looking forward to improving Hudi with you all. And feel > free > > to let > > > us know if there is anything specific, you’d like us to look > at. > > > > > > Thanks, > > > Brandon > > > > > > > > > > > >