Brandon, Great initiative and thoughts. Thanks for writing detailed description on what you are looking to achieve.
Here are some of my comments/thoughts: 1. HUDI-326 : There is some work that is happening in this direction. But, we should be able to collaborate on this. Siva has opened a PR ( https://github.com/apache/incubator-hudi/pull/1004) to support delete using only HoodieKey (partitionPath, recordKey). Technically, we can support an interface for delete with only recordKeys if the index is of type global (Current implementation supports HoodieGlobalBloomIndex). Within Uber, we use Hbase as the global Hudi index to support partition agnostic record-key lookups. In other words, we can have 2 flavors of delete APIs - one with input being RDD<HoodieKeys> (works for all index types) and another with input RDD<RecordKey> that works with global index. Our vision is to support an external clustered index (global) as the de-facto index that resides in DFS along with dataset. 2. HUDI-327 : IIUC, Just like ComplexKeyGenerator, the new key generator would need composite keys (in this case primary and secondary for breaking the "null" tie ). Are you concerned about the record-key footprint for each key when using the key generated by ComplexKeyGenerator? In that case, makes sense to me. Otherwise, ComplexKeyGenerator should be able to handle cases when some component of it is null. right ? 3. As for HUDI-83, at-least on the write side, we have tied this with spark-2.4 upgrade. There is ongoing work happening in this regard. I will request folks who is working on this to provide status. Last I know, we were running into some test failures when doing this upgrade. But yes, as this is a massive upgrade, we would need your help in reviewing, debugging and testing this change :) Others, Thoughts ? Thanks, Balaji.V On Fri, Nov 8, 2019 at 2:49 PM Scheller, Brandon <bsche...@amazon.com.invalid> wrote: > Hi Hudi community, > > We at AWS EMR are interested in starting work on a few different usability > improvements for Hudi and we’re interested to hear your feedback. > > Here are some of our ideas: > https://issues.apache.org/jira/browse/HUDI-326 > https://issues.apache.org/jira/browse/HUDI-327 > > Additionally, we were hoping to help drive: > https://issues.apache.org/jira/browse/HUDI-83 and its associated Hive > Jira: https://issues.apache.org/jira/browse/HIVE-22224 > > I am looking forward to improving Hudi with you all. And feel free to let > us know if there is anything specific, you’d like us to look at. > > Thanks, > Brandon >