Thanks for the quick response Balaji!

I think there is a lot here to continue with:
1. I did see that recent pull request for the delete API. I think collaborating 
to support another delete API with just record key would be a great next step. 
I'll begin looking into it. Additionally, the scenario of using Hbase as the 
global index is definitely something which we'd be interested in understanding 
further. 
2. Actually I was speaking to the case of ComplexKeyGenerator. Currently if any 
single component of it is null, it will throw an exception. If this is not 
intended behavior, I'd be happy to fix this bug as it looks to solve our use 
case. 
3. Thanks for the update on this. The spark upgrade is definitely a large 
undertaking that I'd be happy to help with.

Thanks again,
Brandon

On 11/8/19, 3:52 PM, "Balaji Varadarajan" <varadar...@gmail.com> wrote:

    Brandon,
    
    Great initiative and thoughts. Thanks for writing detailed description on
    what you are looking to achieve.
    
    Here are some of my  comments/thoughts:
    
       1. HUDI-326 : There is some work that is happening in this direction.
       But, we should be able to collaborate on this. Siva has opened a PR (
       https://github.com/apache/incubator-hudi/pull/1004) to support delete
       using only HoodieKey (partitionPath, recordKey). Technically, we can
       support an interface for delete with only recordKeys if the index is of
       type global (Current implementation supports HoodieGlobalBloomIndex).
       Within Uber, we use Hbase as the global Hudi index to support partition
       agnostic record-key lookups. In other words, we can have 2 flavors of
       delete APIs - one with input being RDD<HoodieKeys> (works for all index
       types) and another with input RDD<RecordKey> that works with global
       index. Our vision is to support an external clustered index (global) as 
the
       de-facto index that resides in DFS along with dataset.
       2. HUDI-327 :  IIUC, Just like ComplexKeyGenerator, the new key
       generator would need composite keys (in this case primary and secondary 
for
       breaking the "null" tie ). Are you concerned about the record-key 
footprint
       for each key when using the key generated by ComplexKeyGenerator? In that
       case, makes sense to me. Otherwise, ComplexKeyGenerator should be able to
       handle cases when some component of it is null. right ?
       3. As for HUDI-83, at-least on the write side, we have tied this with
       spark-2.4 upgrade. There is ongoing work happening in this regard. I will
       request folks who is working on this to provide status. Last I know, we
       were running into some test failures when doing  this upgrade.  But yes, 
as
       this is a massive upgrade, we would need your help in reviewing, 
debugging
       and testing this change  :)
    
    Others, Thoughts ?
    
    Thanks,
    Balaji.V
    
    On Fri, Nov 8, 2019 at 2:49 PM Scheller, Brandon
    <bsche...@amazon.com.invalid> wrote:
    
    > Hi Hudi community,
    >
    > We at AWS EMR are interested in starting work on a few different usability
    > improvements for Hudi and we’re interested to hear your feedback.
    >
    > Here are some of our ideas:
    > https://issues.apache.org/jira/browse/HUDI-326
    > https://issues.apache.org/jira/browse/HUDI-327
    >
    > Additionally, we were hoping to help drive:
    > https://issues.apache.org/jira/browse/HUDI-83 and its associated Hive
    > Jira: https://issues.apache.org/jira/browse/HIVE-22224
    >
    > I am looking forward to improving Hudi with you all. And feel free to let
    > us know if there is anything specific, you’d like us to look at.
    >
    > Thanks,
    > Brandon
    >
    

Reply via email to