Brandon,

Great initiative and thoughts. Thanks for writing detailed description on
what you are looking to achieve.

Here are some of my  comments/thoughts:

   1. HUDI-326 : There is some work that is happening in this direction.
   But, we should be able to collaborate on this. Siva has opened a PR (
   https://github.com/apache/incubator-hudi/pull/1004) to support delete
   using only HoodieKey (partitionPath, recordKey). Technically, we can
   support an interface for delete with only recordKeys if the index is of
   type global (Current implementation supports HoodieGlobalBloomIndex).
   Within Uber, we use Hbase as the global Hudi index to support partition
   agnostic record-key lookups. In other words, we can have 2 flavors of
   delete APIs - one with input being RDD<HoodieKeys> (works for all index
   types) and another with input RDD<RecordKey> that works with global
   index. Our vision is to support an external clustered index (global) as the
   de-facto index that resides in DFS along with dataset.
   2. HUDI-327 :  IIUC, Just like ComplexKeyGenerator, the new key
   generator would need composite keys (in this case primary and secondary for
   breaking the "null" tie ). Are you concerned about the record-key footprint
   for each key when using the key generated by ComplexKeyGenerator? In that
   case, makes sense to me. Otherwise, ComplexKeyGenerator should be able to
   handle cases when some component of it is null. right ?
   3. As for HUDI-83, at-least on the write side, we have tied this with
   spark-2.4 upgrade. There is ongoing work happening in this regard. I will
   request folks who is working on this to provide status. Last I know, we
   were running into some test failures when doing  this upgrade.  But yes, as
   this is a massive upgrade, we would need your help in reviewing, debugging
   and testing this change  :)

Others, Thoughts ?

Thanks,
Balaji.V

On Fri, Nov 8, 2019 at 2:49 PM Scheller, Brandon
<bsche...@amazon.com.invalid> wrote:

> Hi Hudi community,
>
> We at AWS EMR are interested in starting work on a few different usability
> improvements for Hudi and we’re interested to hear your feedback.
>
> Here are some of our ideas:
> https://issues.apache.org/jira/browse/HUDI-326
> https://issues.apache.org/jira/browse/HUDI-327
>
> Additionally, we were hoping to help drive:
> https://issues.apache.org/jira/browse/HUDI-83 and its associated Hive
> Jira: https://issues.apache.org/jira/browse/HIVE-22224
>
> I am looking forward to improving Hudi with you all. And feel free to let
> us know if there is anything specific, you’d like us to look at.
>
> Thanks,
> Brandon
>

Reply via email to