More for my own edification, how does the recently introduced timeline service play into the delta writer components?
On Fri, Aug 2, 2019 at 7:53 AM vino yang <yanghua1...@gmail.com> wrote: > Hi Suneel, > > Thank you for your suggestion, let me clarify. > > > *The context of this email is that we are evaluating how to implement a > Stream Delta writer base on Flink.* > About the discussion between me, Taher and Vinay, those are just some > trivial details in the preparation of the document, and the discussion is > also based on mail. > > When we don't have the first draft, discussing the details on the mailing > list may confuse others and easily deviate from the topic. Our initial plan > was to facilitate community discussions and reviews when we had a draft of > the documentation available to the community. > > Best, > Vino > > Suneel Marthi <smar...@apache.org> 于2019年8月2日周五 下午10:37写道: > > > Please keep all discussions to Mailing lists here - no offline > discussions > > please. > > > > On Fri, Aug 2, 2019 at 10:22 AM vino yang <yanghua1...@gmail.com> wrote: > > > > > Hi guys, > > > > > > Currently, I, Taher and Vinay are working on issue HUDI-184.[1] > > > > > > As a first step, we are discussing the design doc. > > > > > > After diving into the code, We listed some relevant classes about the > > Spark > > > delta writer. > > > > > > - module: hoodie-utilities > > > > > > com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer > > > com.uber.hoodie.utilities.deltastreamer.DeltaSyncService > > > com.uber.hoodie.utilities.deltastreamer.SourceFormatAdapter > > > com.uber.hoodie.utilities.schema.SchemaProvider > > > com.uber.hoodie.utilities.transform.Transformer > > > > > > - module: hoodie-client > > > > > > com.uber.hoodie.HoodieWriteClient (to commit compaction) > > > > > > > > > The fact is *hoodie-utilities* depends on *hoodie-client*, however, > > > *hoodie-client* is also not a pure Hudi component, it also depends on > > Spark > > > lib. > > > > > > So I propose hoodie should provide a pure hoodie-client and decouple > with > > > Spark. Then Flink and Spark modules should depend on it. > > > > > > Moreover, based on the old discussion[2], we all agree that Spark is > not > > > the only choice for Hudi, it could also be Flink/Beam. > > > > > > IMO, We should decouple Hudi from Spark at the height of the project, > > > including but not limited to module splitting and renaming. > > > > > > Not sure if this requires a HIP to drive. > > > > > > We should first listen to the opinions of the community. Any ideas and > > > suggestions are welcome and appreciated. > > > > > > Best, > > > Vino > > > > > > [1]: https://issues.apache.org/jira/browse/HUDI-184?filter=-1 > > > [2]: > > > > > > > > > https://lists.apache.org/api/source.lua/1533de2d4cd4243fa9e8f8bf057ffd02f2ac0bec7c7539d8f72166ea@%3Cdev.hudi.apache.org%3E > > > > > >