Hi Vinoth, OK, I find that our main difference lies in our understanding of "integration".
In general, my understanding is that the integration module is a separate module. But it seems that "hudi-hadoop-mr" just contains some basic class files, and the module will be used as a public library by other modules. But on the other hand, hudi itself exists as a library. Therefore, there seems to be no problem in understanding this way. The "integration" I understand is still stuck in the model where the main framework can function as a single individual (for example Flink, Presto and so on). In short, now I have no problem, +1 to rename "hudi-hadoop-mr" to "hudi-hive". Best, Vino Vinoth Chandar <[email protected]> 于2020年1月19日周日 上午1:41写道: > >I suggest that we use the "hudi-{another bigdata framework}" naming > pattern more carefully > > Fully understand your concern. But thats exactly what hudi-hadoop-mr is > doing, is it not? :) InputFormats are how you integrate to Hive. > > hudi-spark, hudi-presto etc have their own integrations, but they both can > fallback to the hive integration. > > On Thu, Jan 16, 2020 at 8:36 PM vino yang <[email protected]> wrote: > > > Hi Vionth, Bhavani, > > > > +1 for renaming hudi-hive -> hudi-hive-sync > > > > About "hudi-hadoopm-mr -> hudi-hive", I suggest that we use the > > "hudi-{another bigdata framework}" naming pattern more carefully. On a > > superficial level of understanding. It is very easy for users to > > misunderstand that the module is doing ecosystem integration. Especially > > those who have seen the source code of mainstream projects, such as > > presto.[1] > > > > When we go to check out the hardi-hadoop-mr, it actually just contains > some > > InputFormat. > > > > If we do want to mention other frameworks without letting users > > misunderstand that we are doing ecosystem integration. Then, we need to > add > > additional information, for example: "hudi- {another bigdata framework} > > -xxx" or "hudi-xxx- {another bigdata framework}". > > > > [1]: https://github.com/prestodb/presto > > > > Best, > > Vino > > > > Bhavani Sudha <[email protected]> 于2020年1月17日周五 上午5:42写道: > > > > > Thanks @vinoth for giving a overall picture. I think I can relate > better > > > with the name changes you proposed. > > > > > > +1 for renaming hudi-hive -> hudi-hive-sync and hudi-hadoopm-mr -> > > > hudi-hive > > > > > > On Thu, Jan 16, 2020 at 1:33 PM Vinoth Chandar <[email protected]> > > wrote: > > > > > > > First let me share the context for the existing name.. We saw how > > Parquet > > > > hands out the InputFormat and named it similar to parquet-mr. > > > > InputFormat is indeed a MapReduce class.. I know we live in the age > of > > > > Flink and Spark.. But its true :) > > > > > > > > I think this is the crux of the "understandability" issue.. > > > > > > > > Here are my thoughts.. > > > > > > > > - +0 (neutral) on the rename to hudi-query-common., (whatever we > > decide, > > > > we need to rename the bundle accordingly) > > > > - On hudi-query-bundle being confusing with hive/spark/presto > > bundles, I > > > > don't feel its more confusing than it is today > > > > > > > > Real issue IMO, is hudi-hive, which is really about syncing to hive, > > not > > > > querying Hive. > > > > Then, may be we can rename > > > > - hudi-hadoop-mr to hudi-hive (more understandable, Hive does use > > > > InputFormat as the abstraction) > > > > - current hudi-hive to hudi-hive-sync > > > > (bundles renamed accordingly) > > > > > > > > I know this hijacks the conversation. Apologize :). But thought I'd > > > present > > > > a broader take > > > > > > > > > > > > > > > > On Thu, Jan 16, 2020 at 11:26 AM Bhavani Sudha Saktheeswaran > > > > <[email protected]> wrote: > > > > > > > > > +1 to generally renaming the packages. Since this is about renaming > > for > > > > the > > > > > purpose of making it user friendly, I am concerned if we make this > as > > > > > hudi-query-bundle, users might get confused with other modules like > > > > > hudi-hive and hudi-spark. And inside packaging module, we further > > have > > > > > bundles specific to spark, hive and presto. > > > > > > > > > > Any suggestions on how to rename broadly to avoid these confusions? > > Let > > > > me > > > > > also think and get back. > > > > > > > > > > Thanks, > > > > > Sudha > > > > > > > > > > On Wed, Jan 15, 2020 at 9:56 PM vino yang <[email protected]> > > > wrote: > > > > > > > > > > > Hi guys, > > > > > > > > > > > > I want to start a proposal about refactoring the naming of the > > > > > > "hudi-hadoop-mr" module. > > > > > > > > > > > > IMHO, this module name is not user-friendly. It may make users > > > > confused. > > > > > > Because it looks like that it's about integrating with MapReduce( > > > > > although > > > > > > I know it referenced parquet-mr[1] project). > > > > > > > > > > > > Based on the purpose of this module (contains InputFormat > > > > implementations > > > > > > for ReadOptimized, Incremental, Realtime views). > > > > > > > > > > > > I suggest that we can rename it to "*hudi-query-common*". Then, > we > > > can > > > > > also > > > > > > rename "hudi-hadoop-mr-bundle" to "*hudi-query-bundle*". > > > > > > > > > > > > What do you think? > > > > > > > > > > > > Any thoughts and suggestions are welcome and appreciated. > > > > > > > > > > > > Best, > > > > > > Vino > > > > > > > > > > > > [1]: > > > > > > > > > > > > > > > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_parquet-2Dmr&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=dmZJgDEuo5sZCNsoyMRQUpiJoBP7u4r2i8cdHDMmQic&s=4CnBhu54QxDqAWdCb3NXUdQg9beV2xEmgx-N0yhTr9Y&e= > > > > > > > > > > > > > > > > > > > > >
