I added you to the contributor role on JIRA. On Fri, Jul 19, 2019 at 3:39 PM FIXED-TERM Cheng Yuanbin (CR/PJ-AI-S1) < fixed-term.yuanbin.ch...@us.bosch.com> wrote:
> Hi Tim, > > Thanks so much for the information. > My Jira user name is Yuanbin. > > Looking forward to doing some contribution. > > Best regards > > Yuanbin Cheng > CR/PJ-AI-S1 > > > > > -----Original Message----- > From: Tim Armstrong <tarmstr...@cloudera.com> > Sent: Friday, July 19, 2019 3:23 PM > To: dev@impala <dev@impala.apache.org> > Subject: Re: Support Apache Hudi > > Please feel free to create a JIRA. we can add you as a contributor on > Apache JIRA if you give us your username then you can assign it to yourself. > > You should be able to use our jenkins instance to run tests on a draft > gerrit patch: > > https://cwiki.apache.org/confluence/display/IMPALA/Using+Gerrit+to+submit+and+review+patches#UsingGerrittosubmitandreviewpatches-Verifyingapatch(opentoallImpalacontributors) > . > > > Unfortunately we don't have a way to accelerate the initial local build. > We have a few tips for making incremental builds significantly faster here: > > https://cloudera.atlassian.net/wiki/spaces/ENG/pages/100832437/Tips+for+Faster+Impala+Builds > . It is a lot quicker to iterate on code changes if you follow some of the > tips there, e.g. use ccache and only rebuild the components of impala that > you modified. > > - Tim > > On Fri, Jul 19, 2019 at 2:04 PM FIXED-TERM Cheng Yuanbin (CR/PJ-AI-S1) < > fixed-term.yuanbin.ch...@us.bosch.com> wrote: > > > Hi Tim, > > > > The guys from Hudi said that the Hudi partitioning is compatible with > > Hive partitioning. > > I think I get some idea from the implementation of the Hive ACID > > support tickets. And I am trying to implement the Hudi support now. > > > > Could I create a Jira ticket for this task and use your Jenkins server > > for build? It takes me soo much time waiting the build process. > > > > Thanks so much! > > > > Best regards > > > > Yuanbin Cheng > > CR/PJ-AI-S1 > > > > > > > > -----Original Message----- > > From: Tim Armstrong <tarmstr...@cloudera.com> > > Sent: Tuesday, July 16, 2019 3:24 PM > > To: dev@impala <dev@impala.apache.org> > > Subject: Re: Support Apache Hudi > > > > Sorry I meant to refer to > > ./fe/src/main/java/org/apache/impala/catalog/local/LocalHbaseTable.jav > > a; FeHdfsTable is an interface shared by those two classes. > > > > There's a default catalog implementation that is based on all Impala > > daemons holding a cached snapshot of metadata, and a re-implementation > > where impala daemons fetch metadata on demand from a catalog service. > > The design doc for the reimplementation is here, although i suspect > > some details have changed: > > > > https://docs.google.com/document/d/1WcUQ7nC3fzLFtZLofzO6kvWdGHFaaqh97f > > C_PvqVGCk/edit > > > > It may be helpful to look at some recent commits that added Hive ACID > > support just to get an idea of how that was implemented: > > https://gerrit.cloudera.org/#/q/acid > > > > I guess one detail that may not work so well with HdfsTable is the > > partitioning - it's unclear to me how compatible the Hudi partitioning > > is with Hive's partitioning scheme. > > > > - Tim > > > > > > > > On Wed, Jul 17, 2019 at 6:54 AM FIXED-TERM Cheng Yuanbin (CR/PJ-AI-S1) > > < fixed-term.yuanbin.ch...@us.bosch.com> wrote: > > > > > Hi Tim, > > > > > > Thanks so much for the suggestion. > > > I also think that implement Hudi Table as a variant of HdfsTable > > > should be a cleaner way. > > > I will focus on understand the hdfsTable now, it is really a big file. > > > > > > Currently, our team only use the Copy-on-Write mode now, so I will > > > try to implement the Copy-on-Write first. > > > > > > Can you explain more about the two catalog implementations? > > > My understand is that one is more the metadata of the table and one > > > is for the frontend interface of the table, however, for the > > > HdfsTable, I only found HdfsTable, no FeHdfsTable. > > > > > > Thanks so much! > > > > > > Best regards > > > > > > Yuanbin Cheng > > > CR/PJ-AI-S1 > > > > > > > > > > > > > > > -----Original Message----- > > > From: Tim Armstrong <tarmstr...@cloudera.com> > > > Sent: Tuesday, July 16, 2019 12:28 PM > > > To: dev@impala <dev@impala.apache.org> > > > Subject: Re: Support Apache Hudi > > > > > > Hi Cheng, > > > I think that is one way you could approach it. I'm not really > > > familiar enough with Hudi to know if that's the right way. I took a > > > quick look at https://hudi.incubator.apache.org/concepts.html and > > > I'm wondering if it would actually be cleaner to implement as a > > > variant of HdfsTable. HdfsTable is used for any Hive > > > filesystem-based table, not just HDFS - e.g. S3 or whatever. Hudi > > > seems like it's similar Hive ACID in a lot of ways, which we're > > > currently adding support for in that > > way. > > > > > > Which Hudi features are you planning to implement? Copy-on-Write > > > seems like it would be simpler to implement - it might only require > > > changes in the frontend (i.e. java code). Merge-on-read probably > > > requires backend support for merging the delta files with the base > > > files. Write support also seems more complex than read support. > > > > > > Also another note - currently there are actually two catalog > > > implementations that require their own table implementation, e.g. > > > see fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java and > > > fe/src/main/java/org/apache/impala/catalog/HBaseTable.java > > > > > > On Tue, Jul 16, 2019 at 9:55 AM FIXED-TERM Cheng Yuanbin > > > (CR/PJ-AI-S1) < fixed-term.yuanbin.ch...@us.bosch.com> wrote: > > > > > > > Hi, > > > > > > > > Our team now is using Apache Hudi to migrate our data pipeline > > > > from batch to incremental processing. > > > > However, we find that the Apache Impala cannot pull the Hudi > > > > metadata from the Hive. > > > > Here is the issue: > > > > https://github.com/apache/incubator-hudi/issues/179 > > > > Now I am trying to fix this issue. > > > > > > > > After reading some code related to the table object of the Impala, > > > > currently, my thought is to implement a new HudiTable class and > > > > add it to the fromMetastoreTable method in Table class. > > > > Maybe only add some support methods in the current Table type can > > > > also solve this issue? Not very familiar with the Impala source code. > > > > Here is the Jira ticket for this issue: > > > > https://issues.apache.org/jira/projects/HUDI/issues/HUDI-146 > > > > > > > > Do you have any idea about how to solve this issue? > > > > > > > > I appreciate any help! > > > > > > > > Best regards > > > > > > > > Yuanbin Cheng > > > > CR/PJ-AI-S1 > > > > > > > > > > > > > > > > > >