Sorry I meant to refer to ./fe/src/main/java/org/apache/impala/catalog/local/LocalHbaseTable.java; FeHdfsTable is an interface shared by those two classes.
There's a default catalog implementation that is based on all Impala daemons holding a cached snapshot of metadata, and a re-implementation where impala daemons fetch metadata on demand from a catalog service. The design doc for the reimplementation is here, although i suspect some details have changed: https://docs.google.com/document/d/1WcUQ7nC3fzLFtZLofzO6kvWdGHFaaqh97fC_PvqVGCk/edit It may be helpful to look at some recent commits that added Hive ACID support just to get an idea of how that was implemented: https://gerrit.cloudera.org/#/q/acid I guess one detail that may not work so well with HdfsTable is the partitioning - it's unclear to me how compatible the Hudi partitioning is with Hive's partitioning scheme. - Tim On Wed, Jul 17, 2019 at 6:54 AM FIXED-TERM Cheng Yuanbin (CR/PJ-AI-S1) < fixed-term.yuanbin.ch...@us.bosch.com> wrote: > Hi Tim, > > Thanks so much for the suggestion. > I also think that implement Hudi Table as a variant of HdfsTable should be > a cleaner way. > I will focus on understand the hdfsTable now, it is really a big file. > > Currently, our team only use the Copy-on-Write mode now, so I will try to > implement the Copy-on-Write first. > > Can you explain more about the two catalog implementations? > My understand is that one is more the metadata of the table and one is for > the frontend interface of the table, however, for the HdfsTable, I only > found HdfsTable, no FeHdfsTable. > > Thanks so much! > > Best regards > > Yuanbin Cheng > CR/PJ-AI-S1 > > > > > -----Original Message----- > From: Tim Armstrong <tarmstr...@cloudera.com> > Sent: Tuesday, July 16, 2019 12:28 PM > To: dev@impala <dev@impala.apache.org> > Subject: Re: Support Apache Hudi > > Hi Cheng, > I think that is one way you could approach it. I'm not really familiar > enough with Hudi to know if that's the right way. I took a quick look at > https://hudi.incubator.apache.org/concepts.html and I'm wondering if it > would actually be cleaner to implement as a variant of HdfsTable. HdfsTable > is used for any Hive filesystem-based table, not just HDFS - e.g. S3 or > whatever. Hudi seems like it's similar Hive ACID in a lot of ways, which > we're currently adding support for in that way. > > Which Hudi features are you planning to implement? Copy-on-Write seems > like it would be simpler to implement - it might only require changes in > the frontend (i.e. java code). Merge-on-read probably requires backend > support for merging the delta files with the base files. Write support also > seems more complex than read support. > > Also another note - currently there are actually two catalog > implementations that require their own table implementation, e.g. see > fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java and > fe/src/main/java/org/apache/impala/catalog/HBaseTable.java > > On Tue, Jul 16, 2019 at 9:55 AM FIXED-TERM Cheng Yuanbin (CR/PJ-AI-S1) < > fixed-term.yuanbin.ch...@us.bosch.com> wrote: > > > Hi, > > > > Our team now is using Apache Hudi to migrate our data pipeline from > > batch to incremental processing. > > However, we find that the Apache Impala cannot pull the Hudi metadata > > from the Hive. > > Here is the issue: https://github.com/apache/incubator-hudi/issues/179 > > Now I am trying to fix this issue. > > > > After reading some code related to the table object of the Impala, > > currently, my thought is to implement a new HudiTable class and add it > > to the fromMetastoreTable method in Table class. > > Maybe only add some support methods in the current Table type can also > > solve this issue? Not very familiar with the Impala source code. > > Here is the Jira ticket for this issue: > > https://issues.apache.org/jira/projects/HUDI/issues/HUDI-146 > > > > Do you have any idea about how to solve this issue? > > > > I appreciate any help! > > > > Best regards > > > > Yuanbin Cheng > > CR/PJ-AI-S1 > > > > > > >