Hi Tim, Thanks so much for the suggestion. I also think that implement Hudi Table as a variant of HdfsTable should be a cleaner way. I will focus on understand the hdfsTable now, it is really a big file.
Currently, our team only use the Copy-on-Write mode now, so I will try to implement the Copy-on-Write first. Can you explain more about the two catalog implementations? My understand is that one is more the metadata of the table and one is for the frontend interface of the table, however, for the HdfsTable, I only found HdfsTable, no FeHdfsTable. Thanks so much! Best regards Yuanbin Cheng CR/PJ-AI-S1 -----Original Message----- From: Tim Armstrong <tarmstr...@cloudera.com> Sent: Tuesday, July 16, 2019 12:28 PM To: dev@impala <dev@impala.apache.org> Subject: Re: Support Apache Hudi Hi Cheng, I think that is one way you could approach it. I'm not really familiar enough with Hudi to know if that's the right way. I took a quick look at https://hudi.incubator.apache.org/concepts.html and I'm wondering if it would actually be cleaner to implement as a variant of HdfsTable. HdfsTable is used for any Hive filesystem-based table, not just HDFS - e.g. S3 or whatever. Hudi seems like it's similar Hive ACID in a lot of ways, which we're currently adding support for in that way. Which Hudi features are you planning to implement? Copy-on-Write seems like it would be simpler to implement - it might only require changes in the frontend (i.e. java code). Merge-on-read probably requires backend support for merging the delta files with the base files. Write support also seems more complex than read support. Also another note - currently there are actually two catalog implementations that require their own table implementation, e.g. see fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java and fe/src/main/java/org/apache/impala/catalog/HBaseTable.java On Tue, Jul 16, 2019 at 9:55 AM FIXED-TERM Cheng Yuanbin (CR/PJ-AI-S1) < fixed-term.yuanbin.ch...@us.bosch.com> wrote: > Hi, > > Our team now is using Apache Hudi to migrate our data pipeline from > batch to incremental processing. > However, we find that the Apache Impala cannot pull the Hudi metadata > from the Hive. > Here is the issue: https://github.com/apache/incubator-hudi/issues/179 > Now I am trying to fix this issue. > > After reading some code related to the table object of the Impala, > currently, my thought is to implement a new HudiTable class and add it > to the fromMetastoreTable method in Table class. > Maybe only add some support methods in the current Table type can also > solve this issue? Not very familiar with the Impala source code. > Here is the Jira ticket for this issue: > https://issues.apache.org/jira/projects/HUDI/issues/HUDI-146 > > Do you have any idea about how to solve this issue? > > I appreciate any help! > > Best regards > > Yuanbin Cheng > CR/PJ-AI-S1 > > >