yihua commented on pull request #4695:
URL: https://github.com/apache/hudi/pull/4695#issuecomment-1026487129


   cc @vinothchandar 
   
   My approach is pulling the HFile format relevant classes from HBase repo 
with rel 2.4.9, into hudi repo `hudi-io` module with renamed package of 
`org.apache.hudi.hbase` instead of `org.apache.hadoop.hbase`.  I trimmed some 
classes to limit the number of deps pulled in.  All the backward compatibility 
logic of KeyValue.KVComparator (hbase1) vs CellComparator (hbase2) is pulled in 
as well so we can control that.  In such a way, any hudi logic using HFile 
format is going to use internal `org.apache.hudi.hbase` classes, while 
SparkHoodieHBaseIndex still uses hbase lib with `org.apache.hadoop.hbase` 
classes (these two are independent).
   
   A few things to finalize:
   - I'm questioning whether we should flip the hbase version in hudi repo, 
since if we can unlock the HFile format for metadata table, Presto, Trino, with 
the first WIP PR, there is no real need to upgrade hbase version to 2.x, which 
could introduce compatibility issues for SparkHoodieHBaseIndex.  Anything I 
miss here?  wdyt?
   - Right now, protobuf is used to generate proto classes and I pulled in the 
.proto and protobuf libs (hudi-io-proto module).  Should I just put the 
generated java classes inside the repo and get rid of the proto related files 
altogether?  I can keep hudi-io-proto module though and make hudi-io include 
generated code, not depending on hudi-io-proto, so in the future we can still 
evolve the protos.
   - Regarding the new dependencies pulled in, I can further trim the list down 
if some can cause conflict, e.g., `commons-lang3`, `protobuf`:
   ```
         <groupId>org.apache.hadoop</groupId>
         <artifactId>hadoop-client</artifactId>
         <scope>provided</scope>
       
         <groupId>org.apache.hadoop</groupId>
         <artifactId>hadoop-hdfs</artifactId>
         <scope>provided</scope>
         
         <groupId>org.apache.hbase.thirdparty</groupId>
         <artifactId>hbase-shaded-protobuf</artifactId>
         <version>4.0.1</version>
   
         <groupId>org.apache.hbase.thirdparty</groupId>
         <artifactId>hbase-shaded-miscellaneous</artifactId>
         <version>4.0.1</version>
   
         <groupId>org.apache.hbase.thirdparty</groupId>
         <artifactId>hbase-shaded-gson</artifactId>
         <version>4.0.1</version>
   
         <groupId>org.apache.hbase.thirdparty</groupId>
         <artifactId>hbase-shaded-netty</artifactId>
         <version>4.0.1</version>
   
         <groupId>org.apache.htrace</groupId>
         <artifactId>htrace-core4</artifactId>
         <version>4.2.0-incubating</version>
   
         <groupId>org.apache.commons</groupId>
         <artifactId>commons-lang3</artifactId>
         <version>3.12.0</version>
         <scope>compile</scope>
   
         <groupId>org.apache.yetus</groupId>
         <artifactId>audience-annotations</artifactId>
         <version>0.13.0</version>
   
         <groupId>com.esotericsoftware</groupId>
         <artifactId>kryo-shaded</artifactId>
         <version>4.0.2</version>
   ```  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to