[GitHub] [hudi] codope opened a new issue, #7583: [SUPPORT] Unable to query Partitioned COW Hudi tables with metadata enabled using Trino-Hudi Connector

GitBox Thu, 29 Dec 2022 21:55:49 -0800


codope opened a new issue, #7583:
URL: https://github.com/apache/hudi/issues/7583


   **Describe the problem you faced**
   Original issue: https://github.com/trinodb/trino/issues/15368
   
   > Our team is testing the same on COPY ON WRITE  HUDI (0.10.1) tables with 
metadata enabled at version using Trino 400. And we are facing the error while 
reading from partitioned tables.
   > `Caused by: java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex`.
   
   The issue was resolved by placing some dependencies in the classpath. 
Interestingly, those dependencies are [already included in the 
trino-hudi-bundle](https://github.com/apache/hudi/blob/release-0.12.1/packaging/hudi-trino-bundle/pom.xml#L69-L98).
 This particular issues tracks any gap in packaging.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   1. Write a Hudi COW table with the below properties and metadata enabled.
   2. Query the same table using the trino-hudi connector (properties mentioned 
below) with `hudi.metadata-enabled=true`.
   
   **Trino Hudi Connector Properties:**
   ```
   connector.name=hudi
   hive.metastore.uri={METASTORE_URI}
   hive.s3.iam-role={S3_IAM_ROLE}
   hive.metastore-refresh-interval=2m
   hive.metastore-timeout=3m
   hudi.max-outstanding-splits=1800
   hive.s3.max-error-retries=50
   hive.s3.connect-timeout=1m
   hive.s3.socket-timeout=2m
   hudi.parquet.use-column-names=true
   hudi.metadata-enabled=true
   ```
   
   **Hudi Properties set while writing:**
   ```
   hoodie.datasource.write.partitionpath.field = "insert_ds_ist",
   hoodie.datasource.write.recordkey.field = "id",
   hoodie.datasource.write.precombine.field = "_hoodie_incremental_key", (self 
generated column),
   hoodie.datasource.write.hive_style_partitioning = "true",
   hoodie.datasource.hive_sync.auto_create_database = "true",
   hoodie.parquet.compression.codec = "gzip",
   hoodie.table.name = "<table_name>",
   hoodie.datasource.write.keygenerator.class = 
"org.apache.hudi.keygen.SimpleKeyGenerator",
   hoodie.datasource.write.table.type = "COPY_ON_WRITE",
   hoodie.metadata.enable = "true",
   hoodie.datasource.hive_sync.enable = "true",
   hoodie.datasource.hive_sync.partition_fields = "insert_ds_ist",
   hoodie.datasource.hive_sync.partition_extractor_class = 
"org.apache.hudi.hive.MultiPartKeysValueExtractor"
   ```
   
   **General information of table:**
   Total rows = 1,213,959,199
   Total Partitions = 2400+
   Total file objects = 120,000
   Total Size on S3 = 12~13 GB
   The table was upgraded from 0.9.0 to 0.10.1
   
   **Coordinator Relevant Logs:**
   
   **Expected behavior**
   
   They query should work out-of-the-box without having to place jars in 
classpath.
   
   **Environment Description**
   
   * Hudi version : 0.10.1
   
   * Spark version : 2.4
   
   * Trino version : [400](https://github.com/trinodb/trino/tree/400)
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   Full stacktrace in 
   
[Partitioned_COW_Hudi_Coordinator_logs.log](https://github.com/apache/hudi/files/10323254/Partitioned_COW_Hudi_Coordinator_logs.log)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] codope opened a new issue, #7583: [SUPPORT] Unable to query Partitioned COW Hudi tables with metadata enabled using Trino-Hudi Connector

Reply via email to