[ 
https://issues.apache.org/jira/browse/HUDI-7005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-7005.
----------------------------
    Resolution: Fixed

Fixed via master branch: b14f9e48d3d81cb765e5b2fb355eb2c1e24ee582

> Flink SQL Queries on Hudi Table fail when using the hudi-aws-bundle jar
> -----------------------------------------------------------------------
>
>                 Key: HUDI-7005
>                 URL: https://issues.apache.org/jira/browse/HUDI-7005
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: flink-sql
>    Affects Versions: 0.14.0
>            Reporter: Prabhu Joseph
>            Priority: Major
>              Labels: pull-request-available
>
> Flink SQL Queries on Hudi Table fail when using the hudi-aws-bundle jar. 
> hudi-aws-bundle jar is needed for metastore sync into AWS Glue. Below are the 
> different issues seen:
> *Issue 1:*
> {code}
> 2023-10-07 14:47:03,463 ERROR 
> org.apache.hudi.sink.StreamWriteOperatorCoordinator          [] - Executor 
> executes action [sync hive metadata for instant 20231007144701183] error
> java.lang.NoClassDefFoundError: 
> software/amazon/awssdk/services/glue/model/EntityNotFoundException
>     at 
> org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool.initSyncClient(AwsGlueCatalogSyncTool.java:52)
>  ~[hudi-flink-bundle.jar:0.13.1-amzn-1]
>     at org.apache.hudi.hive.HiveSyncTool.<init>(HiveSyncTool.java:114) 
> ~[hudi-flink-bundle.jar:0.13.1-amzn-1]
> {code}
> This issue happens as AwsGlueCatalogSyncTool (hudi-aws module) part of 
> hudi-flink-bundle has not relocated AWS SDK whereas the one from 
> hudi-aws-bundle has relocated it. The fix is to remove including hudi-aws 
> from hudi-flink-bundle. hudi-flink-bundle need not bring hudi-aws classes as 
> hudi-aws-bundle jar can be used instead at runtime.
> *Issue 2:*
> {code}
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(Z)Lorg/apache/hudi/org/apache/avro/Schema;
>         at 
> org.apache.hudi.util.StreamerUtil.getTableAvroSchema(StreamerUtil.java:431)
>         at 
> org.apache.hudi.util.StreamerUtil.getLatestTableSchema(StreamerUtil.java:441)
>         at 
> org.apache.hudi.table.catalog.HoodieHiveCatalog.getTable(HoodieHiveCatalog.java:420)
> {code}
> This issue happens as TableSchemaResolver (hudi-common module) part of 
> hudi-aws-bundle has not relocated the avro classes whereas the one from 
> hudi-flink-module has relocated it. The fix is to remove including 
> hudi-common from hudi-aws-bundle. hudi-aws-bundle need not bring hudi-common 
> classes as it is used in conjuction with the service bundle 
> hudi-spark-bundle/hudi-flink-bundle which has hudi-common classes.
> *Issue 3:*
> {code}
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.parquet.avro.AvroSchemaConverter.convert(Lorg/apache/hudi/org/apache/avro/Schema;)Lorg/apache/parquet/schema/MessageType;
>         at 
> org.apache.hudi.common.table.TableSchemaResolver.convertAvroSchemaToParquet(TableSchemaResolver.java:288)
>         at 
> org.apache.hudi.table.catalog.TableOptionProperties.translateFlinkTableProperties2Spark(TableOptionProperties.java:181)
>         at 
> org.apache.hudi.table.catalog.HoodieHiveCatalog.instantiateHiveTable(HoodieHiveCatalog.java:603)
>         at 
> org.apache.hudi.table.catalog.HoodieHiveCatalog.createTable(HoodieHiveCatalog.java:468)
> {code}
> This issue happens as AvroSchemaConverter (parquet-avro) part of 
> hudi-aws-bundle has not relocated the avro classes whereas the 
> one from hudi-flink-module has relocated it. The fix is to remove including 
> parquet-avro from hudi-aws-bundle. hudi-aws-bundle need not bring 
> parquet-avro classes as it is used in conjuction with the service bundle 
> hudi-spark-bundle/hudi-flink-bundle which has parquet-avro classes.
>  
>  
> *Repro*
> {code:java}
> cd /usr/lib/flink/lib
> wget 
> https://repo1.maven.org/maven2/org/apache/hudi/hudi-flink1.17-bundle/0.14.0/hudi-flink1.17-bundle-0.14.0.jar
> wget 
> https://repo1.maven.org/maven2/org/apache/hudi/hudi-aws-bundle/0.14.0/hudi-aws-bundle-0.14.0.jar
> flink-yarn-session -d
> /usr/lib/flink/bin/sql-client.sh embedded
> CREATE CATALOG glue_catalog_for_hudi WITH (
> 'type' = 'hudi',
> 'mode' = 'hms',
> 'table.external' = 'true',
> 'default-database' = 'default',
> 'hive.conf.dir' = '/etc/hive/conf.dist',
> 'catalog.path' = 's3://prabhuflinks3/HUDICDC/warehouse/'
> );
> USE CATALOG glue_catalog_for_hudi;
> CREATE DATABASE IF NOT EXISTS flink_glue_hudi_db;
> use flink_glue_hudi_db;
> CREATE TABLE `glue_catalog_for_hudi`.`flink_glue_hudi_db`.`Persons_src` (
> ID INT NOT NULL,
> FirstName STRING,
> Age STRING,
> PRIMARY KEY (`ID`) NOT ENFORCED
> )
> WITH (
> 'connector' = 'hudi',
> 'write.tasks' = '2', 
> 'path' = 's3://prabhuflinks3/HUDICDC/warehouse/Persons_src',
> 'table.type' = 'COPY_ON_WRITE',
> 'read.streaming.enabled' = 'true',
> 'read.streaming.check-interval' = '1',
> 'hoodie.embed.timeline.server' = 'false',
> 'hive_sync.mode' = 'glue'
> );
> {code}
>  
> cc [~uditme]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to