[ https://issues.apache.org/jira/browse/HUDI-7005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Danny Chen closed HUDI-7005. ---------------------------- Resolution: Fixed Fixed via master branch: b14f9e48d3d81cb765e5b2fb355eb2c1e24ee582 > Flink SQL Queries on Hudi Table fail when using the hudi-aws-bundle jar > ----------------------------------------------------------------------- > > Key: HUDI-7005 > URL: https://issues.apache.org/jira/browse/HUDI-7005 > Project: Apache Hudi > Issue Type: Bug > Components: flink-sql > Affects Versions: 0.14.0 > Reporter: Prabhu Joseph > Priority: Major > Labels: pull-request-available > > Flink SQL Queries on Hudi Table fail when using the hudi-aws-bundle jar. > hudi-aws-bundle jar is needed for metastore sync into AWS Glue. Below are the > different issues seen: > *Issue 1:* > {code} > 2023-10-07 14:47:03,463 ERROR > org.apache.hudi.sink.StreamWriteOperatorCoordinator [] - Executor > executes action [sync hive metadata for instant 20231007144701183] error > java.lang.NoClassDefFoundError: > software/amazon/awssdk/services/glue/model/EntityNotFoundException > at > org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool.initSyncClient(AwsGlueCatalogSyncTool.java:52) > ~[hudi-flink-bundle.jar:0.13.1-amzn-1] > at org.apache.hudi.hive.HiveSyncTool.<init>(HiveSyncTool.java:114) > ~[hudi-flink-bundle.jar:0.13.1-amzn-1] > {code} > This issue happens as AwsGlueCatalogSyncTool (hudi-aws module) part of > hudi-flink-bundle has not relocated AWS SDK whereas the one from > hudi-aws-bundle has relocated it. The fix is to remove including hudi-aws > from hudi-flink-bundle. hudi-flink-bundle need not bring hudi-aws classes as > hudi-aws-bundle jar can be used instead at runtime. > *Issue 2:* > {code} > Caused by: java.lang.NoSuchMethodError: > org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(Z)Lorg/apache/hudi/org/apache/avro/Schema; > at > org.apache.hudi.util.StreamerUtil.getTableAvroSchema(StreamerUtil.java:431) > at > org.apache.hudi.util.StreamerUtil.getLatestTableSchema(StreamerUtil.java:441) > at > org.apache.hudi.table.catalog.HoodieHiveCatalog.getTable(HoodieHiveCatalog.java:420) > {code} > This issue happens as TableSchemaResolver (hudi-common module) part of > hudi-aws-bundle has not relocated the avro classes whereas the one from > hudi-flink-module has relocated it. The fix is to remove including > hudi-common from hudi-aws-bundle. hudi-aws-bundle need not bring hudi-common > classes as it is used in conjuction with the service bundle > hudi-spark-bundle/hudi-flink-bundle which has hudi-common classes. > *Issue 3:* > {code} > Caused by: java.lang.NoSuchMethodError: > org.apache.parquet.avro.AvroSchemaConverter.convert(Lorg/apache/hudi/org/apache/avro/Schema;)Lorg/apache/parquet/schema/MessageType; > at > org.apache.hudi.common.table.TableSchemaResolver.convertAvroSchemaToParquet(TableSchemaResolver.java:288) > at > org.apache.hudi.table.catalog.TableOptionProperties.translateFlinkTableProperties2Spark(TableOptionProperties.java:181) > at > org.apache.hudi.table.catalog.HoodieHiveCatalog.instantiateHiveTable(HoodieHiveCatalog.java:603) > at > org.apache.hudi.table.catalog.HoodieHiveCatalog.createTable(HoodieHiveCatalog.java:468) > {code} > This issue happens as AvroSchemaConverter (parquet-avro) part of > hudi-aws-bundle has not relocated the avro classes whereas the > one from hudi-flink-module has relocated it. The fix is to remove including > parquet-avro from hudi-aws-bundle. hudi-aws-bundle need not bring > parquet-avro classes as it is used in conjuction with the service bundle > hudi-spark-bundle/hudi-flink-bundle which has parquet-avro classes. > > > *Repro* > {code:java} > cd /usr/lib/flink/lib > wget > https://repo1.maven.org/maven2/org/apache/hudi/hudi-flink1.17-bundle/0.14.0/hudi-flink1.17-bundle-0.14.0.jar > wget > https://repo1.maven.org/maven2/org/apache/hudi/hudi-aws-bundle/0.14.0/hudi-aws-bundle-0.14.0.jar > flink-yarn-session -d > /usr/lib/flink/bin/sql-client.sh embedded > CREATE CATALOG glue_catalog_for_hudi WITH ( > 'type' = 'hudi', > 'mode' = 'hms', > 'table.external' = 'true', > 'default-database' = 'default', > 'hive.conf.dir' = '/etc/hive/conf.dist', > 'catalog.path' = 's3://prabhuflinks3/HUDICDC/warehouse/' > ); > USE CATALOG glue_catalog_for_hudi; > CREATE DATABASE IF NOT EXISTS flink_glue_hudi_db; > use flink_glue_hudi_db; > CREATE TABLE `glue_catalog_for_hudi`.`flink_glue_hudi_db`.`Persons_src` ( > ID INT NOT NULL, > FirstName STRING, > Age STRING, > PRIMARY KEY (`ID`) NOT ENFORCED > ) > WITH ( > 'connector' = 'hudi', > 'write.tasks' = '2', > 'path' = 's3://prabhuflinks3/HUDICDC/warehouse/Persons_src', > 'table.type' = 'COPY_ON_WRITE', > 'read.streaming.enabled' = 'true', > 'read.streaming.check-interval' = '1', > 'hoodie.embed.timeline.server' = 'false', > 'hive_sync.mode' = 'glue' > ); > {code} > > cc [~uditme] -- This message was sent by Atlassian Jira (v8.20.10#820010)