[ https://issues.apache.org/jira/browse/HUDI-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raymond Xu updated HUDI-4885: ----------------------------- Fix Version/s: 0.12.1 > docker demo fails w/ ClassNotFound w/ LogicalType in latest master > ------------------------------------------------------------------ > > Key: HUDI-4885 > URL: https://issues.apache.org/jira/browse/HUDI-4885 > Project: Apache Hudi > Issue Type: Bug > Components: dev-experience > Reporter: sivabalan narayanan > Assignee: sivabalan narayanan > Priority: Blocker > Fix For: 0.12.1 > > > docker demo fails during hive-sync w/ latest master. > also, some of the env variables are not applied. for eg, > HUDI_UTILITIES_BUNDLE was not set. > command used: > {code:java} > /var/hoodie/ws/hudi-sync/hudi-hive-sync/run_sync_tool.sh --jdbc-url > jdbc:hive2://hiveserver:10000 --user hive --pass hive --partitioned-by > dt --base-path /user/hive/warehouse/stock_ticks_cow --database default > --table stock_ticks_cow {code} > > output: > {code:java} > 2022-09-20 14:24:39,122 INFO [main] hive.HiveSyncTool > (HiveSyncTool.java:syncHoodieTable(179)) - Trying to sync hoodie table > stock_ticks_cow with base path /user/hive/warehouse/stock_ticks_cow of type > COPY_ON_WRITE > 2022-09-20 14:24:39,758 INFO [main] table.TableSchemaResolver > (TableSchemaResolver.java:readSchemaFromParquetBaseFile(439)) - Reading > schema from > /user/hive/warehouse/stock_ticks_cow/2018/08/31/b4a7076c-30e6-4320-bb04-be47246b6646-0_0-29-29_20220920142351042.parquet > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further > details. > 2022-09-20 14:24:40,432 INFO [main] hive.metastore > (HiveMetaStoreClient.java:close(564)) - Closed a connection to metastore, > current connections: 0 > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/avro/LogicalType > at > org.apache.hudi.common.table.TableSchemaResolver.convertParquetSchemaToAvro(TableSchemaResolver.java:288) > at > org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaFromDataFile(TableSchemaResolver.java:121) > at > org.apache.hudi.common.table.TableSchemaResolver.hasOperationField(TableSchemaResolver.java:566) > at org.apache.hudi.util.Lazy.get(Lazy.java:53) > at > org.apache.hudi.common.table.TableSchemaResolver.getTableSchemaFromLatestCommitMetadata(TableSchemaResolver.java:225) > at > org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaInternal(TableSchemaResolver.java:193) > at > org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(TableSchemaResolver.java:142) > at > org.apache.hudi.common.table.TableSchemaResolver.getTableParquetSchema(TableSchemaResolver.java:173) > at > org.apache.hudi.sync.common.HoodieSyncClient.getStorageSchema(HoodieSyncClient.java:103) > at > org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:206) > at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:153) > at > org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:141) > at org.apache.hudi.hive.HiveSyncTool.main(HiveSyncTool.java:358) > Caused by: java.lang.ClassNotFoundException: org.apache.avro.LogicalType > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 13 more {code} > > > tried the spark-submit command directly and it succeeded. > {code:java} > spark-submit --class org.apache.hudi.hive.HiveSyncTool > /var/hoodie/ws/packaging/hudi-hive-sync-bundle/target/hudi-hive-sync-bundle-0.13.0-SNAPSHOT.jar > --database default --table stock_ticks_cow --base-path > /user/hive/warehouse/stock_ticks_cow --base-file-format PARQUET --user > hive --pass hive --jdbc-url jdbc:hive2://hiveserver:10000 > --partition-value-extractor > org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor > --partitioned-by dt {code} > > output: > {code:java} > spark-submit --class org.apache.hudi.hive.HiveSyncTool > /var/hoodie/ws/packaging/hudi-hive-sync-bundle/target/hudi-hive-sync-bundle-0.13.0-SNAPSHOT.jar > --database default --table stock_ticks_cow --base-path > /user/hive/warehouse/stock_ticks_cow --base-file-format PARQUET --user > hive --pass hive --jdbc-url jdbc:hive2://hiveserver:10000 > --partition-value-extractor > org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor > --partitioned-by dt > 22/09/20 15:23:09 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 22/09/20 15:23:12 INFO table.HoodieTableMetaClient: Loading > HoodieTableMetaClient from /user/hive/warehouse/stock_ticks_cow > 22/09/20 15:23:12 INFO table.HoodieTableConfig: Loading table properties from > /user/hive/warehouse/stock_ticks_cow/.hoodie/hoodie.properties > 22/09/20 15:23:12 INFO table.HoodieTableMetaClient: Finished Loading Table of > type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from > /user/hive/warehouse/stock_ticks_cow > 22/09/20 15:23:12 INFO table.HoodieTableMetaClient: Loading Active commit > timeline for /user/hive/warehouse/stock_ticks_cow > 22/09/20 15:23:12 INFO timeline.HoodieActiveTimeline: Loaded instants upto : > Option{val=[20220920142351042__commit__COMPLETED]} > 22/09/20 15:23:12 INFO jdbc.Utils: Supplied authorities: hiveserver:10000 > 22/09/20 15:23:12 INFO jdbc.Utils: Resolved authority: hiveserver:10000 > 22/09/20 15:23:12 INFO jdbc.HiveConnection: Will try to open client transport > with JDBC Uri: jdbc:hive2://hiveserver:10000 > 22/09/20 15:23:13 INFO ddl.QueryBasedDDLExecutor: Successfully established > Hive connection to jdbc:hive2://hiveserver:10000 > 22/09/20 15:23:13 INFO hive.metastore: Trying to connect to metastore with > URI thrift://hivemetastore:9083 > 22/09/20 15:23:13 INFO hive.metastore: Connected to metastore. > 22/09/20 15:23:13 INFO hive.HiveSyncTool: Syncing target hoodie table with > hive table(default.stock_ticks_cow). Hive metastore URL :null, basePath > :/user/hive/warehouse/stock_ticks_cow > 22/09/20 15:23:13 INFO hive.HiveSyncTool: Trying to sync hoodie table > stock_ticks_cow with base path /user/hive/warehouse/stock_ticks_cow of type > COPY_ON_WRITE > 22/09/20 15:23:14 INFO table.TableSchemaResolver: Reading schema from > /user/hive/warehouse/stock_ticks_cow/2018/08/31/b4a7076c-30e6-4320-bb04-be47246b6646-0_0-29-29_20220920142351042.parquet > 22/09/20 15:23:14 INFO hive.HiveSyncTool: Hive table stock_ticks_cow is not > found. Creating it > 22/09/20 15:23:15 INFO ddl.QueryBasedDDLExecutor: Creating table with CREATE > EXTERNAL TABLE IF NOT EXISTS `default`.`stock_ticks_cow`( > `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, > `_hoodie_record_key` string, `_hoodie_partition_path` string, > `_hoodie_file_name` string, `volume` bigint, `ts` string, `symbol` string, > `year` int, `month` string, `high` double, `low` double, `key` string, `date` > string, `close` double, `open` double, `day` string) PARTITIONED BY (`dt` > String) ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH > SERDEPROPERTIES > ('hoodie.query.as.ro.table'='false','path'='/user/hive/warehouse/stock_ticks_cow') > STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > LOCATION '/user/hive/warehouse/stock_ticks_cow' > TBLPROPERTIES('spark.sql.sources.schema.partCol.0'='dt','spark.sql.sources.schema.numParts'='1','spark.sql.sources.schema.numPartCols'='1','spark.sql.sources.provider'='hudi','spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"_hoodie_commit_time","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_commit_seqno","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_record_key","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_partition_path","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_file_name","type":"string","nullable":true,"metadata":{}},{"name":"volume","type":"long","nullable":false,"metadata":{}},{"name":"ts","type":"string","nullable":false,"metadata":{}},{"name":"symbol","type":"string","nullable":false,"metadata":{}},{"name":"year","type":"integer","nullable":false,"metadata":{}},{"name":"month","type":"string","nullable":false,"metadata":{}},{"name":"high","type":"double","nullable":false,"metadata":{}},{"name":"low","type":"double","nullable":false,"metadata":{}},{"name":"key","type":"string","nullable":false,"metadata":{}},{"name":"date","type":"string","nullable":false,"metadata":{}},{"name":"close","type":"double","nullable":false,"metadata":{}},{"name":"open","type":"double","nullable":false,"metadata":{}},{"name":"day","type":"string","nullable":false,"metadata":{}},{"name":"dt","type":"string","nullable":false,"metadata":{}}]}') > 22/09/20 15:23:15 INFO ddl.QueryBasedDDLExecutor: Executing SQL CREATE > EXTERNAL TABLE IF NOT EXISTS `default`.`stock_ticks_cow`( > `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, > `_hoodie_record_key` string, `_hoodie_partition_path` string, > `_hoodie_file_name` string, `volume` bigint, `ts` string, `symbol` string, > `year` int, `month` string, `high` double, `low` double, `key` string, `date` > string, `close` double, `open` double, `day` string) PARTITIONED BY (`dt` > String) ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH > SERDEPROPERTIES > ('hoodie.query.as.ro.table'='false','path'='/user/hive/warehouse/stock_ticks_cow') > STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > LOCATION '/user/hive/warehouse/stock_ticks_cow' > TBLPROPERTIES('spark.sql.sources.schema.partCol.0'='dt','spark.sql.sources.schema.numParts'='1','spark.sql.sources.schema.numPartCols'='1','spark.sql.sources.provider'='hudi','spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"_hoodie_commit_time","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_commit_seqno","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_record_key","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_partition_path","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_file_name","type":"string","nullable":true,"metadata":{}},{"name":"volume","type":"long","nullable":false,"metadata":{}},{"name":"ts","type":"string","nullable":false,"metadata":{}},{"name":"symbol","type":"string","nullable":false,"metadata":{}},{"name":"year","type":"integer","nullable":false,"metadata":{}},{"name":"month","type":"string","nullable":false,"metadata":{}},{"name":"high","type":"double","nullable":false,"metadata":{}},{"name":"low","type":"double","nullable":false,"metadata":{}},{"name":"key","type":"string","nullable":false,"metadata":{}},{"name":"date","type":"string","nullable":false,"metadata":{}},{"name":"close","type":"double","nullable":false,"metadata":{}},{"name":"open","type":"double","nullable":false,"metadata":{}},{"name":"day","type":"string","nullable":false,"metadata":{}},{"name":"dt","type":"string","nullable":false,"metadata":{}}]}') > 22/09/20 15:23:17 INFO hive.HiveSyncTool: Schema sync complete. Syncing > partitions for stock_ticks_cow > 22/09/20 15:23:17 INFO hive.HiveSyncTool: Last commit time synced was found > to be null > 22/09/20 15:23:17 INFO common.HoodieSyncClient: Last commit time synced is > not known, listing all partitions in /user/hive/warehouse/stock_ticks_cow,FS > :DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-1898513899_1, ugi=root > (auth:SIMPLE)]] > 22/09/20 15:23:17 INFO hive.HiveSyncTool: Storage partitions scan complete. > Found 1 > 22/09/20 15:23:17 INFO hive.HiveSyncTool: New Partitions [2018/08/31] > 22/09/20 15:23:17 INFO ddl.QueryBasedDDLExecutor: Adding partitions 1 to > table stock_ticks_cow > 22/09/20 15:23:17 INFO ddl.QueryBasedDDLExecutor: Executing SQL ALTER TABLE > `default`.`stock_ticks_cow` ADD IF NOT EXISTS PARTITION (`dt`='2018-08-31') > LOCATION '/user/hive/warehouse/stock_ticks_cow/2018/08/31' > 22/09/20 15:23:18 INFO hive.HiveSyncTool: Sync complete for stock_ticks_cow > 22/09/20 15:23:18 INFO util.ShutdownHookManager: Shutdown hook called > 22/09/20 15:23:18 INFO util.ShutdownHookManager: Deleting directory > /tmp/spark-59275d30-b1bb-4f7d-af85-bce24962ca1e {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)