[ 
https://issues.apache.org/jira/browse/HUDI-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-4885:
--------------------------------------
    Description: 
docker demo fails during hive-sync w/ latest master. 

also, some of the env variables are not applied. for eg, HUDI_UTILITIES_BUNDLE 
was not set. after setting it explicitly, I was able to get the 1st ingest 
working. and then hive sync failed.

 

command used:
{code:java}
/var/hoodie/ws/hudi-sync/hudi-hive-sync/run_sync_tool.sh   --jdbc-url 
jdbc:hive2://hiveserver:10000   --user hive   --pass hive   --partitioned-by dt 
  --base-path /user/hive/warehouse/stock_ticks_cow   --database default   
--table stock_ticks_cow {code}
 

output:
{code:java}
2022-09-20 14:24:39,122 INFO  [main] hive.HiveSyncTool 
(HiveSyncTool.java:syncHoodieTable(179)) - Trying to sync hoodie table 
stock_ticks_cow with base path /user/hive/warehouse/stock_ticks_cow of type 
COPY_ON_WRITE
2022-09-20 14:24:39,758 INFO  [main] table.TableSchemaResolver 
(TableSchemaResolver.java:readSchemaFromParquetBaseFile(439)) - Reading schema 
from 
/user/hive/warehouse/stock_ticks_cow/2018/08/31/b4a7076c-30e6-4320-bb04-be47246b6646-0_0-29-29_20220920142351042.parquet
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
2022-09-20 14:24:40,432 INFO  [main] hive.metastore 
(HiveMetaStoreClient.java:close(564)) - Closed a connection to metastore, 
current connections: 0
Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/avro/LogicalType
        at 
org.apache.hudi.common.table.TableSchemaResolver.convertParquetSchemaToAvro(TableSchemaResolver.java:288)
        at 
org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaFromDataFile(TableSchemaResolver.java:121)
        at 
org.apache.hudi.common.table.TableSchemaResolver.hasOperationField(TableSchemaResolver.java:566)
        at org.apache.hudi.util.Lazy.get(Lazy.java:53)
        at 
org.apache.hudi.common.table.TableSchemaResolver.getTableSchemaFromLatestCommitMetadata(TableSchemaResolver.java:225)
        at 
org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaInternal(TableSchemaResolver.java:193)
        at 
org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(TableSchemaResolver.java:142)
        at 
org.apache.hudi.common.table.TableSchemaResolver.getTableParquetSchema(TableSchemaResolver.java:173)
        at 
org.apache.hudi.sync.common.HoodieSyncClient.getStorageSchema(HoodieSyncClient.java:103)
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:206)
        at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:153)
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:141)
        at org.apache.hudi.hive.HiveSyncTool.main(HiveSyncTool.java:358)
Caused by: java.lang.ClassNotFoundException: org.apache.avro.LogicalType
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 13 more {code}
 

 

tried the spark-submit command directly and it succeeded. 
{code:java}
spark-submit   --class org.apache.hudi.hive.HiveSyncTool 
/var/hoodie/ws/packaging/hudi-hive-sync-bundle/target/hudi-hive-sync-bundle-0.13.0-SNAPSHOT.jar
    --database default   --table stock_ticks_cow   --base-path 
/user/hive/warehouse/stock_ticks_cow    --base-file-format PARQUET   --user 
hive --pass hive   --jdbc-url jdbc:hive2://hiveserver:10000 
--partition-value-extractor 
org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor   --partitioned-by 
dt {code}
 

output:
{code:java}
spark-submit   --class org.apache.hudi.hive.HiveSyncTool 
/var/hoodie/ws/packaging/hudi-hive-sync-bundle/target/hudi-hive-sync-bundle-0.13.0-SNAPSHOT.jar
    --database default   --table stock_ticks_cow   --base-path 
/user/hive/warehouse/stock_ticks_cow    --base-file-format PARQUET   --user 
hive --pass hive   --jdbc-url jdbc:hive2://hiveserver:10000 
--partition-value-extractor 
org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor   --partitioned-by 
dt
22/09/20 15:23:09 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
22/09/20 15:23:12 INFO table.HoodieTableMetaClient: Loading 
HoodieTableMetaClient from /user/hive/warehouse/stock_ticks_cow
22/09/20 15:23:12 INFO table.HoodieTableConfig: Loading table properties from 
/user/hive/warehouse/stock_ticks_cow/.hoodie/hoodie.properties
22/09/20 15:23:12 INFO table.HoodieTableMetaClient: Finished Loading Table of 
type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from 
/user/hive/warehouse/stock_ticks_cow
22/09/20 15:23:12 INFO table.HoodieTableMetaClient: Loading Active commit 
timeline for /user/hive/warehouse/stock_ticks_cow
22/09/20 15:23:12 INFO timeline.HoodieActiveTimeline: Loaded instants upto : 
Option{val=[20220920142351042__commit__COMPLETED]}
22/09/20 15:23:12 INFO jdbc.Utils: Supplied authorities: hiveserver:10000
22/09/20 15:23:12 INFO jdbc.Utils: Resolved authority: hiveserver:10000
22/09/20 15:23:12 INFO jdbc.HiveConnection: Will try to open client transport 
with JDBC Uri: jdbc:hive2://hiveserver:10000
22/09/20 15:23:13 INFO ddl.QueryBasedDDLExecutor: Successfully established Hive 
connection to  jdbc:hive2://hiveserver:10000
22/09/20 15:23:13 INFO hive.metastore: Trying to connect to metastore with URI 
thrift://hivemetastore:9083
22/09/20 15:23:13 INFO hive.metastore: Connected to metastore.
22/09/20 15:23:13 INFO hive.HiveSyncTool: Syncing target hoodie table with hive 
table(default.stock_ticks_cow). Hive metastore URL :null, basePath 
:/user/hive/warehouse/stock_ticks_cow
22/09/20 15:23:13 INFO hive.HiveSyncTool: Trying to sync hoodie table 
stock_ticks_cow with base path /user/hive/warehouse/stock_ticks_cow of type 
COPY_ON_WRITE
22/09/20 15:23:14 INFO table.TableSchemaResolver: Reading schema from 
/user/hive/warehouse/stock_ticks_cow/2018/08/31/b4a7076c-30e6-4320-bb04-be47246b6646-0_0-29-29_20220920142351042.parquet
22/09/20 15:23:14 INFO hive.HiveSyncTool: Hive table stock_ticks_cow is not 
found. Creating it
22/09/20 15:23:15 INFO ddl.QueryBasedDDLExecutor: Creating table with CREATE 
EXTERNAL TABLE IF NOT EXISTS `default`.`stock_ticks_cow`( `_hoodie_commit_time` 
string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, 
`_hoodie_partition_path` string, `_hoodie_file_name` string, `volume` bigint, 
`ts` string, `symbol` string, `year` int, `month` string, `high` double, `low` 
double, `key` string, `date` string, `close` double, `open` double, `day` 
string) PARTITIONED BY (`dt` String) ROW FORMAT SERDE 
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH 
SERDEPROPERTIES 
('hoodie.query.as.ro.table'='false','path'='/user/hive/warehouse/stock_ticks_cow')
 STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' 
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
LOCATION '/user/hive/warehouse/stock_ticks_cow' 
TBLPROPERTIES('spark.sql.sources.schema.partCol.0'='dt','spark.sql.sources.schema.numParts'='1','spark.sql.sources.schema.numPartCols'='1','spark.sql.sources.provider'='hudi','spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"_hoodie_commit_time","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_commit_seqno","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_record_key","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_partition_path","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_file_name","type":"string","nullable":true,"metadata":{}},{"name":"volume","type":"long","nullable":false,"metadata":{}},{"name":"ts","type":"string","nullable":false,"metadata":{}},{"name":"symbol","type":"string","nullable":false,"metadata":{}},{"name":"year","type":"integer","nullable":false,"metadata":{}},{"name":"month","type":"string","nullable":false,"metadata":{}},{"name":"high","type":"double","nullable":false,"metadata":{}},{"name":"low","type":"double","nullable":false,"metadata":{}},{"name":"key","type":"string","nullable":false,"metadata":{}},{"name":"date","type":"string","nullable":false,"metadata":{}},{"name":"close","type":"double","nullable":false,"metadata":{}},{"name":"open","type":"double","nullable":false,"metadata":{}},{"name":"day","type":"string","nullable":false,"metadata":{}},{"name":"dt","type":"string","nullable":false,"metadata":{}}]}')
22/09/20 15:23:15 INFO ddl.QueryBasedDDLExecutor: Executing SQL CREATE EXTERNAL 
TABLE IF NOT EXISTS `default`.`stock_ticks_cow`( `_hoodie_commit_time` string, 
`_hoodie_commit_seqno` string, `_hoodie_record_key` string, 
`_hoodie_partition_path` string, `_hoodie_file_name` string, `volume` bigint, 
`ts` string, `symbol` string, `year` int, `month` string, `high` double, `low` 
double, `key` string, `date` string, `close` double, `open` double, `day` 
string) PARTITIONED BY (`dt` String) ROW FORMAT SERDE 
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH 
SERDEPROPERTIES 
('hoodie.query.as.ro.table'='false','path'='/user/hive/warehouse/stock_ticks_cow')
 STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' 
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
LOCATION '/user/hive/warehouse/stock_ticks_cow' 
TBLPROPERTIES('spark.sql.sources.schema.partCol.0'='dt','spark.sql.sources.schema.numParts'='1','spark.sql.sources.schema.numPartCols'='1','spark.sql.sources.provider'='hudi','spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"_hoodie_commit_time","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_commit_seqno","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_record_key","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_partition_path","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_file_name","type":"string","nullable":true,"metadata":{}},{"name":"volume","type":"long","nullable":false,"metadata":{}},{"name":"ts","type":"string","nullable":false,"metadata":{}},{"name":"symbol","type":"string","nullable":false,"metadata":{}},{"name":"year","type":"integer","nullable":false,"metadata":{}},{"name":"month","type":"string","nullable":false,"metadata":{}},{"name":"high","type":"double","nullable":false,"metadata":{}},{"name":"low","type":"double","nullable":false,"metadata":{}},{"name":"key","type":"string","nullable":false,"metadata":{}},{"name":"date","type":"string","nullable":false,"metadata":{}},{"name":"close","type":"double","nullable":false,"metadata":{}},{"name":"open","type":"double","nullable":false,"metadata":{}},{"name":"day","type":"string","nullable":false,"metadata":{}},{"name":"dt","type":"string","nullable":false,"metadata":{}}]}')
22/09/20 15:23:17 INFO hive.HiveSyncTool: Schema sync complete. Syncing 
partitions for stock_ticks_cow
22/09/20 15:23:17 INFO hive.HiveSyncTool: Last commit time synced was found to 
be null
22/09/20 15:23:17 INFO common.HoodieSyncClient: Last commit time synced is not 
known, listing all partitions in /user/hive/warehouse/stock_ticks_cow,FS 
:DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-1898513899_1, ugi=root 
(auth:SIMPLE)]]
22/09/20 15:23:17 INFO hive.HiveSyncTool: Storage partitions scan complete. 
Found 1
22/09/20 15:23:17 INFO hive.HiveSyncTool: New Partitions [2018/08/31]
22/09/20 15:23:17 INFO ddl.QueryBasedDDLExecutor: Adding partitions 1 to table 
stock_ticks_cow
22/09/20 15:23:17 INFO ddl.QueryBasedDDLExecutor: Executing SQL ALTER TABLE 
`default`.`stock_ticks_cow` ADD IF NOT EXISTS   PARTITION (`dt`='2018-08-31') 
LOCATION '/user/hive/warehouse/stock_ticks_cow/2018/08/31' 
22/09/20 15:23:18 INFO hive.HiveSyncTool: Sync complete for stock_ticks_cow
22/09/20 15:23:18 INFO util.ShutdownHookManager: Shutdown hook called
22/09/20 15:23:18 INFO util.ShutdownHookManager: Deleting directory 
/tmp/spark-59275d30-b1bb-4f7d-af85-bce24962ca1e {code}
 

  was:
docker demo fails during hive-sync w/ latest master. 

also, some of the env variables are not applied. for eg, HUDI_UTILITIES_BUNDLE 
was not set. 

command used:
{code:java}
/var/hoodie/ws/hudi-sync/hudi-hive-sync/run_sync_tool.sh   --jdbc-url 
jdbc:hive2://hiveserver:10000   --user hive   --pass hive   --partitioned-by dt 
  --base-path /user/hive/warehouse/stock_ticks_cow   --database default   
--table stock_ticks_cow {code}
 

output:
{code:java}
2022-09-20 14:24:39,122 INFO  [main] hive.HiveSyncTool 
(HiveSyncTool.java:syncHoodieTable(179)) - Trying to sync hoodie table 
stock_ticks_cow with base path /user/hive/warehouse/stock_ticks_cow of type 
COPY_ON_WRITE
2022-09-20 14:24:39,758 INFO  [main] table.TableSchemaResolver 
(TableSchemaResolver.java:readSchemaFromParquetBaseFile(439)) - Reading schema 
from 
/user/hive/warehouse/stock_ticks_cow/2018/08/31/b4a7076c-30e6-4320-bb04-be47246b6646-0_0-29-29_20220920142351042.parquet
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
2022-09-20 14:24:40,432 INFO  [main] hive.metastore 
(HiveMetaStoreClient.java:close(564)) - Closed a connection to metastore, 
current connections: 0
Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/avro/LogicalType
        at 
org.apache.hudi.common.table.TableSchemaResolver.convertParquetSchemaToAvro(TableSchemaResolver.java:288)
        at 
org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaFromDataFile(TableSchemaResolver.java:121)
        at 
org.apache.hudi.common.table.TableSchemaResolver.hasOperationField(TableSchemaResolver.java:566)
        at org.apache.hudi.util.Lazy.get(Lazy.java:53)
        at 
org.apache.hudi.common.table.TableSchemaResolver.getTableSchemaFromLatestCommitMetadata(TableSchemaResolver.java:225)
        at 
org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaInternal(TableSchemaResolver.java:193)
        at 
org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(TableSchemaResolver.java:142)
        at 
org.apache.hudi.common.table.TableSchemaResolver.getTableParquetSchema(TableSchemaResolver.java:173)
        at 
org.apache.hudi.sync.common.HoodieSyncClient.getStorageSchema(HoodieSyncClient.java:103)
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:206)
        at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:153)
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:141)
        at org.apache.hudi.hive.HiveSyncTool.main(HiveSyncTool.java:358)
Caused by: java.lang.ClassNotFoundException: org.apache.avro.LogicalType
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 13 more {code}
 

 

tried the spark-submit command directly and it succeeded. 
{code:java}
spark-submit   --class org.apache.hudi.hive.HiveSyncTool 
/var/hoodie/ws/packaging/hudi-hive-sync-bundle/target/hudi-hive-sync-bundle-0.13.0-SNAPSHOT.jar
    --database default   --table stock_ticks_cow   --base-path 
/user/hive/warehouse/stock_ticks_cow    --base-file-format PARQUET   --user 
hive --pass hive   --jdbc-url jdbc:hive2://hiveserver:10000 
--partition-value-extractor 
org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor   --partitioned-by 
dt {code}
 

output:
{code:java}
spark-submit   --class org.apache.hudi.hive.HiveSyncTool 
/var/hoodie/ws/packaging/hudi-hive-sync-bundle/target/hudi-hive-sync-bundle-0.13.0-SNAPSHOT.jar
    --database default   --table stock_ticks_cow   --base-path 
/user/hive/warehouse/stock_ticks_cow    --base-file-format PARQUET   --user 
hive --pass hive   --jdbc-url jdbc:hive2://hiveserver:10000 
--partition-value-extractor 
org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor   --partitioned-by 
dt
22/09/20 15:23:09 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
22/09/20 15:23:12 INFO table.HoodieTableMetaClient: Loading 
HoodieTableMetaClient from /user/hive/warehouse/stock_ticks_cow
22/09/20 15:23:12 INFO table.HoodieTableConfig: Loading table properties from 
/user/hive/warehouse/stock_ticks_cow/.hoodie/hoodie.properties
22/09/20 15:23:12 INFO table.HoodieTableMetaClient: Finished Loading Table of 
type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from 
/user/hive/warehouse/stock_ticks_cow
22/09/20 15:23:12 INFO table.HoodieTableMetaClient: Loading Active commit 
timeline for /user/hive/warehouse/stock_ticks_cow
22/09/20 15:23:12 INFO timeline.HoodieActiveTimeline: Loaded instants upto : 
Option{val=[20220920142351042__commit__COMPLETED]}
22/09/20 15:23:12 INFO jdbc.Utils: Supplied authorities: hiveserver:10000
22/09/20 15:23:12 INFO jdbc.Utils: Resolved authority: hiveserver:10000
22/09/20 15:23:12 INFO jdbc.HiveConnection: Will try to open client transport 
with JDBC Uri: jdbc:hive2://hiveserver:10000
22/09/20 15:23:13 INFO ddl.QueryBasedDDLExecutor: Successfully established Hive 
connection to  jdbc:hive2://hiveserver:10000
22/09/20 15:23:13 INFO hive.metastore: Trying to connect to metastore with URI 
thrift://hivemetastore:9083
22/09/20 15:23:13 INFO hive.metastore: Connected to metastore.
22/09/20 15:23:13 INFO hive.HiveSyncTool: Syncing target hoodie table with hive 
table(default.stock_ticks_cow). Hive metastore URL :null, basePath 
:/user/hive/warehouse/stock_ticks_cow
22/09/20 15:23:13 INFO hive.HiveSyncTool: Trying to sync hoodie table 
stock_ticks_cow with base path /user/hive/warehouse/stock_ticks_cow of type 
COPY_ON_WRITE
22/09/20 15:23:14 INFO table.TableSchemaResolver: Reading schema from 
/user/hive/warehouse/stock_ticks_cow/2018/08/31/b4a7076c-30e6-4320-bb04-be47246b6646-0_0-29-29_20220920142351042.parquet
22/09/20 15:23:14 INFO hive.HiveSyncTool: Hive table stock_ticks_cow is not 
found. Creating it
22/09/20 15:23:15 INFO ddl.QueryBasedDDLExecutor: Creating table with CREATE 
EXTERNAL TABLE IF NOT EXISTS `default`.`stock_ticks_cow`( `_hoodie_commit_time` 
string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, 
`_hoodie_partition_path` string, `_hoodie_file_name` string, `volume` bigint, 
`ts` string, `symbol` string, `year` int, `month` string, `high` double, `low` 
double, `key` string, `date` string, `close` double, `open` double, `day` 
string) PARTITIONED BY (`dt` String) ROW FORMAT SERDE 
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH 
SERDEPROPERTIES 
('hoodie.query.as.ro.table'='false','path'='/user/hive/warehouse/stock_ticks_cow')
 STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' 
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
LOCATION '/user/hive/warehouse/stock_ticks_cow' 
TBLPROPERTIES('spark.sql.sources.schema.partCol.0'='dt','spark.sql.sources.schema.numParts'='1','spark.sql.sources.schema.numPartCols'='1','spark.sql.sources.provider'='hudi','spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"_hoodie_commit_time","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_commit_seqno","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_record_key","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_partition_path","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_file_name","type":"string","nullable":true,"metadata":{}},{"name":"volume","type":"long","nullable":false,"metadata":{}},{"name":"ts","type":"string","nullable":false,"metadata":{}},{"name":"symbol","type":"string","nullable":false,"metadata":{}},{"name":"year","type":"integer","nullable":false,"metadata":{}},{"name":"month","type":"string","nullable":false,"metadata":{}},{"name":"high","type":"double","nullable":false,"metadata":{}},{"name":"low","type":"double","nullable":false,"metadata":{}},{"name":"key","type":"string","nullable":false,"metadata":{}},{"name":"date","type":"string","nullable":false,"metadata":{}},{"name":"close","type":"double","nullable":false,"metadata":{}},{"name":"open","type":"double","nullable":false,"metadata":{}},{"name":"day","type":"string","nullable":false,"metadata":{}},{"name":"dt","type":"string","nullable":false,"metadata":{}}]}')
22/09/20 15:23:15 INFO ddl.QueryBasedDDLExecutor: Executing SQL CREATE EXTERNAL 
TABLE IF NOT EXISTS `default`.`stock_ticks_cow`( `_hoodie_commit_time` string, 
`_hoodie_commit_seqno` string, `_hoodie_record_key` string, 
`_hoodie_partition_path` string, `_hoodie_file_name` string, `volume` bigint, 
`ts` string, `symbol` string, `year` int, `month` string, `high` double, `low` 
double, `key` string, `date` string, `close` double, `open` double, `day` 
string) PARTITIONED BY (`dt` String) ROW FORMAT SERDE 
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH 
SERDEPROPERTIES 
('hoodie.query.as.ro.table'='false','path'='/user/hive/warehouse/stock_ticks_cow')
 STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' 
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
LOCATION '/user/hive/warehouse/stock_ticks_cow' 
TBLPROPERTIES('spark.sql.sources.schema.partCol.0'='dt','spark.sql.sources.schema.numParts'='1','spark.sql.sources.schema.numPartCols'='1','spark.sql.sources.provider'='hudi','spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"_hoodie_commit_time","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_commit_seqno","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_record_key","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_partition_path","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_file_name","type":"string","nullable":true,"metadata":{}},{"name":"volume","type":"long","nullable":false,"metadata":{}},{"name":"ts","type":"string","nullable":false,"metadata":{}},{"name":"symbol","type":"string","nullable":false,"metadata":{}},{"name":"year","type":"integer","nullable":false,"metadata":{}},{"name":"month","type":"string","nullable":false,"metadata":{}},{"name":"high","type":"double","nullable":false,"metadata":{}},{"name":"low","type":"double","nullable":false,"metadata":{}},{"name":"key","type":"string","nullable":false,"metadata":{}},{"name":"date","type":"string","nullable":false,"metadata":{}},{"name":"close","type":"double","nullable":false,"metadata":{}},{"name":"open","type":"double","nullable":false,"metadata":{}},{"name":"day","type":"string","nullable":false,"metadata":{}},{"name":"dt","type":"string","nullable":false,"metadata":{}}]}')
22/09/20 15:23:17 INFO hive.HiveSyncTool: Schema sync complete. Syncing 
partitions for stock_ticks_cow
22/09/20 15:23:17 INFO hive.HiveSyncTool: Last commit time synced was found to 
be null
22/09/20 15:23:17 INFO common.HoodieSyncClient: Last commit time synced is not 
known, listing all partitions in /user/hive/warehouse/stock_ticks_cow,FS 
:DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-1898513899_1, ugi=root 
(auth:SIMPLE)]]
22/09/20 15:23:17 INFO hive.HiveSyncTool: Storage partitions scan complete. 
Found 1
22/09/20 15:23:17 INFO hive.HiveSyncTool: New Partitions [2018/08/31]
22/09/20 15:23:17 INFO ddl.QueryBasedDDLExecutor: Adding partitions 1 to table 
stock_ticks_cow
22/09/20 15:23:17 INFO ddl.QueryBasedDDLExecutor: Executing SQL ALTER TABLE 
`default`.`stock_ticks_cow` ADD IF NOT EXISTS   PARTITION (`dt`='2018-08-31') 
LOCATION '/user/hive/warehouse/stock_ticks_cow/2018/08/31' 
22/09/20 15:23:18 INFO hive.HiveSyncTool: Sync complete for stock_ticks_cow
22/09/20 15:23:18 INFO util.ShutdownHookManager: Shutdown hook called
22/09/20 15:23:18 INFO util.ShutdownHookManager: Deleting directory 
/tmp/spark-59275d30-b1bb-4f7d-af85-bce24962ca1e {code}
 


> docker demo fails w/ ClassNotFound w/ LogicalType in latest master
> ------------------------------------------------------------------
>
>                 Key: HUDI-4885
>                 URL: https://issues.apache.org/jira/browse/HUDI-4885
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: dev-experience
>            Reporter: sivabalan narayanan
>            Assignee: sivabalan narayanan
>            Priority: Blocker
>             Fix For: 0.12.1
>
>
> docker demo fails during hive-sync w/ latest master. 
> also, some of the env variables are not applied. for eg, 
> HUDI_UTILITIES_BUNDLE was not set. after setting it explicitly, I was able to 
> get the 1st ingest working. and then hive sync failed.
>  
> command used:
> {code:java}
> /var/hoodie/ws/hudi-sync/hudi-hive-sync/run_sync_tool.sh   --jdbc-url 
> jdbc:hive2://hiveserver:10000   --user hive   --pass hive   --partitioned-by 
> dt   --base-path /user/hive/warehouse/stock_ticks_cow   --database default   
> --table stock_ticks_cow {code}
>  
> output:
> {code:java}
> 2022-09-20 14:24:39,122 INFO  [main] hive.HiveSyncTool 
> (HiveSyncTool.java:syncHoodieTable(179)) - Trying to sync hoodie table 
> stock_ticks_cow with base path /user/hive/warehouse/stock_ticks_cow of type 
> COPY_ON_WRITE
> 2022-09-20 14:24:39,758 INFO  [main] table.TableSchemaResolver 
> (TableSchemaResolver.java:readSchemaFromParquetBaseFile(439)) - Reading 
> schema from 
> /user/hive/warehouse/stock_ticks_cow/2018/08/31/b4a7076c-30e6-4320-bb04-be47246b6646-0_0-29-29_20220920142351042.parquet
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> 2022-09-20 14:24:40,432 INFO  [main] hive.metastore 
> (HiveMetaStoreClient.java:close(564)) - Closed a connection to metastore, 
> current connections: 0
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/avro/LogicalType
>       at 
> org.apache.hudi.common.table.TableSchemaResolver.convertParquetSchemaToAvro(TableSchemaResolver.java:288)
>       at 
> org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaFromDataFile(TableSchemaResolver.java:121)
>       at 
> org.apache.hudi.common.table.TableSchemaResolver.hasOperationField(TableSchemaResolver.java:566)
>       at org.apache.hudi.util.Lazy.get(Lazy.java:53)
>       at 
> org.apache.hudi.common.table.TableSchemaResolver.getTableSchemaFromLatestCommitMetadata(TableSchemaResolver.java:225)
>       at 
> org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaInternal(TableSchemaResolver.java:193)
>       at 
> org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(TableSchemaResolver.java:142)
>       at 
> org.apache.hudi.common.table.TableSchemaResolver.getTableParquetSchema(TableSchemaResolver.java:173)
>       at 
> org.apache.hudi.sync.common.HoodieSyncClient.getStorageSchema(HoodieSyncClient.java:103)
>       at 
> org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:206)
>       at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:153)
>       at 
> org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:141)
>       at org.apache.hudi.hive.HiveSyncTool.main(HiveSyncTool.java:358)
> Caused by: java.lang.ClassNotFoundException: org.apache.avro.LogicalType
>       at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>       at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>       at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
>       at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>       ... 13 more {code}
>  
>  
> tried the spark-submit command directly and it succeeded. 
> {code:java}
> spark-submit   --class org.apache.hudi.hive.HiveSyncTool 
> /var/hoodie/ws/packaging/hudi-hive-sync-bundle/target/hudi-hive-sync-bundle-0.13.0-SNAPSHOT.jar
>     --database default   --table stock_ticks_cow   --base-path 
> /user/hive/warehouse/stock_ticks_cow    --base-file-format PARQUET   --user 
> hive --pass hive   --jdbc-url jdbc:hive2://hiveserver:10000 
> --partition-value-extractor 
> org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor   
> --partitioned-by dt {code}
>  
> output:
> {code:java}
> spark-submit   --class org.apache.hudi.hive.HiveSyncTool 
> /var/hoodie/ws/packaging/hudi-hive-sync-bundle/target/hudi-hive-sync-bundle-0.13.0-SNAPSHOT.jar
>     --database default   --table stock_ticks_cow   --base-path 
> /user/hive/warehouse/stock_ticks_cow    --base-file-format PARQUET   --user 
> hive --pass hive   --jdbc-url jdbc:hive2://hiveserver:10000 
> --partition-value-extractor 
> org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor   
> --partitioned-by dt
> 22/09/20 15:23:09 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 22/09/20 15:23:12 INFO table.HoodieTableMetaClient: Loading 
> HoodieTableMetaClient from /user/hive/warehouse/stock_ticks_cow
> 22/09/20 15:23:12 INFO table.HoodieTableConfig: Loading table properties from 
> /user/hive/warehouse/stock_ticks_cow/.hoodie/hoodie.properties
> 22/09/20 15:23:12 INFO table.HoodieTableMetaClient: Finished Loading Table of 
> type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from 
> /user/hive/warehouse/stock_ticks_cow
> 22/09/20 15:23:12 INFO table.HoodieTableMetaClient: Loading Active commit 
> timeline for /user/hive/warehouse/stock_ticks_cow
> 22/09/20 15:23:12 INFO timeline.HoodieActiveTimeline: Loaded instants upto : 
> Option{val=[20220920142351042__commit__COMPLETED]}
> 22/09/20 15:23:12 INFO jdbc.Utils: Supplied authorities: hiveserver:10000
> 22/09/20 15:23:12 INFO jdbc.Utils: Resolved authority: hiveserver:10000
> 22/09/20 15:23:12 INFO jdbc.HiveConnection: Will try to open client transport 
> with JDBC Uri: jdbc:hive2://hiveserver:10000
> 22/09/20 15:23:13 INFO ddl.QueryBasedDDLExecutor: Successfully established 
> Hive connection to  jdbc:hive2://hiveserver:10000
> 22/09/20 15:23:13 INFO hive.metastore: Trying to connect to metastore with 
> URI thrift://hivemetastore:9083
> 22/09/20 15:23:13 INFO hive.metastore: Connected to metastore.
> 22/09/20 15:23:13 INFO hive.HiveSyncTool: Syncing target hoodie table with 
> hive table(default.stock_ticks_cow). Hive metastore URL :null, basePath 
> :/user/hive/warehouse/stock_ticks_cow
> 22/09/20 15:23:13 INFO hive.HiveSyncTool: Trying to sync hoodie table 
> stock_ticks_cow with base path /user/hive/warehouse/stock_ticks_cow of type 
> COPY_ON_WRITE
> 22/09/20 15:23:14 INFO table.TableSchemaResolver: Reading schema from 
> /user/hive/warehouse/stock_ticks_cow/2018/08/31/b4a7076c-30e6-4320-bb04-be47246b6646-0_0-29-29_20220920142351042.parquet
> 22/09/20 15:23:14 INFO hive.HiveSyncTool: Hive table stock_ticks_cow is not 
> found. Creating it
> 22/09/20 15:23:15 INFO ddl.QueryBasedDDLExecutor: Creating table with CREATE 
> EXTERNAL TABLE IF NOT EXISTS `default`.`stock_ticks_cow`( 
> `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, 
> `_hoodie_record_key` string, `_hoodie_partition_path` string, 
> `_hoodie_file_name` string, `volume` bigint, `ts` string, `symbol` string, 
> `year` int, `month` string, `high` double, `low` double, `key` string, `date` 
> string, `close` double, `open` double, `day` string) PARTITIONED BY (`dt` 
> String) ROW FORMAT SERDE 
> 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH 
> SERDEPROPERTIES 
> ('hoodie.query.as.ro.table'='false','path'='/user/hive/warehouse/stock_ticks_cow')
>  STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' 
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
> LOCATION '/user/hive/warehouse/stock_ticks_cow' 
> TBLPROPERTIES('spark.sql.sources.schema.partCol.0'='dt','spark.sql.sources.schema.numParts'='1','spark.sql.sources.schema.numPartCols'='1','spark.sql.sources.provider'='hudi','spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"_hoodie_commit_time","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_commit_seqno","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_record_key","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_partition_path","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_file_name","type":"string","nullable":true,"metadata":{}},{"name":"volume","type":"long","nullable":false,"metadata":{}},{"name":"ts","type":"string","nullable":false,"metadata":{}},{"name":"symbol","type":"string","nullable":false,"metadata":{}},{"name":"year","type":"integer","nullable":false,"metadata":{}},{"name":"month","type":"string","nullable":false,"metadata":{}},{"name":"high","type":"double","nullable":false,"metadata":{}},{"name":"low","type":"double","nullable":false,"metadata":{}},{"name":"key","type":"string","nullable":false,"metadata":{}},{"name":"date","type":"string","nullable":false,"metadata":{}},{"name":"close","type":"double","nullable":false,"metadata":{}},{"name":"open","type":"double","nullable":false,"metadata":{}},{"name":"day","type":"string","nullable":false,"metadata":{}},{"name":"dt","type":"string","nullable":false,"metadata":{}}]}')
> 22/09/20 15:23:15 INFO ddl.QueryBasedDDLExecutor: Executing SQL CREATE 
> EXTERNAL TABLE IF NOT EXISTS `default`.`stock_ticks_cow`( 
> `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, 
> `_hoodie_record_key` string, `_hoodie_partition_path` string, 
> `_hoodie_file_name` string, `volume` bigint, `ts` string, `symbol` string, 
> `year` int, `month` string, `high` double, `low` double, `key` string, `date` 
> string, `close` double, `open` double, `day` string) PARTITIONED BY (`dt` 
> String) ROW FORMAT SERDE 
> 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH 
> SERDEPROPERTIES 
> ('hoodie.query.as.ro.table'='false','path'='/user/hive/warehouse/stock_ticks_cow')
>  STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' 
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
> LOCATION '/user/hive/warehouse/stock_ticks_cow' 
> TBLPROPERTIES('spark.sql.sources.schema.partCol.0'='dt','spark.sql.sources.schema.numParts'='1','spark.sql.sources.schema.numPartCols'='1','spark.sql.sources.provider'='hudi','spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"_hoodie_commit_time","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_commit_seqno","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_record_key","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_partition_path","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_file_name","type":"string","nullable":true,"metadata":{}},{"name":"volume","type":"long","nullable":false,"metadata":{}},{"name":"ts","type":"string","nullable":false,"metadata":{}},{"name":"symbol","type":"string","nullable":false,"metadata":{}},{"name":"year","type":"integer","nullable":false,"metadata":{}},{"name":"month","type":"string","nullable":false,"metadata":{}},{"name":"high","type":"double","nullable":false,"metadata":{}},{"name":"low","type":"double","nullable":false,"metadata":{}},{"name":"key","type":"string","nullable":false,"metadata":{}},{"name":"date","type":"string","nullable":false,"metadata":{}},{"name":"close","type":"double","nullable":false,"metadata":{}},{"name":"open","type":"double","nullable":false,"metadata":{}},{"name":"day","type":"string","nullable":false,"metadata":{}},{"name":"dt","type":"string","nullable":false,"metadata":{}}]}')
> 22/09/20 15:23:17 INFO hive.HiveSyncTool: Schema sync complete. Syncing 
> partitions for stock_ticks_cow
> 22/09/20 15:23:17 INFO hive.HiveSyncTool: Last commit time synced was found 
> to be null
> 22/09/20 15:23:17 INFO common.HoodieSyncClient: Last commit time synced is 
> not known, listing all partitions in /user/hive/warehouse/stock_ticks_cow,FS 
> :DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-1898513899_1, ugi=root 
> (auth:SIMPLE)]]
> 22/09/20 15:23:17 INFO hive.HiveSyncTool: Storage partitions scan complete. 
> Found 1
> 22/09/20 15:23:17 INFO hive.HiveSyncTool: New Partitions [2018/08/31]
> 22/09/20 15:23:17 INFO ddl.QueryBasedDDLExecutor: Adding partitions 1 to 
> table stock_ticks_cow
> 22/09/20 15:23:17 INFO ddl.QueryBasedDDLExecutor: Executing SQL ALTER TABLE 
> `default`.`stock_ticks_cow` ADD IF NOT EXISTS   PARTITION (`dt`='2018-08-31') 
> LOCATION '/user/hive/warehouse/stock_ticks_cow/2018/08/31' 
> 22/09/20 15:23:18 INFO hive.HiveSyncTool: Sync complete for stock_ticks_cow
> 22/09/20 15:23:18 INFO util.ShutdownHookManager: Shutdown hook called
> 22/09/20 15:23:18 INFO util.ShutdownHookManager: Deleting directory 
> /tmp/spark-59275d30-b1bb-4f7d-af85-bce24962ca1e {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to