soumilshah1995 opened a new issue, #10231: URL: https://github.com/apache/hudi/issues/10231
Hello everyone, I'm encountering a small issue that seems to be related to settings, and I would appreciate any guidance in identifying the problem. This pertains to my upcoming videos where I'm covering the Hudi Hive Sync tool in detail. I've started the Spark Thrift Server using the following command: ``` spark-submit \ --master 'local[*]' \ --conf spark.executor.extraJavaOptions=-Duser.timezone=Etc/UTC \ --conf spark.eventLog.enabled=false \ --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 \ --name "Thrift JDBC/ODBC Server" \ --executor-memory 512m \ --packages org.apache.spark:spark-hive_2.12:3.4.0 ``` Additionally, I have Beeline installed and connected to the default database: ``` beeline -u jdbc:hive2://localhost:10000/default ``` While my delta stream works fine, it appears that I'm facing issues using it with the Hive MetaStore. Here's my Spark submit command for the Hudi Delta Streamer: ``` spark-submit \ --class org.apache.hudi.utilities.streamer.HoodieStreamer \ --packages 'org.apache.hudi:hudi-spark3.4-bundle_2.12:0.14.0,org.apache.hadoop:hadoop-aws:3.3.2' \ --repositories 'https://repo.maven.apache.org/maven2' \ --properties-file /Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E5/spark-config.properties \ --master 'local[*]' \ --executor-memory 1g \ /Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E5/jar/hudi-utilities-slim-bundle_2.12-0.14.0.jar \ --table-type COPY_ON_WRITE \ --op UPSERT \ --enable-hive-sync \ --source-ordering-field ts \ --source-class org.apache.hudi.utilities.sources.CsvDFSSource \ --target-base-path file:///Users/soumilshah/Downloads/hudidb/ \ --target-table orders \ --props hudi_tbl.props ``` Hudi CONF ``` hoodie.datasource.write.recordkey.field=order_id hoodie.datasource.write.partitionpath.field=order_date hoodie.streamer.source.dfs.root=file:////Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E5/sampledata/orders hoodie.datasource.write.precombine.field=ts hoodie.deltastreamer.csv.header=true hoodie.deltastreamer.csv.sep=\t hoodie.datasource.hive_sync.enable=true hoodie.datasource.hive_sync.mode=jdbc hoodie.datasource.hive_sync.jdbcurl=jdbc:hive2://localhost:10000 hoodie.datasource.hive_sync.database=default hoodie.datasource.hive_sync.table=orders hoodie.datasource.hive_sync.partition_fields=order_date ``` Spark Conf: ``` spark.serializer=org.apache.spark.serializer.KryoSerializer spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog spark.sql.hive.convertMetastoreParquet=false ``` The error I'm encountering is: ``` Required table missing : "VERSION" in Catalog "" Schema "". DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable "datanucleus.schema.autoCreateTables" org.datanucleus.store.rdbms.exceptions.MissingTableException: Required table missing : "VERSION" in Catalog "" Schema "". DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable "datanucleus.schema.autoCreateTables" at org.datanucleus.store.rdbms.table.AbstractTable.exists(AbstractTable.java:606) at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:3385) ``` Any assistance in identifying what might be missing or misconfigured would be highly appreciated. Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org