Hi, Thanks for your response.
Now I tried to convert existing dataset to Hudi managed dataset and I used the hdfsparquestimport in hud-cli. I encountered following error: 19/10/08 09:50:59 INFO DAGScheduler: Job 1 failed: countByKey at HoodieBloomIndex.java:148, took 2.913761 s 19/10/08 09:50:59 ERROR HDFSParquetImporter: Error occurred. org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time 20191008095056 Caused by: org.apache.hudi.exception.HoodieIOException: partition key is missing. :session_date My command in hud-cli as following: hdfsparquetimport --upsert false --srcPath /path/to/source --targetPath /path/to/target --tableName xxx --tableType COPY_ON_WRITE --rowKeyField _row_key --partitionPathField session_date --parallelism 1500 --schemaFilePath /path/to/avro/schema --format parquet --sparkMemory 6g --retry 2 Could you please tell me how to solve this problem? Thanks. Best, Qian On Oct 6, 2019, 9:15 AM -0700, Qian Wang <[email protected]>, wrote: > Hi, > > I have some questions when I try to use Hudi in my company’s prod env: > > 1. When I migrate the history table in HDFS, I tried use hudi-cli and > HDFSParquetImporter tool. How can I specify Spark parameters in this tool, > such as Yarn queue, etc? > 2. Hudi needs to write metadata to Hive and it uses HiveMetastoreClient and > HiveJDBC. How can I do if the Hive has Kerberos Authentication? > > Thanks. > > Best, > Qian
