Re: Questions about using Hudi

Qian Wang Tue, 08 Oct 2019 10:12:33 -0700

Hi,

Thanks for your response.

Now I tried to convert existing dataset to Hudi managed dataset and I used the 
hdfsparquestimport in hud-cli. I encountered following error:

19/10/08 09:50:59 INFO DAGScheduler: Job 1 failed: countByKey at 
HoodieBloomIndex.java:148, took 2.913761 s
19/10/08 09:50:59 ERROR HDFSParquetImporter: Error occurred.
org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit 
time 20191008095056

Caused by: org.apache.hudi.exception.HoodieIOException: partition key is 
missing. :session_date

My command in hud-cli as following:
hdfsparquetimport --upsert false --srcPath /path/to/source --targetPath 
/path/to/target --tableName xxx --tableType COPY_ON_WRITE --rowKeyField 
_row_key --partitionPathField session_date --parallelism 1500 --schemaFilePath 
/path/to/avro/schema --format parquet --sparkMemory 6g --retry 2

Could you please tell me how to solve this problem? Thanks.

Best,
Qian
On Oct 6, 2019, 9:15 AM -0700, Qian Wang <[email protected]>, wrote:
> Hi,
>
> I have some questions when I try to use Hudi in my company’s prod env:
>
> 1. When I migrate the history table in HDFS, I tried use hudi-cli and 
> HDFSParquetImporter tool. How can I specify Spark parameters in this tool, 
> such as Yarn queue, etc?
> 2. Hudi needs to write metadata to Hive and it uses HiveMetastoreClient and 
> HiveJDBC. How can I do if the Hive has Kerberos Authentication?
>
> Thanks.
>
> Best,
> Qian

Re: Questions about using Hudi

Reply via email to