Qian,
It looks like the partitionPathField that you specified (session_date) is
missing or the code is unable to grab it from your payload. Is this field a
top-level field or a nested field in your schema ?
( Currently, the HDFSImporterTool looks for your partitionPathField only at
the top-level, for example genericRecord.get("session_date") )
Thanks,
Nishith
On Tue, Oct 8, 2019 at 10:12 AM Qian Wang <[email protected]> wrote:
> Hi,
>
> Thanks for your response.
>
> Now I tried to convert existing dataset to Hudi managed dataset and I used
> the hdfsparquestimport in hud-cli. I encountered following error:
>
> 19/10/08 09:50:59 INFO DAGScheduler: Job 1 failed: countByKey at
> HoodieBloomIndex.java:148, took 2.913761 s
> 19/10/08 09:50:59 ERROR HDFSParquetImporter: Error occurred.
> org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for
> commit time 20191008095056
>
> Caused by: org.apache.hudi.exception.HoodieIOException: partition key is
> missing. :session_date
>
> My command in hud-cli as following:
> hdfsparquetimport --upsert false --srcPath /path/to/source --targetPath
> /path/to/target --tableName xxx --tableType COPY_ON_WRITE --rowKeyField
> _row_key --partitionPathField session_date --parallelism 1500
> --schemaFilePath /path/to/avro/schema --format parquet --sparkMemory 6g
> --retry 2
>
> Could you please tell me how to solve this problem? Thanks.
>
> Best,
> Qian
> On Oct 6, 2019, 9:15 AM -0700, Qian Wang <[email protected]>, wrote:
> > Hi,
> >
> > I have some questions when I try to use Hudi in my company’s prod env:
> >
> > 1. When I migrate the history table in HDFS, I tried use hudi-cli and
> HDFSParquetImporter tool. How can I specify Spark parameters in this tool,
> such as Yarn queue, etc?
> > 2. Hudi needs to write metadata to Hive and it uses HiveMetastoreClient
> and HiveJDBC. How can I do if the Hive has Kerberos Authentication?
> >
> > Thanks.
> >
> > Best,
> > Qian
>