Hi Nishith, I have checked the data, there is no null in that field. Does there has other possibility about this error?
Thanks, Qian On Oct 8, 2019, 10:55 AM -0700, Qian Wang <[email protected]>, wrote: > Hi Nishith, > > Thanks for your response. > The session_date is one field in my original dataset. I have some questions > about the schema parameter: > > 1. Do I need create the target table? > 2. My source data is Parquet format, why the tool need the schema file as the > parameter? > 3. Can I use the schema file of Avro format? > > The schema is looks like: > > {"type":"record","name":"PathExtractData","doc":"Path event extract fact > data”,”fields”:[ > {“name”:”SESSION_DATE”,”type”:”string”}, > {“name”:”SITE_ID”,”type”:”int”}, > {“name”:”GUID”,”type”:”string”}, > {“name”:”SESSION_KEY”,”type”:”long”}, > {“name”:”USER_ID”,”type”:”string”}, > {“name”:”STEP”,”type”:”int”}, > {“name”:”PAGE_ID”,”type”:”int”} > ]} > > Thanks. > > Best, > Qian > On Oct 8, 2019, 10:47 AM -0700, nishith agarwal <[email protected]>, wrote: > > Qian, > > > > It looks like the partitionPathField that you specified (session_date) is > > missing or the code is unable to grab it from your payload. Is this field a > > top-level field or a nested field in your schema ? > > ( Currently, the HDFSImporterTool looks for your partitionPathField only at > > the top-level, for example genericRecord.get("session_date") ) > > > > Thanks, > > Nishith > > > > > > On Tue, Oct 8, 2019 at 10:12 AM Qian Wang <[email protected]> wrote: > > > > > Hi, > > > > > > Thanks for your response. > > > > > > Now I tried to convert existing dataset to Hudi managed dataset and I used > > > the hdfsparquestimport in hud-cli. I encountered following error: > > > > > > 19/10/08 09:50:59 INFO DAGScheduler: Job 1 failed: countByKey at > > > HoodieBloomIndex.java:148, took 2.913761 s > > > 19/10/08 09:50:59 ERROR HDFSParquetImporter: Error occurred. > > > org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for > > > commit time 20191008095056 > > > > > > Caused by: org.apache.hudi.exception.HoodieIOException: partition key is > > > missing. :session_date > > > > > > My command in hud-cli as following: > > > hdfsparquetimport --upsert false --srcPath /path/to/source --targetPath > > > /path/to/target --tableName xxx --tableType COPY_ON_WRITE --rowKeyField > > > _row_key --partitionPathField session_date --parallelism 1500 > > > --schemaFilePath /path/to/avro/schema --format parquet --sparkMemory 6g > > > --retry 2 > > > > > > Could you please tell me how to solve this problem? Thanks. > > > > > > Best, > > > Qian > > > On Oct 6, 2019, 9:15 AM -0700, Qian Wang <[email protected]>, wrote: > > > > Hi, > > > > > > > > I have some questions when I try to use Hudi in my company’s prod env: > > > > > > > > 1. When I migrate the history table in HDFS, I tried use hudi-cli and > > > HDFSParquetImporter tool. How can I specify Spark parameters in this tool, > > > such as Yarn queue, etc? > > > > 2. Hudi needs to write metadata to Hive and it uses HiveMetastoreClient > > > and HiveJDBC. How can I do if the Hive has Kerberos Authentication? > > > > > > > > Thanks. > > > > > > > > Best, > > > > Qian > > >
