tooptoop4 opened a new issue #1954:
URL: https://github.com/apache/hudi/issues/1954


   i'm loading data from DMS and i don't want any partitions (i did not specify 
hoodie.datasource.hive_sync.partition_fields since website says can leave 
default empty)
   
   ```
   /home/ec2-user/spark_home/bin/spark-submit --conf 
"spark.hadoop.fs.s3a.proxy.host=redact" --conf 
"spark.hadoop.fs.s3a.proxy.port=redact" --conf 
"spark.driver.extraClassPath=/home/ec2-user/json-20090211.jar" --conf 
"spark.executor.extraClassPath=/home/ec2-user/json-20090211.jar" --class 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer --jars 
"/home/ec2-user/spark-avro_2.11-2.4.6.jar" --master spark://redact:7077 
--deploy-mode client /home/ec2-user/hudi-utilities-bundle_2.11-0.5.3-1.jar 
--table-type COPY_ON_WRITE --source-ordering-field TimeCreated --source-class 
org.apache.hudi.utilities.sources.ParquetDFSSource --enable-hive-sync 
--hoodie-conf hoodie.datasource.hive_sync.database=redact --hoodie-conf 
hoodie.datasource.hive_sync.table=dmstest_multpk4 --hoodie-conf 
hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.MultiPartKeysValueExtractor
 --hoodie-conf  hoodie.datasource.hive_sync.use_jdbc=false --target-base-path 
s3a://redact/my2/multpk4 --tar
 get-table dmstest_multpk4 --transformer-class 
org.apache.hudi.utilities.transform.AWSDmsTransformer --payload-class 
org.apache.hudi.payload.AWSDmsAvroPayload --hoodie-conf 
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator
 --hoodie-conf hoodie.datasource.write.recordkey.field=version_no,group_company 
--hoodie-conf hoodie.datasource.write.partitionpath.field=sys_user 
--hoodie-conf hoodie.deltastreamer.source.dfs.root=s3a://redact/dbo/tblhere > 
multpk4.log
   ```
   
   ```
   2020-08-12 11:31:11,186 [main] INFO  
org.apache.hudi.client.AbstractHoodieWriteClient - Committed 20200812112840
   2020-08-12 11:31:11,189 [main] INFO  
org.apache.hudi.utilities.deltastreamer.DeltaSync - Commit 20200812112840 
successful!
   2020-08-12 11:31:11,194 [main] INFO  
org.apache.hudi.utilities.deltastreamer.DeltaSync - Syncing target hoodie table 
with hive table(dmstest_multpk4). Hive metastore URL 
:jdbc:hive2://localhost:10000, basePath :s3a://redact/my2/multpk4
   2020-08-12 11:31:11,960 [main] INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants 
[[20200812112840__commit__COMPLETED]]
   2020-08-12 11:31:14,264 [main] INFO  org.apache.hudi.hive.HiveSyncTool - 
Trying to sync hoodie table dmstest_multpk4 with base path 
s3a://redact/my2/multpk4 of type COPY_ON_WRITE
   2020-08-12 11:31:14,707 [main] INFO  org.apache.hudi.hive.HoodieHiveClient - 
Reading schema from 
s3a://redact/my2/multpk4/mpark2/7ed7627c-6110-4d42-9df2-f3a6afe877df-0_187-25-15737_20200812112840.parquet
   2020-08-12 11:31:15,330 [main] INFO  org.apache.hudi.hive.HiveSyncTool - 
Hive table dmstest_multpk4 is not found. Creating it
   2020-08-12 11:31:15,337 [main] INFO  org.apache.hudi.hive.HoodieHiveClient - 
Creating table with CREATE EXTERNAL TABLE  IF NOT EXISTS 
`redact`.`dmstest_multpk4`( `_hoodie_commit_time` string, 
`_hoodie_commit_seqno` string, `_hoodie_record_key` string, 
`_hoodie_partition_path` string, `_hoodie_file_name` string, `Op` string, `Id` 
int, `AuditProcessHistoryId` int, `org_id` int, `org_name` string, `org_sname` 
string, `org_mnem` string, `org_parent` int, `percent_holding` double, 
`group_company` string, `grp_ord_for_cln` string, `mkt_only` string, 
`pro_rate_ind` string, `show_shapes` string, `sec_code_pref` string, 
`alert_org_ref` string, `swift_bic` string, `exec_breakdown` string, `notes` 
string, `active` string, `version_no` int, `sys_date` bigint, `sys_user` 
string, `create_date` bigint, `cntry_of_dom` string, `client` string, 
`alert_acronym` string, `oneoff_client` string, `booking_domicile` string, 
`booking_dom_list` string, `TimeCreated` bigint, `UserCreated` string) ROW 
FORMAT
  SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS 
INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 
's3a://redact/my2/multpk4'
   2020-08-12 11:31:15,411 [main] INFO  org.apache.hudi.hive.HoodieHiveClient - 
Time taken to start SessionState and create Driver: 74 ms
   2020-08-12 11:31:15,444 [main] INFO  hive.ql.parse.ParseDriver - Parsing 
command: CREATE EXTERNAL TABLE  IF NOT EXISTS `redact`.`dmstest_multpk4`( 
`_hoodie_commit_time` string, `_hoodie_commit_seqno` string, 
`_hoodie_record_key` string, `_hoodie_partition_path` string, 
`_hoodie_file_name` string, `Op` string, `Id` int, `AuditProcessHistoryId` int, 
`org_id` int, `org_name` string, `org_sname` string, `org_mnem` string, 
`org_parent` int, `percent_holding` double, `group_company` string, 
`grp_ord_for_cln` string, `mkt_only` string, `pro_rate_ind` string, 
`show_shapes` string, `sec_code_pref` string, `alert_org_ref` string, 
`swift_bic` string, `exec_breakdown` string, `notes` string, `active` string, 
`version_no` int, `sys_date` bigint, `sys_user` string, `create_date` bigint, 
`cntry_of_dom` string, `client` string, `alert_acronym` string, `oneoff_client` 
string, `booking_domicile` string, `booking_dom_list` string, `TimeCreated` 
bigint, `UserCreated` string) ROW FORMAT SERDE 'org.apa
 che.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 
'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 
's3a://redact/my2/multpk4'
   2020-08-12 11:31:16,131 [main] INFO  hive.ql.parse.ParseDriver - Parse 
Completed
   2020-08-12 11:31:16,568 [main] INFO  org.apache.hudi.hive.HoodieHiveClient - 
Time taken to execute [CREATE EXTERNAL TABLE  IF NOT EXISTS 
`redact`.`dmstest_multpk4`( `_hoodie_commit_time` string, 
`_hoodie_commit_seqno` string, `_hoodie_record_key` string, 
`_hoodie_partition_path` string, `_hoodie_file_name` string, `Op` string, `Id` 
int, `AuditProcessHistoryId` int, `org_id` int, `org_name` string, `org_sname` 
string, `org_mnem` string, `org_parent` int, `percent_holding` double, 
`group_company` string, `grp_ord_for_cln` string, `mkt_only` string, 
`pro_rate_ind` string, `show_shapes` string, `sec_code_pref` string, 
`alert_org_ref` string, `swift_bic` string, `exec_breakdown` string, `notes` 
string, `active` string, `version_no` int, `sys_date` bigint, `sys_user` 
string, `create_date` bigint, `cntry_of_dom` string, `client` string, 
`alert_acronym` string, `oneoff_client` string, `booking_domicile` string, 
`booking_dom_list` string, `TimeCreated` bigint, `UserCreated` string) ROW FOR
 MAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED 
AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 
's3a://redact/my2/multpk4']: 1157 ms
   2020-08-12 11:31:16,574 [main] INFO  org.apache.hudi.hive.HiveSyncTool - 
Schema sync complete. Syncing partitions for dmstest_multpk4
   2020-08-12 11:31:16,574 [main] INFO  org.apache.hudi.hive.HiveSyncTool - 
Last commit time synced was found to be null
   2020-08-12 11:31:16,575 [main] INFO  org.apache.hudi.hive.HoodieHiveClient - 
Last commit time synced is not known, listing all partitions in 
s3a://redact/my2/multpk4,FS :S3AFileSystem{uri=s3a://redact, 
workingDir=s3a://redact/user/ec2-user, inputPolicy=normal, partSize=104857600, 
enableMultiObjectsDelete=true, maxKeys=5000, readAhead=65536, 
blockSize=33554432, multiPartThreshold=2147483647, 
serverSideEncryptionAlgorithm='AES256', 
blockFactory=org.apache.hadoop.fs.s3a.S3ADataBlocks$DiskBlockFactory@62765aec, 
boundedExecutor=BlockingThreadPoolExecutorService{SemaphoredDelegatingExecutor{permitCount=2405,
 available=2405, waiting=0}, activeCount=0}, 
unboundedExecutor=java.util.concurrent.ThreadPoolExecutor@6f5bd362[Running, 
pool size = 6, active threads = 0, queued tasks = 0, completed tasks = 6], 
statistics {761530 bytes read, 320081 bytes written, 712 read ops, 0 large read 
ops, 31 write ops}, metrics {{Context=S3AFileSystem} 
{FileSystemId=db54a51b-e05e-4b3c-9140-240762a0c03d-redact
 } {fsURI=s3a://redact/redact/sparkevents} {files_created=5} {files_copied=0} 
{files_copied_bytes=0} {files_deleted=271} {fake_directories_deleted=0} 
{directories_created=6} {directories_deleted=0} {ignored_errors=4} 
{op_copy_from_local_file=0} {op_exists=53} {op_get_file_status=415} 
{op_glob_status=0} {op_is_directory=38} {op_is_file=0} {op_list_files=271} 
{op_list_located_status=0} {op_list_status=19} {op_mkdirs=5} {op_rename=0} 
{object_copy_requests=0} {object_delete_requests=5} {object_list_requests=680} 
{object_continue_list_requests=0} {object_metadata_requests=805} 
{object_multipart_aborted=0} {object_put_bytes=320081} {object_put_requests=10} 
{object_put_requests_completed=10} {stream_write_failures=0} 
{stream_write_block_uploads=0} {stream_write_block_uploads_committed=0} 
{stream_write_block_uploads_aborted=0} {stream_write_total_time=0} 
{stream_write_total_data=320081} {object_put_requests_active=0} 
{object_put_bytes_pending=0} {stream_write_block_uploads_active=0} {stream_
 write_block_uploads_pending=4} {stream_write_block_uploads_data_pending=0} 
{stream_read_fully_operations=0} {stream_opened=22} 
{stream_bytes_skipped_on_seek=0} {stream_closed=22} 
{stream_bytes_backwards_on_seek=437965} {stream_bytes_read=761530} 
{stream_read_operations_incomplete=107} {stream_bytes_discarded_in_abort=0} 
{stream_close_operations=22} {stream_read_operations=3020} {stream_aborted=0} 
{stream_forward_seek_operations=0} {stream_backward_seek_operations=1} 
{stream_seek_operations=1} {stream_bytes_read_in_close=8} 
{stream_read_exceptions=0} }}
   2020-08-12 11:31:34,438 [main] INFO  org.apache.hudi.hive.HiveSyncTool - 
Storage partitions scan complete. Found 271
   2020-08-12 11:31:34,476 [main] INFO  org.apache.hudi.hive.HiveSyncTool - New 
Partitions [AAB, redactlist]
   2020-08-12 11:31:34,476 [main] INFO  org.apache.hudi.hive.HoodieHiveClient - 
Adding partitions 271 to table dmstest_multpk4
   2020-08-12 11:31:34,477 [main] ERROR org.apache.hudi.hive.HiveSyncTool - Got 
runtime exception when hive syncing
   org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync partitions for 
table dmstest_multpk4
           at 
org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:187)
           at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:126)
           at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:87)
           at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncHive(DeltaSync.java:460)
           at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:402)
           at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:235)
           at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:123)
           at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:380)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
           at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
           at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
           at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
           at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
           at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
           at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
           at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   Caused by: java.lang.IllegalArgumentException: Partition key parts [] does 
not match with partition values [AAB]. Check partition strategy.
           at 
org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40)
           at 
org.apache.hudi.hive.HoodieHiveClient.getPartitionClause(HoodieHiveClient.java:182)
           at 
org.apache.hudi.hive.HoodieHiveClient.constructAddPartitions(HoodieHiveClient.java:166)
           at 
org.apache.hudi.hive.HoodieHiveClient.addPartitionsToTable(HoodieHiveClient.java:141)
           at 
org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:182)
           ... 19 more
   2020-08-12 11:31:34,513 [main] INFO  
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer - Shut down 
deltastreamer
   2020-08-12 11:31:34,535 [main] INFO  
org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend - Shutting down 
all executors
   ```
   ```
   aws s3 ls s3://redact/my2/multpk4/
                              PRE .hoodie/
                              PRE AAB/
                              PRE CC/
                              PRE DD/
                              ...etc
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to