tooptoop4 commented on issue #1954:
URL: https://github.com/apache/hudi/issues/1954#issuecomment-672843556


   got a bit further with the below, now hudi/spark job succeeds but the hive 
ddl is pointing at wrong s3 location, so doing select from hive/presto gives 
error. But when i manually alter the s3 location in the table ddl via 
hiveserver2 then it works (ie change LOCATION 's3a://redact/my2/multpk7' to 
LOCATION 's3a://redact/my2/multpk7/default'), so i think there should be some 
code change to make it create table at proper s3 location.
   
   ```
   /home/ec2-user/spark_home/bin/spark-submit --conf 
"spark.hadoop.fs.s3a.proxy.host=redact" --conf 
"spark.hadoop.fs.s3a.proxy.port=redact" --conf 
"spark.driver.extraClassPath=/home/ec2-user/json-20090211.jar" --conf 
"spark.executor.extraClassPath=/home/ec2-user/json-20090211.jar" --class 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer --jars 
"/home/ec2-user/spark-avro_2.11-2.4.6.jar" --master spark://redact:7077 
--deploy-mode client /home/ec2-user/hudi-utilities-bundle_2.11-0.5.3-1.jar 
--table-type COPY_ON_WRITE --source-ordering-field TimeCreated --source-class 
org.apache.hudi.utilities.sources.ParquetDFSSource --enable-hive-sync 
--hoodie-conf hoodie.datasource.hive_sync.database=redact --hoodie-conf 
hoodie.datasource.hive_sync.table=dmstest_multpk7 --hoodie-conf 
hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.NonPartitionedExtractor
 --hoodie-conf  hoodie.datasource.hive_sync.use_jdbc=false --target-base-path 
s3a://redact/my2/multpk7 --target-
 table dmstest_multpk7 --transformer-class 
org.apache.hudi.utilities.transform.AWSDmsTransformer --payload-class 
org.apache.hudi.payload.AWSDmsAvroPayload --hoodie-conf 
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator
 --hoodie-conf hoodie.datasource.write.recordkey.field=version_no,group_company 
--hoodie-conf "hoodie.datasource.write.partitionpath.field=" --hoodie-conf 
hoodie.deltastreamer.source.dfs.root=s3a://redact/dbo/tbl > multpk7.log
   OK
   ```
   
   cat multpk7.log
   ```
   2020-08-12 12:18:15,375 [main] WARN  
org.apache.hudi.utilities.deltastreamer.SchedulerConfGenerator - Job Scheduling 
Configs will not be in effect as spark.scheduler.mode is not set to FAIR at 
instantiation time. Continuing without scheduling configs
   2020-08-12 12:18:16,386 [dispatcher-event-loop-3] INFO  
org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend - Connected to 
Spark cluster with app ID app-20200812121816-0086
   2020-08-12 12:18:17,199 [main] INFO  com.amazonaws.http.AmazonHttpClient - 
Configuring Proxy. redact
   2020-08-12 12:18:18,154 [main] INFO  
org.apache.spark.scheduler.EventLoggingListener - Logging events to 
s3a://redact/sparkevents/app-20200812121816-0086
   2020-08-12 12:18:18,171 [dispatcher-event-loop-2] INFO  
org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend - Granted 
executor ID app-20200812121816-0086/0 on hostPort redact:19629 with 4 core(s), 
7.9 GB RAM
   2020-08-12 12:18:18,195 [main] INFO  
org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend - 
SchedulerBackend is ready for scheduling beginning after reached 
minRegisteredResourcesRatio: 0.0
   2020-08-12 12:18:18,427 [main] WARN  org.apache.spark.SparkContext - Using 
an existing SparkContext; some configuration may not take effect.
   2020-08-12 12:18:18,526 [main] ERROR 
org.apache.hudi.common.util.DFSPropertiesConfiguration - Error reading in 
properies from dfs
   java.io.FileNotFoundException: File 
file:/home/ec2-user/http_listener/logs/src/test/resources/delta-streamer-config/dfs-source.properties
 does not exist
           at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:635)
           at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:861)
           at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:625)
           at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
           at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:146)
           at 
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
           at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:787)
           at 
org.apache.hudi.common.util.DFSPropertiesConfiguration.visitFile(DFSPropertiesConfiguration.java:87)
           at 
org.apache.hudi.common.util.DFSPropertiesConfiguration.<init>(DFSPropertiesConfiguration.java:60)
           at 
org.apache.hudi.common.util.DFSPropertiesConfiguration.<init>(DFSPropertiesConfiguration.java:64)
           at 
org.apache.hudi.utilities.UtilHelpers.readConfig(UtilHelpers.java:118)
           at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.<init>(HoodieDeltaStreamer.java:451)
           at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:97)
           at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:91)
           at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:380)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
           at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
           at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
           at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
           at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
           at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
           at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
           at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   2020-08-12 12:18:18,528 [main] WARN  org.apache.hudi.utilities.UtilHelpers - 
Unexpected error read props file at 
:file:/home/ec2-user/http_listener/logs/src/test/resources/delta-streamer-config/dfs-source.properties
   java.lang.IllegalArgumentException: Cannot read properties from dfs
           at 
org.apache.hudi.common.util.DFSPropertiesConfiguration.visitFile(DFSPropertiesConfiguration.java:91)
           at 
org.apache.hudi.common.util.DFSPropertiesConfiguration.<init>(DFSPropertiesConfiguration.java:60)
           at 
org.apache.hudi.common.util.DFSPropertiesConfiguration.<init>(DFSPropertiesConfiguration.java:64)
           at 
org.apache.hudi.utilities.UtilHelpers.readConfig(UtilHelpers.java:118)
           at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.<init>(HoodieDeltaStreamer.java:451)
           at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:97)
           at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:91)
           at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:380)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
           at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
           at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
           at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
           at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
           at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
           at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
           at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   Caused by: java.io.FileNotFoundException: File 
file:/home/ec2-user/http_listener/logs/src/test/resources/delta-streamer-config/dfs-source.properties
 does not exist
           at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:635)
           at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:861)
           at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:625)
           at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
           at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:146)
           at 
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
           at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:787)
           at 
org.apache.hudi.common.util.DFSPropertiesConfiguration.visitFile(DFSPropertiesConfiguration.java:87)
           ... 19 more
   2020-08-12 12:18:18,528 [main] INFO  org.apache.hudi.utilities.UtilHelpers - 
Adding overridden properties to file properties.
   2020-08-12 12:18:18,529 [main] INFO  
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer - Creating delta 
streamer with configs : {hoodie.datasource.hive_sync.use_jdbc=false, 
hoodie.datasource.write.recordkey.field=version_no,group_company, 
hoodie.datasource.write.partitionpath.field=, 
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator,
 
hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.NonPartitionedExtractor,
 hoodie.datasource.hive_sync.table=dmstest_multpk7, 
hoodie.deltastreamer.source.dfs.root=s3a://redact/dbo/tbl, 
hoodie.datasource.hive_sync.database=redact}
   2020-08-12 12:18:18,533 [main] INFO  
org.apache.hudi.utilities.deltastreamer.DeltaSync - Creating delta streamer 
with configs : {hoodie.datasource.hive_sync.use_jdbc=false, 
hoodie.datasource.write.recordkey.field=version_no,group_company, 
hoodie.datasource.write.partitionpath.field=, 
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator,
 
hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.NonPartitionedExtractor,
 hoodie.datasource.hive_sync.table=dmstest_multpk7, 
hoodie.deltastreamer.source.dfs.root=s3a://redact/dbo/tbl, 
hoodie.datasource.hive_sync.database=redact}
   2020-08-12 12:18:19,798 [main] INFO  
org.apache.hudi.utilities.deltastreamer.DeltaSync - Setting up Hoodie Write 
Client
   2020-08-12 12:18:19,799 [main] INFO  
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer - Delta Streamer 
running only single round
   2020-08-12 12:18:20,218 [main] INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants []
   2020-08-12 12:18:20,222 [main] INFO  
org.apache.hudi.utilities.deltastreamer.DeltaSync - Checkpoint to resume from : 
Option{val=null}
   2020-08-12 12:18:42,136 [main] INFO  
org.apache.hudi.utilities.deltastreamer.DeltaSync - Setting up Hoodie Write 
Client
   2020-08-12 12:18:42,156 [main] INFO  
org.apache.hudi.utilities.deltastreamer.DeltaSync - Registering Schema 
:[{"type":"record","name":"hoodie_source","namespace":"hoodie.source","fields":[{"name":"Op","type":["string","null"]},{"name":"Id","type":["int","null"]},{"name":"AuditProcessHistoryId","type":["int","null"]},{"name":"org_id","type":["int","null"]},{"name":"org_name","type":["string","null"]},{"name":"org_sname","type":["string","null"]},{"name":"org_mnem","type":["string","null"]},{"name":"org_parent","type":["int","null"]},{"name":"percent_holding","type":["double","null"]},{"name":"group_company","type":["string","null"]},{"name":"grp_ord_for_cln","type":["string","null"]},{"name":"mkt_only","type":["string","null"]},{"name":"pro_rate_ind","type":["string","null"]},{"name":"show_shapes","type":["string","null"]},{"name":"sec_code_pref","type":["string","null"]},{"name":"alert_org_ref","type":["string","null"]},{"name":"swift_bic","type":["string","null"]},{"name":"exec_b
 
reakdown","type":["string","null"]},{"name":"notes","type":["string","null"]},{"name":"active","type":["string","null"]},{"name":"version_no","type":["int","null"]},{"name":"sys_date","type":[{"type":"long","logicalType":"timestamp-micros"},"null"]},{"name":"sys_user","type":["string","null"]},{"name":"create_date","type":[{"type":"long","logicalType":"timestamp-micros"},"null"]},{"name":"cntry_of_dom","type":["string","null"]},{"name":"client","type":["string","null"]},{"name":"alert_acronym","type":["string","null"]},{"name":"oneoff_client","type":["string","null"]},{"name":"booking_domicile","type":["string","null"]},{"name":"booking_dom_list","type":["string","null"]},{"name":"TimeCreated","type":[{"type":"long","logicalType":"timestamp-micros"},"null"]},{"name":"UserCreated","type":["string","null"]}]},
 
{"type":"record","name":"hoodie_source","namespace":"hoodie.source","fields":[{"name":"Op","type":["string","null"]},{"name":"Id","type":["int","null"]},{"name":"AuditProcessHis
 
toryId","type":["int","null"]},{"name":"org_id","type":["int","null"]},{"name":"org_name","type":["string","null"]},{"name":"org_sname","type":["string","null"]},{"name":"org_mnem","type":["string","null"]},{"name":"org_parent","type":["int","null"]},{"name":"percent_holding","type":["double","null"]},{"name":"group_company","type":["string","null"]},{"name":"grp_ord_for_cln","type":["string","null"]},{"name":"mkt_only","type":["string","null"]},{"name":"pro_rate_ind","type":["string","null"]},{"name":"show_shapes","type":["string","null"]},{"name":"sec_code_pref","type":["string","null"]},{"name":"alert_org_ref","type":["string","null"]},{"name":"swift_bic","type":["string","null"]},{"name":"exec_breakdown","type":["string","null"]},{"name":"notes","type":["string","null"]},{"name":"active","type":["string","null"]},{"name":"version_no","type":["int","null"]},{"name":"sys_date","type":[{"type":"long","logicalType":"timestamp-micros"},"null"]},{"name":"sys_user","type":["string","nu
 
ll"]},{"name":"create_date","type":[{"type":"long","logicalType":"timestamp-micros"},"null"]},{"name":"cntry_of_dom","type":["string","null"]},{"name":"client","type":["string","null"]},{"name":"alert_acronym","type":["string","null"]},{"name":"oneoff_client","type":["string","null"]},{"name":"booking_domicile","type":["string","null"]},{"name":"booking_dom_list","type":["string","null"]},{"name":"TimeCreated","type":[{"type":"long","logicalType":"timestamp-micros"},"null"]},{"name":"UserCreated","type":["string","null"]}]}]
   2020-08-12 12:18:50,361 [main] INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants []
   2020-08-12 12:18:50,934 [main] INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants []
   2020-08-12 12:18:50,937 [main] INFO  
org.apache.hudi.client.HoodieWriteClient - Generate a new instant time 
20200812121850
   2020-08-12 12:18:51,226 [main] INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants []
   2020-08-12 12:18:51,234 [main] INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Creating a new 
instant [==>20200812121850__commit__REQUESTED]
   2020-08-12 12:18:51,415 [main] INFO  
org.apache.hudi.utilities.deltastreamer.DeltaSync - Starting commit  : 
20200812121850
   2020-08-12 12:18:51,699 [main] INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants 
[[==>20200812121850__commit__REQUESTED]]
   2020-08-12 12:18:51,982 [main] INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants 
[[==>20200812121850__commit__REQUESTED]]
   2020-08-12 12:19:21,501 [main] INFO  
org.apache.hudi.index.bloom.HoodieBloomIndex - InputParallelism: ${1500}, 
IndexParallelism: ${0}
   2020-08-12 12:19:32,817 [main] INFO  
org.apache.hudi.client.HoodieWriteClient - Workload profile :WorkloadProfile 
{globalStat=WorkloadStat {numInserts=103, numUpdates=0}, 
partitionStat={default=WorkloadStat {numInserts=103, numUpdates=0}}}
   2020-08-12 12:19:32,841 [main] INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Checking for file 
exists ?s3a://redact/my2/multpk7/.hoodie/20200812121850.commit.requested
   2020-08-12 12:19:33,081 [main] INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Create new file 
for toInstant ?s3a://redact/my2/multpk7/.hoodie/20200812121850.inflight
   2020-08-12 12:19:33,082 [main] INFO  
org.apache.hudi.table.HoodieCopyOnWriteTable - AvgRecordSize => 1024
   2020-08-12 12:19:33,184 [main] INFO  
org.apache.hudi.table.HoodieCopyOnWriteTable - For partitionPath : default 
Small Files => []
   2020-08-12 12:19:33,184 [main] INFO  
org.apache.hudi.table.HoodieCopyOnWriteTable - After small file assignment: 
unassignedInserts => 103, totalInsertBuckets => 1, recordsPerBucket => 122880
   2020-08-12 12:19:33,185 [main] INFO  
org.apache.hudi.table.HoodieCopyOnWriteTable - Total insert buckets for 
partition path default => [WorkloadStat {bucketNumber=0, weight=1.0}]
   2020-08-12 12:19:33,186 [main] INFO  
org.apache.hudi.table.HoodieCopyOnWriteTable - Total Buckets :1, buckets info 
=> {0=BucketInfo {bucketType=INSERT, 
fileIdPrefix=a9ab6f7a-4def-490a-aac0-49e15ee9d742}},
   Partition to insert buckets => {default=[WorkloadStat {bucketNumber=0, 
weight=1.0}]},
   UpdateLocations mapped to buckets =>{}
   2020-08-12 12:19:33,206 [main] INFO  
org.apache.hudi.client.AbstractHoodieWriteClient - Auto commit disabled for 
20200812121850
   2020-08-12 12:19:41,179 [main] INFO  
org.apache.hudi.client.AbstractHoodieWriteClient - Commiting 20200812121850
   2020-08-12 12:19:41,502 [main] INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants 
[[==>20200812121850__commit__INFLIGHT]]
   2020-08-12 12:19:41,777 [main] INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants 
[[==>20200812121850__commit__INFLIGHT]]
   2020-08-12 12:19:42,140 [main] INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants 
[[==>20200812121850__commit__INFLIGHT]]
   2020-08-12 12:19:42,479 [main] INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants 
[[==>20200812121850__commit__INFLIGHT]]
   2020-08-12 12:19:42,706 [main] INFO  org.apache.hudi.table.HoodieTable - 
Removing marker directory=s3a://redact/my2/multpk7/.hoodie/.temp/20200812121850
   2020-08-12 12:19:43,027 [main] INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Marking instant 
complete [==>20200812121850__commit__INFLIGHT]
   2020-08-12 12:19:43,027 [main] INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Checking for file 
exists ?s3a://redact/my2/multpk7/.hoodie/20200812121850.inflight
   2020-08-12 12:19:43,356 [main] INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Create new file 
for toInstant ?s3a://redact/my2/multpk7/.hoodie/20200812121850.commit
   2020-08-12 12:19:43,357 [main] INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Completed 
[==>20200812121850__commit__INFLIGHT]
   2020-08-12 12:19:43,745 [main] INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants 
[[20200812121850__commit__COMPLETED]]
   2020-08-12 12:19:44,010 [main] INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants 
[[20200812121850__commit__COMPLETED]]
   2020-08-12 12:19:44,084 [main] INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants 
[[==>20200812121850__commit__REQUESTED], [==>20200812121850__commit__INFLIGHT], 
[20200812121850__commit__COMPLETED]]
   2020-08-12 12:19:44,085 [main] INFO  
org.apache.hudi.table.HoodieCommitArchiveLog - No Instants to archive
   2020-08-12 12:19:44,086 [main] INFO  
org.apache.hudi.client.HoodieWriteClient - Auto cleaning is enabled. Running 
cleaner now
   2020-08-12 12:19:44,356 [main] INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants 
[[20200812121850__commit__COMPLETED]]
   2020-08-12 12:19:44,629 [main] INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants 
[[20200812121850__commit__COMPLETED]]
   2020-08-12 12:19:44,912 [main] INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants 
[[20200812121850__commit__COMPLETED]]
   2020-08-12 12:19:45,321 [main] INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants 
[[20200812121850__commit__COMPLETED]]
   2020-08-12 12:19:45,337 [main] INFO  org.apache.hudi.table.CleanHelper - No 
earliest commit to retain. No need to scan partitions !!
   2020-08-12 12:19:45,337 [main] INFO  
org.apache.hudi.table.HoodieCopyOnWriteTable - Nothing to clean here. It is 
already clean
   2020-08-12 12:19:45,374 [main] INFO  
org.apache.hudi.client.AbstractHoodieWriteClient - Committed 20200812121850
   2020-08-12 12:19:45,374 [main] INFO  
org.apache.hudi.utilities.deltastreamer.DeltaSync - Commit 20200812121850 
successful!
   2020-08-12 12:19:45,375 [main] INFO  
org.apache.hudi.utilities.deltastreamer.DeltaSync - Syncing target hoodie table 
with hive table(dmstest_multpk7). Hive metastore URL 
:jdbc:hive2://localhost:10000, basePath :s3a://redact/my2/multpk7
   2020-08-12 12:19:45,636 [main] INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants 
[[20200812121850__commit__COMPLETED]]
   2020-08-12 12:19:46,806 [main] INFO  org.apache.hudi.hive.HiveSyncTool - 
Trying to sync hoodie table dmstest_multpk7 with base path 
s3a://redact/my2/multpk7 of type COPY_ON_WRITE
   2020-08-12 12:19:46,864 [main] INFO  org.apache.hudi.hive.HoodieHiveClient - 
Reading schema from 
s3a://redact/my2/multpk7/default/a9ab6f7a-4def-490a-aac0-49e15ee9d742-0_0-25-15010_20200812121850.parquet
   2020-08-12 12:19:47,064 [main] INFO  org.apache.hudi.hive.HiveSyncTool - 
Hive table dmstest_multpk7 is not found. Creating it
   2020-08-12 12:19:47,070 [main] INFO  org.apache.hudi.hive.HoodieHiveClient - 
Creating table with CREATE EXTERNAL TABLE  IF NOT EXISTS 
`redact`.`dmstest_multpk7`( `_hoodie_commit_time` string, 
`_hoodie_commit_seqno` string, `_hoodie_record_key` string, 
`_hoodie_partition_path` string, `_hoodie_file_name` string, `Op` string, `Id` 
int, `AuditProcessHistoryId` int, `org_id` int, `org_name` string, `org_sname` 
string, `org_mnem` string, `org_parent` int, `percent_holding` double, 
`group_company` string, `grp_ord_for_cln` string, `mkt_only` string, 
`pro_rate_ind` string, `show_shapes` string, `sec_code_pref` string, 
`alert_org_ref` string, `swift_bic` string, `exec_breakdown` string, `notes` 
string, `active` string, `version_no` int, `sys_date` bigint, `sys_user` 
string, `create_date` bigint, `cntry_of_dom` string, `client` string, 
`alert_acronym` string, `oneoff_client` string, `booking_domicile` string, 
`booking_dom_list` string, `TimeCreated` bigint, `UserCreated` string) ROW 
FORMAT
  SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS 
INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 
's3a://redact/my2/multpk7'
   2020-08-12 12:19:47,151 [main] INFO  org.apache.hudi.hive.HoodieHiveClient - 
Time taken to start SessionState and create Driver: 81 ms
   2020-08-12 12:19:47,186 [main] INFO  hive.ql.parse.ParseDriver - Parsing 
command: CREATE EXTERNAL TABLE  IF NOT EXISTS `redact`.`dmstest_multpk7`( 
`_hoodie_commit_time` string, `_hoodie_commit_seqno` string, 
`_hoodie_record_key` string, `_hoodie_partition_path` string, 
`_hoodie_file_name` string, `Op` string, `Id` int, `AuditProcessHistoryId` int, 
`org_id` int, `org_name` string, `org_sname` string, `org_mnem` string, 
`org_parent` int, `percent_holding` double, `group_company` string, 
`grp_ord_for_cln` string, `mkt_only` string, `pro_rate_ind` string, 
`show_shapes` string, `sec_code_pref` string, `alert_org_ref` string, 
`swift_bic` string, `exec_breakdown` string, `notes` string, `active` string, 
`version_no` int, `sys_date` bigint, `sys_user` string, `create_date` bigint, 
`cntry_of_dom` string, `client` string, `alert_acronym` string, `oneoff_client` 
string, `booking_domicile` string, `booking_dom_list` string, `TimeCreated` 
bigint, `UserCreated` string) ROW FORMAT SERDE 'org.apa
 che.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 
'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 
's3a://redact/my2/multpk7'
   2020-08-12 12:19:47,874 [main] INFO  hive.ql.parse.ParseDriver - Parse 
Completed
   2020-08-12 12:19:48,323 [main] INFO  org.apache.hudi.hive.HoodieHiveClient - 
Time taken to execute [CREATE EXTERNAL TABLE  IF NOT EXISTS 
`redact`.`dmstest_multpk7`( `_hoodie_commit_time` string, 
`_hoodie_commit_seqno` string, `_hoodie_record_key` string, 
`_hoodie_partition_path` string, `_hoodie_file_name` string, `Op` string, `Id` 
int, `AuditProcessHistoryId` int, `org_id` int, `org_name` string, `org_sname` 
string, `org_mnem` string, `org_parent` int, `percent_holding` double, 
`group_company` string, `grp_ord_for_cln` string, `mkt_only` string, 
`pro_rate_ind` string, `show_shapes` string, `sec_code_pref` string, 
`alert_org_ref` string, `swift_bic` string, `exec_breakdown` string, `notes` 
string, `active` string, `version_no` int, `sys_date` bigint, `sys_user` 
string, `create_date` bigint, `cntry_of_dom` string, `client` string, 
`alert_acronym` string, `oneoff_client` string, `booking_domicile` string, 
`booking_dom_list` string, `TimeCreated` bigint, `UserCreated` string) ROW FOR
 MAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED 
AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 
's3a://redact/my2/multpk7']: 1171 ms
   2020-08-12 12:19:48,329 [main] INFO  org.apache.hudi.hive.HiveSyncTool - 
Schema sync complete. Syncing partitions for dmstest_multpk7
   2020-08-12 12:19:48,329 [main] INFO  org.apache.hudi.hive.HiveSyncTool - 
Last commit time synced was found to be null
   2020-08-12 12:19:48,330 [main] INFO  org.apache.hudi.hive.HoodieHiveClient - 
Last commit time synced is not known, listing all partitions in 
s3a://redact/my2/multpk7,FS :S3AFileSystem{uri=s3a://redact, 
workingDir=s3a://redact/user/ec2-user, inputPolicy=normal, partSize=104857600, 
enableMultiObjectsDelete=true, maxKeys=5000, readAhead=65536, 
blockSize=33554432, multiPartThreshold=2147483647, 
serverSideEncryptionAlgorithm='AES256', 
blockFactory=org.apache.hadoop.fs.s3a.S3ADataBlocks$DiskBlockFactory@62765aec, 
boundedExecutor=BlockingThreadPoolExecutorService{SemaphoredDelegatingExecutor{permitCount=2405,
 available=2405, waiting=0}, activeCount=0}, 
unboundedExecutor=java.util.concurrent.ThreadPoolExecutor@6f5bd362[Running, 
pool size = 6, active threads = 0, queued tasks = 0, completed tasks = 6], 
statistics {445890 bytes read, 4324 bytes written, 172 read ops, 0 large read 
ops, 31 write ops}, metrics {{Context=S3AFileSystem} 
{FileSystemId=aad8f6ce-2b40-4ddb-9b9b-4e82033cb193-redact} 
 {fsURI=s3a://redact/sparkevents} {files_created=5} {files_copied=0} 
{files_copied_bytes=0} {files_deleted=1} {fake_directories_deleted=0} 
{directories_created=6} {directories_deleted=0} {ignored_errors=4} 
{op_copy_from_local_file=0} {op_exists=53} {op_get_file_status=145} 
{op_glob_status=0} {op_is_directory=38} {op_is_file=0} {op_list_files=1} 
{op_list_located_status=0} {op_list_status=19} {op_mkdirs=5} {op_rename=0} 
{object_copy_requests=0} {object_delete_requests=5} {object_list_requests=140} 
{object_continue_list_requests=0} {object_metadata_requests=265} 
{object_multipart_aborted=0} {object_put_bytes=4324} {object_put_requests=10} 
{object_put_requests_completed=10} {stream_write_failures=0} 
{stream_write_block_uploads=0} {stream_write_block_uploads_committed=0} 
{stream_write_block_uploads_aborted=0} {stream_write_total_time=0} 
{stream_write_total_data=4324} {object_put_requests_active=0} 
{object_put_bytes_pending=0} {stream_write_block_uploads_active=0} 
{stream_write_block_uploa
 ds_pending=4} {stream_write_block_uploads_data_pending=0} 
{stream_read_fully_operations=0} {stream_opened=22} 
{stream_bytes_skipped_on_seek=0} {stream_closed=22} 
{stream_bytes_backwards_on_seek=438082} {stream_bytes_read=445890} 
{stream_read_operations_incomplete=71} {stream_bytes_discarded_in_abort=0} 
{stream_close_operations=22} {stream_read_operations=2764} {stream_aborted=0} 
{stream_forward_seek_operations=0} {stream_backward_seek_operations=1} 
{stream_seek_operations=1} {stream_bytes_read_in_close=8} 
{stream_read_exceptions=0} }}
   2020-08-12 12:19:48,584 [main] INFO  org.apache.hudi.hive.HiveSyncTool - 
Storage partitions scan complete. Found 1
   2020-08-12 12:19:48,613 [main] INFO  org.apache.hudi.hive.HiveSyncTool - New 
Partitions []
   2020-08-12 12:19:48,614 [main] INFO  org.apache.hudi.hive.HoodieHiveClient - 
No partitions to add for dmstest_multpk7
   2020-08-12 12:19:48,614 [main] INFO  org.apache.hudi.hive.HiveSyncTool - 
Changed Partitions []
   2020-08-12 12:19:48,614 [main] INFO  org.apache.hudi.hive.HoodieHiveClient - 
No partitions to change for dmstest_multpk7
   2020-08-12 12:19:49,002 [main] INFO  org.apache.hudi.hive.HiveSyncTool - 
Sync complete for dmstest_multpk7
   2020-08-12 12:19:49,031 [main] INFO  
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer - Shut down 
deltastreamer
   2020-08-12 12:19:49,044 [main] INFO  
org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend - Shutting down 
all executors
   ```
   
   ```
   aws s3 ls s3://redact/my2/multpk7/
                              PRE .hoodie/
                              PRE default/
                              
   aws s3 ls s3://redact/my2/multpk7/default/
   2020-08-12 12:19:39         93 .hoodie_partition_metadata
   2020-08-12 12:19:41     452644 
a9ab6f7a-4def-490a-aac0-49e15ee9d742-0_0-25-15010_20200812121850.parquet
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to