tooptoop4 commented on issue #1954:
URL: https://github.com/apache/hudi/issues/1954#issuecomment-672843556
got a bit further with the below, now hudi/spark job succeeds but the hive
ddl is pointing at wrong s3 location, so doing select from hive/presto gives
error. But when i manually alter the s3 location in the table ddl via
hiveserver2 then it works (ie change LOCATION 's3a://redact/my2/multpk7' to
LOCATION 's3a://redact/my2/multpk7/default'), so i think there should be some
code change to make it create table at proper s3 location.
```
/home/ec2-user/spark_home/bin/spark-submit --conf
"spark.hadoop.fs.s3a.proxy.host=redact" --conf
"spark.hadoop.fs.s3a.proxy.port=redact" --conf
"spark.driver.extraClassPath=/home/ec2-user/json-20090211.jar" --conf
"spark.executor.extraClassPath=/home/ec2-user/json-20090211.jar" --class
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer --jars
"/home/ec2-user/spark-avro_2.11-2.4.6.jar" --master spark://redact:7077
--deploy-mode client /home/ec2-user/hudi-utilities-bundle_2.11-0.5.3-1.jar
--table-type COPY_ON_WRITE --source-ordering-field TimeCreated --source-class
org.apache.hudi.utilities.sources.ParquetDFSSource --enable-hive-sync
--hoodie-conf hoodie.datasource.hive_sync.database=redact --hoodie-conf
hoodie.datasource.hive_sync.table=dmstest_multpk7 --hoodie-conf
hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.NonPartitionedExtractor
--hoodie-conf hoodie.datasource.hive_sync.use_jdbc=false --target-base-path
s3a://redact/my2/multpk7 --target-
table dmstest_multpk7 --transformer-class
org.apache.hudi.utilities.transform.AWSDmsTransformer --payload-class
org.apache.hudi.payload.AWSDmsAvroPayload --hoodie-conf
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator
--hoodie-conf hoodie.datasource.write.recordkey.field=version_no,group_company
--hoodie-conf "hoodie.datasource.write.partitionpath.field=" --hoodie-conf
hoodie.deltastreamer.source.dfs.root=s3a://redact/dbo/tbl > multpk7.log
OK
```
cat multpk7.log
```
2020-08-12 12:18:15,375 [main] WARN
org.apache.hudi.utilities.deltastreamer.SchedulerConfGenerator - Job Scheduling
Configs will not be in effect as spark.scheduler.mode is not set to FAIR at
instantiation time. Continuing without scheduling configs
2020-08-12 12:18:16,386 [dispatcher-event-loop-3] INFO
org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend - Connected to
Spark cluster with app ID app-20200812121816-0086
2020-08-12 12:18:17,199 [main] INFO com.amazonaws.http.AmazonHttpClient -
Configuring Proxy. redact
2020-08-12 12:18:18,154 [main] INFO
org.apache.spark.scheduler.EventLoggingListener - Logging events to
s3a://redact/sparkevents/app-20200812121816-0086
2020-08-12 12:18:18,171 [dispatcher-event-loop-2] INFO
org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend - Granted
executor ID app-20200812121816-0086/0 on hostPort redact:19629 with 4 core(s),
7.9 GB RAM
2020-08-12 12:18:18,195 [main] INFO
org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend -
SchedulerBackend is ready for scheduling beginning after reached
minRegisteredResourcesRatio: 0.0
2020-08-12 12:18:18,427 [main] WARN org.apache.spark.SparkContext - Using
an existing SparkContext; some configuration may not take effect.
2020-08-12 12:18:18,526 [main] ERROR
org.apache.hudi.common.util.DFSPropertiesConfiguration - Error reading in
properies from dfs
java.io.FileNotFoundException: File
file:/home/ec2-user/http_listener/logs/src/test/resources/delta-streamer-config/dfs-source.properties
does not exist
at
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:635)
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:861)
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:625)
at
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
at
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
at
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:787)
at
org.apache.hudi.common.util.DFSPropertiesConfiguration.visitFile(DFSPropertiesConfiguration.java:87)
at
org.apache.hudi.common.util.DFSPropertiesConfiguration.(DFSPropertiesConfiguration.java:60)
at
org.apache.hudi.common.util.DFSPropertiesConfiguration.(DFSPropertiesConfiguration.java:64)
at
org.apache.hudi.utilities.UtilHelpers.readConfig(UtilHelpers.java:118)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.(HoodieDeltaStreamer.java:451)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.(HoodieDeltaStreamer.j