[GitHub] [hudi] tooptoop4 commented on issue #1954: [SUPPORT] DMS Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values

2020-08-24 Thread GitBox


tooptoop4 commented on issue #1954:
URL: https://github.com/apache/hudi/issues/1954#issuecomment-679353671


   @bvaradar in each comment I am trying brand new tables with different spark 
submits. So not changing an existing table.
   
   try to reproduce with
   
   /home/ec2-user/spark_home/bin/spark-submit --conf 
"spark.hadoop.fs.s3a.proxy.host=redact" --conf 
"spark.hadoop.fs.s3a.proxy.port=redact" --conf 
"spark.driver.extraClassPath=/home/ec2-user/json-20090211.jar" --conf 
"spark.executor.extraClassPath=/home/ec2-user/json-20090211.jar" --class 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer --jars 
"/home/ec2-user/spark-avro_2.11-2.4.6.jar" --master spark://redact:7077 
--deploy-mode client /home/ec2-user/hudi-utilities-bundle_2.11-0.5.3-1.jar 
--table-type COPY_ON_WRITE --source-ordering-field TimeCreated --source-class 
org.apache.hudi.utilities.sources.ParquetDFSSource --enable-hive-sync 
--hoodie-conf hoodie.datasource.hive_sync.database=redact --hoodie-conf 
hoodie.datasource.hive_sync.table=dmstest_multpk7 --hoodie-conf 
hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.NonPartitionedExtractor
 --hoodie-conf  hoodie.datasource.hive_sync.use_jdbc=false --target-base-path 
s3a://redact/my2/multpk7 --target-
 table dmstest_multpk7 --transformer-class 
org.apache.hudi.utilities.transform.AWSDmsTransformer --payload-class 
org.apache.hudi.payload.AWSDmsAvroPayload --hoodie-conf 
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator
 --hoodie-conf hoodie.datasource.write.recordkey.field=version_no,group_company 
--hoodie-conf "hoodie.datasource.write.partitionpath.field=" --hoodie-conf 
hoodie.deltastreamer.source.dfs.root=s3a://redact/dbo/tbl
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] tooptoop4 commented on issue #1954: [SUPPORT] DMS Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values

2020-08-12 Thread GitBox


tooptoop4 commented on issue #1954:
URL: https://github.com/apache/hudi/issues/1954#issuecomment-672843556


   got a bit further with the below, now hudi/spark job succeeds but the hive 
ddl is pointing at wrong s3 location, so doing select from hive/presto gives 
error. But when i manually alter the s3 location in the table ddl via 
hiveserver2 then it works (ie change LOCATION 's3a://redact/my2/multpk7' to 
LOCATION 's3a://redact/my2/multpk7/default'), so i think there should be some 
code change to make it create table at proper s3 location.
   
   ```
   /home/ec2-user/spark_home/bin/spark-submit --conf 
"spark.hadoop.fs.s3a.proxy.host=redact" --conf 
"spark.hadoop.fs.s3a.proxy.port=redact" --conf 
"spark.driver.extraClassPath=/home/ec2-user/json-20090211.jar" --conf 
"spark.executor.extraClassPath=/home/ec2-user/json-20090211.jar" --class 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer --jars 
"/home/ec2-user/spark-avro_2.11-2.4.6.jar" --master spark://redact:7077 
--deploy-mode client /home/ec2-user/hudi-utilities-bundle_2.11-0.5.3-1.jar 
--table-type COPY_ON_WRITE --source-ordering-field TimeCreated --source-class 
org.apache.hudi.utilities.sources.ParquetDFSSource --enable-hive-sync 
--hoodie-conf hoodie.datasource.hive_sync.database=redact --hoodie-conf 
hoodie.datasource.hive_sync.table=dmstest_multpk7 --hoodie-conf 
hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.NonPartitionedExtractor
 --hoodie-conf  hoodie.datasource.hive_sync.use_jdbc=false --target-base-path 
s3a://redact/my2/multpk7 --target-
 table dmstest_multpk7 --transformer-class 
org.apache.hudi.utilities.transform.AWSDmsTransformer --payload-class 
org.apache.hudi.payload.AWSDmsAvroPayload --hoodie-conf 
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator
 --hoodie-conf hoodie.datasource.write.recordkey.field=version_no,group_company 
--hoodie-conf "hoodie.datasource.write.partitionpath.field=" --hoodie-conf 
hoodie.deltastreamer.source.dfs.root=s3a://redact/dbo/tbl > multpk7.log
   OK
   ```
   
   cat multpk7.log
   ```
   2020-08-12 12:18:15,375 [main] WARN  
org.apache.hudi.utilities.deltastreamer.SchedulerConfGenerator - Job Scheduling 
Configs will not be in effect as spark.scheduler.mode is not set to FAIR at 
instantiation time. Continuing without scheduling configs
   2020-08-12 12:18:16,386 [dispatcher-event-loop-3] INFO  
org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend - Connected to 
Spark cluster with app ID app-20200812121816-0086
   2020-08-12 12:18:17,199 [main] INFO  com.amazonaws.http.AmazonHttpClient - 
Configuring Proxy. redact
   2020-08-12 12:18:18,154 [main] INFO  
org.apache.spark.scheduler.EventLoggingListener - Logging events to 
s3a://redact/sparkevents/app-20200812121816-0086
   2020-08-12 12:18:18,171 [dispatcher-event-loop-2] INFO  
org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend - Granted 
executor ID app-20200812121816-0086/0 on hostPort redact:19629 with 4 core(s), 
7.9 GB RAM
   2020-08-12 12:18:18,195 [main] INFO  
org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend - 
SchedulerBackend is ready for scheduling beginning after reached 
minRegisteredResourcesRatio: 0.0
   2020-08-12 12:18:18,427 [main] WARN  org.apache.spark.SparkContext - Using 
an existing SparkContext; some configuration may not take effect.
   2020-08-12 12:18:18,526 [main] ERROR 
org.apache.hudi.common.util.DFSPropertiesConfiguration - Error reading in 
properies from dfs
   java.io.FileNotFoundException: File 
file:/home/ec2-user/http_listener/logs/src/test/resources/delta-streamer-config/dfs-source.properties
 does not exist
   at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:635)
   at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:861)
   at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:625)
   at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
   at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
   at 
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:787)
   at 
org.apache.hudi.common.util.DFSPropertiesConfiguration.visitFile(DFSPropertiesConfiguration.java:87)
   at 
org.apache.hudi.common.util.DFSPropertiesConfiguration.(DFSPropertiesConfiguration.java:60)
   at 
org.apache.hudi.common.util.DFSPropertiesConfiguration.(DFSPropertiesConfiguration.java:64)
   at 
org.apache.hudi.utilities.UtilHelpers.readConfig(UtilHelpers.java:118)
   at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.(HoodieDeltaStreamer.java:451)
   at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.(HoodieDeltaStreamer.j

[GitHub] [hudi] tooptoop4 commented on issue #1954: [SUPPORT] DMS Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values

2020-08-12 Thread GitBox


tooptoop4 commented on issue #1954:
URL: https://github.com/apache/hudi/issues/1954#issuecomment-672831272


   even with NonPartitionedExtractor getting same issue



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org