giaosudau removed a comment on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-732523090


   I tried to run deltastreamer with sqltransformer 
   
   Hi everyone,
   I am running spark3 https://github.com/apache/hudi/pull/2208
   with deltastreamer and sqltranformer for debezium data
   ``` 
   spark-submit \
   --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
   --driver-memory 2g \
   --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
   --conf spark.sql.hive.convertMetastoreParquet=false \
   --packages org.apache.spark:spark-avro_2.12:3.0.1 \
   
~/workspace/hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.12-0.6.1-SNAPSHOT.jar
 \
   --table-type MERGE_ON_READ \
   --source-ordering-field ts_ms \
   --schemaprovider-class 
org.apache.hudi.utilities.schema.SchemaRegistryProvider \
   --source-class org.apache.hudi.utilities.sources.AvroKafkaSource \
   --target-base-path /Users/users/Downloads/roi/debezium/by_test/ \
   --target-table users \
   --props ./hudi_base.properties \
   --transformer-class 
org.apache.hudi.utilities.transform.SqlQueryBasedTransformer
   hoodie.upsert.shuffle.parallelism=2
   hoodie.insert.shuffle.parallelism=2
   hoodie.bulkinsert.shuffle.parallelism=2
   # Key fields, for kafka example
   hoodie.datasource.write.storage.type=MERGE_ON_READ
   hoodie.datasource.write.recordkey.field=id
   hoodie.datasource.write.partitionpath.field=ts_ms
   hoodie.deltastreamer.keygen.timebased.timestamp.type=EPOCHMILLISECONDS
   
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.NonpartitionedKeyGenerator
   hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy-MM-dd
   # schema provider configs
   
hoodie.deltastreamer.schemaprovider.registry.url=http://localhost:8081/subjects/dbz1.by_test.users-value/versions/latest
   #Kafka props
   hoodie.deltastreamer.source.kafka.topic=dbz1.by_test.users
   metadata.broker.list=localhost:9092
   bootstrap.servers=localhost:9092
   auto.offset.reset=earliest
   schema.registry.url=http://localhost:8081
   hoodie.deltastreamer.transformer.sql=SELECT ts_ms, op, after.* FROM <SRC> 
WHERE op IN ('u', 'c')
   ```
   
   ```
   #
   # A fatal error has been detected by the Java Runtime Environment:
   #
   #  SIGSEGV (0xb) at pc=0x000000010f4cbad0, pid=33960, tid=0x0000000000013e03
   #
   # JRE version: OpenJDK Runtime Environment (8.0_265-b01) (build 
1.8.0_265-b01)
   # Java VM: OpenJDK 64-Bit Server VM (25.265-b01 mixed mode bsd-amd64 
compressed oops)
   # Problematic frame:
   # V  [libjvm.dylib+0xcbad0]
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to