giaosudau removed a comment on pull request #2208: URL: https://github.com/apache/hudi/pull/2208#issuecomment-732523090
I tried to run deltastreamer with sqltransformer Hi everyone, I am running spark3 https://github.com/apache/hudi/pull/2208 with deltastreamer and sqltranformer for debezium data ``` spark-submit \ --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \ --driver-memory 2g \ --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \ --conf spark.sql.hive.convertMetastoreParquet=false \ --packages org.apache.spark:spark-avro_2.12:3.0.1 \ ~/workspace/hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.12-0.6.1-SNAPSHOT.jar \ --table-type MERGE_ON_READ \ --source-ordering-field ts_ms \ --schemaprovider-class org.apache.hudi.utilities.schema.SchemaRegistryProvider \ --source-class org.apache.hudi.utilities.sources.AvroKafkaSource \ --target-base-path /Users/users/Downloads/roi/debezium/by_test/ \ --target-table users \ --props ./hudi_base.properties \ --transformer-class org.apache.hudi.utilities.transform.SqlQueryBasedTransformer hoodie.upsert.shuffle.parallelism=2 hoodie.insert.shuffle.parallelism=2 hoodie.bulkinsert.shuffle.parallelism=2 # Key fields, for kafka example hoodie.datasource.write.storage.type=MERGE_ON_READ hoodie.datasource.write.recordkey.field=id hoodie.datasource.write.partitionpath.field=ts_ms hoodie.deltastreamer.keygen.timebased.timestamp.type=EPOCHMILLISECONDS hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.NonpartitionedKeyGenerator hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy-MM-dd # schema provider configs hoodie.deltastreamer.schemaprovider.registry.url=http://localhost:8081/subjects/dbz1.by_test.users-value/versions/latest #Kafka props hoodie.deltastreamer.source.kafka.topic=dbz1.by_test.users metadata.broker.list=localhost:9092 bootstrap.servers=localhost:9092 auto.offset.reset=earliest schema.registry.url=http://localhost:8081 hoodie.deltastreamer.transformer.sql=SELECT ts_ms, op, after.* FROM <SRC> WHERE op IN ('u', 'c') ``` ``` # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x000000010f4cbad0, pid=33960, tid=0x0000000000013e03 # # JRE version: OpenJDK Runtime Environment (8.0_265-b01) (build 1.8.0_265-b01) # Java VM: OpenJDK 64-Bit Server VM (25.265-b01 mixed mode bsd-amd64 compressed oops) # Problematic frame: # V [libjvm.dylib+0xcbad0] ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org