gauravg1977 opened a new issue, #11598:
URL: https://github.com/apache/hudi/issues/11598

   I am exploring using Apache Hudi HoodieStreamer to ingest protobuf messages 
from Kafka into Hudi. Despite a lot of attempts I have hit a roadblock I get an 
exception while the HoodieStreamer tries make use of the schema from my locally 
hosted confluent schema registry
   
   
   **Environment Description**
   
   * Hudi version : 0.15
   * Spark version : 3.4
   * Storage (HDFS/S3/GCS..) : Local File System
   * Running on Docker? (yes/no) : No
   
   I start the HoodieStreamer as follows:
   
   ```
   spark-submit \
   --packages 
org.apache.hudi:hudi-utilities-bundle_2.12:0.15.0,org.apache.hudi:hudi-spark3.4-bundle_2.12:0.15.0
 \
   --jars 
/home/gaurav/ws/learn/hoodie-delta-streamer/kafka-protobuf-provider-7.6.1.jar \
   --driver-memory 8g --executor-memory 8g \
   --class org.apache.hudi.utilities.streamer.HoodieStreamer 
/home/gaurav/ws/learn/hoodie-delta-streamer/hudi-utilities-bundle_2.12-0.15.0.jar
 \
   --props 
/home/gaurav/ws/learn/hoodie-delta-streamer/kafka/try1/props/kafka-source.properties
 \
   --schemaprovider-class 
org.apache.hudi.utilities.schema.SchemaRegistryProvider \
   --source-class org.apache.hudi.utilities.sources.ProtoKafkaSource \
   --table-type COPY_ON_WRITE \
   --target-base-path 
file:\/\/\/home/gaurav/ws/learn/hoodie-delta-streamer/db/try1 \
   --target-table vols1 \
   --op UPSERT \
   --continuous \
   --source-limit 4000000 \
   --min-sync-interval-seconds 60
   ```
   
   The kafka-source.properties passed above is as follows:
   
   ```
   hoodie.upsert.shuffle.parallelism=2
   hoodie.insert.shuffle.parallelism=2
   hoodie.delete.shuffle.parallelism=2
   hoodie.bulkinsert.shuffle.parallelism=2
   hoodie.datasource.write.recordkey.field=name
   hoodie.datasource.write.partitionpath.field=name
   
   
hoodie.streamer.schemaprovider.registry.url=http://localhost:8081/subjects/vols1-value/versions/latest
   
hoodie.streamer.schemaprovider.registry.schemaconverter=org.apache.hudi.utilities.schema.converter.ProtoSchemaToAvroSchemaConverter
   
hoodie.streamer.source.kafka.proto.value.deserializer.class=io.confluent.kafka.serializers.protobuf.KafkaProtobufDeserializer
   
   hoodie.streamer.source.kafka.topic=vols1
   
   bootstrap.servers=localhost:9092
   auto.offset.reset=earliest
   schema.registry.url=http://localhost:8081
   
   ```
   
   HoodieStreamer fetches the schema from the schema registry but fails to 
parse due to a NoSuchMethodException on the 
ProtoSchemaToAvroSchemaConverter.<init>() method. Here is the relevant 
exception stack trace:
   
   ```
   Caused by: org.apache.hudi.utilities.exception.HoodieSchemaFetchException: 
Error reading source schema from registry. Please check 
hoodie.streamer.schemaprovider.registry.url is configured correctly. Truncated 
URL: http://loc...ons/latest
   
   Caused by: org.apache.hudi.internal.schema.HoodieSchemaException: Failed to 
parse schema from registry: syntax = "proto3";
   package com.gaurav.data.vol;
   
   Caused by: org.apache.hudi.internal.schema.HoodieSchemaException: Failed to 
parse schema from registry: syntax = "proto3";
   package com.gaurav.data.vol;
   import "google/protobuf/timestamp.proto";
   
   message VolSurface {
    optional string name = 1;
    repeated .google.protobuf.Timestamp expiry = 2;
    repeated double atmVol = 3;
    repeated double skew = 4;
   }
   
   Caused by: org.apache.hudi.exception.HoodieException: Could not load class 
org.apache.hudi.utilities.schema.converter.ProtoSchemaToAvroSchemaConverter
    at 
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:64)
    at 
org.apache.hudi.utilities.schema.SchemaRegistryProvider.parseSchemaFromRegistry(SchemaRegistryProvider.java:107)
    ... 11 more
   Caused by: java.lang.InstantiationException: 
org.apache.hudi.utilities.schema.converter.ProtoSchemaToAvroSchemaConverter
    at java.base/java.lang.Class.newInstance(Class.java:571)
    at 
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:62)
    ... 12 more
   Caused by: java.lang.NoSuchMethodException: 
org.apache.hudi.utilities.schema.converter.ProtoSchemaToAvroSchemaConverter.<init>()
    at java.base/java.lang.Class.getConstructor0(Class.java:3349)
    at java.base/java.lang.Class.newInstance(Class.java:556)
    ... 13 more
   ```
   
   I hope that the use the below line in the properties file is correct . My 
understanding is that the converter is needed because Hudi needs to convert 
protobuf message into Avro
   
   
`hoodie.streamer.schemaprovider.registry.schemaconverter=org.apache.hudi.utilities.schema.converter.ProtoSchemaToAvroSchemaConverter
   `
   The NoSuchMethodException on the ProtoSchemaToAvroSchemaConverter.<init>() 
method (as opposed to a NoClassDefFoundError) seems to indicate that it is 
failing to find a no-arg constructor and looking at the code downloaded from 
github that is true, ie there isnt any no-arg constructor for the class 
ProtoSchemaToAvroSchemaConverter . Unless this exposing a bug, I think that 
there is something basic I am missing and I will be really grateful if you can 
point me in the right direction.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to