Re: Structured Streaming on GCP Dataproc - java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArraySerializer

2022-02-02 Thread Mich Talebzadeh
Well you are now using a package instead of the jar. There is a difference between using a jar and using a package in spark-submit. --jar adds only that jar. --package adds the jar and all its dependencies listed in maven. Packages do resolve the dependencies. They do so via ivy

Re: Structured Streaming on GCP Dataproc - java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArraySerializer

2022-02-02 Thread karan alang
Hi Mitch, All - thnx, i was able to resolve this using the command below : --- gcloud dataproc jobs submit pyspark /Users/karanalang/Documents/Technology/gcp/DataProc/StructuredStreaming_Kafka_GCP-Batch-feb2.py --cluster dataproc-ss-poc --properties

Re: Structured Streaming on GCP Dataproc - java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArraySerializer

2022-02-02 Thread Mich Talebzadeh
The current Spark version on GCP is 3.1.2. Try using this jar file instead spark-sql-kafka-0-10_2.12-3.0.1.jar HTH view my Linkedin profile *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage

Structured Streaming on GCP Dataproc - java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArraySerializer

2022-02-01 Thread karan alang
Hello All, I'm running a simple Structured Streaming on GCP, which reads data from Kafka and prints onto console. Command : cloud dataproc jobs submit pyspark /Users/karanalang/Documents/Technology/gcp/DataProc/StructuredStreaming_Kafka_GCP-Batch-feb1.py --cluster dataproc-ss-poc