Re: GCP Dataproc - Failed to construct kafka consumer, Failed to load SSL keystore dataproc-versa-sase-p12-1.jks of type JKS

2022-02-02 Thread karan alang
re-checking to see if there is any suggestion on this issue. On Wed, Feb 2, 2022 at 3:36 PM karan alang wrote: > Hello All, > > I'm trying to run a Structured Streaming program on GCP Dataproc, which > accesses the data from Kafka and prints it. > > Access to Kafka is using SSL, and the

Re: Structured Streaming on GCP Dataproc - java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArraySerializer

2022-02-02 Thread Mich Talebzadeh
Well you are now using a package instead of the jar. There is a difference between using a jar and using a package in spark-submit. --jar adds only that jar. --package adds the jar and all its dependencies listed in maven. Packages do resolve the dependencies. They do so via ivy

GCP Dataproc - Failed to construct kafka consumer, Failed to load SSL keystore dataproc-versa-sase-p12-1.jks of type JKS

2022-02-02 Thread karan alang
Hello All, I'm trying to run a Structured Streaming program on GCP Dataproc, which accesses the data from Kafka and prints it. Access to Kafka is using SSL, and the truststore and keystore files are stored in buckets. I'm using Google Storage API to access the bucket, and store the file in the

Re: Structured Streaming on GCP Dataproc - java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArraySerializer

2022-02-02 Thread karan alang
Hi Mitch, All - thnx, i was able to resolve this using the command below : --- gcloud dataproc jobs submit pyspark /Users/karanalang/Documents/Technology/gcp/DataProc/StructuredStreaming_Kafka_GCP-Batch-feb2.py --cluster dataproc-ss-poc --properties

[ANNOUNCE] .NET for Apache Spark™ 2.1 released

2022-02-02 Thread Terry Kim
Hi, We are happy to announce that .NET for Apache Spark™ v2.1 has been released ! The release note includes the full list of features/improvements of this

Re: [Spark K8s] Seeking Advice on Scaling Spark Cluster in Kubernetes

2022-02-02 Thread Mich Talebzadeh
I do not know about AWS but in Google Kubernetes Cluster (GKE), you can autoscale as follows gcloud container clusters create example-cluster \ --num-nodes 2 \ --zone us-central1-a \ --node-locations us-central1-a,us-central1-b,us-central1-f \ --enable-autoscaling --min-nodes 1

Re: [Spark K8s] Seeking Advice on Scaling Spark Cluster in Kubernetes

2022-02-02 Thread Han Lin
For Kubernetes yes -- we use AWS EKS. We manage the Spark cluster ourselves and it is deployed via Helm chart. On Wed, Feb 2, 2022 at 8:32 AM Mich Talebzadeh wrote: > Is this part of cloud service offerings like Google GKE? > > > >view my Linkedin profile >

Re: [Spark K8s] Seeking Advice on Scaling Spark Cluster in Kubernetes

2022-02-02 Thread Mich Talebzadeh
Is this part of cloud service offerings like Google GKE? view my Linkedin profile *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise

[Spark K8s] Seeking Advice on Scaling Spark Cluster in Kubernetes

2022-02-02 Thread Han Lin
Hello, Before I ask my questions, let me say what I am trying to do and briefly describe the setup I have so far. I am basically building an API service that serves a ML model which uses Spark ML. I have Spark deployed in Kubernetes in standalone mode (the default Spark manager) with 2 worker

Re: Structured Streaming on GCP Dataproc - java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArraySerializer

2022-02-02 Thread Mich Talebzadeh
The current Spark version on GCP is 3.1.2. Try using this jar file instead spark-sql-kafka-0-10_2.12-3.0.1.jar HTH view my Linkedin profile *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage