GCP Dataproc - error in importing KafkaProducer

2022-02-17 Thread karan alang
Hello All, I've a GCP Dataproc cluster, and I'm running a Spark StructuredStreaming job on this. I'm trying to use KafkaProducer to push aggregated data into a Kafka topic, however when i import KafkaProducer (from kafka import KafkaProducer), it gives error ``` Traceback (most recent call

Re: Cast int to string not possible?

2022-02-17 Thread Rico Bergmann
I found the reason why it did not work: When returning the Spark data type I was calling new StringType(). When changing it to DataTypes.StringType it worked. Greets, Rico. > Am 17.02.2022 um 14:13 schrieb Gourav Sengupta : > >  > Hi, > > can you please post a screen shot of the exact

Encoders.STRING() causing performance problems in Java application

2022-02-17 Thread martin
Hello, I am working on optimising the performance of a Java ML/NLP application based on Spark / SparkNLP. For prediction, I am applying a trained model on a Spark dataset which consists of one column with only one row. The dataset is created like this: List textList =

Position for 'cf.content' not found in row

2022-02-17 Thread 潘明文
HI, Could you help me the below issue,Thanks! This is my source code: SparkConf sparkConf = new SparkConf(true); sparkConf.setAppName(ESTest.class.getName()); SparkSession spark = null; sparkConf.setMaster("local[*]"); sparkConf.set("spark.cleaner.ttl", "3600"); sparkConf.set("es.nodes",

writing a Dataframe (with one of the columns as struct) into Kafka

2022-02-17 Thread karan alang
Hello All, I've a pyspark dataframe which i need to write to Kafka topic. Structure of the DF is : root |-- window: struct (nullable = true) ||-- start: timestamp (nullable = false) ||-- end: timestamp (nullable = false) |-- processedAlarmCnt: integer (nullable = false) |--

Re: Spark 3.2.1 in Google Kubernetes Version 1.19 or 1.21 - SparkSubmit Error

2022-02-17 Thread Mich Talebzadeh
Just a create directory as below on gcp storage bucket CODE_DIRECTORY_CLOUD="gs://spark-on-k8s/codes/" Put your jar file there gsutil cp /opt/spark/examples/jars/spark-examples_2.12-3.2.1.jar $CODE_DIRECTORY_CLOUD --conf spark.kubernetes.file.upload.path=file:///tmp \

Re: Using Avro file format with SparkSQL

2022-02-17 Thread Artemis User
Please try these two corrections: 1. The --packages isn't the right command line argument for spark-submit.  Please use --conf spark.jars.packages=your-package to specify Maven packages or define your configuration parameters in the spark-defaults.conf file 2. Please check the version

Re: Spark 3.2.1 in Google Kubernetes Version 1.19 or 1.21 - SparkSubmit Error

2022-02-17 Thread Gnana Kumar
Though I have created the kubernetes RBAC as per Spark site in my GKE cluster,Im getting POD NAME null error. kubectl create serviceaccount spark kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default On Thu, Feb 17, 2022 at 11:31 PM

Re: Spark 3.2.1 in Google Kubernetes Version 1.19 or 1.21 - SparkSubmit Error

2022-02-17 Thread Gnana Kumar
Hi Mich This is the latest error I'm stuck with. Please help me resolve this issue. Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create] for kind: [Pod] with name: [null] in namespace: [default] failed.

Re: StructuredStreaming - foreach/foreachBatch

2022-02-17 Thread Gourav Sengupta
Hi, The following excellent documentation may help as well: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#using-foreach-and-foreachbatch The book from Dr. Zaharia on SPARK does a fantastic job in explaining the fundamental thinking behind these concepts.

Re: Spark 3.2.1 in Google Kubernetes Version 1.19 or 1.21 - SparkSubmit Error

2022-02-17 Thread Mich Talebzadeh
Hi Gnana, That JAR file /home/gnana_kumar123/spark/spark-3.2.1- bin-hadoop3.2/examples/jars/spark-examples_2.12-3.2.1.jar, is not visible to the GKE cluster such that all nodes can read it. I suggest that you put it on gs:// bucket in GCP and access it from there. HTH view my Linkedin

Re: Cast int to string not possible?

2022-02-17 Thread Gourav Sengupta
Hi, can you please post a screen shot of the exact CAST statement that you are using? Did you use the SQL method mentioned by me earlier? Regards, Gourav Sengupta On Thu, Feb 17, 2022 at 12:17 PM Rico Bergmann wrote: > hi! > > Casting another int column that is not a partition column fails

Fwd: Spark 3.2.1 in Google Kubernetes Version 1.19 or 1.21 - SparkSubmit Error

2022-02-17 Thread Gnana Kumar
Hi There, I'm getting below error though I pass --class and --jars values while submitting a spark job through Spark-Submit. Please help. Exception in thread "main" org.apache.spark.SparkException: Failed to get main class in JAR with error 'File file:/home/gnana_kumar123/spark/ does not

Re: Cast int to string not possible?

2022-02-17 Thread Rico Bergmann
hi! Casting another int column that is not a partition column fails with the same error. The Schema before the cast (column names are anonymized): root |-- valueObject: struct (nullable = true) ||-- value1: string (nullable = true) ||-- value2: string (nullable = true) ||--

Re: SparkStructured Streaming using withWatermark - TypeError: 'module' object is not callable

2022-02-17 Thread Mich Talebzadeh
OK, that sounds reasonable. In the code below #Aggregation code in Alarm call, which uses withWatermark def computeCount(df_processedAlarm, df_totalAlarm): processedAlarmCnt = None if df_processedAlarm.count() > 0: processedAlarmCnt =

Re: Cast int to string not possible?

2022-02-17 Thread ayan guha
Can you try to cast any other Int field which is NOT a partition column? On Thu, 17 Feb 2022 at 7:34 pm, Gourav Sengupta wrote: > Hi, > > This appears interesting, casting INT to STRING has never been an issue > for me. > > Can you just help us with the output of : df.printSchema() ? > > I

Re: Cast int to string not possible?

2022-02-17 Thread Gourav Sengupta
Hi, This appears interesting, casting INT to STRING has never been an issue for me. Can you just help us with the output of : df.printSchema() ? I prefer to use SQL, and the method I use for casting is: CAST(<> AS STRING) <>. Regards, Gourav On Thu, Feb 17, 2022 at 6:02 AM Rico Bergmann