date:20220201

Structured Streaming on GCP Dataproc - java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArraySerializer

2022-02-01 Thread karan alang

Hello All, I'm running a simple Structured Streaming on GCP, which reads data from Kafka and prints onto console. Command : cloud dataproc jobs submit pyspark /Users/karanalang/Documents/Technology/gcp/DataProc/StructuredStreaming_Kafka_GCP-Batch-feb1.py --cluster dataproc-ss-poc

Re: Structured Streaming - not showing records on console

2022-02-01 Thread karan alang

Hi Mich, thnx, seems 'complete' mode is supported only if there are streaming aggregations. I get this error on changing the output mode. pyspark.sql.utils.AnalysisException: Complete output mode not supported when there are no streaming aggregations on streaming DataFrames/Datasets; Project

Re: Structured Streaming - not showing records on console

2022-02-01 Thread Mich Talebzadeh

hm. I am trying to recall if I am correct so you should try outpudeMode('complete') with format('console') result = resultMF. \ writeStream. \ outputMode('complete'). \ option("numRows", 1000). \

Structured Streaming - not showing records on console

2022-02-01 Thread karan alang

Hello Spark Experts, I've a simple Structured Streaming program, which reads data from Kafka, and writes on the console. This is working in batch mode (i.e spark.read or df.write), not not working in streaming mode. Details are in the stackoverflow

Re: Code fails when AQE enabled in Spark 3.1

2022-02-01 Thread Sean Owen

At a glance, it doesn't seem so. That is a corner case in two ways - very old dates and using RDDs, at least it seems. I also suspect that individual change is tied to a lot of other date related changes in 3.2, so may not be very back-portable. You should pursue updating to 3.2 for many reasons,

Re: A Persisted Spark DataFrame is computed twice

2022-02-01 Thread Gourav Sengupta

Hi, Can you please try to use SPARK SQL, instead of dataframes and see the difference? You will get a lot of theoretical arguments, and that is fine, but they are just largely and essentially theories. Also try to apply the function to the result of the filters as a sub-query by caching in the

Structured Streaming on GCP Dataproc - java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArraySerializer

Re: Structured Streaming - not showing records on console

Re: Structured Streaming - not showing records on console

Structured Streaming - not showing records on console

Re: Code fails when AQE enabled in Spark 3.1

Re: A Persisted Spark DataFrame is computed twice

6 matches

Site Navigation

Mail list logo

Footer information