Hello All,
I'm running a simple Structured Streaming on GCP, which reads data from
Kafka and prints onto console.
Command :
cloud dataproc jobs submit pyspark
/Users/karanalang/Documents/Technology/gcp/DataProc/StructuredStreaming_Kafka_GCP-Batch-feb1.py
--cluster dataproc-ss-poc
Hi Mich,
thnx, seems 'complete' mode is supported only if there are streaming
aggregations.
I get this error on changing the output mode.
pyspark.sql.utils.AnalysisException: Complete output mode not supported
when there are no streaming aggregations on streaming DataFrames/Datasets;
Project
hm.
I am trying to recall if I am correct so you should try
outpudeMode('complete') with format('console')
result = resultMF. \
writeStream. \
outputMode('complete'). \
option("numRows", 1000). \
Hello Spark Experts,
I've a simple Structured Streaming program, which reads data from Kafka,
and writes on the console. This is working in batch mode (i.e spark.read or
df.write), not not working in streaming mode.
Details are in the stackoverflow
At a glance, it doesn't seem so. That is a corner case in two ways - very
old dates and using RDDs, at least it seems.
I also suspect that individual change is tied to a lot of other date
related changes in 3.2, so may not be very back-portable.
You should pursue updating to 3.2 for many reasons,
Hi,
Can you please try to use SPARK SQL, instead of dataframes and see the
difference?
You will get a lot of theoretical arguments, and that is fine, but they are
just largely and essentially theories.
Also try to apply the function to the result of the filters as a sub-query
by caching in the