Hello list,
Thanks to Spark project and the community I have made my first data
statistics project with Spark.
The url: https://github.com/bitfoxtop/EmailRankings
Surely this is not that big-data... I can even write a python script to
finish the job more quickly.
But since the job was done
Hi developers,
I using structured streaming + kafka.
Last week on prd environment when one node(kafka cluster) crashed ,my
application consumes will become slowly,
but when I used kafka console it can consume message and the speed is ok.
On fat environment,I kill the kafka process
Hi spark developers,
I ask one question on issure board:SPARK-37720.(Error reading delta
file,hdfs://BMT163/state/0/0/2879.delta does not exist)
Answers: mismatch spark core and python.
I am comfused: if it causes by mismatch version,it maybe happend
always,but now it occured
Hi
I am having trouble debugging my driver. It runs correctly on smaller data set
but fails on large ones. It is very hard to figure out what the bug is. I
suspect it may have something do with the way spark is installed and
configured. I am using google cloud platform dataproc pyspark
The
Hi
Below is typical pseudo code I find myself writing over and over again. There
is only a single action at the very end of the program. The early narrow
transformations potentially hold on to a lot of needless data. I have a for
loop over join. (ie wide transformation). Followed by a bunch