date:20211226

my first data science project with spark

2021-12-26 Thread bitfox

Hello list, Thanks to Spark project and the community I have made my first data statistics project with Spark. The url: https://github.com/bitfoxtop/EmailRankings Surely this is not that big-data... I can even write a python script to finish the job more quickly. But since the job was done

some questions when using structure streaming

2021-12-26 Thread fangmin

Hi developers, I using structured streaming + kafka. Last week on prd environment when one node（kafka cluster） crashed ,my application consumes will become slowly, but when I used kafka console it can consume message and the speed is ok. On fat environment,I kill the kafka process

some errors occur when using structured streaming

2021-12-26 Thread fangmin

Hi spark developers, I ask one question on issure board:SPARK-37720.（Error reading delta file,hdfs://BMT163/state/0/0/2879.delta does not exist） Answers: mismatch spark core and python. I am comfused: if it causes by mismatch version,it maybe happend always,but now it occured

Pyspark debugging best practices

2021-12-26 Thread Andrew Davidson

Hi I am having trouble debugging my driver. It runs correctly on smaller data set but fails on large ones. It is very hard to figure out what the bug is. I suspect it may have something do with the way spark is installed and configured. I am using google cloud platform dataproc pyspark The

Pyspark garbage collection and cache management best practices

2021-12-26 Thread Andrew Davidson

Hi Below is typical pseudo code I find myself writing over and over again. There is only a single action at the very end of the program. The early narrow transformations potentially hold on to a lot of needless data. I have a for loop over join. (ie wide transformation). Followed by a bunch

my first data science project with spark

some questions when using structure streaming

some errors occur when using structured streaming

Pyspark debugging best practices

Pyspark garbage collection and cache management best practices

5 matches

Site Navigation

Mail list logo

Footer information