The problem is that you are reducing a list of tuples, but you are
producing an int. The resulting int can't be combined with other tuples
with your function. reduce() has to produce the same type as its arguments.
rdd.map(lambda x: x[1]).reduce(lambda x,y: x+y)
... would work
On Tue, Jan 18,
Hello
Please help take a look why my this simple reduce doesn't work?
rdd = sc.parallelize([("a",1),("b",2),("c",3)])
rdd.reduce(lambda x,y: x[1]+y[1])
Traceback (most recent call last):
File "", line 1, in
File "/opt/spark/python/pyspark/rdd.py", line 1001, in reduce
return
How large is the file? From my experience, reading the excel file from
data lake and loading as dataframe, works great.
Thanks
On 2022-01-18 22:16, Heta Desai wrote:
Hello,
I have zip files on SFTP location. I want to download/copy those
files and put into Azure Data Lake. Once the zip
Hello,
I have zip files on SFTP location. I want to download/copy those files and put
into Azure Data Lake. Once the zip files get stored into Azure Data Lake, I
want to unzip those files and read using Data Frames.
The file format inside zip is excel. SO, once files are unzipped, I want to
Hi team,
We are testing the performance and capability of Spark for Linear regression
application to replace at least sklearn linear regression.
We firstly generated data for model fitting via
sklearn.dataset.make_regression. See the generation code
Does this property spark.kubernetes.executor.deleteontermination checks
whether the executor which is deleted have shuffle data or not ?
On Tue, 18 Jan 2022, 11:20 Pralabh Kumar, wrote:
> Hi spark team
>
> Have cluster wide property spark.kubernetis.executor.deleteontermination
> to true.
>
Hi,
We're using Spark 3.2.0 and we have enabled the spark decommission
feature. As part of validating this feature, we wanted to check if the rdd
blocks and shuffle blocks from the decommissioned executors are migrated to
other executors.
However, we could not see this happening. Below is