Re: newbie question for reduce

2022-01-18 Thread Sean Owen
The problem is that you are reducing a list of tuples, but you are producing an int. The resulting int can't be combined with other tuples with your function. reduce() has to produce the same type as its arguments. rdd.map(lambda x: x[1]).reduce(lambda x,y: x+y) ... would work On Tue, Jan 18,

newbie question for reduce

2022-01-18 Thread capitnfrakass
Hello Please help take a look why my this simple reduce doesn't work? rdd = sc.parallelize([("a",1),("b",2),("c",3)]) rdd.reduce(lambda x,y: x[1]+y[1]) Traceback (most recent call last): File "", line 1, in File "/opt/spark/python/pyspark/rdd.py", line 1001, in reduce return

Re: [Pyspark] How to download Zip file from SFTP location and put in into Azure Data Lake and unzip it

2022-01-18 Thread Wes Peng
How large is the file? From my experience, reading the excel file from data lake and loading as dataframe, works great. Thanks On 2022-01-18 22:16, Heta Desai wrote: Hello, I have zip files on SFTP location. I want to download/copy those files and put into Azure Data Lake. Once the zip

[Pyspark] How to download Zip file from SFTP location and put in into Azure Data Lake and unzip it

2022-01-18 Thread Heta Desai
Hello, I have zip files on SFTP location. I want to download/copy those files and put into Azure Data Lake. Once the zip files get stored into Azure Data Lake, I want to unzip those files and read using Data Frames. The file format inside zip is excel. SO, once files are unzipped, I want to

[ML Intermediate]: Slow fitting of Linear regression vs Sklearn

2022-01-18 Thread Hu You
Hi team, We are testing the performance and capability of Spark for Linear regression application to replace at least sklearn linear regression. We firstly generated data for model fitting via sklearn.dataset.make_regression. See the generation code

Re: Spark on k8s : spark 3.0.1 spark.kubernetes.executor.deleteontermination issue

2022-01-18 Thread Pralabh Kumar
Does this property spark.kubernetes.executor.deleteontermination checks whether the executor which is deleted have shuffle data or not ? On Tue, 18 Jan 2022, 11:20 Pralabh Kumar, wrote: > Hi spark team > > Have cluster wide property spark.kubernetis.executor.deleteontermination > to true. >

Regarding spark-3.2.0 decommission features.

2022-01-18 Thread Patidar, Mohanlal (Nokia - IN/Bangalore)
Hi, We're using Spark 3.2.0 and we have enabled the spark decommission feature. As part of validating this feature, we wanted to check if the rdd blocks and shuffle blocks from the decommissioned executors are migrated to other executors. However, we could not see this happening. Below is