date:20190731

Re: [Pyspark 2.4] not able to partition the data frame by dates

2019-07-31 Thread Rishi Shah

Thanks for your prompt reply Gourav. I am using Spark 2.4.0 (cloudera distribution). The job consistently threw this error, so I narrowed down the dataset by adding a date filter (date rang: 2018-01-01 to 2018-06-30).. However it's still throwing the same error! *command*: spark2-submit --master

Re: [Pyspark 2.4] not able to partition the data frame by dates

2019-07-31 Thread Gourav Sengupta

Hi Rishi, there is no version as 2.4 :), can you please specify the exact SPARK version you are using? How are you starting the SPARK session? And what is the environment? I know this issue occurs intermittently over large writes in S3 and has to do with S3 eventual consistency issues. Just

[Pyspark 2.4] not able to partition the data frame by dates

2019-07-31 Thread Rishi Shah

Hi All, I have a dataframe of size 2.7T (parquet) which I need to partition by date, however below spark program doesn't help - keeps failing due to *file already exists exception..* df = spark.read.parquet(INPUT_PATH)

Announcing .NET for Apache Spark 0.4.0

2019-07-31 Thread Terry Kim

We are thrilled to announce that .NET for Apache Spark 0.4.0 has been just released ! Some of the highlights of this release include: - Apache Arrow backed UDFs (Vector UDF, Grouped Map UDF) - Robust UDF-related assembly loading -

Re: Spark Image resizing

2019-07-31 Thread Patrick McCarthy

It won't be very efficient but you could write a python UDF using PythonMagick - https://wiki.python.org/moin/ImageMagick If you have PyArrow > 0.10 then you might be able to get a boost by saving images in a column as BinaryType and writing a PandasUDF. On Wed, Jul 31, 2019 at 6:22 AM Nick

Re: Core allocation is scattered

2019-07-31 Thread Muthu Jayakumar

>I am running a spark job with 20 cores but i did not understand why my application get 1-2 cores on couple of machines why not it just run on two nodes like node1=16 cores and node 2=4 cores . but cores are allocated like node1=2 node =1-node 14=1 like that. I believe that's the intended

Re: Kafka Integration libraries put in the fat jar

2019-07-31 Thread Spico Florin

Hi! Thanks to Jacek Laskowski , I found the answer here https://stackoverflow.com/questions/51792203/how-to-get-spark-kafka-org-apache-sparkspark-sql-kafka-0-10-2-112-1-0-dependen Just add the maven shade plugin:

Re: Spark Image resizing

2019-07-31 Thread Nick Dawes

Any other way of resizing the image before creating the DataFrame in Spark? I know opencv does it. But I don't have opencv on my cluster. I have Anaconda python packages installed on my cluster. Any ideas will be appreciated. Thank you! On Tue, Jul 30, 2019, 4:17 PM Nick Dawes wrote: > Hi > >

Re: [Pyspark 2.4] not able to partition the data frame by dates

Re: [Pyspark 2.4] not able to partition the data frame by dates

[Pyspark 2.4] not able to partition the data frame by dates

Announcing .NET for Apache Spark 0.4.0

Re: Spark Image resizing

Re: Core allocation is scattered

Re: Kafka Integration libraries put in the fat jar

Re: Spark Image resizing

8 matches

Site Navigation

Mail list logo

Footer information