Re: unsubscribe

2020-05-16 Thread Hichame El Khalfi
From: Basavaraj Sent: Friday, May 15, 2020 9:12:01 PM To: spark users Subject: unsubscribe

Re: Start point to read source codes

2019-09-05 Thread Hichame El Khalfi
Hey David, You can the source code on GitHub: https://github.com/apache/spark Hope this helps, Hichame From: zhou10...@gmail.com Sent: September 5, 2019 4:11 PM To: user@spark.apache.org Subject: Start point to read source codes Hi, I want to read the source codes. Is there any doc, wiki or

Re: JDK11 Support in Apache Spark

2019-08-25 Thread Hichame El Khalfi
That's Awesome !!! Thanks to everyone that made this possible :cheers: Hichame From: cloud0...@gmail.com Sent: August 25, 2019 10:43 PM To: lix...@databricks.com Cc: felixcheun...@hotmail.com; ravishankar.n...@gmail.com; dongjoon.h...@gmail.com; d...@spark.apache.org; user@spark.apache.org

Re: testing frameworks

2019-02-03 Thread Hichame El Khalfi
Hi, You can use pysparkling => https://github.com/svenkreiss/pysparkling This lib is useful in case you have RDD. Hope this helps, Hichame From: mmistr...@gmail.com Sent: February 3, 2019 4:42 PM To: radams...@gmail.com Cc: la...@mapflat.com; bpru...@opentext.com; user@spark.apache.org Subject:

Re: Persist Dataframe to HDFS considering HDFS Block Size.

2019-01-19 Thread Hichame El Khalfi
You can do this in 2 passes (not one) A) save you dataset into hdfs with what you have. B) calculate number of partition, n= (size of your dataset)/hdfs block size Then run simple spark job to read and partition based on 'n'. Hichame From: felixcheun...@hotmail.com Sent: January 19, 2019 2:06 PM

Kryoserializer with pyspark

2018-08-09 Thread Hichame El Khalfi
Hello there !!! Is there any benefit from tuning `spark.kryoserializer.buffer` and `spark.kryoserializer.buffer.max` if we just use pyspark wth no Java or Scala classes ? Thanks for your help, Hichame

Re: Use Arrow instead of Pickle without pandas_udf

2018-07-31 Thread Hichame El Khalfi
pandas_udf https://issues.apache.org/jira/browse/SPARK-24579 On Wed, Jul 25, 2018 at 3:36 PM, Hichame El Khalfi mailto:hich...@elkhalfi.com>> wrote: Hey Holden, Thanks for your reply, We currently using a python function that produces a Row(TS=LongType(), bin=BinaryType()). We use this fu

Re: Use Arrow instead of Pickle without pandas_udf

2018-07-25 Thread Hichame El Khalfi
PM To: hich...@elkhalfi.com Cc: user@spark.apache.org Subject: Re: Use Arrow instead of Pickle without pandas_udf Not currently. What's the problem with pandas_udf for your use case? On Wed, Jul 25, 2018 at 1:27 PM, Hichame El Khalfi mailto:hich...@elkhalfi.com>> wrot

Use Arrow instead of Pickle without pandas_udf

2018-07-25 Thread Hichame El Khalfi
Hi There, Is there a way to use Arrow format instead of Pickle but without using pandas_udf ? Thank for your help, Hichame