From: Basavaraj
Sent: Friday, May 15, 2020 9:12:01 PM
To: spark users
Subject: unsubscribe
Hey David,
You can the source code on GitHub:
https://github.com/apache/spark
Hope this helps,
Hichame
From: zhou10...@gmail.com
Sent: September 5, 2019 4:11 PM
To: user@spark.apache.org
Subject: Start point to read source codes
Hi,
I want to read the source codes. Is there any doc, wiki or
That's Awesome !!!
Thanks to everyone that made this possible :cheers:
Hichame
From: cloud0...@gmail.com
Sent: August 25, 2019 10:43 PM
To: lix...@databricks.com
Cc: felixcheun...@hotmail.com; ravishankar.n...@gmail.com;
dongjoon.h...@gmail.com; d...@spark.apache.org; user@spark.apache.org
Hi,
You can use pysparkling => https://github.com/svenkreiss/pysparkling
This lib is useful in case you have RDD.
Hope this helps,
Hichame
From: mmistr...@gmail.com
Sent: February 3, 2019 4:42 PM
To: radams...@gmail.com
Cc: la...@mapflat.com; bpru...@opentext.com; user@spark.apache.org
Subject:
You can do this in 2 passes (not one)
A) save you dataset into hdfs with what you have.
B) calculate number of partition, n= (size of your dataset)/hdfs block size
Then run simple spark job to read and partition based on 'n'.
Hichame
From: felixcheun...@hotmail.com
Sent: January 19, 2019 2:06 PM
Hello there !!!
Is there any benefit from tuning `spark.kryoserializer.buffer` and
`spark.kryoserializer.buffer.max` if we just use pyspark wth no Java or Scala
classes ?
Thanks for your help,
Hichame
pandas_udf
https://issues.apache.org/jira/browse/SPARK-24579
On Wed, Jul 25, 2018 at 3:36 PM, Hichame El Khalfi
mailto:hich...@elkhalfi.com>> wrote:
Hey Holden,
Thanks for your reply,
We currently using a python function that produces a Row(TS=LongType(),
bin=BinaryType()).
We use this fu
PM
To: hich...@elkhalfi.com
Cc: user@spark.apache.org
Subject: Re: Use Arrow instead of Pickle without pandas_udf
Not currently. What's the problem with pandas_udf for your use case?
On Wed, Jul 25, 2018 at 1:27 PM, Hichame El Khalfi
mailto:hich...@elkhalfi.com>> wrot
Hi There,
Is there a way to use Arrow format instead of Pickle but without using
pandas_udf ?
Thank for your help,
Hichame