Re: [EXT] handling skewness issues

2019-04-29 Thread Michael Mansour
There were recently some fantastic talks about this at the SparkSummit conference in San Francisco. I suggest you check out the SparkSummit YouTube channel after May 9th for a deep dive into this topic. From: rajat kumar Date: Monday, April 29, 2019 at 9:34 AM To: "user@spark.apache.org"

Re: [EXT] [Spark 2.x Core] .collect() size limit

2018-04-30 Thread Michael Mansour
expand on what you're trying to achieve here. -- Michael Mansour Data Scientist Symantec CASB On 4/28/18, 8:41 AM, "klrmowse" <klrmo...@gmail.com> wrote: i am currently trying to find a workaround for the Spark application i am working on so that it does not have

Re: [EXT] Debugging a local spark executor in pycharm

2018-03-13 Thread Michael Mansour
, and pass it into the function. This alleviates the need to write debugging code etc. I find this model useful and a bit more fast, but it does not offer the step-through capability. Best of luck! M -- Michael Mansour Data Scientist Symantec CASB From: Vitaliy Pisarev <vitaliy.pisa...@biocatch.

Re: [EXT] How do I extract a value in foreachRDD operation

2018-01-22 Thread Michael Mansour
Toy, I suggest your partition your data according to date, and use the forEachPartition function, using the partition as the bucket location. This would require you to define a custom hash partitioner function, but that is not too difficult. -- Michael Mansour Data Scientist Symantec From: Toy

[PySpark] - Broadcast Variable Pickle Registry Usage?

2017-05-24 Thread Michael Mansour (CS)
Hi all, I’m poking around the Pyspark.Broadcast module, and I notice that one can pass in a `pickle_registry` and a `path`. The documentation does not outline the pickle registry use and I’m curious about how to use it, and if there are any advantages to it. Thanks, Michael Mansour

Re: [EXT] Re: [Spark Core]: Python and Scala generate different DAGs for identical code

2017-05-10 Thread Michael Mansour (CS)
expression” tool, and pass them through the function In expression evaluator. Hope this helps -- Michael Mansour -- Michael Mansour Data Scientist Symantec Cloud Security From: Pavel Klemenkov <pklemen...@gmail.com> Date: Wednesday, May 10, 2017 at 10:43 AM To: "user@spark.apache