Re: Spark 1.5.2 - Different results from reduceByKey over multiple iterations

Takeshi Yamamuro Wed, 22 Jun 2016 00:13:07 -0700

Hi,

Could you check the issue also occurs in v1.6.1 and v2.0?


// maropu

On Wed, Jun 22, 2016 at 2:42 PM, Nirav Patel <npa...@xactlycorp.com> wrote:

> I have an RDD[String, MyObj] which is a result of Join + Map operation. It
> has no partitioner info. I run reduceByKey without passing any Partitioner
> or partition counts.  I observed that output aggregation result for given
> key is incorrect sometime. like 1 out of 5 times. It looks like reduce
> operation is joining values from two different keys. There is no
> configuration change between multiple runs. I am scratching my head over
> this. I verified results by printing out RDD before and after reduce
> operation; collecting subset at driver.
>
> Besides shuffle and storage memory fraction I use following options:
>
> sparkConf.set("spark.driver.userClassPathFirst","true")
> sparkConf.set("spark.unsafe.offHeap","true")
> sparkConf.set("spark.reducer.maxSizeInFlight","128m")
> sparkConf.set("spark.serializer",
> "org.apache.spark.serializer.KryoSerializer")
>
>
>
> [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>
>
> <https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn]
> <https://www.linkedin.com/company/xactly-corporation>  [image: Twitter]
> <https://twitter.com/Xactly>  [image: Facebook]
> <https://www.facebook.com/XactlyCorp>  [image: YouTube]
> <http://www.youtube.com/xactlycorporation>




-- 
---
Takeshi Yamamuro

Re: Spark 1.5.2 - Different results from reduceByKey over multiple iterations

Reply via email to