Hi, Could you check the issue also occurs in v1.6.1 and v2.0?
// maropu On Wed, Jun 22, 2016 at 2:42 PM, Nirav Patel <npa...@xactlycorp.com> wrote: > I have an RDD[String, MyObj] which is a result of Join + Map operation. It > has no partitioner info. I run reduceByKey without passing any Partitioner > or partition counts. I observed that output aggregation result for given > key is incorrect sometime. like 1 out of 5 times. It looks like reduce > operation is joining values from two different keys. There is no > configuration change between multiple runs. I am scratching my head over > this. I verified results by printing out RDD before and after reduce > operation; collecting subset at driver. > > Besides shuffle and storage memory fraction I use following options: > > sparkConf.set("spark.driver.userClassPathFirst","true") > sparkConf.set("spark.unsafe.offHeap","true") > sparkConf.set("spark.reducer.maxSizeInFlight","128m") > sparkConf.set("spark.serializer", > "org.apache.spark.serializer.KryoSerializer") > > > > [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/> > > <https://www.nyse.com/quote/XNYS:XTLY> [image: LinkedIn] > <https://www.linkedin.com/company/xactly-corporation> [image: Twitter] > <https://twitter.com/Xactly> [image: Facebook] > <https://www.facebook.com/XactlyCorp> [image: YouTube] > <http://www.youtube.com/xactlycorporation> -- --- Takeshi Yamamuro