Re: Spark 1.5.2 - Different results from reduceByKey over multiple iterations

2016-06-23 Thread Nirav Patel
SO it was indeed my merge function. I created new result object for every
merge and its working now.

Thanks

On Wed, Jun 22, 2016 at 3:46 PM, Nirav Patel  wrote:

> PS. In my reduceByKey operation I have two mutable object. What I do is
> merge mutable2 into mutable1 and return mutable1. I read that it works for
> aggregateByKey so thought it will work for reduceByKey as well. I might be
> wrong here. Can someone verify if this will work or be un predictable?
>
> On Wed, Jun 22, 2016 at 11:52 AM, Nirav Patel 
> wrote:
>
>> Hi,
>>
>> I do not see any indication of errors or executor getting killed in spark
>> UI - jobs, stages, event timelines. No task failures. I also don't see any
>> errors in executor logs.
>>
>> Thanks
>>
>> On Wed, Jun 22, 2016 at 2:32 AM, Ted Yu  wrote:
>>
>>> For the run which returned incorrect result, did you observe any error
>>> (on workers) ?
>>>
>>> Cheers
>>>
>>> On Tue, Jun 21, 2016 at 10:42 PM, Nirav Patel 
>>> wrote:
>>>
 I have an RDD[String, MyObj] which is a result of Join + Map operation.
 It has no partitioner info. I run reduceByKey without passing any
 Partitioner or partition counts.  I observed that output aggregation result
 for given key is incorrect sometime. like 1 out of 5 times. It looks like
 reduce operation is joining values from two different keys. There is no
 configuration change between multiple runs. I am scratching my head over
 this. I verified results by printing out RDD before and after reduce
 operation; collecting subset at driver.

 Besides shuffle and storage memory fraction I use following options:

 sparkConf.set("spark.driver.userClassPathFirst","true")
 sparkConf.set("spark.unsafe.offHeap","true")
 sparkConf.set("spark.reducer.maxSizeInFlight","128m")
 sparkConf.set("spark.serializer",
 "org.apache.spark.serializer.KryoSerializer")



 [image: What's New with Xactly]
 

   [image: LinkedIn]
   [image: Twitter]
   [image: Facebook]
   [image: YouTube]
 
>>>
>>>
>>>
>>
>

-- 


[image: What's New with Xactly] 

  [image: LinkedIn] 
  [image: Twitter] 
  [image: Facebook] 
  [image: YouTube] 



Re: Spark 1.5.2 - Different results from reduceByKey over multiple iterations

2016-06-22 Thread Nirav Patel
PS. In my reduceByKey operation I have two mutable object. What I do is
merge mutable2 into mutable1 and return mutable1. I read that it works for
aggregateByKey so thought it will work for reduceByKey as well. I might be
wrong here. Can someone verify if this will work or be un predictable?

On Wed, Jun 22, 2016 at 11:52 AM, Nirav Patel  wrote:

> Hi,
>
> I do not see any indication of errors or executor getting killed in spark
> UI - jobs, stages, event timelines. No task failures. I also don't see any
> errors in executor logs.
>
> Thanks
>
> On Wed, Jun 22, 2016 at 2:32 AM, Ted Yu  wrote:
>
>> For the run which returned incorrect result, did you observe any error
>> (on workers) ?
>>
>> Cheers
>>
>> On Tue, Jun 21, 2016 at 10:42 PM, Nirav Patel 
>> wrote:
>>
>>> I have an RDD[String, MyObj] which is a result of Join + Map operation.
>>> It has no partitioner info. I run reduceByKey without passing any
>>> Partitioner or partition counts.  I observed that output aggregation result
>>> for given key is incorrect sometime. like 1 out of 5 times. It looks like
>>> reduce operation is joining values from two different keys. There is no
>>> configuration change between multiple runs. I am scratching my head over
>>> this. I verified results by printing out RDD before and after reduce
>>> operation; collecting subset at driver.
>>>
>>> Besides shuffle and storage memory fraction I use following options:
>>>
>>> sparkConf.set("spark.driver.userClassPathFirst","true")
>>> sparkConf.set("spark.unsafe.offHeap","true")
>>> sparkConf.set("spark.reducer.maxSizeInFlight","128m")
>>> sparkConf.set("spark.serializer",
>>> "org.apache.spark.serializer.KryoSerializer")
>>>
>>>
>>>
>>> [image: What's New with Xactly] 
>>>
>>>   [image: LinkedIn]
>>>   [image: Twitter]
>>>   [image: Facebook]
>>>   [image: YouTube]
>>> 
>>
>>
>>
>

-- 


[image: What's New with Xactly] 

  [image: LinkedIn] 
  [image: Twitter] 
  [image: Facebook] 
  [image: YouTube] 



Re: Spark 1.5.2 - Different results from reduceByKey over multiple iterations

2016-06-22 Thread Nirav Patel
Hi,

I do not see any indication of errors or executor getting killed in spark
UI - jobs, stages, event timelines. No task failures. I also don't see any
errors in executor logs.

Thanks

On Wed, Jun 22, 2016 at 2:32 AM, Ted Yu  wrote:

> For the run which returned incorrect result, did you observe any error (on
> workers) ?
>
> Cheers
>
> On Tue, Jun 21, 2016 at 10:42 PM, Nirav Patel 
> wrote:
>
>> I have an RDD[String, MyObj] which is a result of Join + Map operation.
>> It has no partitioner info. I run reduceByKey without passing any
>> Partitioner or partition counts.  I observed that output aggregation result
>> for given key is incorrect sometime. like 1 out of 5 times. It looks like
>> reduce operation is joining values from two different keys. There is no
>> configuration change between multiple runs. I am scratching my head over
>> this. I verified results by printing out RDD before and after reduce
>> operation; collecting subset at driver.
>>
>> Besides shuffle and storage memory fraction I use following options:
>>
>> sparkConf.set("spark.driver.userClassPathFirst","true")
>> sparkConf.set("spark.unsafe.offHeap","true")
>> sparkConf.set("spark.reducer.maxSizeInFlight","128m")
>> sparkConf.set("spark.serializer",
>> "org.apache.spark.serializer.KryoSerializer")
>>
>>
>>
>> [image: What's New with Xactly] 
>>
>>   [image: LinkedIn]
>>   [image: Twitter]
>>   [image: Facebook]
>>   [image: YouTube]
>> 
>
>
>

-- 


[image: What's New with Xactly] 

  [image: LinkedIn] 
  [image: Twitter] 
  [image: Facebook] 
  [image: YouTube] 



Re: Spark 1.5.2 - Different results from reduceByKey over multiple iterations

2016-06-22 Thread Ted Yu
For the run which returned incorrect result, did you observe any error (on
workers) ?

Cheers

On Tue, Jun 21, 2016 at 10:42 PM, Nirav Patel  wrote:

> I have an RDD[String, MyObj] which is a result of Join + Map operation. It
> has no partitioner info. I run reduceByKey without passing any Partitioner
> or partition counts.  I observed that output aggregation result for given
> key is incorrect sometime. like 1 out of 5 times. It looks like reduce
> operation is joining values from two different keys. There is no
> configuration change between multiple runs. I am scratching my head over
> this. I verified results by printing out RDD before and after reduce
> operation; collecting subset at driver.
>
> Besides shuffle and storage memory fraction I use following options:
>
> sparkConf.set("spark.driver.userClassPathFirst","true")
> sparkConf.set("spark.unsafe.offHeap","true")
> sparkConf.set("spark.reducer.maxSizeInFlight","128m")
> sparkConf.set("spark.serializer",
> "org.apache.spark.serializer.KryoSerializer")
>
>
>
> [image: What's New with Xactly] 
>
>   [image: LinkedIn]
>   [image: Twitter]
>   [image: Facebook]
>   [image: YouTube]
> 


Re: Spark 1.5.2 - Different results from reduceByKey over multiple iterations

2016-06-22 Thread Takeshi Yamamuro
Hi,

Could you check the issue also occurs in v1.6.1 and v2.0?

// maropu

On Wed, Jun 22, 2016 at 2:42 PM, Nirav Patel  wrote:

> I have an RDD[String, MyObj] which is a result of Join + Map operation. It
> has no partitioner info. I run reduceByKey without passing any Partitioner
> or partition counts.  I observed that output aggregation result for given
> key is incorrect sometime. like 1 out of 5 times. It looks like reduce
> operation is joining values from two different keys. There is no
> configuration change between multiple runs. I am scratching my head over
> this. I verified results by printing out RDD before and after reduce
> operation; collecting subset at driver.
>
> Besides shuffle and storage memory fraction I use following options:
>
> sparkConf.set("spark.driver.userClassPathFirst","true")
> sparkConf.set("spark.unsafe.offHeap","true")
> sparkConf.set("spark.reducer.maxSizeInFlight","128m")
> sparkConf.set("spark.serializer",
> "org.apache.spark.serializer.KryoSerializer")
>
>
>
> [image: What's New with Xactly] 
>
>   [image: LinkedIn]
>   [image: Twitter]
>   [image: Facebook]
>   [image: YouTube]
> 




-- 
---
Takeshi Yamamuro


Spark 1.5.2 - Different results from reduceByKey over multiple iterations

2016-06-21 Thread Nirav Patel
I have an RDD[String, MyObj] which is a result of Join + Map operation. It
has no partitioner info. I run reduceByKey without passing any Partitioner
or partition counts.  I observed that output aggregation result for given
key is incorrect sometime. like 1 out of 5 times. It looks like reduce
operation is joining values from two different keys. There is no
configuration change between multiple runs. I am scratching my head over
this. I verified results by printing out RDD before and after reduce
operation; collecting subset at driver.

Besides shuffle and storage memory fraction I use following options:

sparkConf.set("spark.driver.userClassPathFirst","true")
sparkConf.set("spark.unsafe.offHeap","true")
sparkConf.set("spark.reducer.maxSizeInFlight","128m")
sparkConf.set("spark.serializer",
"org.apache.spark.serializer.KryoSerializer")

-- 


[image: What's New with Xactly] 

  [image: LinkedIn] 
  [image: Twitter] 
  [image: Facebook] 
  [image: YouTube]