RE: Cannot read reducer values into a list

2008-08-20 Thread Deepika Khera
Thanks...this works beautifully :) !

Deepika

-Original Message-
From: Owen O'Malley [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, August 20, 2008 7:52 AM
To: core-user@hadoop.apache.org
Subject: Re: Cannot read reducer values into a list


On Aug 19, 2008, at 4:57 PM, Deepika Khera wrote:

> Thanks for the clarification on this.
>
> So, it seems like cloning the object before adding to the list is the
> only solution for this problem. Is that right?

Yes. You can use WritableUtils.clone to do the job.

-- Owen


Re: Cannot read reducer values into a list

2008-08-20 Thread Owen O'Malley


On Aug 19, 2008, at 4:57 PM, Deepika Khera wrote:


Thanks for the clarification on this.

So, it seems like cloning the object before adding to the list is the
only solution for this problem. Is that right?


Yes. You can use WritableUtils.clone to do the job.

-- Owen


RE: Cannot read reducer values into a list

2008-08-19 Thread Deepika Khera
Thanks for the clarification on this.

So, it seems like cloning the object before adding to the list is the
only solution for this problem. Is that right?

Deepika

-Original Message-
From: Arun C Murthy [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, August 19, 2008 4:49 PM
To: core-user@hadoop.apache.org
Subject: Re: Cannot read reducer values into a list


On Aug 19, 2008, at 3:58 PM, Deepika Khera wrote:

> Hi,
>
> Are we sure that this issue was fixed in 0.17.0(or do we need to  
> patch).
> I am using this version and I still see the issue?
>

Sorry if it wasn't clear: HADOOP-2399 was an 'enhancement' and hence  
applications henceforth need to be aware that the framework will reuse  
the key/value objects.

So, your application will have to be aware of this from 0.17 onwards  
and work around this...

Arun

>
> Thanks,
> Deepika
>
> -Original Message-
> From: Arun C Murthy [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, August 19, 2008 1:04 PM
> To: core-user@hadoop.apache.org
> Subject: Re: Cannot read reducer values into a list
>
>
> On Aug 19, 2008, at 12:17 PM, Stuart Sierra wrote:
>
>> Hello list,
>> Thought I would share this tidbit that frustrated me for a couple of
>> hours.  Beware!  Hadoop reuses the Writable objects given to the
>> reducer.  For example:
>>
>
> Yes.
> http://issues.apache.org/jira/browse/HADOOP-2399 - fixed in 0.17.0.
>
> Arun
>
>>   public void reduce(K key, Iterator values,
>>  OutputCollector output,
>>  Reporter reporter)
>>   throws IOException {
>>
>>   List valueList = new ArrayList();
>>   while (values.hasNext()) {
>>   valueList.add(values.next());
>>   }
>>
>>   // Say there were 10 values.  valueList now contains 10
>>   // pointers to the same object.
>>   }
>>
>> I assume this is done for efficiency, but a warning in the Reducer
>> documentation would be nice.
>>
>> -Stuart
>



Re: Cannot read reducer values into a list

2008-08-19 Thread Arun C Murthy


On Aug 19, 2008, at 3:58 PM, Deepika Khera wrote:


Hi,

Are we sure that this issue was fixed in 0.17.0(or do we need to  
patch).

I am using this version and I still see the issue?



Sorry if it wasn't clear: HADOOP-2399 was an 'enhancement' and hence  
applications henceforth need to be aware that the framework will reuse  
the key/value objects.


So, your application will have to be aware of this from 0.17 onwards  
and work around this...


Arun



Thanks,
Deepika

-Original Message-
From: Arun C Murthy [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 19, 2008 1:04 PM
To: core-user@hadoop.apache.org
Subject: Re: Cannot read reducer values into a list


On Aug 19, 2008, at 12:17 PM, Stuart Sierra wrote:


Hello list,
Thought I would share this tidbit that frustrated me for a couple of
hours.  Beware!  Hadoop reuses the Writable objects given to the
reducer.  For example:



Yes.
http://issues.apache.org/jira/browse/HADOOP-2399 - fixed in 0.17.0.

Arun


  public void reduce(K key, Iterator values,
 OutputCollector output,
 Reporter reporter)
  throws IOException {

  List valueList = new ArrayList();
  while (values.hasNext()) {
  valueList.add(values.next());
  }

  // Say there were 10 values.  valueList now contains 10
  // pointers to the same object.
  }

I assume this is done for efficiency, but a warning in the Reducer
documentation would be nice.

-Stuart






RE: Cannot read reducer values into a list

2008-08-19 Thread Deepika Khera
Hi,

Are we sure that this issue was fixed in 0.17.0(or do we need to patch).
I am using this version and I still see the issue? 


Thanks,
Deepika 

-Original Message-
From: Arun C Murthy [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, August 19, 2008 1:04 PM
To: core-user@hadoop.apache.org
Subject: Re: Cannot read reducer values into a list


On Aug 19, 2008, at 12:17 PM, Stuart Sierra wrote:

> Hello list,
> Thought I would share this tidbit that frustrated me for a couple of
> hours.  Beware!  Hadoop reuses the Writable objects given to the
> reducer.  For example:
>

Yes.
http://issues.apache.org/jira/browse/HADOOP-2399 - fixed in 0.17.0.

Arun

>public void reduce(K key, Iterator values,
>   OutputCollector output,
>   Reporter reporter)
>throws IOException {
>
>List valueList = new ArrayList();
>while (values.hasNext()) {
>valueList.add(values.next());
>}
>
>// Say there were 10 values.  valueList now contains 10
>// pointers to the same object.
>}
>
> I assume this is done for efficiency, but a warning in the Reducer
> documentation would be nice.
>
> -Stuart



Re: Cannot read reducer values into a list

2008-08-19 Thread Owen O'Malley


On Aug 19, 2008, at 12:27 PM, Albert Chern wrote:

Thanks for the heads up.  Does anyone know when this change was  
introduced?
I am certain that old versions of Hadoop create a new instance of  
the key

and value each time a data pair is read.


It was Hadoop-2399. Previously the inner loop had to allocate a new  
key and value for each call to reduce. They are now reused, which does  
mean that if you need to keep them around, they must be cloned via  
WritableUtils.clone. Note that inputs to the map are also reused...


-- Owen


Re: Cannot read reducer values into a list

2008-08-19 Thread Arun C Murthy


On Aug 19, 2008, at 12:17 PM, Stuart Sierra wrote:


Hello list,
Thought I would share this tidbit that frustrated me for a couple of
hours.  Beware!  Hadoop reuses the Writable objects given to the
reducer.  For example:



Yes.
http://issues.apache.org/jira/browse/HADOOP-2399 - fixed in 0.17.0.

Arun


   public void reduce(K key, Iterator values,
  OutputCollector output,
  Reporter reporter)
   throws IOException {

   List valueList = new ArrayList();
   while (values.hasNext()) {
   valueList.add(values.next());
   }

   // Say there were 10 values.  valueList now contains 10
   // pointers to the same object.
   }

I assume this is done for efficiency, but a warning in the Reducer
documentation would be nice.

-Stuart




Re: Cannot read reducer values into a list

2008-08-19 Thread Albert Chern
Thanks for the heads up.  Does anyone know when this change was introduced?
I am certain that old versions of Hadoop create a new instance of the key
and value each time a data pair is read.

On Tue, Aug 19, 2008 at 12:17 PM, Stuart Sierra <[EMAIL PROTECTED]>wrote:

> Hello list,
> Thought I would share this tidbit that frustrated me for a couple of
> hours.  Beware!  Hadoop reuses the Writable objects given to the
> reducer.  For example:
>
>public void reduce(K key, Iterator values,
>   OutputCollector output,
>   Reporter reporter)
>throws IOException {
>
>List valueList = new ArrayList();
>while (values.hasNext()) {
>valueList.add(values.next());
>}
>
>// Say there were 10 values.  valueList now contains 10
>// pointers to the same object.
>}
>
> I assume this is done for efficiency, but a warning in the Reducer
> documentation would be nice.
>
> -Stuart
>