Re: [jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-19 Thread Grant Ingersoll
Never mind on this, I read some emails out of context and now realize  
this has been addressed.


On Mar 19, 2009, at 6:57 AM, Grant Ingersoll (JIRA) wrote:



   [ https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683426 
#action_12683426 ]


Grant Ingersoll commented on MAHOUT-99:
---

For the record, I ran Canopy independently, and that worked just fine.






Improving speed of KMeans
-

   Key: MAHOUT-99
   URL: https://issues.apache.org/jira/browse/MAHOUT-99
   Project: Mahout
Issue Type: Improvement
Components: Clustering
  Reporter: Pallavi Palleti
  Assignee: Grant Ingersoll
   Fix For: 0.1

   Attachments: MAHOUT-99-1.patch, Mahout-99.patch,  
MAHOUT-99.patch



Improved the speed of KMeans by passing only cluster ID from mapper  
to reducer. Previously, whole Cluster Info as formatted s`tring was  
being sent.
Also removed the implicit assumption of Combiner runs only once  
approach and the code is modified accordingly so that it won't  
create a bug when combiner runs zero or more than once.


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.





RE: [jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Palleti, Pallavi
It depends on the kind of output. If we are just outputting only some numeric 
values then it is preferred to have SequenceFile as the data is written as 
binary. If not, it is preferred to write as simple text. Text file is readable 
where as binary is not readable. 

As we consider the data as text in reducers of both Canopy and KMeans, I don't 
see any performance improvement in using SequenceFile. So, I used 
TextInputFormat which is read friendly.
 
Thanks
Pallavi

-Original Message-
From: Jeff Eastman [mailto:j...@windwardsolutions.com] 
Sent: Thursday, March 19, 2009 10:19 AM
To: mahout-dev@lucene.apache.org
Subject: Re: [jira] Commented: (MAHOUT-99) Improving speed of KMeans

Also why not consider just converting canopy? Which reader is better?


Jeff Eastman wrote:
> * PGP Signed: 03/18/09 at 21:37:36
>
> Sure, why don't you go ahead and post a patch?
>
>
> Pallavi Palleti (JIRA) wrote:
>> [
>> https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.ji
>> ra.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=126
>> 83312#action_12683312
>> ]
>> Pallavi Palleti commented on MAHOUT-99:
>> ---
>>
>> I have used KeyValueLineRecordReader internally for my code and 
>> forgot to revert back to SequenceFileReader. Will that be sufficient 
>> to add another patch on the latest code and modify only KMeansDriver 
>> to use SequenceFileReader? Kindly let me know.
>>
>> Thanks
>> Pallavi
>>
>>  
>>> Improving speed of KMeans
>>> -
>>>
>>> Key: MAHOUT-99
>>> URL: https://issues.apache.org/jira/browse/MAHOUT-99
>>> Project: Mahout
>>>  Issue Type: Improvement
>>>  Components: Clustering
>>>Reporter: Pallavi Palleti
>>>Assignee: Grant Ingersoll
>>> Fix For: 0.1
>>>
>>> Attachments: MAHOUT-99-1.patch, Mahout-99.patch, 
>>> MAHOUT-99.patch
>>>
>>>
>>> Improved the speed of KMeans by passing only cluster ID from mapper 
>>> to reducer. Previously, whole Cluster Info as formatted s`tring was 
>>> being sent.
>>> Also removed the implicit assumption of Combiner runs only once 
>>> approach and the code is modified accordingly so that it won't 
>>> create a bug when combiner runs zero or more than once.
>>> 
>>
>>   
>
>
> * Jeff Eastman 
> * 0x6BFF1277
>
> .
>



RE: [jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Palleti, Pallavi
There is a testcase in TestKMeansClustering.java which actually uses the output 
of Canopy as input. This testcase succeeded without any issue. But the thing 
here is, it doesn't use hdfs but uses the local file system. So, this might be 
the reason why it is succeeded without any issue.

Thanks
Pallavi



-Original Message-
From: Jeff Eastman [mailto:j...@windwardsolutions.com] 
Sent: Thursday, March 19, 2009 10:14 AM
To: mahout-dev@lucene.apache.org
Subject: Re: [jira] Commented: (MAHOUT-99) Improving speed of KMeans

The unit tests dont care which format is used as long as it is consistent. The 
compiler helps enforce that. kMeans will run and its tests will pass. So will 
Canopy. When somebody runs the kMeans example it encounters the file format 
differences. Are all the examples run by the install? I'd be surprised.

Jeff


Palleti, Pallavi wrote:
> Yeah. But, I am wondering how the testcases succeeded? I ran them using "mvn 
> clean install" command.
>
> Thanks
> Pallavi
>
> -Original Message-
> From: Jeff Eastman [mailto:j...@windwardsolutions.com]
> Sent: Thursday, March 19, 2009 9:56 AM
> To: mahout-dev@lucene.apache.org
> Subject: Re: [jira] Commented: (MAHOUT-99) Improving speed of KMeans
>
> The Synthetic Control kMeans job calls the Canopy job to build its initial 
> clusters as is commonly done. If the kMeans record format was changed and the 
> Canopy not changed accordingly, then everything would still compile but there 
> would be a mismatch when the kMeans mapper tried to read in the clusters.
>
> Jeff
>
>
> Richard Tomsett (JIRA) wrote:
>   
>> [
>> https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.ji
>> r
>> a.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1268
>> 3
>> 252#action_12683252 ]
>>
>> Richard Tomsett commented on MAHOUT-99:
>> ---
>>
>> Yup, just downloaded the latest trunk and run with Hadoop 0.19.1 and I get 
>> the same error on the Synthetic Control example. It seems to be because the 
>> new KMeans code uses a KeyValueLineRecordReader object to read the input 
>> cluster centres from the canopy clustering output, but the canopy clustering 
>> job outputs a SequenceFile (and the old KMeans code read in a SequenceFile 
>> for the cluster centres). Think that's the problem at least, I''ll have a 
>> quick play.
>>
>>   
>> 
>>> Improving speed of KMeans
>>> -
>>>
>>> Key: MAHOUT-99
>>> URL: https://issues.apache.org/jira/browse/MAHOUT-99
>>> Project: Mahout
>>>  Issue Type: Improvement
>>>  Components: Clustering
>>>Reporter: Pallavi Palleti
>>>Assignee: Grant Ingersoll
>>> Fix For: 0.1
>>>
>>> Attachments: MAHOUT-99-1.patch, Mahout-99.patch, 
>>> MAHOUT-99.patch
>>>
>>>
>>> Improved the speed of KMeans by passing only cluster ID from mapper to 
>>> reducer. Previously, whole Cluster Info as formatted s`tring was being sent.
>>> Also removed the implicit assumption of Combiner runs only once approach 
>>> and the code is modified accordingly so that it won't create a bug when 
>>> combiner runs zero or more than once.
>>> 
>>>   
>>   
>> 
>
>   



Re: [jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Jeff Eastman

Also why not consider just converting canopy? Which reader is better?


Jeff Eastman wrote:

* PGP Signed: 03/18/09 at 21:37:36

Sure, why don't you go ahead and post a patch?


Pallavi Palleti (JIRA) wrote:
[ 
https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683312#action_12683312 
]

Pallavi Palleti commented on MAHOUT-99:
---

I have used KeyValueLineRecordReader internally for my code and 
forgot to revert back to SequenceFileReader. Will that be sufficient 
to add another patch on the latest code and modify only KMeansDriver 
to use SequenceFileReader? Kindly let me know.


Thanks
Pallavi

 

Improving speed of KMeans
-

Key: MAHOUT-99
URL: https://issues.apache.org/jira/browse/MAHOUT-99
Project: Mahout
 Issue Type: Improvement
 Components: Clustering
   Reporter: Pallavi Palleti
   Assignee: Grant Ingersoll
Fix For: 0.1

Attachments: MAHOUT-99-1.patch, Mahout-99.patch, 
MAHOUT-99.patch



Improved the speed of KMeans by passing only cluster ID from mapper 
to reducer. Previously, whole Cluster Info as formatted s`tring was 
being sent.
Also removed the implicit assumption of Combiner runs only once 
approach and the code is modified accordingly so that it won't 
create a bug when combiner runs zero or more than once.



  



* Jeff Eastman 
* 0x6BFF1277

.





PGP.sig
Description: PGP signature


Re: [jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Jeff Eastman
The unit tests dont care which format is used as long as it is 
consistent. The compiler helps enforce that. kMeans will run and its 
tests will pass. So will Canopy. When somebody runs the kMeans example 
it encounters the file format differences. Are all the examples run by 
the install? I'd be surprised.


Jeff


Palleti, Pallavi wrote:

Yeah. But, I am wondering how the testcases succeeded? I ran them using "mvn clean 
install" command.

Thanks
Pallavi

-Original Message-
From: Jeff Eastman [mailto:j...@windwardsolutions.com] 
Sent: Thursday, March 19, 2009 9:56 AM

To: mahout-dev@lucene.apache.org
Subject: Re: [jira] Commented: (MAHOUT-99) Improving speed of KMeans

The Synthetic Control kMeans job calls the Canopy job to build its initial 
clusters as is commonly done. If the kMeans record format was changed and the 
Canopy not changed accordingly, then everything would still compile but there 
would be a mismatch when the kMeans mapper tried to read in the clusters.

Jeff


Richard Tomsett (JIRA) wrote:
  
[ 
https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jir

a.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683
252#action_12683252 ]

Richard Tomsett commented on MAHOUT-99:
---

Yup, just downloaded the latest trunk and run with Hadoop 0.19.1 and I get the 
same error on the Synthetic Control example. It seems to be because the new 
KMeans code uses a KeyValueLineRecordReader object to read the input cluster 
centres from the canopy clustering output, but the canopy clustering job 
outputs a SequenceFile (and the old KMeans code read in a SequenceFile for the 
cluster centres). Think that's the problem at least, I''ll have a quick play.

  


Improving speed of KMeans
-

Key: MAHOUT-99
URL: https://issues.apache.org/jira/browse/MAHOUT-99
Project: Mahout
 Issue Type: Improvement
 Components: Clustering
   Reporter: Pallavi Palleti
   Assignee: Grant Ingersoll
Fix For: 0.1

Attachments: MAHOUT-99-1.patch, Mahout-99.patch, 
MAHOUT-99.patch



Improved the speed of KMeans by passing only cluster ID from mapper to reducer. 
Previously, whole Cluster Info as formatted s`tring was being sent.
Also removed the implicit assumption of Combiner runs only once approach and 
the code is modified accordingly so that it won't create a bug when combiner 
runs zero or more than once.

  
  



  




PGP.sig
Description: PGP signature


Re: [jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Jeff Eastman

Sure, why don't you go ahead and post a patch?


Pallavi Palleti (JIRA) wrote:
[ https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683312#action_12683312 ] 


Pallavi Palleti commented on MAHOUT-99:
---

I have used KeyValueLineRecordReader internally for my code and forgot to 
revert back to SequenceFileReader. Will that be sufficient to add another patch 
on the latest code and modify only KMeansDriver to use SequenceFileReader? 
Kindly let me know.

Thanks
Pallavi

  

Improving speed of KMeans
-

Key: MAHOUT-99
URL: https://issues.apache.org/jira/browse/MAHOUT-99
Project: Mahout
 Issue Type: Improvement
 Components: Clustering
   Reporter: Pallavi Palleti
   Assignee: Grant Ingersoll
Fix For: 0.1

Attachments: MAHOUT-99-1.patch, Mahout-99.patch, MAHOUT-99.patch


Improved the speed of KMeans by passing only cluster ID from mapper to reducer. 
Previously, whole Cluster Info as formatted s`tring was being sent.
Also removed the implicit assumption of Combiner runs only once approach and 
the code is modified accordingly so that it won't create a bug when combiner 
runs zero or more than once.



  




PGP.sig
Description: PGP signature


RE: [jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Palleti, Pallavi
Yeah. But, I am wondering how the testcases succeeded? I ran them using "mvn 
clean install" command.

Thanks
Pallavi

-Original Message-
From: Jeff Eastman [mailto:j...@windwardsolutions.com] 
Sent: Thursday, March 19, 2009 9:56 AM
To: mahout-dev@lucene.apache.org
Subject: Re: [jira] Commented: (MAHOUT-99) Improving speed of KMeans

The Synthetic Control kMeans job calls the Canopy job to build its initial 
clusters as is commonly done. If the kMeans record format was changed and the 
Canopy not changed accordingly, then everything would still compile but there 
would be a mismatch when the kMeans mapper tried to read in the clusters.

Jeff


Richard Tomsett (JIRA) wrote:
> [ 
> https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jir
> a.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683
> 252#action_12683252 ]
>
> Richard Tomsett commented on MAHOUT-99:
> ---
>
> Yup, just downloaded the latest trunk and run with Hadoop 0.19.1 and I get 
> the same error on the Synthetic Control example. It seems to be because the 
> new KMeans code uses a KeyValueLineRecordReader object to read the input 
> cluster centres from the canopy clustering output, but the canopy clustering 
> job outputs a SequenceFile (and the old KMeans code read in a SequenceFile 
> for the cluster centres). Think that's the problem at least, I''ll have a 
> quick play.
>
>   
>> Improving speed of KMeans
>> -
>>
>> Key: MAHOUT-99
>> URL: https://issues.apache.org/jira/browse/MAHOUT-99
>> Project: Mahout
>>  Issue Type: Improvement
>>  Components: Clustering
>>Reporter: Pallavi Palleti
>>Assignee: Grant Ingersoll
>> Fix For: 0.1
>>
>> Attachments: MAHOUT-99-1.patch, Mahout-99.patch, 
>> MAHOUT-99.patch
>>
>>
>> Improved the speed of KMeans by passing only cluster ID from mapper to 
>> reducer. Previously, whole Cluster Info as formatted s`tring was being sent.
>> Also removed the implicit assumption of Combiner runs only once approach and 
>> the code is modified accordingly so that it won't create a bug when combiner 
>> runs zero or more than once.
>> 
>
>   



Re: [jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Jeff Eastman

Are the examples run automatically in the build?

Pallavi Palleti (JIRA) wrote:
[ https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683297#action_12683297 ] 


Pallavi Palleti commented on MAHOUT-99:
---

Yup. That must be the issue. But I am wondering how the test case succeeded?

  

Improving speed of KMeans
-

Key: MAHOUT-99
URL: https://issues.apache.org/jira/browse/MAHOUT-99
Project: Mahout
 Issue Type: Improvement
 Components: Clustering
   Reporter: Pallavi Palleti
   Assignee: Grant Ingersoll
Fix For: 0.1

Attachments: MAHOUT-99-1.patch, Mahout-99.patch, MAHOUT-99.patch


Improved the speed of KMeans by passing only cluster ID from mapper to reducer. 
Previously, whole Cluster Info as formatted s`tring was being sent.
Also removed the implicit assumption of Combiner runs only once approach and 
the code is modified accordingly so that it won't create a bug when combiner 
runs zero or more than once.



  




PGP.sig
Description: PGP signature


Re: [jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Jeff Eastman
The Synthetic Control kMeans job calls the Canopy job to build its 
initial clusters as is commonly done. If the kMeans record format was 
changed and the Canopy not changed accordingly, then everything would 
still compile but there would be a mismatch when the kMeans mapper tried 
to read in the clusters.


Jeff


Richard Tomsett (JIRA) wrote:
[ https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683252#action_12683252 ] 


Richard Tomsett commented on MAHOUT-99:
---

Yup, just downloaded the latest trunk and run with Hadoop 0.19.1 and I get the 
same error on the Synthetic Control example. It seems to be because the new 
KMeans code uses a KeyValueLineRecordReader object to read the input cluster 
centres from the canopy clustering output, but the canopy clustering job 
outputs a SequenceFile (and the old KMeans code read in a SequenceFile for the 
cluster centres). Think that's the problem at least, I''ll have a quick play.

  

Improving speed of KMeans
-

Key: MAHOUT-99
URL: https://issues.apache.org/jira/browse/MAHOUT-99
Project: Mahout
 Issue Type: Improvement
 Components: Clustering
   Reporter: Pallavi Palleti
   Assignee: Grant Ingersoll
Fix For: 0.1

Attachments: MAHOUT-99-1.patch, Mahout-99.patch, MAHOUT-99.patch


Improved the speed of KMeans by passing only cluster ID from mapper to reducer. 
Previously, whole Cluster Info as formatted s`tring was being sent.
Also removed the implicit assumption of Combiner runs only once approach and 
the code is modified accordingly so that it won't create a bug when combiner 
runs zero or more than once.



  




PGP.sig
Description: PGP signature


Re: [jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Grant Ingersoll

On my Mac, I have:
$ echo $JAVA_HOME
/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home

-Grant

On Mar 18, 2009, at 2:10 PM, Jeff Eastman wrote:

I'm running the example in Eclipse using the stand-alone mode in the  
hadoop-0.19.1 jar file. It works fine, as does the hadoop compile in  
Eclipse. I cannot; however, get any hadoop stuff to work from the  
command line. Even though my JAVA_HOME environment is set to / 
Library/Java/Home and java -version yields:


Java(TM) SE Runtime Environment (build 1.6.0_07-b06-153)
Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_07-b06-57, mixed mode)

... the hadoop build script and the start-all.sh commands all  
complain about class version errors. Can any other Mac users help me  
out?


Jeff


Grant Ingersoll (JIRA) wrote:
   [ https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683077 
#action_12683077 ]

Grant Ingersoll commented on MAHOUT-99:
---

Yeah, what version of Hadoop are you running?  I got it w/ 0.19.1,  
but maybe I didn't set something up right.


{code}
bin/hadoop jar ~/projects/lucene/mahout/mahout-clean/examples/ 
target/mahout-examples-0.2-SNAPSHOT.job  
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

{code}



Improving speed of KMeans
-

   Key: MAHOUT-99
   URL: https://issues.apache.org/jira/browse/MAHOUT-99
   Project: Mahout
Issue Type: Improvement
Components: Clustering
  Reporter: Pallavi Palleti
  Assignee: Grant Ingersoll
   Fix For: 0.1

   Attachments: MAHOUT-99-1.patch, Mahout-99.patch,  
MAHOUT-99.patch



Improved the speed of KMeans by passing only cluster ID from  
mapper to reducer. Previously, whole Cluster Info as formatted  
s`tring was being sent.
Also removed the implicit assumption of Combiner runs only once  
approach and the code is modified accordingly so that it won't  
create a bug when combiner runs zero or more than once.











Re: [jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Jeff Eastman
I'm running the example in Eclipse using the stand-alone mode in the 
hadoop-0.19.1 jar file. It works fine, as does the hadoop compile in 
Eclipse. I cannot; however, get any hadoop stuff to work from the 
command line. Even though my JAVA_HOME environment is set to 
/Library/Java/Home and java -version yields:


Java(TM) SE Runtime Environment (build 1.6.0_07-b06-153)
Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_07-b06-57, mixed mode)

... the hadoop build script and the start-all.sh commands all complain 
about class version errors. Can any other Mac users help me out?


Jeff


Grant Ingersoll (JIRA) wrote:
[ https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683077#action_12683077 ] 


Grant Ingersoll commented on MAHOUT-99:
---

Yeah, what version of Hadoop are you running?  I got it w/ 0.19.1, but maybe I 
didn't set something up right.

{code}
 bin/hadoop jar 
~/projects/lucene/mahout/mahout-clean/examples/target/mahout-examples-0.2-SNAPSHOT.job
 org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
{code}

  

Improving speed of KMeans
-

Key: MAHOUT-99
URL: https://issues.apache.org/jira/browse/MAHOUT-99
Project: Mahout
 Issue Type: Improvement
 Components: Clustering
   Reporter: Pallavi Palleti
   Assignee: Grant Ingersoll
Fix For: 0.1

Attachments: MAHOUT-99-1.patch, Mahout-99.patch, MAHOUT-99.patch


Improved the speed of KMeans by passing only cluster ID from mapper to reducer. 
Previously, whole Cluster Info as formatted s`tring was being sent.
Also removed the implicit assumption of Combiner runs only once approach and 
the code is modified accordingly so that it won't create a bug when combiner 
runs zero or more than once.



  




PGP.sig
Description: PGP signature


RE: [jira] Commented: (MAHOUT-99) Improving speed of KMeans

2008-12-07 Thread Uppuluri, Rohini
Hi Grant, 

I am Rohini and work in the same team as Pallavi is. Pallavi is out of
Office till the end of this month. I will be taking care of this issue
now. 

I will look into the issue you have pointed out and get back to you. 

Thanks, 
-Rohini


-Original Message-
From: Grant Ingersoll (JIRA) [mailto:[EMAIL PROTECTED] 
Sent: Sunday, December 07, 2008 7:32 AM
To: mahout-dev@lucene.apache.org
Subject: [jira] Commented: (MAHOUT-99) Improving speed of KMeans


[
https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.
plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654168#
action_12654168 ] 

Grant Ingersoll commented on MAHOUT-99:
---

Hi Pallavi,

The core code works, but the change to the KMeansDriver causes a compile
error in examples in the Kmeans demo code b/c it now asks for the number
of map tasks and the number of centroids.  Could you document these new
parameters and put in reasonable defaults and update the patch?

One thing I'm not certain of, though, is why we need to pass in the
number of map tasks, isn't that a config thing already when you setup
Hadoop?  

> Improving speed of KMeans
> -
>
> Key: MAHOUT-99
> URL: https://issues.apache.org/jira/browse/MAHOUT-99
> Project: Mahout
>  Issue Type: Improvement
>  Components: Clustering
>Reporter: Pallavi Palleti
>Assignee: Grant Ingersoll
> Attachments: MAHOUT-99.patch
>
>
> Improved the speed of KMeans by passing only cluster ID from mapper to
reducer. Previously, whole Cluster Info as formatted s`tring was being
sent.
> Also removed the implicit assumption of Combiner runs only once
approach and the code is modified accordingly so that it won't create a
bug when combiner runs zero or more than once.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.