from:"Pallavi Palleti \(JIRA\)"

[jira] Updated: (MAHOUT-153) Implement kmeans++ for initial cluster selection in kmeans

2010-03-19 Thread Pallavi Palleti (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pallavi Palleti updated MAHOUT-153:
---

Attachment: Mahout-153.patch

Removed making lengthSquared instance variable to transient. Used 
AbstractVector.equivalent for comparing two cluster centroids. Kindly review.

 Implement kmeans++ for initial cluster selection in kmeans
 --

 Key: MAHOUT-153
 URL: https://issues.apache.org/jira/browse/MAHOUT-153
 Project: Mahout
  Issue Type: New Feature
  Components: Clustering
Affects Versions: 0.2
 Environment: OS Independent
Reporter: Panagiotis Papadimitriou
Assignee: Ted Dunning
 Fix For: 0.4

 Attachments: Mahout-153.patch, Mahout-153.patch, Mahout-153.patch, 
 MAHOUT-153_RandomFarthest.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 The current implementation of k-means includes the following algorithms for 
 initial cluster selection (seed selection): 1) random selection of k points, 
 2) use of canopy clusters.
 I plan to implement k-means++. The details of the algorithm are available 
 here: http://www.stanford.edu/~darthur/kMeansPlusPlus.pdf.
 Design Outline: I will create an abstract class SeedGenerator and a subclass 
 KMeansPlusPlusSeedGenerator. The existing class RandomSeedGenerator will 
 become a subclass of SeedGenerator.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-153) Implement kmeans++ for initial cluster selection in kmeans

2010-03-18 Thread Pallavi Palleti (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pallavi Palleti updated MAHOUT-153:
---

Attachment: Mahout-153.patch

Kindly find the updated patch which includes test cases. Also,input and output 
formats are modified to be compatible with other clustering algorithms (kmeans, 
 fuzzy kmeans). The distance measure is given as input parameter. And the float 
point comparison as suggested by Shashi is taken care. Kindly review

 Implement kmeans++ for initial cluster selection in kmeans
 --

 Key: MAHOUT-153
 URL: https://issues.apache.org/jira/browse/MAHOUT-153
 Project: Mahout
  Issue Type: New Feature
  Components: Clustering
Affects Versions: 0.2
 Environment: OS Independent
Reporter: Panagiotis Papadimitriou
Assignee: Ted Dunning
 Fix For: 0.4

 Attachments: Mahout-153.patch, Mahout-153.patch, 
 MAHOUT-153_RandomFarthest.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 The current implementation of k-means includes the following algorithms for 
 initial cluster selection (seed selection): 1) random selection of k points, 
 2) use of canopy clusters.
 I plan to implement k-means++. The details of the algorithm are available 
 here: http://www.stanford.edu/~darthur/kMeansPlusPlus.pdf.
 Design Outline: I will create an abstract class SeedGenerator and a subclass 
 KMeansPlusPlusSeedGenerator. The existing class RandomSeedGenerator will 
 become a subclass of SeedGenerator.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-153) Implement kmeans++ for initial cluster selection in kmeans

2010-03-18 Thread Pallavi Palleti (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846785#action_12846785
 ] 

Pallavi Palleti commented on MAHOUT-153:


Forgot to mention. In this patch, I made the lengthsquared instance variable in 
AbstractVector to transient.

 Implement kmeans++ for initial cluster selection in kmeans
 --

 Key: MAHOUT-153
 URL: https://issues.apache.org/jira/browse/MAHOUT-153
 Project: Mahout
  Issue Type: New Feature
  Components: Clustering
Affects Versions: 0.2
 Environment: OS Independent
Reporter: Panagiotis Papadimitriou
Assignee: Ted Dunning
 Fix For: 0.4

 Attachments: Mahout-153.patch, Mahout-153.patch, 
 MAHOUT-153_RandomFarthest.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 The current implementation of k-means includes the following algorithms for 
 initial cluster selection (seed selection): 1) random selection of k points, 
 2) use of canopy clusters.
 I plan to implement k-means++. The details of the algorithm are available 
 here: http://www.stanford.edu/~darthur/kMeansPlusPlus.pdf.
 Design Outline: I will create an abstract class SeedGenerator and a subclass 
 KMeansPlusPlusSeedGenerator. The existing class RandomSeedGenerator will 
 become a subclass of SeedGenerator.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-153) Implement kmeans++ for initial cluster selection in kmeans

2010-02-09 Thread Pallavi Palleti (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pallavi Palleti updated MAHOUT-153:
---

Attachment: Mahout-153.patch

Here is the patch for selecting initial clusters for a clustering algorithm. 
Idea is taken from paper Farthest Point Heuristic Based Initialization Methods 
for K-Modes Clustering(http://arxiv.org/pdf/cs/0610043).  

The attached patch follow below steps:

The farthest-point heuristic starts with an arbitrary point s1. Pick a point s2 
that is as far from
s1 as possible. Pick si to maximize the distance to the nearest of all 
centroids picked so far. That is,
maximize the min {dist (si, s1), dist (si, s2), ...}. After all k 
representatives are chosen we can define
the partition of D: cluster Cj consists of all points closer to sj than to any 
other representative
 

 Implement kmeans++ for initial cluster selection in kmeans
 --

 Key: MAHOUT-153
 URL: https://issues.apache.org/jira/browse/MAHOUT-153
 Project: Mahout
  Issue Type: New Feature
  Components: Clustering
Affects Versions: 0.2
 Environment: OS Independent
Reporter: Panagiotis Papadimitriou
Assignee: Ted Dunning
 Fix For: 0.4

 Attachments: Mahout-153.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 The current implementation of k-means includes the following algorithms for 
 initial cluster selection (seed selection): 1) random selection of k points, 
 2) use of canopy clusters.
 I plan to implement k-means++. The details of the algorithm are available 
 here: http://www.stanford.edu/~darthur/kMeansPlusPlus.pdf.
 Design Outline: I will create an abstract class SeedGenerator and a subclass 
 KMeansPlusPlusSeedGenerator. The existing class RandomSeedGenerator will 
 become a subclass of SeedGenerator.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-153) Implement kmeans++ for initial cluster selection in kmeans

2010-02-09 Thread Pallavi Palleti (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831391#action_12831391
 ] 

Pallavi Palleti commented on MAHOUT-153:


Forgot to mention. The above patch doesn't include test cases. Kindly review.

 Implement kmeans++ for initial cluster selection in kmeans
 --

 Key: MAHOUT-153
 URL: https://issues.apache.org/jira/browse/MAHOUT-153
 Project: Mahout
  Issue Type: New Feature
  Components: Clustering
Affects Versions: 0.2
 Environment: OS Independent
Reporter: Panagiotis Papadimitriou
Assignee: Ted Dunning
 Fix For: 0.4

 Attachments: Mahout-153.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 The current implementation of k-means includes the following algorithms for 
 initial cluster selection (seed selection): 1) random selection of k points, 
 2) use of canopy clusters.
 I plan to implement k-means++. The details of the algorithm are available 
 here: http://www.stanford.edu/~darthur/kMeansPlusPlus.pdf.
 Design Outline: I will create an abstract class SeedGenerator and a subclass 
 KMeansPlusPlusSeedGenerator. The existing class RandomSeedGenerator will 
 become a subclass of SeedGenerator.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (MAHOUT-284) In Fuzzy Kmeans, when the distance between centroid and the given point is zero, then it should belong to that cluster with probability 1 and rest with probability zero

2010-02-09 Thread Pallavi Palleti (JIRA)

In Fuzzy Kmeans, when the distance between centroid and the given point is 
zero, then it should belong to that cluster with probability 1 and rest with 
probability zero


 Key: MAHOUT-284
 URL: https://issues.apache.org/jira/browse/MAHOUT-284
 Project: Mahout
  Issue Type: Bug
  Components: Clustering
Reporter: Pallavi Palleti
Priority: Minor


In Fuzzy Kmeans, when the distance between centroid and the given point is 
zero, then the point should belong to that cluster with probability 1 and rest 
with probability zero. However, right now, we are not doing that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-284) In Fuzzy Kmeans, when the distance between centroid and the given point is zero, then it should belong to that cluster with probability 1 and rest with probability zero

2010-02-09 Thread Pallavi Palleti (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pallavi Palleti updated MAHOUT-284:
---

Attachment: Mahout-284.patch

This patch fix the issue

 In Fuzzy Kmeans, when the distance between centroid and the given point is 
 zero, then it should belong to that cluster with probability 1 and rest with 
 probability zero
 

 Key: MAHOUT-284
 URL: https://issues.apache.org/jira/browse/MAHOUT-284
 Project: Mahout
  Issue Type: Bug
  Components: Clustering
Reporter: Pallavi Palleti
Priority: Minor
 Attachments: Mahout-284.patch


 In Fuzzy Kmeans, when the distance between centroid and the given point is 
 zero, then the point should belong to that cluster with probability 1 and 
 rest with probability zero. However, right now, we are not doing that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-153) Implement kmeans++ for initial cluster selection in kmeans

2010-01-18 Thread Pallavi Palleti (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801716#action_12801716
 ] 

Pallavi Palleti commented on MAHOUT-153:


Hi all,

I am ready with my patch. However, I was trying to see if there is any possible 
optimizations that can be made. I will share the patch and seek further 
optimization suggestions from the group. Should I open another jira issue as 
David might be working on and submit a patch to this jira issue? Kindly suggest.


 Implement kmeans++ for initial cluster selection in kmeans
 --

 Key: MAHOUT-153
 URL: https://issues.apache.org/jira/browse/MAHOUT-153
 Project: Mahout
  Issue Type: New Feature
  Components: Clustering
Affects Versions: 0.2
 Environment: OS Independent
Reporter: Panagiotis Papadimitriou
 Fix For: 0.3

   Original Estimate: 336h
  Remaining Estimate: 336h

 The current implementation of k-means includes the following algorithms for 
 initial cluster selection (seed selection): 1) random selection of k points, 
 2) use of canopy clusters.
 I plan to implement k-means++. The details of the algorithm are available 
 here: http://www.stanford.edu/~darthur/kMeansPlusPlus.pdf.
 Design Outline: I will create an abstract class SeedGenerator and a subclass 
 KMeansPlusPlusSeedGenerator. The existing class RandomSeedGenerator will 
 become a subclass of SeedGenerator.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-66) EuclideanDistanceMeasure and ManhattanDistanceMeasure classes are not optimized for Sparse Vectors

2009-05-13 Thread Pallavi Palleti (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-66?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12709260#action_12709260
 ] 

Pallavi Palleti commented on MAHOUT-66:
---

Can you please elaborate on this a little bit as I couldn't get it. 
Essentially, what I did was modified these distance measure classes to use 
vector operations there by reusing code and depending on the vector type we are 
using, the respective class methods get called and there by taking care of 
optimizations at the vector class level.

 EuclideanDistanceMeasure and ManhattanDistanceMeasure classes are not 
 optimized for Sparse Vectors
 --

 Key: MAHOUT-66
 URL: https://issues.apache.org/jira/browse/MAHOUT-66
 Project: Mahout
  Issue Type: Improvement
  Components: Clustering
Reporter: Pallavi Palleti
Priority: Minor
 Attachments: MAHOUT-66.patch, MAHOUT-66.patch, MAHOUT-66.patch, 
 MAHOUT-66.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-19 Thread Pallavi Palleti (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683335#action_12683335
 ] 

Pallavi Palleti commented on MAHOUT-99:
---

If we need to modify Canopy. We need to modify all depandant classes too where 
ever canopy is being used.

 Improving speed of KMeans
 -

 Key: MAHOUT-99
 URL: https://issues.apache.org/jira/browse/MAHOUT-99
 Project: Mahout
  Issue Type: Improvement
  Components: Clustering
Reporter: Pallavi Palleti
Assignee: Grant Ingersoll
 Fix For: 0.1

 Attachments: MAHOUT-99-1.patch, Mahout-99.patch, MAHOUT-99.patch


 Improved the speed of KMeans by passing only cluster ID from mapper to 
 reducer. Previously, whole Cluster Info as formatted s`tring was being sent.
 Also removed the implicit assumption of Combiner runs only once approach and 
 the code is modified accordingly so that it won't create a bug when combiner 
 runs zero or more than once.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-99) Improving speed of KMeans

2009-03-19 Thread Pallavi Palleti (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pallavi Palleti updated MAHOUT-99:
--

Attachment: MAHOUT-99.patch

I have fixed sequencefile issue. Modified code SequenceFile where ever 
possible. And also, with the new KMeansClusterMapper, we don't need 
outputMapper code in Job.java in SyntheticControl. So, I commented that.

Thanks
Pallavi

 Improving speed of KMeans
 -

 Key: MAHOUT-99
 URL: https://issues.apache.org/jira/browse/MAHOUT-99
 Project: Mahout
  Issue Type: Improvement
  Components: Clustering
Reporter: Pallavi Palleti
Assignee: Grant Ingersoll
 Fix For: 0.1

 Attachments: MAHOUT-99-1.patch, MAHOUT-99.patch, Mahout-99.patch, 
 MAHOUT-99.patch


 Improved the speed of KMeans by passing only cluster ID from mapper to 
 reducer. Previously, whole Cluster Info as formatted s`tring was being sent.
 Also removed the implicit assumption of Combiner runs only once approach and 
 the code is modified accordingly so that it won't create a bug when combiner 
 runs zero or more than once.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Pallavi Palleti (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683297#action_12683297
 ] 

Pallavi Palleti commented on MAHOUT-99:
---

Yup. That must be the issue. But I am wondering how the test case succeeded?

 Improving speed of KMeans
 -

 Key: MAHOUT-99
 URL: https://issues.apache.org/jira/browse/MAHOUT-99
 Project: Mahout
  Issue Type: Improvement
  Components: Clustering
Reporter: Pallavi Palleti
Assignee: Grant Ingersoll
 Fix For: 0.1

 Attachments: MAHOUT-99-1.patch, Mahout-99.patch, MAHOUT-99.patch


 Improved the speed of KMeans by passing only cluster ID from mapper to 
 reducer. Previously, whole Cluster Info as formatted s`tring was being sent.
 Also removed the implicit assumption of Combiner runs only once approach and 
 the code is modified accordingly so that it won't create a bug when combiner 
 runs zero or more than once.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Pallavi Palleti (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683312#action_12683312
 ] 

Pallavi Palleti commented on MAHOUT-99:
---

I have used KeyValueLineRecordReader internally for my code and forgot to 
revert back to SequenceFileReader. Will that be sufficient to add another patch 
on the latest code and modify only KMeansDriver to use SequenceFileReader? 
Kindly let me know.

Thanks
Pallavi

 Improving speed of KMeans
 -

 Key: MAHOUT-99
 URL: https://issues.apache.org/jira/browse/MAHOUT-99
 Project: Mahout
  Issue Type: Improvement
  Components: Clustering
Reporter: Pallavi Palleti
Assignee: Grant Ingersoll
 Fix For: 0.1

 Attachments: MAHOUT-99-1.patch, Mahout-99.patch, MAHOUT-99.patch


 Improved the speed of KMeans by passing only cluster ID from mapper to 
 reducer. Previously, whole Cluster Info as formatted s`tring was being sent.
 Also removed the implicit assumption of Combiner runs only once approach and 
 the code is modified accordingly so that it won't create a bug when combiner 
 runs zero or more than once.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-99) Improving speed of KMeans

2008-11-28 Thread Pallavi Palleti (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pallavi Palleti updated MAHOUT-99:
--

Attachment: MAHOUT-99.patch

this patch takes care of issues with speed. Also, the issues with combiner runs 
zero or more than once has been taken care.

 Improving speed of KMeans
 -

 Key: MAHOUT-99
 URL: https://issues.apache.org/jira/browse/MAHOUT-99
 Project: Mahout
  Issue Type: Improvement
  Components: Clustering
Reporter: Pallavi Palleti
 Attachments: MAHOUT-99.patch


 Improved the speed of KMeans by passing only cluster ID from mapper to 
 reducer. Previously, whole Cluster Info as formatted s`tring was being sent.
 Also removed the implicit assumption of Combiner runs only once approach and 
 the code is modified accordingly so that it won't create a bug when combiner 
 runs zero or more than once.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-79) Improving the speed of Fuzzy K-Means by optimizing data transfer between map and reduce tasks

2008-11-03 Thread Pallavi Palleti (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-79?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pallavi Palleti updated MAHOUT-79:
--

Attachment: FUZZY-79.patch

I have made the code compatible with recent updates. please review.

Thanks
Pallavi

 Improving the speed of Fuzzy K-Means by optimizing data transfer between map 
 and reduce tasks
 -

 Key: MAHOUT-79
 URL: https://issues.apache.org/jira/browse/MAHOUT-79
 Project: Mahout
  Issue Type: Improvement
  Components: Clustering
Reporter: Pallavi Palleti
Assignee: Grant Ingersoll
 Fix For: 0.1

 Attachments: FUZZY-79.patch, FUZZY-79.patch, FUZZY-79.patch, 
 FUZZY-79.patch, FUZZY-79.patch, FUZZY.patch


 Improve the speed of fuzzy k-Means by passing only the cluster-id info as key 
 output of mapper task and reading the cluster information in reducer task 
 where this info is needed. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-79) Improving the speed of Fuzzy K-Means by optimizing data transfer between map and reduce tasks

2008-10-17 Thread Pallavi Palleti (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-79?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pallavi Palleti updated MAHOUT-79:
--

Attachment: FUZZY-79.patch

Made sure srowen concern over not using try{} catch{} for control flow in 
FuzzyKMeansReducer. Made addpoint() and addPoints() parameters order same in 
SoftCluster.


 Improving the speed of Fuzzy K-Means by optimizing data transfer between map 
 and reduce tasks
 -

 Key: MAHOUT-79
 URL: https://issues.apache.org/jira/browse/MAHOUT-79
 Project: Mahout
  Issue Type: Improvement
  Components: Clustering
Reporter: Pallavi Palleti
Assignee: Grant Ingersoll
 Fix For: 0.1

 Attachments: FUZZY-79.patch, FUZZY-79.patch, FUZZY.patch


 Improve the speed of fuzzy k-Means by passing only the cluster-id info as key 
 output of mapper task and reading the cluster information in reducer task 
 where this info is needed. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-79) Improving the speed of Fuzzy K-Means by optimizing data transfer between map and reduce tasks

2008-10-16 Thread Pallavi Palleti (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-79?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pallavi Palleti updated MAHOUT-79:
--

Attachment: FUZZY-79.patch

I have added combiner to Fuzzy. But, this time, I am making sure that the 
system is aware that a combiner can run zero or more times. And so, respective 
conditions are added both in combiner and reducer.

 Improving the speed of Fuzzy K-Means by optimizing data transfer between map 
 and reduce tasks
 -

 Key: MAHOUT-79
 URL: https://issues.apache.org/jira/browse/MAHOUT-79
 Project: Mahout
  Issue Type: Improvement
  Components: Clustering
Reporter: Pallavi Palleti
Assignee: Grant Ingersoll
 Fix For: 0.1

 Attachments: FUZZY-79.patch, FUZZY.patch


 Improve the speed of fuzzy k-Means by passing only the cluster-id info as key 
 output of mapper task and reading the cluster information in reducer task 
 where this info is needed. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-79) Improving the speed of Fuzzy K-Means by optimizing data transfer between map and reduce tasks

2008-10-16 Thread Pallavi Palleti (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12640436#action_12640436
 ] 

Pallavi Palleti commented on MAHOUT-79:
---

Hi Grant, latest patch takes care of Ted's concerns.

 Improving the speed of Fuzzy K-Means by optimizing data transfer between map 
 and reduce tasks
 -

 Key: MAHOUT-79
 URL: https://issues.apache.org/jira/browse/MAHOUT-79
 Project: Mahout
  Issue Type: Improvement
  Components: Clustering
Reporter: Pallavi Palleti
Assignee: Grant Ingersoll
 Fix For: 0.1

 Attachments: FUZZY-79.patch, FUZZY.patch


 Improve the speed of fuzzy k-Means by passing only the cluster-id info as key 
 output of mapper task and reading the cluster information in reducer task 
 where this info is needed. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-79) Improving the speed of Fuzzy K-Means by optimizing data transfer between map and reduce tasks

2008-10-03 Thread Pallavi Palleti (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12636615#action_12636615
 ] 

Pallavi Palleti commented on MAHOUT-79:
---

Please review the code and let me know if any changes need to be done.

Thanks
Pallavi

 Improving the speed of Fuzzy K-Means by optimizing data transfer between map 
 and reduce tasks
 -

 Key: MAHOUT-79
 URL: https://issues.apache.org/jira/browse/MAHOUT-79
 Project: Mahout
  Issue Type: Improvement
  Components: Clustering
Reporter: Pallavi Palleti
 Attachments: FUZZY.patch


 Improve the speed of fuzzy k-Means by passing only the cluster-id info as key 
 output of mapper task and reading the cluster information in reducer task 
 where this info is needed. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-79) Improving the speed of Fuzzy K-Means by optimizing data transfer between map and reduce tasks

2008-10-03 Thread Pallavi Palleti (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12636634#action_12636634
 ] 

Pallavi Palleti commented on MAHOUT-79:
---

If possible then It will be good to consider this in 0.1.



 Improving the speed of Fuzzy K-Means by optimizing data transfer between map 
 and reduce tasks
 -

 Key: MAHOUT-79
 URL: https://issues.apache.org/jira/browse/MAHOUT-79
 Project: Mahout
  Issue Type: Improvement
  Components: Clustering
Reporter: Pallavi Palleti
 Attachments: FUZZY.patch


 Improve the speed of fuzzy k-Means by passing only the cluster-id info as key 
 output of mapper task and reading the cluster information in reducer task 
 where this info is needed. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-79) Improving the speed of Fuzzy K-Means by optimizing data transfer between map and reduce tasks

2008-09-30 Thread Pallavi Palleti (JIRA)

[
https://issues.apache.org/jira/browse/MAHOUT-79?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Pallavi Palleti updated MAHOUT-79:
--

Attachment: FUZZY.patch

There are three major changes that are done in this implementation:

One is related to improving speed:
1. The existing implementation was passing the centroid information as a key to
the next tasks (combiner and reducer).
When the dimensionality is huge, then passing this huge information as a key
throws out of memory error as it is difficult hold the whole data into memory.
So, the approach I have taken in this implementation is to send only the
cluster-id as the key value in mapper tasks.
and In reducer phase we read the cluster information in configure method and
accessing cluster information by maintaining a map of id to softcluster object.
As we are not changing the cluster values till one single iteration ends. We
can optimize the code in this way and there by improving speed.
I have personally seen a speed improvement of hours to minutes.

Two are related to bugs:
1. Combiner is removed as it is not sure about how many times a combiner run on
a dataset. It may run zero to many times. If it runs more than once, it is
going to be a big logical bug. So, combiner is removed in new implementation.
2. There was a logical bug where in place of power, I used multiplication in
previous implementation. I fixed it in this implementation.

NOTE:The above(Combiner, improving speed) can be applicable to K-Means too.
Because,
1. K-Means do modify the data points in combiner and as per hadoop
specifications, it is not given guarantee that combiner run only once over a
data point. So, in this way, it may create a bug.
2. By passing only cluster-id, we can improve the speed as it reduces the
amount of data that is being transferred between map and reduce tasks.

We can apply this idea of passing cluster-id rather than whole cluster wherever
it is applicable in any other mahout implementations.

Improving the speed of Fuzzy K-Means by optimizing data transfer between map
and reduce tasks
-

Key: MAHOUT-79
URL: https://issues.apache.org/jira/browse/MAHOUT-79
Project: Mahout
Issue Type: Improvement
Components: Clustering
Reporter: Pallavi Palleti
Attachments: FUZZY.patch

Improve the speed of fuzzy k-Means by passing only the cluster-id info as key
output of mapper task and reading the cluster information in reducer task
where this info is needed.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-77) DistanceMeasure calculation slow for SparseVector

2008-09-19 Thread Pallavi Palleti (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12632914#action_12632914
 ] 

Pallavi Palleti commented on MAHOUT-77:
---

Hi Allen, It was suggested to use vector operations in addPoint and 
computeCentroid so that it makes simple to understand. Also, in distance 
measure classes too, we can replace the code using Vector operations like plus 
and minus,dot methods. Detail discussion is present in 
https://issues.apache.org/jira/browse/MAHOUT-66

Also, I have added plus and divide method specific for sparse vector. The patch 
which contain this is:https://issues.apache.org/jira/browse/MAHOUT-67

Thanks
Pallavi

 DistanceMeasure calculation slow for SparseVector
 -

 Key: MAHOUT-77
 URL: https://issues.apache.org/jira/browse/MAHOUT-77
 Project: Mahout
  Issue Type: Improvement
  Components: Matrix
Reporter: Allen Day
Priority: Minor
 Fix For: 0.2

 Attachments: sparse.patch, sparse.patch


 ManhattanDistanceMeasure and TanimotoDistanceMeasure assume all vector 
 indices up to cardinality() must be compared.  We can speed this up for 
 SparseVectors (and others) because Vector implements Iterable, so we can 
 consider only non-zero indices.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-74) Fuzzy K-Means clustering

2008-08-17 Thread Pallavi Palleti (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-74?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12623252#action_12623252
 ] 

Pallavi Palleti commented on MAHOUT-74:
---

Hi Grant,
  urlCount is unnecessary variable. It got added mistakenly. 
  SoftCluster.m should be configurable. I am sorry. I forgot to modify it.




 Fuzzy K-Means clustering
 

 Key: MAHOUT-74
 URL: https://issues.apache.org/jira/browse/MAHOUT-74
 Project: Mahout
  Issue Type: New Feature
  Components: Clustering
Reporter: Pallavi Palleti
Assignee: Grant Ingersoll
 Attachments: mahout-74.patch, mahout-74.patch


 Fuzzy KMeans clustering algorithm is an extension to traditional K Means 
 clustering algorithm and performs soft clustering.
 More details about fuzzy k-means can be found here 
 :http://en.wikipedia.org/wiki/Data_clustering#Fuzzy_c-means_clustering
 I have implemented fuzzy K-Means prototype and tests in 
 org.apache.mahout.clustering.fuzzykmeans

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-74) Fuzzy K-Means clustering

2008-08-11 Thread Pallavi Palleti (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-74?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pallavi Palleti updated MAHOUT-74:
--

Attachment: mahout-74.patch

I have implemented Fuzzy K-Means prototype and tests. Please review the code.

 Fuzzy K-Means clustering
 

 Key: MAHOUT-74
 URL: https://issues.apache.org/jira/browse/MAHOUT-74
 Project: Mahout
  Issue Type: New Feature
Reporter: Pallavi Palleti
 Attachments: mahout-74.patch


 Fuzzy KMeans clustering algorithm is an extension to traditional K Means 
 clustering algorithm and performs soft clustering.
 More details about fuzzy k-means can be found here 
 :http://en.wikipedia.org/wiki/Data_clustering#Fuzzy_c-means_clustering
 I have implemented fuzzy K-Means prototype and tests in 
 org.apache.mahout.clustering.fuzzykmeans

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (MAHOUT-74) Fuzzy K-Means clustering

2008-08-11 Thread Pallavi Palleti (JIRA)

Fuzzy K-Means clustering


 Key: MAHOUT-74
 URL: https://issues.apache.org/jira/browse/MAHOUT-74
 Project: Mahout
  Issue Type: New Feature
Reporter: Pallavi Palleti
 Attachments: mahout-74.patch

Fuzzy KMeans clustering algorithm is an extension to traditional K Means 
clustering algorithm and performs soft clustering.

More details about fuzzy k-means can be found here 
:http://en.wikipedia.org/wiki/Data_clustering#Fuzzy_c-means_clustering

I have implemented fuzzy K-Means prototype and tests in 
org.apache.mahout.clustering.fuzzykmeans

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-67) plus method and divide method in AbstractVector doesn't work for SparseVectors

2008-08-06 Thread Pallavi Palleti (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-67?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pallavi Palleti updated MAHOUT-67:
--

Attachment: MAHOUT-67.patch

I have added unit tests  to show that the current implementation breaks for 
sparse vector plus, dot and divide operation. and my version works.

 plus method and divide method in AbstractVector doesn't work for SparseVectors
 --

 Key: MAHOUT-67
 URL: https://issues.apache.org/jira/browse/MAHOUT-67
 Project: Mahout
  Issue Type: Bug
  Components: Matrix
Reporter: Pallavi Palleti
Priority: Minor
 Attachments: MAHOUT-67.patch, MAHOUT-67.patch, MAHOUT-67.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-67) plus method and divide method in AbstractVector can be optimized for SparseVectors

2008-08-06 Thread Pallavi Palleti (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-67?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pallavi Palleti updated MAHOUT-67:
--

Summary: plus method and divide method in AbstractVector can be optimized 
for SparseVectors  (was: plus method and divide method in AbstractVector 
doesn't work for SparseVectors)

I misunderstood the sparse vector representation. I agree that the cardinality 
exception should be thrown if two sparse vectors' cardinality is not same. But, 
my implementation still holds and optimizes the computation of divide and plus 
operation over sparse vectors. 


 plus method and divide method in AbstractVector can be optimized for 
 SparseVectors
 --

 Key: MAHOUT-67
 URL: https://issues.apache.org/jira/browse/MAHOUT-67
 Project: Mahout
  Issue Type: Bug
  Components: Matrix
Reporter: Pallavi Palleti
Priority: Minor
 Attachments: MAHOUT-67.patch, MAHOUT-67.patch, MAHOUT-67.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-67) plus method and divide method in AbstractVector can be optimized for SparseVectors

2008-08-06 Thread Pallavi Palleti (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-67?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pallavi Palleti updated MAHOUT-67:
--

Attachment: MAHOUT-67.patch

I have restored the cardinality check in dot method and also added cardinality 
check for plus method and so updated unit tests accordingly. As this code is 
optimization of  existing code, the previous unit tests holds. 

 plus method and divide method in AbstractVector can be optimized for 
 SparseVectors
 --

 Key: MAHOUT-67
 URL: https://issues.apache.org/jira/browse/MAHOUT-67
 Project: Mahout
  Issue Type: Bug
  Components: Matrix
Reporter: Pallavi Palleti
Priority: Minor
 Attachments: MAHOUT-67.patch, MAHOUT-67.patch, MAHOUT-67.patch, 
 MAHOUT-67.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-66) EuclideanDistanceMeasure and ManhattanDistanceMeasure classes are not optimized for Sparse Vectors

2008-08-06 Thread Pallavi Palleti (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-66?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pallavi Palleti updated MAHOUT-66:
--

Attachment: MAHOUT-66.patch

As this is not a bug but an improvement, existing unit tests hold here. I have 
added unit tests for Manhattan and Euclidean distance measures.Please review 
the code.

Thanks
Pallavi

 EuclideanDistanceMeasure and ManhattanDistanceMeasure classes are not 
 optimized for Sparse Vectors
 --

 Key: MAHOUT-66
 URL: https://issues.apache.org/jira/browse/MAHOUT-66
 Project: Mahout
  Issue Type: Improvement
  Components: Clustering
Reporter: Pallavi Palleti
Priority: Minor
 Attachments: MAHOUT-66.patch, MAHOUT-66.patch, MAHOUT-66.patch, 
 MAHOUT-66.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-68) addPoint, computeCentroid can be represented with Vector operations

2008-08-06 Thread Pallavi Palleti (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-68?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pallavi Palleti updated MAHOUT-68:
--

Issue Type: Improvement  (was: Bug)
   Summary: addPoint, computeCentroid can be represented with Vector 
operations  (was: addPoint, computeCentroid does not work for SparseVectors)

In this way, we can hide the implementation and can optimize the code at vector 
level. 

 addPoint, computeCentroid can be represented with Vector operations
 ---

 Key: MAHOUT-68
 URL: https://issues.apache.org/jira/browse/MAHOUT-68
 Project: Mahout
  Issue Type: Improvement
  Components: Clustering
Reporter: Pallavi Palleti
Priority: Minor
 Attachments: MAHOUT-68.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-68) addPoint, computeCentroid can use vector operators to do the respective task

2008-08-06 Thread Pallavi Palleti (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12620211#action_12620211
 ] 

Pallavi Palleti commented on MAHOUT-68:
---

All this came because of my confusion in understanding the cardinality in 
Sparse Vector. So, this is not a bug  and so existing unit tests hold even 
after the changes.

 addPoint, computeCentroid can use vector operators to do the respective task
 

 Key: MAHOUT-68
 URL: https://issues.apache.org/jira/browse/MAHOUT-68
 Project: Mahout
  Issue Type: Improvement
  Components: Clustering
Reporter: Pallavi Palleti
Priority: Minor
 Attachments: MAHOUT-68.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (MAHOUT-73) This is still an improvement over existing implementations.

2008-08-06 Thread Pallavi Palleti (JIRA)

This is still an improvement over existing implementations.
---

 Key: MAHOUT-73
 URL: https://issues.apache.org/jira/browse/MAHOUT-73
 Project: Mahout
  Issue Type: Improvement
  Components: Clustering
Reporter: Pallavi Palleti
Priority: Minor




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-73) addPoint, computeCentroid can be optimized by using vector operators

2008-08-06 Thread Pallavi Palleti (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-73?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pallavi Palleti updated MAHOUT-73:
--

Summary: addPoint, computeCentroid can be optimized by using vector 
operators  (was: This is still an improvement over existing implementations.)

 addPoint, computeCentroid can be optimized by using vector operators
 

 Key: MAHOUT-73
 URL: https://issues.apache.org/jira/browse/MAHOUT-73
 Project: Mahout
  Issue Type: Improvement
  Components: Clustering
Reporter: Pallavi Palleti
Priority: Minor



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-73) addPoint, computeCentroid can be optimized by using vector operators

2008-08-06 Thread Pallavi Palleti (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-73?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pallavi Palleti updated MAHOUT-73:
--

Attachment: MAHOUT-73.patch

We can use Vector operators in addPoint and computeCentroid methods of Canopy 
class. There by, this can be optimized for Sparse vector.Also, by using vector 
operators, we are hiding the implementation details and reusing the existing 
code in Sparse Vector, Dense Vector classes.

 addPoint, computeCentroid can be optimized by using vector operators
 

 Key: MAHOUT-73
 URL: https://issues.apache.org/jira/browse/MAHOUT-73
 Project: Mahout
  Issue Type: Improvement
  Components: Clustering
Reporter: Pallavi Palleti
Priority: Minor
 Attachments: MAHOUT-73.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-67) plus method in AbstractVector doesn't work for SparseVectors

2008-07-09 Thread Pallavi Palleti (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12611906#action_12611906
 ] 

Pallavi Palleti commented on MAHOUT-67:
---

Isabel: Sure, I will make those changes.

Karl: I couldn't get. Can you please elaborate it?

 plus method in AbstractVector doesn't work for SparseVectors
 

 Key: MAHOUT-67
 URL: https://issues.apache.org/jira/browse/MAHOUT-67
 Project: Mahout
  Issue Type: Bug
  Components: Matrix
Reporter: Pallavi Palleti
Priority: Minor
 Attachments: MAHOUT-67.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-67) plus method and divide method in AbstractVector doesn't work for SparseVectors

2008-07-09 Thread Pallavi Palleti (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-67?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pallavi Palleti updated MAHOUT-67:
--

Attachment: MAHOUT-67.patch

changed to Java5 notation style.
Also added divide method for sparse vector.

 plus method and divide method in AbstractVector doesn't work for SparseVectors
 --

 Key: MAHOUT-67
 URL: https://issues.apache.org/jira/browse/MAHOUT-67
 Project: Mahout
  Issue Type: Bug
  Components: Matrix
Reporter: Pallavi Palleti
Priority: Minor
 Attachments: MAHOUT-67.patch, MAHOUT-67.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-66) EuclideanDistanceMeasure and ManhattanDistanceMeasure classes does not compute distance for Sparse Vectors

2008-07-08 Thread Pallavi Palleti (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-66?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pallavi Palleti updated MAHOUT-66:
--

Attachment: MAHOUT-66.patch

I have refactored the code as per Isabel instructions. And overridden minus 
method in SparseVector and have only one distance(Vector v1, Vector v2) method 
in both EuclideanDistanceMeasure and ManhattanDistanceMeasure.
Please review the code.

Thanks
Pallavi

 EuclideanDistanceMeasure and ManhattanDistanceMeasure classes does not 
 compute distance for Sparse Vectors
 --

 Key: MAHOUT-66
 URL: https://issues.apache.org/jira/browse/MAHOUT-66
 Project: Mahout
  Issue Type: Bug
  Components: Clustering
Reporter: Pallavi Palleti
Priority: Minor
 Attachments: MAHOUT-66.patch, MAHOUT-66.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (MAHOUT-67) plus method in AbstractVector doesn't work for SparseVectors

2008-07-08 Thread Pallavi Palleti (JIRA)

plus method in AbstractVector doesn't work for SparseVectors


 Key: MAHOUT-67
 URL: https://issues.apache.org/jira/browse/MAHOUT-67
 Project: Mahout
  Issue Type: Bug
  Components: Matrix
Reporter: Pallavi Palleti
Priority: Minor




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-67) plus method in AbstractVector doesn't work for SparseVectors

2008-07-08 Thread Pallavi Palleti (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-67?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pallavi Palleti updated MAHOUT-67:
--

Attachment: MAHOUT-67.patch

I have overridden plus() method in SparseVector. Inorder to do this, I need to 
add a method called addCardinality().
Also, I found that in dot() method, it shouldn't throw cardinalityException. 
So, I removed that condition. Please review the code.

Thanks
Pallavi

 plus method in AbstractVector doesn't work for SparseVectors
 

 Key: MAHOUT-67
 URL: https://issues.apache.org/jira/browse/MAHOUT-67
 Project: Mahout
  Issue Type: Bug
  Components: Matrix
Reporter: Pallavi Palleti
Priority: Minor
 Attachments: MAHOUT-67.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (MAHOUT-66) EuclideanDistanceMeasure and ManhattanDistanceMeasure classes does not compute distance for Sparse Vectors

2008-07-07 Thread Pallavi Palleti (JIRA)

EuclideanDistanceMeasure and ManhattanDistanceMeasure classes does not compute 
distance for Sparse Vectors
--

 Key: MAHOUT-66
 URL: https://issues.apache.org/jira/browse/MAHOUT-66
 Project: Mahout
  Issue Type: Bug
  Components: Clustering
Reporter: Pallavi Palleti
Priority: Minor




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-66) EuclideanDistanceMeasure and ManhattanDistanceMeasure classes does not compute distance for Sparse Vectors

2008-07-07 Thread Pallavi Palleti (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-66?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pallavi Palleti updated MAHOUT-66:
--

Attachment: MAHOUT-66.patch

I added a condition to the actual distance method such that if the vectors are 
sparse, a different distance method gets called which is specific for 
SparseVector.

 EuclideanDistanceMeasure and ManhattanDistanceMeasure classes does not 
 compute distance for Sparse Vectors
 --

 Key: MAHOUT-66
 URL: https://issues.apache.org/jira/browse/MAHOUT-66
 Project: Mahout
  Issue Type: Bug
  Components: Clustering
Reporter: Pallavi Palleti
Priority: Minor
 Attachments: MAHOUT-66.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

41 matches

Mail list logo