[ 
https://issues.apache.org/jira/browse/MAHOUT-825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122944#comment-13122944
 ] 

Paritosh Ranjan commented on MAHOUT-825:
----------------------------------------

I experimented on Canopy generation phase as Ted suggested it. And the results 
are positive, and it can be switched on/off using a flag, and, the default is 
off.

The radius is being calculated. I think you missed it :
{code}
private double computeCanopyNeighbourhoodDistance(Canopy canopy) {
      canopy.computeParameters();
      double radius = canopy.getRadius().getLengthSquared();
      return radius*clusterStrictness;
  }
{code}

I don't see any harm in having an outlier elimination mechanism in Canopy if we 
don't have it in other clustering mechanisms. As, there is a flag to control 
that, and, its off by default.

Its evolving from some time. And, I think, all the doubts mentioned earlier 
have been fixed. Its controlled by a flag, and, its not dependent on t1 anymore.
                
> Canopies grouping records outside t1
> ------------------------------------
>
>                 Key: MAHOUT-825
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-825
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.6
>         Environment: windows, linux
>            Reporter: Paritosh Ranjan
>              Labels: features, newbie, patch
>             Fix For: 0.6
>
>         Attachments: Clustering Remote Points - Two Big, Useless 
> Clusters.txt, Not Clustering Remote Points - Two Meaningful Clusters.txt, 
> canopy-clusterFilter-t1, canopy-outlier-elimination, 
> canopy-outside-t1-points-patch-1, canopy-strict-clustering-flag
>
>
> While finding closest canopy, there is no check to ensure that it returns 
> canopies which are within distance t1 from the point. This results in 
> incorrect result i.e. Points outside t1 are grouped in canopies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to