[jira] [Comment Edited] (SOLR-10786) Add DBSCAN clustering Streaming Evaluator

2018-01-29 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344035#comment-16344035
 ] 

Joel Bernstein edited comment on SOLR-10786 at 1/29/18 9:13 PM:


Solr 7.3 has kmeans++ and fuzzyKmeans clustering. But DBSCAN clustering just to 
slow I thought to be useful for the Solr users. I will try it again sometime to 
see if it was just how I put things together, but it was painfully slow 
compared to kmeans.

eigen and singular value decomposition are planned of for Solr 7.4, so other 
clustering techniques such as PCA and LSA are on the way.

 


was (Author: joel.bernstein):
Solr 7.3 has kmeans++, fuzzyKmeans clustering. But DBSCAN clustering just to 
slow I thought to be useful for the Solr users. I will try it again sometime to 
see if it was just how I put things together, but it was painfully slow 
compared to kmeans.

> Add DBSCAN clustering Streaming Evaluator
> -
>
> Key: SOLR-10786
> URL: https://issues.apache.org/jira/browse/SOLR-10786
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Fix For: master (8.0), 7.3
>
> Attachments: SOLR-10786.patch, SOLR-10786.patch, SOLR-10786.patch
>
>
> The DBSCAN clustering Stream Evaluator will cluster numeric vectors using the 
> DBSCAN clustering algorithm.
> Clustering implementation will be provided by Apache Commons Math.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-10786) Add DBSCAN clustering Streaming Evaluator

2018-01-29 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344035#comment-16344035
 ] 

Joel Bernstein edited comment on SOLR-10786 at 1/29/18 9:14 PM:


Solr 7.3 has kmeans++ and fuzzyKmeans clustering. But DBSCAN clustering was 
just too slow, I thought, to be useful for Solr users. I will try it again 
sometime to see if it was just how I put things together, but it was painfully 
slow compared to kmeans++.

eigen and singular value decompositions are planned for Solr 7.4, so other 
clustering techniques such as PCA and LSA are on the way.

 


was (Author: joel.bernstein):
Solr 7.3 has kmeans++ and fuzzyKmeans clustering. But DBSCAN clustering was 
just too slow I thought to be useful for Solr users. I will try it again 
sometime to see if it was just how I put things together, but it was painfully 
slow compared to kmeans++.

eigen and singular value decompositions are planned for Solr 7.4, so other 
clustering techniques such as PCA and LSA are on the way.

 

> Add DBSCAN clustering Streaming Evaluator
> -
>
> Key: SOLR-10786
> URL: https://issues.apache.org/jira/browse/SOLR-10786
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Fix For: master (8.0), 7.3
>
> Attachments: SOLR-10786.patch, SOLR-10786.patch, SOLR-10786.patch
>
>
> The DBSCAN clustering Stream Evaluator will cluster numeric vectors using the 
> DBSCAN clustering algorithm.
> Clustering implementation will be provided by Apache Commons Math.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-10786) Add DBSCAN clustering Streaming Evaluator

2018-01-29 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344035#comment-16344035
 ] 

Joel Bernstein edited comment on SOLR-10786 at 1/29/18 9:14 PM:


Solr 7.3 has kmeans++ and fuzzyKmeans clustering. But DBSCAN clustering was 
just too slow I thought to be useful for Solr users. I will try it again 
sometime to see if it was just how I put things together, but it was painfully 
slow compared to kmeans++.

eigen and singular value decompositions are planned for Solr 7.4, so other 
clustering techniques such as PCA and LSA are on the way.

 


was (Author: joel.bernstein):
Solr 7.3 has kmeans++ and fuzzyKmeans clustering. But DBSCAN clustering just to 
slow I thought to be useful for the Solr users. I will try it again sometime to 
see if it was just how I put things together, but it was painfully slow 
compared to kmeans.

eigen and singular value decomposition are planned of for Solr 7.4, so other 
clustering techniques such as PCA and LSA are on the way.

 

> Add DBSCAN clustering Streaming Evaluator
> -
>
> Key: SOLR-10786
> URL: https://issues.apache.org/jira/browse/SOLR-10786
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Fix For: master (8.0), 7.3
>
> Attachments: SOLR-10786.patch, SOLR-10786.patch, SOLR-10786.patch
>
>
> The DBSCAN clustering Stream Evaluator will cluster numeric vectors using the 
> DBSCAN clustering algorithm.
> Clustering implementation will be provided by Apache Commons Math.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-10786) Add DBSCAN clustering Streaming Evaluator

2018-01-29 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344301#comment-16344301
 ] 

Joel Bernstein edited comment on SOLR-10786 at 1/30/18 12:49 AM:
-

[~dsmiley], do you think you put together a lat/long distance measure that we 
could cluster lat/long coordinates with?

We would need to follow this interface to plug into Apache Commons Math DBSCAN 
clustering:

[https://commons.apache.org/proper/commons-math/javadocs/api-3.6/org/apache/commons/math3/ml/distance/DistanceMeasure.html]


was (Author: joel.bernstein):
[~dsmiley], do you think you put together a lat/long distance measure the we 
could cluster lat/long coordinates with?

 

We would need to follow this interface to plug into Apache Commons Math DBSCAN 
clustering:

https://commons.apache.org/proper/commons-math/javadocs/api-3.6/org/apache/commons/math3/ml/distance/DistanceMeasure.html

> Add DBSCAN clustering Streaming Evaluator
> -
>
> Key: SOLR-10786
> URL: https://issues.apache.org/jira/browse/SOLR-10786
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Fix For: master (8.0), 7.3
>
> Attachments: SOLR-10786.patch, SOLR-10786.patch, SOLR-10786.patch
>
>
> The DBSCAN clustering Stream Evaluator will cluster numeric vectors using the 
> DBSCAN clustering algorithm.
> Clustering implementation will be provided by Apache Commons Math.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-10786) Add DBSCAN clustering Streaming Evaluator

2018-01-30 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16346164#comment-16346164
 ] 

Joel Bernstein edited comment on SOLR-10786 at 1/31/18 2:45 AM:


I did some benchmarking and DBSCAN is very fast with 2 dimensional vectors. So, 
lat/long DBSCAN clustering is looking like a very promising use case. I'll also 
add haversinMeter and haversinKilometer distance to the distance() function, to 
support the creation of distance matrices. This will open the door to other 
machine learning algorithms such as spectral clustering.


was (Author: joel.bernstein):
I did some benchmarking and DBSCAN is very fast with 2 dimensional vectors. So, 
lat/long DBSCAN clustering is looking like a very promising use case. I'll also 
add haversinMeter and haversinKilometer distance to the distance() function, to 
support the creation of distance matrices. This will open the door to other 
machine learning algorithms on lat/long data.

> Add DBSCAN clustering Streaming Evaluator
> -
>
> Key: SOLR-10786
> URL: https://issues.apache.org/jira/browse/SOLR-10786
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Fix For: master (8.0), 7.3
>
> Attachments: SOLR-10786.patch, SOLR-10786.patch, SOLR-10786.patch
>
>
> The DBSCAN clustering Stream Evaluator will cluster numeric vectors using the 
> DBSCAN clustering algorithm.
> Clustering implementation will be provided by Apache Commons Math.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org