subject:"\[jira\] \[Updated\] \(SPARK\-5405\) Spark clusterer should support high dimensional data"

[jira] [Updated] (SPARK-5405) Spark clusterer should support high dimensional data

2015-02-23 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-5405:
-
Labels: clustering  (was: )

 Spark clusterer should support high dimensional data
 

 Key: SPARK-5405
 URL: https://issues.apache.org/jira/browse/SPARK-5405
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.2.0
Reporter: Derrick Burns
  Labels: clustering
   Original Estimate: 504h
  Remaining Estimate: 504h

 The MLLIB clusterer works well for low  (200) dimensional data.  However, 
 performance is linear with the number of dimensions.  So, for practical 
 purposes, it is not very useful for high dimensional data.  
 Depending on the data type, one can embed the high dimensional data into 
 lower dimensional spaces in a distance-preserving way.  The Spark clusterer 
 should support such embedding.
 An example implementation that supports high dimensional data is here:
 https://github.com/derrickburns/generalized-kmeans-clustering



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5405) Spark clusterer should support high dimensional data

2015-01-25 Thread Derrick Burns (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Derrick Burns updated SPARK-5405:
-
Description:
The MLLIB clusterer works well for low (200) dimensional data. However,
performance is linear with the number of dimensions. So, for practical
purposes, it is not very useful for high dimensional data.

Depending on the data type, one can embed the high dimensional data into lower
dimensional spaces in a distance-preserving way. The Spark clusterer should
support such embedding.

An example implementation that supports high dimensional data is here:
https://github.com/derrickburns/generalized-kmeans-clustering

was:
The MLLIB clusterer works well for low (200) dimensional data. However,
performance is linear with the number of dimensions. So, for practical
purposes, it is not very useful for high dimensional data.

Depending on the data type, one can embed the high dimensional data into lower
dimensional spaces in a distance-preserving way. The Spark clusterer should
support such embedding.

Spark clusterer should support high dimensional data

Key: SPARK-5405
URL: https://issues.apache.org/jira/browse/SPARK-5405
Project: Spark
Issue Type: New Feature
Components: MLlib
Affects Versions: 1.2.0
Reporter: Derrick Burns
Original Estimate: 504h
Remaining Estimate: 504h

The MLLIB clusterer works well for low (200) dimensional data. However,
performance is linear with the number of dimensions. So, for practical
purposes, it is not very useful for high dimensional data.
Depending on the data type, one can embed the high dimensional data into
lower dimensional spaces in a distance-preserving way. The Spark clusterer
should support such embedding.
An example implementation that supports high dimensional data is here:
https://github.com/derrickburns/generalized-kmeans-clustering

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5405) Spark clusterer should support high dimensional data

[jira] [Updated] (SPARK-5405) Spark clusterer should support high dimensional data

2 matches

Site Navigation

Mail list logo

Footer information