[jira] [Commented] (SPARK-7334) Implement RandomProjection for Dimensionality Reduction

2016-10-24 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15603030#comment-15603030
 ] 

Joseph K. Bradley commented on SPARK-7334:
--

[~sebalf] I'm sorry we weren't able to get your PR in.  I do appreciate your 
work on this!  Looking back, I believe the functionality in this JIRA should be 
a subset of what is in the PR for [SPARK-5992], so I'll go ahead and close this 
JIRA issue.  If you have time, feedback on the current PR for [SPARK-5992] 
would be very valuable.  Thanks very much.

> Implement RandomProjection for Dimensionality Reduction
> ---
>
> Key: SPARK-7334
> URL: https://issues.apache.org/jira/browse/SPARK-7334
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Sebastian Alfers
>Priority: Minor
>
> Implement RandomProjection (RP) for dimensionality reduction
> RP is a popular approach to reduce the amount of data while preserving a 
> reasonable amount of information (pairwise distance) of you data [1][2]
> - [1] http://www.yaroslavvb.com/papers/achlioptas-database.pdf
> - [2] 
> http://people.inf.elte.hu/fekete/algoritmusok_msc/dimenzio_csokkentes/randon_projection_kdd.pdf
> I compared different implementations of that algorithm:
> - https://github.com/sebastian-alfers/random-projection-python



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7334) Implement RandomProjection for Dimensionality Reduction

2015-11-17 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009543#comment-15009543
 ] 

Joseph K. Bradley commented on SPARK-7334:
--

I'd still like to get random projections and maybe other types of LSH into 
Spark, but it's true that having a package takes some of the pressure off of 
this goal.  We're trying to figure out the roadmap for the next release 
currently, and that should give me a better idea of what can be prioritized.  
Thanks for your patience!

> Implement RandomProjection for Dimensionality Reduction
> ---
>
> Key: SPARK-7334
> URL: https://issues.apache.org/jira/browse/SPARK-7334
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Sebastian Alfers
>Priority: Minor
>
> Implement RandomProjection (RP) for dimensionality reduction
> RP is a popular approach to reduce the amount of data while preserving a 
> reasonable amount of information (pairwise distance) of you data [1][2]
> - [1] http://www.yaroslavvb.com/papers/achlioptas-database.pdf
> - [2] 
> http://people.inf.elte.hu/fekete/algoritmusok_msc/dimenzio_csokkentes/randon_projection_kdd.pdf
> I compared different implementations of that algorithm:
> - https://github.com/sebastian-alfers/random-projection-python



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7334) Implement RandomProjection for Dimensionality Reduction

2015-11-09 Thread Sebastian Alfers (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14996470#comment-14996470
 ] 

Sebastian Alfers commented on SPARK-7334:
-

It this still relevant? [~josephkb] 

I saw a discussion about LSH here: 
https://issues.apache.org/jira/browse/SPARK-5992

> Implement RandomProjection for Dimensionality Reduction
> ---
>
> Key: SPARK-7334
> URL: https://issues.apache.org/jira/browse/SPARK-7334
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Sebastian Alfers
>Priority: Minor
>
> Implement RandomProjection (RP) for dimensionality reduction
> RP is a popular approach to reduce the amount of data while preserving a 
> reasonable amount of information (pairwise distance) of you data [1][2]
> - [1] http://www.yaroslavvb.com/papers/achlioptas-database.pdf
> - [2] 
> http://people.inf.elte.hu/fekete/algoritmusok_msc/dimenzio_csokkentes/randon_projection_kdd.pdf
> I compared different implementations of that algorithm:
> - https://github.com/sebastian-alfers/random-projection-python



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7334) Implement RandomProjection for Dimensionality Reduction

2015-06-30 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14608900#comment-14608900
 ] 

Joseph K. Bradley commented on SPARK-7334:
--

Sorry for the slow review; there is a big backlog of PRs we're trying to work 
through.  I hope to look at this later this week.  [~yuu.ishik...@gmail.com] I 
saw you are taking a look at the PR; I appreciate it!

> Implement RandomProjection for Dimensionality Reduction
> ---
>
> Key: SPARK-7334
> URL: https://issues.apache.org/jira/browse/SPARK-7334
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Sebastian Alfers
>Priority: Minor
>
> Implement RandomProjection (RP) for dimensionality reduction
> RP is a popular approach to reduce the amount of data while preserving a 
> reasonable amount of information (pairwise distance) of you data [1][2]
> - [1] http://www.yaroslavvb.com/papers/achlioptas-database.pdf
> - [2] 
> http://people.inf.elte.hu/fekete/algoritmusok_msc/dimenzio_csokkentes/randon_projection_kdd.pdf
> I compared different implementations of that algorithm:
> - https://github.com/sebastian-alfers/random-projection-python



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7334) Implement RandomProjection for Dimensionality Reduction

2015-06-30 Thread Sebastian Alfers (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14607994#comment-14607994
 ] 

Sebastian Alfers commented on SPARK-7334:
-

[~josephkb] any progress on this one?

> Implement RandomProjection for Dimensionality Reduction
> ---
>
> Key: SPARK-7334
> URL: https://issues.apache.org/jira/browse/SPARK-7334
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Sebastian Alfers
>Priority: Minor
>
> Implement RandomProjection (RP) for dimensionality reduction
> RP is a popular approach to reduce the amount of data while preserving a 
> reasonable amount of information (pairwise distance) of you data [1][2]
> - [1] http://www.yaroslavvb.com/papers/achlioptas-database.pdf
> - [2] 
> http://people.inf.elte.hu/fekete/algoritmusok_msc/dimenzio_csokkentes/randon_projection_kdd.pdf
> I compared different implementations of that algorithm:
> - https://github.com/sebastian-alfers/random-projection-python



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7334) Implement RandomProjection for Dimensionality Reduction

2015-06-17 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14590711#comment-14590711
 ] 

Yu Ishikawa commented on SPARK-7334:


[~sebalf], I'm very sorry for the delay of my response. And thank you for your 
PR. I'm afraid I haven't taken a look at your PR. So I will see it in a few 
days.

In my opinion, if your idea seems to be fit for the design I'm thinking, your 
PR should be merged before creating LSH abstraction. And then we should change 
or depreciating sometimes your implementation, if necessary. Anyway, I will 
check it and get back to you by next Monday or Tuesday. If there seems to be no 
problem, I'll ask any commiter to review your PR.

> Implement RandomProjection for Dimensionality Reduction
> ---
>
> Key: SPARK-7334
> URL: https://issues.apache.org/jira/browse/SPARK-7334
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Sebastian Alfers
>Priority: Minor
>
> Implement RandomProjection (RP) for dimensionality reduction
> RP is a popular approach to reduce the amount of data while preserving a 
> reasonable amount of information (pairwise distance) of you data [1][2]
> - [1] http://www.yaroslavvb.com/papers/achlioptas-database.pdf
> - [2] 
> http://people.inf.elte.hu/fekete/algoritmusok_msc/dimenzio_csokkentes/randon_projection_kdd.pdf
> I compared different implementations of that algorithm:
> - https://github.com/sebastian-alfers/random-projection-python



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7334) Implement RandomProjection for Dimensionality Reduction

2015-06-17 Thread Sebastian Alfers (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589388#comment-14589388
 ] 

Sebastian Alfers commented on SPARK-7334:
-

I implemented RP as a transformer to be able to serialize the model and re-use 
it later.
Also, the actual implementation of RP is separated and (theoretically) can be 
used in LSH.

I implemented RP as a "stand alone" method as a replacement / comparison to PCA.

> Implement RandomProjection for Dimensionality Reduction
> ---
>
> Key: SPARK-7334
> URL: https://issues.apache.org/jira/browse/SPARK-7334
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Sebastian Alfers
>Priority: Minor
>
> Implement RandomProjection (RP) for dimensionality reduction
> RP is a popular approach to reduce the amount of data while preserving a 
> reasonable amount of information (pairwise distance) of you data [1][2]
> - [1] http://www.yaroslavvb.com/papers/achlioptas-database.pdf
> - [2] 
> http://people.inf.elte.hu/fekete/algoritmusok_msc/dimenzio_csokkentes/randon_projection_kdd.pdf
> I compared different implementations of that algorithm:
> - https://github.com/sebastian-alfers/random-projection-python



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7334) Implement RandomProjection for Dimensionality Reduction

2015-06-16 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588950#comment-14588950
 ] 

Joseph K. Bradley commented on SPARK-7334:
--

I linked this issue to the LSH issue since LSH can use random projections.  
Thinking more about it, random projection should probably be its own feature 
transformer since it's often not thought of as an LSH method.

So backpedaling partway...the main thing to check w.r.t. [SPARK-5992] is 
whether an LSH method based on random projections can use your code.  
[~yuu.ishik...@gmail.com] If there are design choices affecting your LSH plans, 
can you please comment here?

Thanks!

People are catching up still after all of the release QA...but reviews should 
resume in full force before long.

> Implement RandomProjection for Dimensionality Reduction
> ---
>
> Key: SPARK-7334
> URL: https://issues.apache.org/jira/browse/SPARK-7334
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Sebastian Alfers
>Priority: Minor
>
> Implement RandomProjection (RP) for dimensionality reduction
> RP is a popular approach to reduce the amount of data while preserving a 
> reasonable amount of information (pairwise distance) of you data [1][2]
> - [1] http://www.yaroslavvb.com/papers/achlioptas-database.pdf
> - [2] 
> http://people.inf.elte.hu/fekete/algoritmusok_msc/dimenzio_csokkentes/randon_projection_kdd.pdf
> I compared different implementations of that algorithm:
> - https://github.com/sebastian-alfers/random-projection-python



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7334) Implement RandomProjection for Dimensionality Reduction

2015-06-16 Thread Sebastian Alfers (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587691#comment-14587691
 ] 

Sebastian Alfers commented on SPARK-7334:
-

I tried to contact [~yuu.ishik...@gmail.com] but got no reply - how can we 
continue on this? What needs to be done? 

Maybe we can finish my PR and update the API if necessary?

> Implement RandomProjection for Dimensionality Reduction
> ---
>
> Key: SPARK-7334
> URL: https://issues.apache.org/jira/browse/SPARK-7334
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Sebastian Alfers
>Priority: Minor
>
> Implement RandomProjection (RP) for dimensionality reduction
> RP is a popular approach to reduce the amount of data while preserving a 
> reasonable amount of information (pairwise distance) of you data [1][2]
> - [1] http://www.yaroslavvb.com/papers/achlioptas-database.pdf
> - [2] 
> http://people.inf.elte.hu/fekete/algoritmusok_msc/dimenzio_csokkentes/randon_projection_kdd.pdf
> I compared different implementations of that algorithm:
> - https://github.com/sebastian-alfers/random-projection-python



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7334) Implement RandomProjection for Dimensionality Reduction

2015-06-05 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14575083#comment-14575083
 ] 

Joseph K. Bradley commented on SPARK-7334:
--

I won't be able to look at the PR right away, but I will try to before long.  
(It looks very well-documented and tested!)  In the meantime, can you please 
view [~yuu.ishik...@gmail.com]'s design doc on [SPARK-5992] and discuss a good 
common interface?

> Implement RandomProjection for Dimensionality Reduction
> ---
>
> Key: SPARK-7334
> URL: https://issues.apache.org/jira/browse/SPARK-7334
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Sebastian Alfers
>Priority: Minor
>
> Implement RandomProjection (RP) for dimensionality reduction
> RP is a popular approach to reduce the amount of data while preserving a 
> reasonable amount of information (pairwise distance) of you data [1][2]
> - [1] http://www.yaroslavvb.com/papers/achlioptas-database.pdf
> - [2] 
> http://people.inf.elte.hu/fekete/algoritmusok_msc/dimenzio_csokkentes/randon_projection_kdd.pdf
> I compared different implementations of that algorithm:
> - https://github.com/sebastian-alfers/random-projection-python



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7334) Implement RandomProjection for Dimensionality Reduction

2015-06-03 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14570884#comment-14570884
 ] 

Apache Spark commented on SPARK-7334:
-

User 'sebastian-alfers' has created a pull request for this issue:
https://github.com/apache/spark/pull/6613

> Implement RandomProjection for Dimensionality Reduction
> ---
>
> Key: SPARK-7334
> URL: https://issues.apache.org/jira/browse/SPARK-7334
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Sebastian Alfers
>Priority: Minor
>
> Implement RandomProjection (RP) for dimensionality reduction
> RP is a popular approach to reduce the amount of data while preserving a 
> reasonable amount of information (pairwise distance) of you data [1][2]
> - [1] http://www.yaroslavvb.com/papers/achlioptas-database.pdf
> - [2] 
> http://people.inf.elte.hu/fekete/algoritmusok_msc/dimenzio_csokkentes/randon_projection_kdd.pdf
> I compared different implementations of that algorithm:
> - https://github.com/sebastian-alfers/random-projection-python



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org