[jira] [Created] (SPARK-17785) Find a more robust way to detect the existing of the initialModel

2016-10-05 Thread Xusen Yin (JIRA)
Xusen Yin created SPARK-17785: - Summary: Find a more robust way to detect the existing of the initialModel Key: SPARK-17785 URL: https://issues.apache.org/jira/browse/SPARK-17785 Project: Spark

[jira] [Created] (SPARK-17784) Add fromCenters method for KMeans

2016-10-05 Thread Xusen Yin (JIRA)
Xusen Yin created SPARK-17784: - Summary: Add fromCenters method for KMeans Key: SPARK-17784 URL: https://issues.apache.org/jira/browse/SPARK-17784 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-16581) Making JVM backend calling functions public

2016-08-23 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15434030#comment-15434030 ] Xusen Yin commented on SPARK-16581: --- Sure, no problem. > Making JVM backend calling functions public >

[jira] [Commented] (SPARK-14381) Review spark.ml parity for feature transformers

2016-08-19 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428557#comment-15428557 ] Xusen Yin commented on SPARK-14381: --- I believe we can resolve this. > Review spark.ml parity for

[jira] [Commented] (SPARK-16581) Making JVM backend calling functions public

2016-08-17 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15425792#comment-15425792 ] Xusen Yin commented on SPARK-16581: --- I'll find related JIRAs and link them if possible. > Making JVM

[jira] [Commented] (SPARK-16581) Making JVM backend calling functions public

2016-08-17 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15425775#comment-15425775 ] Xusen Yin commented on SPARK-16581: --- [~shivaram] [~sunrui] Still work on it? I can help work on this if

[jira] [Commented] (SPARK-16857) CrossValidator and KMeans throws IllegalArgumentException

2016-08-02 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405069#comment-15405069 ] Xusen Yin commented on SPARK-16857: --- I agree the cluster assignments could be arbitrary. Yes under this

[jira] [Commented] (SPARK-16857) CrossValidator and KMeans throws IllegalArgumentException

2016-08-02 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405050#comment-15405050 ] Xusen Yin commented on SPARK-16857: --- Using CrossValidator with KMeans should be supported. As a kind of

[jira] [Comment Edited] (SPARK-3728) RandomForest: Learn models too large to store in memory

2016-07-17 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15381583#comment-15381583 ] Xusen Yin edited comment on SPARK-3728 at 7/17/16 11:46 PM: Not now. Because I

[jira] [Commented] (SPARK-3728) RandomForest: Learn models too large to store in memory

2016-07-17 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15381583#comment-15381583 ] Xusen Yin commented on SPARK-3728: -- Not now. Because I thought the BFS style could reach the best

[jira] [Created] (SPARK-16558) examples/mllib/LDAExample should use MLVector instead of MLlib Vector

2016-07-14 Thread Xusen Yin (JIRA)
Xusen Yin created SPARK-16558: - Summary: examples/mllib/LDAExample should use MLVector instead of MLlib Vector Key: SPARK-16558 URL: https://issues.apache.org/jira/browse/SPARK-16558 Project: Spark

[jira] [Commented] (SPARK-16447) LDA wrapper in SparkR

2016-07-08 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15368149#comment-15368149 ] Xusen Yin commented on SPARK-16447: --- [~mengxr] I'd like to work on this. > LDA wrapper in SparkR >

[jira] [Updated] (SPARK-16372) Retag RDD to tallSkinnyQR of RowMatrix

2016-07-04 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xusen Yin updated SPARK-16372: -- Summary: Retag RDD to tallSkinnyQR of RowMatrix (was: RowMatrix constructor should use retag for Java

[jira] [Created] (SPARK-16372) RowMatrix constructor should use retag for Java compatibility

2016-07-04 Thread Xusen Yin (JIRA)
Xusen Yin created SPARK-16372: - Summary: RowMatrix constructor should use retag for Java compatibility Key: SPARK-16372 URL: https://issues.apache.org/jira/browse/SPARK-16372 Project: Spark

[jira] [Commented] (SPARK-16372) RowMatrix constructor should use retag for Java compatibility

2016-07-04 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15361822#comment-15361822 ] Xusen Yin commented on SPARK-16372: --- SPARK-11497 fixed this for PySpark. > RowMatrix constructor

[jira] [Created] (SPARK-16369) tallSkinnyQR of RowMatrix should aware of empty partition

2016-07-04 Thread Xusen Yin (JIRA)
Xusen Yin created SPARK-16369: - Summary: tallSkinnyQR of RowMatrix should aware of empty partition Key: SPARK-16369 URL: https://issues.apache.org/jira/browse/SPARK-16369 Project: Spark Issue

[jira] [Commented] (SPARK-16144) Add a separate Rd for ML generic methods: read.ml, write.ml, summary, predict

2016-06-27 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15351942#comment-15351942 ] Xusen Yin commented on SPARK-16144: --- I'd like to work on this. > Add a separate Rd for ML generic

[jira] [Commented] (SPARK-15574) Python meta-algorithms in Scala

2016-06-15 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332475#comment-15332475 ] Xusen Yin commented on SPARK-15574: --- I just finished the prototype of PythonTransformer in Scala as the

[jira] [Commented] (SPARK-11106) Should ML Models contains single models or Pipelines?

2016-06-07 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15319685#comment-15319685 ] Xusen Yin commented on SPARK-11106: --- RFormula is easy to use, but it may not always do right things.

[jira] [Commented] (SPARK-15574) Python meta-algorithms in Scala

2016-06-06 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317503#comment-15317503 ] Xusen Yin commented on SPARK-15574: --- [~josephkb] Can I work on this one? > Python meta-algorithms in

[jira] [Commented] (SPARK-14381) Review spark.ml parity for feature transformers

2016-06-06 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317459#comment-15317459 ] Xusen Yin commented on SPARK-14381: --- Comparing mllib.feature with ml.feature, there are only two APIs

[jira] [Created] (SPARK-15793) Word2vec in ML package should have maxSentenceLength method

2016-06-06 Thread Xusen Yin (JIRA)
Xusen Yin created SPARK-15793: - Summary: Word2vec in ML package should have maxSentenceLength method Key: SPARK-15793 URL: https://issues.apache.org/jira/browse/SPARK-15793 Project: Spark Issue

[jira] [Commented] (SPARK-14381) Review spark.ml parity for feature transformers

2016-06-03 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15315060#comment-15315060 ] Xusen Yin commented on SPARK-14381: --- I can work on this one. > Review spark.ml parity for feature

[jira] [Commented] (SPARK-3728) RandomForest: Learn models too large to store in memory

2016-06-03 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15314795#comment-15314795 ] Xusen Yin commented on SPARK-3728: -- Hi [~josephkb], as I [surveyed on

[jira] [Comment Edited] (SPARK-13868) Random forest accuracy exploration

2016-06-02 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15313400#comment-15313400 ] Xusen Yin edited comment on SPARK-13868 at 6/3/16 12:40 AM: [~josephkb]

[jira] [Commented] (SPARK-13868) Random forest accuracy exploration

2016-06-02 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15313400#comment-15313400 ] Xusen Yin commented on SPARK-13868: --- [~josephkb] [~tanwanirahul] Here is what I found: 1. Dataset

[jira] [Updated] (SPARK-14973) The CrossValidator and TrainValidationSplit miss the seed when saving and loading

2016-05-02 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xusen Yin updated SPARK-14973: -- Description: The CrossValidator and TrainValidationSplit miss the seed when saving and loading. Need

[jira] [Resolved] (SPARK-14302) Python examples code merge and clean up

2016-05-01 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xusen Yin resolved SPARK-14302. --- Resolution: Won't Fix > Python examples code merge and clean up >

[jira] [Commented] (SPARK-14302) Python examples code merge and clean up

2016-05-01 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266093#comment-15266093 ] Xusen Yin commented on SPARK-14302: --- I'll close it, anything else I'll let you know. Thanks! > Python

[jira] [Commented] (SPARK-14302) Python examples code merge and clean up

2016-05-01 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266006#comment-15266006 ] Xusen Yin commented on SPARK-14302: --- [~kanjilal] Thanks for working on this. However, I check the

[jira] [Commented] (SPARK-14302) Python examples code merge and clean up

2016-04-28 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262828#comment-15262828 ] Xusen Yin commented on SPARK-14302: --- Thanks! And sorry for the late response, I forgot it. > Python

[jira] [Commented] (SPARK-14302) Python examples code merge and clean up

2016-04-28 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262810#comment-15262810 ] Xusen Yin commented on SPARK-14302: --- We should leave them unmerged e.g. ml.bisecting_k_means_example

[jira] [Commented] (SPARK-14302) Python examples code merge and clean up

2016-04-28 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262794#comment-15262794 ] Xusen Yin commented on SPARK-14302: --- Hi Saikat, any updates? > Python examples code merge and clean up

[jira] [Commented] (SPARK-14973) The CrossValidator and TrainValidationSplit miss the seed when saving and loading

2016-04-28 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261608#comment-15261608 ] Xusen Yin commented on SPARK-14973: --- Will fix it with SPARK-14706 > The CrossValidator and

[jira] [Created] (SPARK-14973) The CrossValidator and TrainValidationSplit miss the seed when saving and loading

2016-04-28 Thread Xusen Yin (JIRA)
Xusen Yin created SPARK-14973: - Summary: The CrossValidator and TrainValidationSplit miss the seed when saving and loading Key: SPARK-14973 URL: https://issues.apache.org/jira/browse/SPARK-14973 Project:

[jira] [Created] (SPARK-14931) Mismatched default values between pipelines in Spark and PySpark

2016-04-26 Thread Xusen Yin (JIRA)
Xusen Yin created SPARK-14931: - Summary: Mismatched default values between pipelines in Spark and PySpark Key: SPARK-14931 URL: https://issues.apache.org/jira/browse/SPARK-14931 Project: Spark

[jira] [Created] (SPARK-14924) OneVsRest with classifier in estimatorParamMaps of tuning fail to persistence

2016-04-26 Thread Xusen Yin (JIRA)
Xusen Yin created SPARK-14924: - Summary: OneVsRest with classifier in estimatorParamMaps of tuning fail to persistence Key: SPARK-14924 URL: https://issues.apache.org/jira/browse/SPARK-14924 Project:

[jira] [Commented] (SPARK-11337) Make example code in user guide testable

2016-04-25 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256751#comment-15256751 ] Xusen Yin commented on SPARK-11337: --- [~mengxr] We can close this now. > Make example code in user

[jira] [Closed] (SPARK-11399) Include_example should support labels to cut out different parts in one example code

2016-04-25 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xusen Yin closed SPARK-11399. - Resolution: Won't Fix > Include_example should support labels to cut out different parts in one >

[jira] [Commented] (SPARK-14706) Python ML persistence integration test

2016-04-21 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252435#comment-15252435 ] Xusen Yin commented on SPARK-14706: --- Sure. I'll take care of it. There are more issues with

[jira] [Commented] (SPARK-14706) Python ML persistence integration test

2016-04-18 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246950#comment-15246950 ] Xusen Yin commented on SPARK-14706: --- I am starting write it. > Python ML persistence integration test

[jira] [Updated] (SPARK-14440) Remove PySpark ml.pipeline's specific Reader and Writer

2016-04-14 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xusen Yin updated SPARK-14440: -- Description: Since the PipelineMLWriter/PipelineMLReader/PipelineModelMLWriter/PipelineModelMLReader

[jira] [Updated] (SPARK-14440) Remove PySpark ml.pipeline's specific Reader and Writer

2016-04-14 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xusen Yin updated SPARK-14440: -- Description: Since the PipelineMLWriter/PipelineMLReader/PipelineModelMLWriter/PipelineModelMLReader

[jira] [Updated] (SPARK-14440) Remove PySpark ml.pipeline's specific Reader and Writer

2016-04-14 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xusen Yin updated SPARK-14440: -- Description: Since the PipelineMLWriter/PipelineMLReader/PipelineModelMLWriter/PipelineModelMLReader

[jira] [Updated] (SPARK-14440) Remove PySpark ml.pipeline's specific Reader and Writer

2016-04-14 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xusen Yin updated SPARK-14440: -- Description: Remove * PipelineMLWriter * PipelineMLReader * PipelineModelMLWriter *

[jira] [Commented] (SPARK-14440) Remove PySpark ml.pipeline's specific Reader and Writer

2016-04-14 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242147#comment-15242147 ] Xusen Yin commented on SPARK-14440: --- Sorry for the late response, I'll update it soon. > Remove

[jira] [Commented] (SPARK-14306) PySpark ml.classification OneVsRest support export/import

2016-04-13 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239678#comment-15239678 ] Xusen Yin commented on SPARK-14306: --- Yes, but blocked by this

[jira] [Created] (SPARK-14440) Remove PySpark ml.pipeline's specific Reader and Writer

2016-04-06 Thread Xusen Yin (JIRA)
Xusen Yin created SPARK-14440: - Summary: Remove PySpark ml.pipeline's specific Reader and Writer Key: SPARK-14440 URL: https://issues.apache.org/jira/browse/SPARK-14440 Project: Spark Issue

[jira] [Commented] (SPARK-14301) Java examples code merge and clean up

2016-04-06 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228693#comment-15228693 ] Xusen Yin commented on SPARK-14301: --- Thanks, we'll make sure that. :) > Java examples code merge and

[jira] [Updated] (SPARK-14299) Scala ML examples code merge and clean up

2016-04-01 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xusen Yin updated SPARK-14299: -- Description: Duplicated code that I found in scala/examples/ml: * scala/ml **

[jira] [Commented] (SPARK-14306) PySpark ml.classification OneVsRest support export/import

2016-03-31 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220519#comment-15220519 ] Xusen Yin commented on SPARK-14306: --- start work on it now. > PySpark ml.classification OneVsRest

[jira] [Commented] (SPARK-14302) Python examples code merge and clean up

2016-03-31 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220337#comment-15220337 ] Xusen Yin commented on SPARK-14302: --- This JIRA only focuses on Python examples. I.e.

[jira] [Commented] (SPARK-14302) Python examples code merge and clean up

2016-03-31 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220324#comment-15220324 ] Xusen Yin commented on SPARK-14302: --- And this JIRA is to delete or merge some example codes, not to

[jira] [Commented] (SPARK-14302) Python examples code merge and clean up

2016-03-31 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220321#comment-15220321 ] Xusen Yin commented on SPARK-14302: --- Java code is in this JIRA:

[jira] [Commented] (SPARK-14302) Python examples code merge and clean up

2016-03-31 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220304#comment-15220304 ] Xusen Yin commented on SPARK-14302: --- Sure, thanks > Python examples code merge and clean up >

[jira] [Commented] (SPARK-14301) Java examples code merge and clean up

2016-03-31 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220242#comment-15220242 ] Xusen Yin commented on SPARK-14301: --- Go ahead. Thanks! > Java examples code merge and clean up >

[jira] [Closed] (SPARK-13462) Vector serialization error in example code of ModelSelectionViaTrainValidationSplitExample and JavaModelSelectionViaTrainValidationSplitExample

2016-03-31 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xusen Yin closed SPARK-13462. - Resolution: Won't Fix > Vector serialization error in example code of >

[jira] [Commented] (SPARK-13462) Vector serialization error in example code of ModelSelectionViaTrainValidationSplitExample and JavaModelSelectionViaTrainValidationSplitExample

2016-03-31 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220239#comment-15220239 ] Xusen Yin commented on SPARK-13462: --- Well, this is a false alarm. They can run with current github

[jira] [Commented] (SPARK-14300) Scala MLlib examples code merge and clean up

2016-03-31 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220182#comment-15220182 ] Xusen Yin commented on SPARK-14300: --- Thanks! Be sure to check every code example. > Scala MLlib

[jira] [Updated] (SPARK-14299) Scala ML examples code merge and clean up

2016-03-31 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xusen Yin updated SPARK-14299: -- Description: Duplicated code that I found in scala/examples/ml: * scala/ml **

[jira] [Updated] (SPARK-14299) Scala ML examples code merge and clean up

2016-03-31 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xusen Yin updated SPARK-14299: -- Description: Duplicated code that I found in scala/examples/ml: * scala/ml **

[jira] [Updated] (SPARK-14299) Scala ML examples code merge and clean up

2016-03-31 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xusen Yin updated SPARK-14299: -- Description: Duplicated code that I found in scala/examples/ml: * scala/ml **

[jira] [Updated] (SPARK-14299) Scala ML examples code merge and clean up

2016-03-31 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xusen Yin updated SPARK-14299: -- Description: Duplicated code that I found in scala/examples/ml: * scala/ml **

[jira] [Updated] (SPARK-14299) Scala ML examples code merge and clean up

2016-03-31 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xusen Yin updated SPARK-14299: -- Description: Duplicated code that I found in scala/examples/ml: * scala/ml **

[jira] [Commented] (SPARK-14041) Locate possible duplicates and group them into subtasks

2016-03-31 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220083#comment-15220083 ] Xusen Yin commented on SPARK-14041: --- I've split them into 4 JIRAs. > Locate possible duplicates and

[jira] [Updated] (SPARK-14302) Python examples code merge and clean up

2016-03-31 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xusen Yin updated SPARK-14302: -- Description: Duplicated code that I found in python/examples/mllib and python/examples/ml: *

[jira] [Updated] (SPARK-14301) Java examples code merge and clean up

2016-03-31 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xusen Yin updated SPARK-14301: -- Description: Duplicated code that I found in java/examples/mllib and java/examples/ml: * java/ml **

[jira] [Updated] (SPARK-14300) Scala MLlib examples code merge and clean up

2016-03-31 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xusen Yin updated SPARK-14300: -- Description: Duplicated code that I found in scala/examples/mllib: * scala/mllib **

[jira] [Created] (SPARK-14299) Scala examples code merge and clean up

2016-03-31 Thread Xusen Yin (JIRA)
Xusen Yin created SPARK-14299: - Summary: Scala examples code merge and clean up Key: SPARK-14299 URL: https://issues.apache.org/jira/browse/SPARK-14299 Project: Spark Issue Type: Sub-task

[jira] [Created] (SPARK-14300) Scala MLlib examples code merge and clean up

2016-03-31 Thread Xusen Yin (JIRA)
Xusen Yin created SPARK-14300: - Summary: Scala MLlib examples code merge and clean up Key: SPARK-14300 URL: https://issues.apache.org/jira/browse/SPARK-14300 Project: Spark Issue Type: Sub-task

[jira] [Updated] (SPARK-14299) Scala ML examples code merge and clean up

2016-03-31 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xusen Yin updated SPARK-14299: -- Description: Duplicated code that I found in scala/examples/ml: * scala/ml **

[jira] [Updated] (SPARK-14299) Scala ML examples code merge and clean up

2016-03-31 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xusen Yin updated SPARK-14299: -- Description: Duplicated code that I found in scala/examples/ml: * scala/ml **

[jira] [Updated] (SPARK-14041) Locate possible duplicates and group them into subtasks

2016-03-31 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xusen Yin updated SPARK-14041: -- Description: To find out all examples of ml/mllib that don't contain "example on": {code}grep -L

[jira] [Updated] (SPARK-14302) Python examples code merge and clean up

2016-03-31 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xusen Yin updated SPARK-14302: -- Description: Duplicated code that I found in python/examples/mllib and python/examples/ml: *

[jira] [Updated] (SPARK-14301) Java examples code merge and clean up

2016-03-31 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xusen Yin updated SPARK-14301: -- Description: Duplicated code that I found in java/examples/mllib and java/examples/ml: * java/ml **

[jira] [Updated] (SPARK-14301) Java examples code merge and clean up

2016-03-31 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xusen Yin updated SPARK-14301: -- Description: Duplicated code that I found in java/examples/mllib and java/examples/ml: * java/ml **

[jira] [Created] (SPARK-14302) Python examples code merge and clean up

2016-03-31 Thread Xusen Yin (JIRA)
Xusen Yin created SPARK-14302: - Summary: Python examples code merge and clean up Key: SPARK-14302 URL: https://issues.apache.org/jira/browse/SPARK-14302 Project: Spark Issue Type: Sub-task

[jira] [Updated] (SPARK-14300) Scala MLlib examples code merge and clean up

2016-03-31 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xusen Yin updated SPARK-14300: -- Description: Duplicated code that I found in scala/examples/mllib: * scala/mllib **

[jira] [Created] (SPARK-14301) Java examples code merge and clean up

2016-03-31 Thread Xusen Yin (JIRA)
Xusen Yin created SPARK-14301: - Summary: Java examples code merge and clean up Key: SPARK-14301 URL: https://issues.apache.org/jira/browse/SPARK-14301 Project: Spark Issue Type: Sub-task

[jira] [Updated] (SPARK-14299) Scala ML examples code merge and clean up

2016-03-31 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xusen Yin updated SPARK-14299: -- Summary: Scala ML examples code merge and clean up (was: Scala examples code merge and clean up) >

[jira] [Created] (SPARK-14181) TrainValidationSplit should have HasSeed

2016-03-27 Thread Xusen Yin (JIRA)
Xusen Yin created SPARK-14181: - Summary: TrainValidationSplit should have HasSeed Key: SPARK-14181 URL: https://issues.apache.org/jira/browse/SPARK-14181 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-13786) Pyspark ml.tuning support export/import

2016-03-27 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213357#comment-15213357 ] Xusen Yin commented on SPARK-13786: --- I have finished the CrossValidator, but need to wait until

[jira] [Commented] (SPARK-13786) Pyspark ml.tuning support export/import

2016-03-25 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212364#comment-15212364 ] Xusen Yin commented on SPARK-13786: --- I'll work on it. > Pyspark ml.tuning support export/import >

[jira] [Updated] (SPARK-14041) Locate possible duplicates and group them into subtasks

2016-03-25 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xusen Yin updated SPARK-14041: -- Description: To find out all examples of ml/mllib that don't contain "example on": {code}grep -L

[jira] [Updated] (SPARK-14041) Locate possible duplicates and group them into subtasks

2016-03-23 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xusen Yin updated SPARK-14041: -- Description: Please go through the current example code and list possible duplicates. To find out all

[jira] [Commented] (SPARK-14041) Locate possible duplicates and group them into subtasks

2016-03-22 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207417#comment-15207417 ] Xusen Yin commented on SPARK-14041: --- [~mengxr] Maybe no need to divide them into several JIRAs, since

[jira] [Updated] (SPARK-14041) Locate possible duplicates and group them into subtasks

2016-03-22 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xusen Yin updated SPARK-14041: -- Description: Please go through the current example code and list possible duplicates. Duplicates need

[jira] [Updated] (SPARK-14041) Locate possible duplicates and group them into subtasks

2016-03-22 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xusen Yin updated SPARK-14041: -- Description: Please go through the current example code and list possible duplicates. Duplicates need

[jira] [Commented] (SPARK-13461) Duplicated example code merge and cleanup

2016-03-20 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203569#comment-15203569 ] Xusen Yin commented on SPARK-13461: --- I delete it. It's from another JIRA > Duplicated example code

[jira] [Updated] (SPARK-13461) Duplicated example code merge and cleanup

2016-03-20 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xusen Yin updated SPARK-13461: -- Description: Merge duplicated code after we finishing the example code substitution. Duplications

[jira] [Commented] (SPARK-13461) Duplicated example code merge and cleanup

2016-03-19 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203076#comment-15203076 ] Xusen Yin commented on SPARK-13461: --- Yes we'll delete it. > Duplicated example code merge and cleanup

[jira] [Created] (SPARK-13993) PySpark ml.feature.RFormula/RFormulaModel support export/import

2016-03-19 Thread Xusen Yin (JIRA)
Xusen Yin created SPARK-13993: - Summary: PySpark ml.feature.RFormula/RFormulaModel support export/import Key: SPARK-13993 URL: https://issues.apache.org/jira/browse/SPARK-13993 Project: Spark

[jira] [Commented] (SPARK-13951) PySpark ml.pipeline support export/import - nested Piplines

2016-03-18 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198642#comment-15198642 ] Xusen Yin commented on SPARK-13951: --- I start work on it now. > PySpark ml.pipeline support

[jira] [Commented] (SPARK-13641) getModelFeatures of ml.api.r.SparkRWrapper cannot (always) reveal the original column names

2016-03-15 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15196765#comment-15196765 ] Xusen Yin commented on SPARK-13641: --- [~muralidh] I gonna close this JIRA since I find that it is

[jira] [Comment Edited] (SPARK-13641) getModelFeatures of ml.api.r.SparkRWrapper cannot (always) reveal the original column names

2016-03-15 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15196765#comment-15196765 ] Xusen Yin edited comment on SPARK-13641 at 3/16/16 5:00 AM: I gonna close

[jira] [Commented] (SPARK-11136) Warm-start support for ML estimator

2016-03-14 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193991#comment-15193991 ] Xusen Yin commented on SPARK-11136: --- I agree. Will add it in the new commit. Thanks! > Warm-start

[jira] [Commented] (SPARK-13868) Random forest accuracy exploration

2016-03-14 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193899#comment-15193899 ] Xusen Yin commented on SPARK-13868: --- I'd love to explore this. > Random forest accuracy exploration >

[jira] [Commented] (SPARK-11136) Warm-start support for ML estimator

2016-03-14 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193886#comment-15193886 ] Xusen Yin commented on SPARK-11136: --- This is a good point. Actually in our settings now, the new KMeans

[jira] [Updated] (SPARK-13461) Duplicated example code merge and cleanup

2016-03-07 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xusen Yin updated SPARK-13461: -- Description: Merge duplicated code after we finishing the example code substitution. Duplications

[jira] [Commented] (SPARK-13641) getModelFeatures of ml.api.r.SparkRWrapper cannot (always) reveal the original column names

2016-03-05 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15181872#comment-15181872 ] Xusen Yin commented on SPARK-13641: --- You can checkout code from

  1   2   3   4   >