[jira] [Commented] (SPARK-7008) An implementation of Factorization Machine (LibFM)

2015-10-23 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970870#comment-14970870 ] Nick Pentreath commented on SPARK-7008: --- Is this now going in 1.6 (as per SPARK-10324)? If so is

[jira] [Resolved] (SPARK-12296) Feature parity for pyspark.mllib StandardScalerModel

2015-12-22 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-12296. Resolution: Fixed Fix Version/s: 2.0.0 > Feature parity for pyspark.mllib

[jira] [Commented] (SPARK-12296) Feature parity for pyspark.mllib StandardScalerModel

2015-12-22 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15067725#comment-15067725 ] Nick Pentreath commented on SPARK-12296: Issue resolved by pull request 10298

[jira] [Updated] (SPARK-11922) Python API for ml.feature.QuantileDiscretizer

2015-12-17 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-11922: --- Assignee: holdenk > Python API for ml.feature.QuantileDiscretizer >

[jira] [Updated] (SPARK-12182) Distributed binning for trees in spark.ml

2015-12-17 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-12182: --- Assignee: Seth Hendrickson > Distributed binning for trees in spark.ml >

[jira] [Updated] (SPARK-12296) Feature parity for pyspark.mllib StandardScalerModel

2015-12-17 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-12296: --- Assignee: holdenk > Feature parity for pyspark.mllib StandardScalerModel >

[jira] [Resolved] (SPARK-15668) ml.feature: update check schema to avoid confusion when user use MLlib.vector as input type

2016-06-02 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-15668. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 13411

[jira] [Comment Edited] (SPARK-14811) ML, Graph 2.0 QA: API: New Scala APIs, docs

2016-06-02 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15313208#comment-15313208 ] Nick Pentreath edited comment on SPARK-14811 at 6/2/16 10:31 PM: -

[jira] [Comment Edited] (SPARK-14811) ML, Graph 2.0 QA: API: New Scala APIs, docs

2016-06-02 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15313208#comment-15313208 ] Nick Pentreath edited comment on SPARK-14811 at 6/2/16 10:31 PM: -

[jira] [Commented] (SPARK-14811) ML, Graph 2.0 QA: API: New Scala APIs, docs

2016-06-02 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15313208#comment-15313208 ] Nick Pentreath commented on SPARK-14811: Question on this - we seem to be inconsistent with the

[jira] [Resolved] (SPARK-15139) PySpark TreeEnsemble missing methods

2016-06-02 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-15139. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 12919

[jira] [Updated] (SPARK-15139) PySpark TreeEnsemble missing methods

2016-06-02 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-15139: --- Assignee: holdenk > PySpark TreeEnsemble missing methods >

[jira] [Created] (SPARK-15746) SchemaUtils.checkColumnType with VectorUDT prints instance details

2016-06-02 Thread Nick Pentreath (JIRA)
Nick Pentreath created SPARK-15746: -- Summary: SchemaUtils.checkColumnType with VectorUDT prints instance details Key: SPARK-15746 URL: https://issues.apache.org/jira/browse/SPARK-15746 Project:

[jira] [Updated] (SPARK-15746) SchemaUtils.checkColumnType with VectorUDT prints instance details in error message

2016-06-02 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-15746: --- Summary: SchemaUtils.checkColumnType with VectorUDT prints instance details in error message

[jira] [Resolved] (SPARK-15092) toDebugString missing from ML DecisionTreeClassifier

2016-06-02 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-15092. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 12919

[jira] [Commented] (SPARK-15746) SchemaUtils.checkColumnType with VectorUDT prints instance details in error message

2016-06-03 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15314627#comment-15314627 ] Nick Pentreath commented on SPARK-15746: I'd say hold off on working on it until we decide which

[jira] [Comment Edited] (SPARK-15447) Performance test for ALS in Spark 2.0

2016-06-03 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15314441#comment-15314441 ] Nick Pentreath edited comment on SPARK-15447 at 6/3/16 5:22 PM: Added a

[jira] [Commented] (SPARK-14811) ML, Graph 2.0 QA: API: New Scala APIs, docs

2016-06-03 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15314489#comment-15314489 ] Nick Pentreath commented on SPARK-14811: Yes, that does make sense. I will take a pass through

[jira] [Commented] (SPARK-15447) Performance test for ALS in Spark 2.0

2016-06-03 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15314441#comment-15314441 ] Nick Pentreath commented on SPARK-15447: Added a second tab to the sheet for testing DF-based API

[jira] [Created] (SPARK-15788) PySpark IDFModel missing "idf" property

2016-06-06 Thread Nick Pentreath (JIRA)
Nick Pentreath created SPARK-15788: -- Summary: PySpark IDFModel missing "idf" property Key: SPARK-15788 URL: https://issues.apache.org/jira/browse/SPARK-15788 Project: Spark Issue Type:

[jira] [Created] (SPARK-15790) Audit @Since annotations in ML

2016-06-06 Thread Nick Pentreath (JIRA)
Nick Pentreath created SPARK-15790: -- Summary: Audit @Since annotations in ML Key: SPARK-15790 URL: https://issues.apache.org/jira/browse/SPARK-15790 Project: Spark Issue Type: Documentation

[jira] [Resolved] (SPARK-15788) PySpark IDFModel missing "idf" property

2016-06-09 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-15788. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 13540

[jira] [Updated] (SPARK-15168) Add missing params to Python's MultilayerPerceptronClassifier

2016-06-03 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-15168: --- Assignee: holdenk > Add missing params to Python's MultilayerPerceptronClassifier >

[jira] [Resolved] (SPARK-15168) Add missing params to Python's MultilayerPerceptronClassifier

2016-06-03 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-15168. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 12943

[jira] [Updated] (SPARK-15761) pyspark shell should load if PYSPARK_DRIVER_PYTHON is ipython an Python3

2016-06-03 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-15761: --- Assignee: Manoj Kumar > pyspark shell should load if PYSPARK_DRIVER_PYTHON is ipython an

[jira] [Resolved] (SPARK-15500) Remove defaults in storage level param doc in ALS

2016-05-25 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-15500. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 13277

[jira] [Commented] (SPARK-15790) Audit @Since annotations in ML

2016-06-13 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15327028#comment-15327028 ] Nick Pentreath commented on SPARK-15790: Ah thanks - missed that umbrella. It's actually really

[jira] [Commented] (SPARK-15746) SchemaUtils.checkColumnType with VectorUDT prints instance details in error message

2016-06-13 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15327237#comment-15327237 ] Nick Pentreath commented on SPARK-15746: I think you can go ahead now - I also vote for the

[jira] [Commented] (SPARK-15790) Audit @Since annotations in ML

2016-06-13 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15327193#comment-15327193 ] Nick Pentreath commented on SPARK-15790: Yes, I've just looked at things in the concrete classes

[jira] [Commented] (SPARK-15904) High Memory Pressure using MLlib K-means

2016-06-13 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15327220#comment-15327220 ] Nick Pentreath commented on SPARK-15904: Could you explain why you're using K>3000 when your

[jira] [Commented] (SPARK-15447) Performance test for ALS in Spark 2.0

2016-05-31 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308797#comment-15308797 ] Nick Pentreath commented on SPARK-15447: Created a Google sheet with initial results:

[jira] [Commented] (SPARK-15575) Remove breeze from dependencies?

2016-05-27 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304546#comment-15304546 ] Nick Pentreath commented on SPARK-15575: What specifically are the "performance issues" with

[jira] [Resolved] (SPARK-15492) Binarization scala example copy & paste to spark-shell error

2016-05-26 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-15492. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 13266

[jira] [Updated] (SPARK-15587) ML 2.0 QA: Scala APIs audit for feature

2016-06-01 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-15587: --- Assignee: Yanbo Liang > ML 2.0 QA: Scala APIs audit for feature >

[jira] [Resolved] (SPARK-15587) ML 2.0 QA: Scala APIs audit for feature

2016-06-01 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-15587. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 13410

[jira] [Updated] (SPARK-15162) Update PySpark LogisticRegression threshold PyDoc to be as complete as Scaladoc

2016-06-01 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-15162: --- Assignee: holdenk > Update PySpark LogisticRegression threshold PyDoc to be as complete as

[jira] [Comment Edited] (SPARK-14810) ML, Graph 2.0 QA: API: Binary incompatible changes

2016-06-01 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293316#comment-15293316 ] Nick Pentreath edited comment on SPARK-14810 at 6/1/16 5:56 PM: List of

[jira] [Updated] (SPARK-15668) ml.feature: update check schema to avoid confusion when user use MLlib.vector as input type

2016-06-01 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-15668: --- Assignee: yuhao yang > ml.feature: update check schema to avoid confusion when user use

[jira] [Updated] (SPARK-15164) Mark classification algorithms as experimental where marked so in scala

2016-06-01 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-15164: --- Assignee: holdenk > Mark classification algorithms as experimental where marked so in scala

[jira] [Updated] (SPARK-16063) Add storageLevel to Dataset

2016-06-21 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-16063: --- Description: SPARK-11905 added {{cache}}/{{persist}} to {{Dataset}}. We should add

[jira] [Updated] (SPARK-16063) Add storageLevel to Dataset

2016-06-21 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-16063: --- Summary: Add storageLevel to Dataset (was: Add getStorageLevel to Dataset) > Add

[jira] [Commented] (SPARK-16075) Make VectorUDT/MatrixUDT singleton under spark.ml package

2016-06-21 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341506#comment-15341506 ] Nick Pentreath commented on SPARK-16075: [~wangmiao1981] SPARK-15746 will probably be superceded

[jira] [Updated] (SPARK-16127) Audit @Since annotations related to ml.linalg

2016-06-22 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-16127: --- Description: SPARK-14615 converted {{spark.ml}} to use the new {{Vector}}/{{Matrix}} classes

[jira] [Created] (SPARK-16127) Audit @Since annotations related to ml.linalg

2016-06-22 Thread Nick Pentreath (JIRA)
Nick Pentreath created SPARK-16127: -- Summary: Audit @Since annotations related to ml.linalg Key: SPARK-16127 URL: https://issues.apache.org/jira/browse/SPARK-16127 Project: Spark Issue

[jira] [Assigned] (SPARK-16127) Audit @Since annotations related to ml.linalg

2016-06-22 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath reassigned SPARK-16127: -- Assignee: Nick Pentreath > Audit @Since annotations related to ml.linalg >

[jira] [Comment Edited] (SPARK-14810) ML, Graph 2.0 QA: API: Binary incompatible changes

2016-06-22 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293316#comment-15293316 ] Nick Pentreath edited comment on SPARK-14810 at 6/22/16 9:17 AM: - List of

[jira] [Resolved] (SPARK-15164) Mark classification algorithms as experimental where marked so in scala

2016-06-22 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-15164. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 12938

[jira] [Resolved] (SPARK-15162) Update PySpark LogisticRegression threshold PyDoc to be as complete as Scaladoc

2016-06-22 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-15162. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 12938

[jira] [Comment Edited] (SPARK-14810) ML, Graph 2.0 QA: API: Binary incompatible changes

2016-06-22 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293316#comment-15293316 ] Nick Pentreath edited comment on SPARK-14810 at 6/22/16 9:17 AM: - List of

[jira] [Commented] (SPARK-15447) Performance test for ALS in Spark 2.0

2016-06-16 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1530#comment-1530 ] Nick Pentreath commented on SPARK-15447: Almost there - I'll be able to close this off by Friday

[jira] [Updated] (SPARK-15997) Audit ml.feature Update documentation for ml feature transformers

2016-06-17 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-15997: --- Assignee: Gayathri Murali > Audit ml.feature Update documentation for ml feature

[jira] [Updated] (SPARK-15447) Performance test for ALS in Spark 2.0

2016-06-17 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-15447: --- Description: We made several changes to ALS in 2.0. It is necessary to run some tests to

[jira] [Commented] (SPARK-15501) ML 2.0 QA: Scala APIs audit for recommendation

2016-06-17 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15335961#comment-15335961 ] Nick Pentreath commented on SPARK-15501: It's done - resolved it. > ML 2.0 QA: Scala APIs audit

[jira] [Resolved] (SPARK-15447) Performance test for ALS in Spark 2.0

2016-06-17 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-15447. Resolution: Fixed Fix Version/s: 2.0.0 > Performance test for ALS in Spark 2.0 >

[jira] [Commented] (SPARK-15447) Performance test for ALS in Spark 2.0

2016-06-17 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15335956#comment-15335956 ] Nick Pentreath commented on SPARK-15447: Finalized results in the linked Google sheet. Also

[jira] [Updated] (SPARK-15447) Performance test for ALS in Spark 2.0

2016-06-17 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-15447: --- Description: We made several changes to ALS in 2.0. It is necessary to run some tests to

[jira] [Resolved] (SPARK-15501) ML 2.0 QA: Scala APIs audit for recommendation

2016-06-17 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-15501. Resolution: Fixed Fix Version/s: 2.0.0 > ML 2.0 QA: Scala APIs audit for

[jira] [Updated] (SPARK-16008) ML Logistic Regression aggregator serializes unnecessary data

2016-06-17 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-16008: --- Assignee: Seth Hendrickson > ML Logistic Regression aggregator serializes unnecessary data >

[jira] [Commented] (SPARK-15995) Gradient Boosted Trees - handling of Categorical Inputs

2016-06-17 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15335801#comment-15335801 ] Nick Pentreath commented on SPARK-15995: cc [~sethah] > Gradient Boosted Trees - handling of

[jira] [Assigned] (SPARK-10258) Add @Since annotation to ml.feature

2016-06-21 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath reassigned SPARK-10258: -- Assignee: Nick Pentreath (was: Martin Brown) > Add @Since annotation to ml.feature >

[jira] [Updated] (SPARK-10258) Add @Since annotation to ml.feature

2016-06-21 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-10258: --- Assignee: Martin Brown (was: Nick Pentreath) > Add @Since annotation to ml.feature >

[jira] [Created] (SPARK-16063) Add getStorageLevel to Dataset

2016-06-20 Thread Nick Pentreath (JIRA)
Nick Pentreath created SPARK-16063: -- Summary: Add getStorageLevel to Dataset Key: SPARK-16063 URL: https://issues.apache.org/jira/browse/SPARK-16063 Project: Spark Issue Type: Improvement

[jira] [Resolved] (SPARK-15997) Audit ml.feature Update documentation for ml feature transformers

2016-06-24 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-15997. Resolution: Fixed Fix Version/s: 2.0.1 Issue resolved by pull request 13745

[jira] [Commented] (SPARK-16149) API consistency discussion: CountVectorizer.{minDF -> minDocFreq, minTF -> minTermFreq}

2016-06-24 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15348173#comment-15348173 ] Nick Pentreath commented on SPARK-16149: I'd generally vote for: * if it's a new param / model,

[jira] [Commented] (SPARK-13289) Word2Vec generate infinite distances when numIterations>5

2016-02-22 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156785#comment-15156785 ] Nick Pentreath commented on SPARK-13289: [~daiqi5477] could you try your experiments again

[jira] [Commented] (SPARK-13026) Umbrella: Allow user to specify initial model when training

2016-02-22 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156803#comment-15156803 ] Nick Pentreath commented on SPARK-13026: [~holdenk] is this JIRA necessary, as it duplicates

[jira] [Resolved] (SPARK-12632) Make Parameter Descriptions Consistent for PySpark MLlib FPM and Recommendation

2016-02-22 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-12632. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11186

[jira] [Updated] (SPARK-13334) ML KMeansModel/BisectingKMeansModel should be set parent

2016-02-22 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13334: --- Assignee: Yanbo Liang > ML KMeansModel/BisectingKMeansModel should be set parent >

[jira] [Resolved] (SPARK-13334) ML KMeansModel/BisectingKMeansModel should be set parent

2016-02-22 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-13334. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11214

[jira] [Updated] (SPARK-12379) Copy GBT implementation to spark.ml

2016-02-22 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-12379: --- Assignee: Seth Hendrickson > Copy GBT implementation to spark.ml >

[jira] [Commented] (SPARK-13505) Python API for MaxAbsScaler

2016-02-25 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168529#comment-15168529 ] Nick Pentreath commented on SPARK-13505: [~holdenk] [~bryanc] [~sethah] any interest in adding

[jira] [Commented] (SPARK-13289) Word2Vec generate infinite distances when numIterations>5

2016-02-25 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168619#comment-15168619 ] Nick Pentreath commented on SPARK-13289: Master branch should be building now. Can you try again?

[jira] [Commented] (SPARK-13489) GSoC 2016 project ideas for MLlib

2016-02-25 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167120#comment-15167120 ] Nick Pentreath commented on SPARK-13489: Do we want to focus on work within core, or also

[jira] [Updated] (SPARK-13340) [ML] PolynomialExpansion and Normalizer should validate input type

2016-02-25 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13340: --- Assignee: Grzegorz Chilkiewicz > [ML] PolynomialExpansion and Normalizer should validate

[jira] [Resolved] (SPARK-12348) PySpark _inferSchema crashes with incorrect exception on an empty RDD

2016-02-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-12348. Resolution: Not A Bug > PySpark _inferSchema crashes with incorrect exception on an empty

[jira] [Updated] (SPARK-13568) Create feature transformer to impute missing values

2016-02-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13568: --- Description: It is quite common to encounter missing values in data sets. It would be useful

[jira] [Updated] (SPARK-13568) Create feature transformer to impute missing values

2016-02-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13568: --- Description: It is quite common to encounter missing values in data sets. It would be useful

[jira] [Commented] (SPARK-13517) Expose regression summary classes in Pyspark

2016-02-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171905#comment-15171905 ] Nick Pentreath commented on SPARK-13517: Is this not a duplicate of SPARK-13430? > Expose

[jira] [Resolved] (SPARK-12633) Make Parameter Descriptions Consistent for PySpark MLlib Regression

2016-02-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-12633. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11404

[jira] [Updated] (SPARK-13568) Create feature transformer to impute missing values

2016-02-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13568: --- Priority: Minor (was: Major) > Create feature transformer to impute missing values >

[jira] [Commented] (SPARK-12684) Matrix.toString should take a format for how each cell should be printed

2016-02-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171957#comment-15171957 ] Nick Pentreath commented on SPARK-12684: [~srowen] should this be resolved as *Won't Fix*? >

[jira] [Created] (SPARK-13568) Create feature transformer to impute missing values

2016-02-29 Thread Nick Pentreath (JIRA)
Nick Pentreath created SPARK-13568: -- Summary: Create feature transformer to impute missing values Key: SPARK-13568 URL: https://issues.apache.org/jira/browse/SPARK-13568 Project: Spark

[jira] [Commented] (SPARK-12348) PySpark _inferSchema crashes with incorrect exception on an empty RDD

2016-02-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171964#comment-15171964 ] Nick Pentreath commented on SPARK-12348: I'm not sure this is a bug or even a big deal. The cause

[jira] [Updated] (SPARK-12806) Support SQL expressions extracting values from VectorUDT

2016-02-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-12806: --- Description: Use cases exist where a specific index within a {{VectorUDT}} column of a

[jira] [Comment Edited] (SPARK-12348) PySpark _inferSchema crashes with incorrect exception on an empty RDD

2016-02-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171964#comment-15171964 ] Nick Pentreath edited comment on SPARK-12348 at 2/29/16 3:10 PM: - I'm not

[jira] [Commented] (SPARK-13568) Create feature transformer to impute missing values

2016-02-29 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15172328#comment-15172328 ] Nick Pentreath commented on SPARK-13568: Sure, go ahead. However, taking a quick look at your

[jira] [Commented] (SPARK-13289) Word2Vec generate infinite distances when numIterations>5

2016-02-22 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158446#comment-15158446 ] Nick Pentreath commented on SPARK-13289: Yes the master build is currently failing as detailed in

[jira] [Updated] (SPARK-12247) Documentation for spark.ml's ALS and collaborative filtering in general

2016-01-21 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-12247: --- Assignee: Benjamin Fradet > Documentation for spark.ml's ALS and collaborative filtering in

[jira] [Updated] (SPARK-12247) Documentation for spark.ml's ALS and collaborative filtering in general

2016-01-21 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-12247: --- Affects Version/s: (was: 1.5.2) 2.0.0 > Documentation for

[jira] [Updated] (SPARK-12632) Make Parameter Descriptions Consistent for PySpark MLlib FPM and Recommendation

2016-02-18 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-12632: --- Assignee: Bryan Cutler (was: somil deshmukh) > Make Parameter Descriptions Consistent for

[jira] [Updated] (SPARK-13430) Expose ml summary function in PySpark for classification and regression models

2016-03-09 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13430: --- Assignee: Bryan Cutler > Expose ml summary function in PySpark for classification and

[jira] [Commented] (SPARK-12626) MLlib 2.0 Roadmap

2016-03-09 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15188774#comment-15188774 ] Nick Pentreath commented on SPARK-12626: [~dbtsai] ok thanks - would like to take a look when

[jira] [Resolved] (SPARK-13706) Python Example for Train Validation Split Missing

2016-03-09 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-13706. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11547

[jira] [Updated] (SPARK-13672) Add python examples of BisectingKMeans in ML and MLLIB

2016-03-10 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13672: --- Shepherd: Nick Pentreath Assignee: zhengruifeng > Add python examples of BisectingKMeans

[jira] [Updated] (SPARK-11108) OneHotEncoder should support other numeric input types

2016-03-10 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-11108: --- Shepherd: Nick Pentreath > OneHotEncoder should support other numeric input types >

[jira] [Updated] (SPARK-13629) Add binary toggle Param to CountVectorizer

2016-03-10 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13629: --- Shepherd: Nick Pentreath > Add binary toggle Param to CountVectorizer >

[jira] [Updated] (SPARK-13629) Add binary toggle Param to CountVectorizer

2016-03-10 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13629: --- Assignee: yuhao yang > Add binary toggle Param to CountVectorizer >

[jira] [Updated] (SPARK-13600) Use approxQuantile from DataFrame stats in QuantileDiscretizer

2016-03-10 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-13600: --- Shepherd: Nick Pentreath > Use approxQuantile from DataFrame stats in QuantileDiscretizer >

[jira] [Updated] (SPARK-11108) OneHotEncoder should support other numeric input types

2016-03-10 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-11108: --- Assignee: Seth Hendrickson > OneHotEncoder should support other numeric input types >

[jira] [Resolved] (SPARK-11108) OneHotEncoder should support other numeric input types

2016-03-10 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath resolved SPARK-11108. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 9777

  1   2   3   4   5   6   7   8   9   10   >