[jira] [Reopened] (SPARK-15702) Update document programming-guide accumulator section

2016-08-05 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reopened SPARK-15702: -- I'm reopening this because I think the current programming guide accumulator section is

[jira] [Updated] (SPARK-16260) ML Example Improvements and Cleanup

2016-08-05 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-16260: - Description: This parent task is to track a few possible improvements and cleanup for PySpark

[jira] [Updated] (SPARK-16260) ML Example Improvements and Cleanup

2016-08-05 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-16260: - Summary: ML Example Improvements and Cleanup (was: PySpark ML Example Improvements and Cleanup)

[jira] [Commented] (SPARK-16832) CrossValidator and TrainValidationSplit are not random without seed

2016-08-04 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15408190#comment-15408190 ] Bryan Cutler commented on SPARK-16832: -- Yeah, I'm not sure of the reason myself, but I agree with

[jira] [Commented] (SPARK-16832) CrossValidator and TrainValidationSplit are not random without seed

2016-08-01 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403067#comment-15403067 ] Bryan Cutler commented on SPARK-16832: -- The default seed value is a constant, this is the trait

[jira] [Updated] (SPARK-16800) Fix Java Examples that throw exception

2016-07-29 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-16800: - Description: Some Java examples fail to run due to an exception thrown when using

[jira] [Updated] (SPARK-16800) Fix Java Examples that throw exception

2016-07-29 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-16800: - Description: Some Java examples fail to run due to an exception thrown when using

[jira] [Created] (SPARK-16800) Fix Java Examples that throw exception

2016-07-29 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-16800: Summary: Fix Java Examples that throw exception Key: SPARK-16800 URL: https://issues.apache.org/jira/browse/SPARK-16800 Project: Spark Issue Type: Sub-task

[jira] [Commented] (SPARK-16765) Add Pipeline API example for KMeans

2016-07-28 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15397994#comment-15397994 ] Bryan Cutler commented on SPARK-16765: -- Was there some specific use of Pipelines with KMeans that

[jira] [Updated] (SPARK-16197) Cleanup PySpark status api and example

2016-07-22 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-16197: - Description: Cleanup of Status API example to use SparkSession and be more consistent with

[jira] [Commented] (SPARK-16421) Improve output from ML examples

2016-07-18 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382579#comment-15382579 ] Bryan Cutler commented on SPARK-16421: -- Yeah, I'm working on it now > Improve output from ML

[jira] [Resolved] (SPARK-14087) PySpark ML JavaModel does not properly own params after being fit

2016-07-11 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-14087. -- Resolution: Resolved Fix Version/s: 2.0.0 This is no longer an issue as the PySpark

[jira] [Updated] (SPARK-16403) Example cleanup and fix minor issues

2016-07-11 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-16403: - Description: General cleanup of examples, focused on PySpark ML, to remove unused imports, sync

[jira] [Commented] (SPARK-15623) 2.0 python coverage ml.feature

2016-07-11 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371229#comment-15371229 ] Bryan Cutler commented on SPARK-15623: -- Hey [~holdenk], think I can close this off now or would you

[jira] [Updated] (SPARK-16421) Improve output from ML examples

2016-07-07 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-16421: - Issue Type: Sub-task (was: Improvement) Parent: SPARK-16260 > Improve output from ML

[jira] [Commented] (SPARK-16421) Improve output from ML examples

2016-07-07 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15366461#comment-15366461 ] Bryan Cutler commented on SPARK-16421: -- I'll be working on this once the blocking issue is resolved,

[jira] [Created] (SPARK-16421) Improve output from ML examples

2016-07-07 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-16421: Summary: Improve output from ML examples Key: SPARK-16421 URL: https://issues.apache.org/jira/browse/SPARK-16421 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-16403) Example cleanup and fix minor issues

2016-07-06 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-16403: - Description: General cleanup of examples, focused on PySpark ML, to remove unused imports, sync

[jira] [Commented] (SPARK-16403) Example cleanup and fix minor issues

2016-07-06 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365174#comment-15365174 ] Bryan Cutler commented on SPARK-16403: -- I'm working on this > Example cleanup and fix minor issues

[jira] [Updated] (SPARK-16403) Example cleanup and fix minor issues

2016-07-06 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-16403: - Priority: Trivial (was: Major) > Example cleanup and fix minor issues >

[jira] [Created] (SPARK-16403) Example cleanup and fix minor issues

2016-07-06 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-16403: Summary: Example cleanup and fix minor issues Key: SPARK-16403 URL: https://issues.apache.org/jira/browse/SPARK-16403 Project: Spark Issue Type: Sub-task

[jira] [Commented] (SPARK-16260) PySpark ML Example Improvements and Cleanup

2016-07-05 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363135#comment-15363135 ] Bryan Cutler commented on SPARK-16260: -- I have a couple tasks I still plan to add here, I will close

[jira] [Commented] (SPARK-15009) PySpark CountVectorizerModel should be able to construct from vocabulary list

2016-07-03 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15360782#comment-15360782 ] Bryan Cutler commented on SPARK-15009: -- At the time I reported this, it was blocked by SPARK-14087

[jira] [Commented] (SPARK-16247) Using pyspark dataframe with pipeline and cross validator

2016-06-30 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15357548#comment-15357548 ] Bryan Cutler commented on SPARK-16247: -- Great, glad that solved the problem! A cross-validation

[jira] [Comment Edited] (SPARK-16247) Using pyspark dataframe with pipeline and cross validator

2016-06-29 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15355456#comment-15355456 ] Bryan Cutler edited comment on SPARK-16247 at 6/29/16 3:53 PM: --- I think you

[jira] [Commented] (SPARK-16247) Using pyspark dataframe with pipeline and cross validator

2016-06-29 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15355456#comment-15355456 ] Bryan Cutler commented on SPARK-16247: -- I think you need to specify the {labelCol} in

[jira] [Commented] (SPARK-16247) Using pyspark dataframe with pipeline and cross validator

2016-06-28 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353555#comment-15353555 ] Bryan Cutler commented on SPARK-16247: -- I'm not sure if this is the issue, but the first parameter

[jira] [Commented] (SPARK-12428) Write a script to run all PySpark MLlib examples for testing

2016-06-28 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353488#comment-15353488 ] Bryan Cutler commented on SPARK-12428: -- Hey Holden, I was thinking about doing this anyway and found

[jira] [Created] (SPARK-16261) Fix Incorrect appNames in PySpark ML Examples

2016-06-28 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-16261: Summary: Fix Incorrect appNames in PySpark ML Examples Key: SPARK-16261 URL: https://issues.apache.org/jira/browse/SPARK-16261 Project: Spark Issue Type:

[jira] [Created] (SPARK-16260) PySpark ML Example Improvements and Cleanup

2016-06-28 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-16260: Summary: PySpark ML Example Improvements and Cleanup Key: SPARK-16260 URL: https://issues.apache.org/jira/browse/SPARK-16260 Project: Spark Issue Type:

[jira] [Created] (SPARK-16231) PySpark ML DataFrame example fails on Vector conversion

2016-06-27 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-16231: Summary: PySpark ML DataFrame example fails on Vector conversion Key: SPARK-16231 URL: https://issues.apache.org/jira/browse/SPARK-16231 Project: Spark

[jira] [Created] (SPARK-16197) Cleanup PySpark status api and example

2016-06-24 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-16197: Summary: Cleanup PySpark status api and example Key: SPARK-16197 URL: https://issues.apache.org/jira/browse/SPARK-16197 Project: Spark Issue Type:

[jira] [Created] (SPARK-16079) PySpark ML classification missing import of DecisionTreeRegressionModel for GBTClassificationModel

2016-06-20 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-16079: Summary: PySpark ML classification missing import of DecisionTreeRegressionModel for GBTClassificationModel Key: SPARK-16079 URL:

[jira] [Commented] (SPARK-15861) pyspark mapPartitions with none generator functions / functors

2016-06-15 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331975#comment-15331975 ] Bryan Cutler commented on SPARK-15861: -- If you change your function to this {noformat} def

[jira] [Updated] (SPARK-15741) PySpark Cleanup of _setDefault with seed=None

2016-06-14 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-15741: - Description: Several places in PySpark ML have Params._setDefault with a seed param equal to

[jira] [Reopened] (SPARK-15741) PySpark Cleanup of _setDefault with seed=None

2016-06-14 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reopened SPARK-15741: -- Reopened as I feel this still should be cleaned up. > PySpark Cleanup of _setDefault with

[jira] [Commented] (SPARK-15861) pyspark mapPartitions with none generator functions / functors

2016-06-13 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15328417#comment-15328417 ] Bryan Cutler commented on SPARK-15861: -- {{mapPartitions}} will expect the function to return a

[jira] [Comment Edited] (SPARK-15861) pyspark mapPartitions with none generator functions / functors

2016-06-13 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15328245#comment-15328245 ] Bryan Cutler edited comment on SPARK-15861 at 6/13/16 9:05 PM: ---

[jira] [Commented] (SPARK-15861) pyspark mapPartitions with none generator functions / functors

2016-06-13 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15328245#comment-15328245 ] Bryan Cutler commented on SPARK-15861: -- [~gbow...@fastmail.co.uk] {{mapPartitions}} expects a

[jira] [Commented] (SPARK-15741) PySpark Cleanup of _setDefault with seed=None

2016-06-02 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15313376#comment-15313376 ] Bryan Cutler commented on SPARK-15741: -- >From what I gathered, explicitly setting a seed to {{None}}

[jira] [Closed] (SPARK-15741) PySpark Cleanup of _setDefault with seed=None

2016-06-02 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler closed SPARK-15741. Resolution: Invalid Looks like I jumped the gun here, None values are not ignored and seems like

[jira] [Updated] (SPARK-15741) PySpark Cleanup of _setDefault with seed=None

2016-06-02 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-15741: - Issue Type: Improvement (was: Sub-task) Parent: (was: SPARK-14771) > PySpark

[jira] [Created] (SPARK-15741) PySpark Cleanup of _setDefault with seed=None

2016-06-02 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-15741: Summary: PySpark Cleanup of _setDefault with seed=None Key: SPARK-15741 URL: https://issues.apache.org/jira/browse/SPARK-15741 Project: Spark Issue Type:

[jira] [Commented] (SPARK-15623) 2.0 python coverage ml.feature

2016-06-02 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15313226#comment-15313226 ] Bryan Cutler commented on SPARK-15623: -- I took another spin through this and linked a couple of

[jira] [Created] (SPARK-15738) PySpark ml.feature RFormula missing string representation displaying formula

2016-06-02 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-15738: Summary: PySpark ml.feature RFormula missing string representation displaying formula Key: SPARK-15738 URL: https://issues.apache.org/jira/browse/SPARK-15738

[jira] [Commented] (SPARK-15009) PySpark CountVectorizerModel should be able to construct from vocabulary list

2016-06-02 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15313030#comment-15313030 ] Bryan Cutler commented on SPARK-15009: -- note - also similar constructor for StringIndexerModel >

[jira] [Commented] (SPARK-12666) spark-shell --packages cannot load artifacts which are publishLocal'd by SBT

2016-05-31 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308950#comment-15308950 ] Bryan Cutler commented on SPARK-12666: -- This seems like it is more of an issue with SBT

[jira] [Comment Edited] (SPARK-15623) 2.0 python coverage ml.feature

2016-05-27 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304898#comment-15304898 ] Bryan Cutler edited comment on SPARK-15623 at 5/27/16 10:11 PM: I was

[jira] [Commented] (SPARK-15623) 2.0 python coverage ml.feature

2016-05-27 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304898#comment-15304898 ] Bryan Cutler commented on SPARK-15623: -- I was only able to quickly go though the user guide and api

[jira] [Updated] (SPARK-15623) 2.0 python coverage ml.feature

2016-05-27 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-15623: - Summary: 2.0 python coverage ml.feature (was: 2.0 python converage ml.feature) > 2.0 python

[jira] [Resolved] (SPARK-15497) DecisionTreeClassificationModel can't be saved within in Pipeline caused by not implement Writable

2016-05-24 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-15497. -- Resolution: Duplicate Fix Version/s: 2.0.0 > DecisionTreeClassificationModel can't be

[jira] [Commented] (SPARK-15497) DecisionTreeClassificationModel can't be saved within in Pipeline caused by not implement Writable

2016-05-24 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298837#comment-15298837 ] Bryan Cutler commented on SPARK-15497: -- This was added in SPARK-11888 and will be in Spark 2.0. >

[jira] [Created] (SPARK-15456) PySpark Shell fails to create SparkContext if HiveConf not found

2016-05-20 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-15456: Summary: PySpark Shell fails to create SparkContext if HiveConf not found Key: SPARK-15456 URL: https://issues.apache.org/jira/browse/SPARK-15456 Project: Spark

[jira] [Commented] (SPARK-15456) PySpark Shell fails to create SparkContext if HiveConf not found

2016-05-20 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15294446#comment-15294446 ] Bryan Cutler commented on SPARK-15456: -- I can submit a fix for this > PySpark Shell fails to create

[jira] [Commented] (SPARK-15448) Flaky test:pyspark.ml.tests.DefaultValuesTests.test_java_params

2016-05-20 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293920#comment-15293920 ] Bryan Cutler commented on SPARK-15448: -- I believe this was recently fixed in SPARK-15444 > Flaky

[jira] [Commented] (SPARK-15391) Spark executor OOM during TimSort

2016-05-19 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291582#comment-15291582 ] Bryan Cutler commented on SPARK-15391: -- this looks to be a duplicate of SPARK-15332 > Spark

[jira] [Commented] (SPARK-15100) Audit: ml.feature

2016-05-18 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289402#comment-15289402 ] Bryan Cutler commented on SPARK-15100: -- sure, I hadn't started on those yet > Audit: ml.feature >

[jira] [Commented] (SPARK-15100) Audit: ml.feature

2016-05-18 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289350#comment-15289350 ] Bryan Cutler commented on SPARK-15100: -- I did a quick pass through Scala and Python APIs, just found

[jira] [Commented] (SPARK-15100) Audit: ml.feature

2016-05-16 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15285366#comment-15285366 ] Bryan Cutler commented on SPARK-15100: -- I can do a PR to update CountVectorizer and HashingTF >

[jira] [Commented] (SPARK-15018) PySpark ML Pipeline fails when no stages set

2016-04-29 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264858#comment-15264858 ] Bryan Cutler commented on SPARK-15018: -- I have a fix for this > PySpark ML Pipeline fails when no

[jira] [Created] (SPARK-15018) PySpark ML Pipeline fails when no stages set

2016-04-29 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-15018: Summary: PySpark ML Pipeline fails when no stages set Key: SPARK-15018 URL: https://issues.apache.org/jira/browse/SPARK-15018 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-15009) PySpark CountVectorizerModel should be able to construct from vocabulary list

2016-04-29 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264515#comment-15264515 ] Bryan Cutler commented on SPARK-15009: -- I'm working on this > PySpark CountVectorizerModel should

[jira] [Created] (SPARK-15009) PySpark CountVectorizerModel should be able to construct from vocabulary list

2016-04-29 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-15009: Summary: PySpark CountVectorizerModel should be able to construct from vocabulary list Key: SPARK-15009 URL: https://issues.apache.org/jira/browse/SPARK-15009

[jira] [Created] (SPARK-14779) Incorrect log message in Worker while handling KillExecutor message

2016-04-20 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-14779: Summary: Incorrect log message in Worker while handling KillExecutor message Key: SPARK-14779 URL: https://issues.apache.org/jira/browse/SPARK-14779 Project: Spark

[jira] [Commented] (SPARK-10086) Flaky StreamingKMeans test in PySpark

2016-04-08 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232888#comment-15232888 ] Bryan Cutler commented on SPARK-10086: -- The changes to the test I proposed earlier are still valid,

[jira] [Commented] (SPARK-14472) Cleanup PySpark-ML Java wrapper classes so that JavaWrapper will inherit from JavaCallable

2016-04-07 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231320#comment-15231320 ] Bryan Cutler commented on SPARK-14472: -- I'm working on it :D > Cleanup PySpark-ML Java wrapper

[jira] [Created] (SPARK-14472) Cleanup PySpark-ML Java wrapper classes so that JavaWrapper will inherit from JavaCallable

2016-04-07 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-14472: Summary: Cleanup PySpark-ML Java wrapper classes so that JavaWrapper will inherit from JavaCallable Key: SPARK-14472 URL: https://issues.apache.org/jira/browse/SPARK-14472

[jira] [Commented] (SPARK-14087) PySpark ML JavaModel does not properly own params after being fit

2016-04-04 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225326#comment-15225326 ] Bryan Cutler commented on SPARK-14087: -- I don't think this would completely solve it, please see my

[jira] [Commented] (SPARK-14087) PySpark ML JavaModel does not properly own params after being fit

2016-03-22 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207555#comment-15207555 ] Bryan Cutler commented on SPARK-14087: -- I can post a PR for this > PySpark ML JavaModel does not

[jira] [Updated] (SPARK-14087) PySpark ML JavaModel does not properly own params after being fit

2016-03-22 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-14087: - Attachment: feature.py > PySpark ML JavaModel does not properly own params after being fit >

[jira] [Created] (SPARK-14087) PySpark ML JavaModel does not properly own params after being fit

2016-03-22 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-14087: Summary: PySpark ML JavaModel does not properly own params after being fit Key: SPARK-14087 URL: https://issues.apache.org/jira/browse/SPARK-14087 Project: Spark

[jira] [Commented] (SPARK-13937) PySpark ML JavaWrapper, variable _java_obj should not be static

2016-03-19 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197805#comment-15197805 ] Bryan Cutler commented on SPARK-13937: -- I'll submit a PR for this > PySpark ML JavaWrapper,

[jira] [Commented] (SPARK-13691) Scala and Python generate inconsistent results

2016-03-19 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197775#comment-15197775 ] Bryan Cutler commented on SPARK-13691: -- Since the problem comes from the structure of the code in

[jira] [Commented] (SPARK-13963) Add binary toggle Param to ml.HashingTF

2016-03-18 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1520#comment-1520 ] Bryan Cutler commented on SPARK-13963: -- Hi [~mlnick], mind if I work on this? > Add binary toggle

[jira] [Created] (SPARK-13937) PySpark ML JavaWrapper, variable _java_obj should not be static

2016-03-18 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-13937: Summary: PySpark ML JavaWrapper, variable _java_obj should not be static Key: SPARK-13937 URL: https://issues.apache.org/jira/browse/SPARK-13937 Project: Spark

[jira] [Commented] (SPARK-13967) Add binary toggle Param to PySpark CountVectorizer

2016-03-18 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201803#comment-15201803 ] Bryan Cutler commented on SPARK-13967: -- Sure, I'd like to do this - thanks! > Add binary toggle

[jira] [Commented] (SPARK-13691) Scala and Python generate inconsistent results

2016-03-07 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183438#comment-15183438 ] Bryan Cutler commented on SPARK-13691: -- The reason for this is that Pyspark serializes the closure

[jira] [Commented] (SPARK-13602) o.a.s.deploy.worker.DriverRunner may leak the driver processes

2016-03-02 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177169#comment-15177169 ] Bryan Cutler commented on SPARK-13602: -- Great! Thanks :D > o.a.s.deploy.worker.DriverRunner may

[jira] [Commented] (SPARK-13602) o.a.s.deploy.worker.DriverRunner may leak the driver processes

2016-03-02 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176787#comment-15176787 ] Bryan Cutler commented on SPARK-13602: -- Hi [~zsxwing], mind if I work on this one? >

[jira] [Updated] (SPARK-13625) PySpark-ML method to get list of params for an obj should not check property attr

2016-03-02 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-13625: - Description: In PySpark params.__init__.py, the method {{Param.params()}} returns a list of

[jira] [Updated] (SPARK-13625) PySpark-ML method to get list of params for an obj should not check property attr

2016-03-02 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-13625: - Description: In PySpark params.__init__.py, the method {{Param.params()}} returns a list of

[jira] [Updated] (SPARK-13625) PySpark-ML method to get list of params for an obj should not check property attr

2016-03-02 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-13625: - Description: In PySpark params.__init__.py, the method {{Param.params()}} returns a list of

[jira] [Commented] (SPARK-13625) PySpark-ML method to get list of params for an obj should not check property attr

2016-03-02 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176655#comment-15176655 ] Bryan Cutler commented on SPARK-13625: -- I have a fix for this, will post PR soon > PySpark-ML

[jira] [Created] (SPARK-13625) PySpark-ML method to get list of params for an obj should not check property attr

2016-03-02 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-13625: Summary: PySpark-ML method to get list of params for an obj should not check property attr Key: SPARK-13625 URL: https://issues.apache.org/jira/browse/SPARK-13625

[jira] [Commented] (SPARK-13430) Expose ml summary function in PySpark for classification and regression models

2016-02-29 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15172775#comment-15172775 ] Bryan Cutler commented on SPARK-13430: -- I can work on adding this > Expose ml summary function in

[jira] [Resolved] (SPARK-11219) Make Parameter Description Format Consistent in PySpark.MLlib

2016-02-29 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-11219. -- Resolution: Done Fix Version/s: 2.0.0 > Make Parameter Description Format Consistent in

[jira] [Resolved] (SPARK-13500) Add an example for LDA in PySpark

2016-02-26 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-13500. -- Resolution: Duplicate this example and others are being added as part of this > Add an

[jira] [Commented] (SPARK-13500) Add an example for LDA in PySpark

2016-02-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168021#comment-15168021 ] Bryan Cutler commented on SPARK-13500: -- I'm working on it :D > Add an example for LDA in PySpark >

[jira] [Created] (SPARK-13500) Add an example for LDA in PySpark

2016-02-25 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-13500: Summary: Add an example for LDA in PySpark Key: SPARK-13500 URL: https://issues.apache.org/jira/browse/SPARK-13500 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-9844) File appender race condition during SparkWorker shutdown

2016-02-17 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150913#comment-15150913 ] Bryan Cutler commented on SPARK-9844: - This error is benign for the most part, once it gets here, the

[jira] [Updated] (SPARK-10086) Flaky StreamingKMeans test in PySpark

2016-02-12 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-10086: - Attachment: flakyRepro.py Simple script with similar operations to this StreamingKMeans test,

[jira] [Comment Edited] (SPARK-10086) Flaky StreamingKMeans test in PySpark

2016-02-12 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15145628#comment-15145628 ] Bryan Cutler edited comment on SPARK-10086 at 2/13/16 12:44 AM: Simple

[jira] [Commented] (SPARK-10086) Flaky StreamingKMeans test in PySpark

2016-02-12 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15145625#comment-15145625 ] Bryan Cutler commented on SPARK-10086: -- I was able to track down the cause of these failures, so

[jira] [Commented] (SPARK-12731) PySpark docstring cleanup

2016-02-01 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126886#comment-15126886 ] Bryan Cutler commented on SPARK-12731: -- Just to add my 2cents since I've been working on a similar

[jira] [Commented] (SPARK-11219) Make Parameter Description Format Consistent in PySpark.MLlib

2016-01-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115756#comment-15115756 ] Bryan Cutler commented on SPARK-11219: -- Regarding overall style in PySpark, I generally see single

[jira] [Commented] (SPARK-12986) Fix pydoc warnings in mllib/regression.py

2016-01-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116153#comment-15116153 ] Bryan Cutler commented on SPARK-12986: -- It looks like this is caused by an indented line not being

[jira] [Commented] (SPARK-12299) Remove history serving functionality from standalone Master

2016-01-18 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105845#comment-15105845 ] Bryan Cutler commented on SPARK-12299: -- I'd be happy to work on this since I recently made some

[jira] [Commented] (SPARK-11219) Make Parameter Description Format Consistent in PySpark.MLlib

2016-01-08 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090292#comment-15090292 ] Bryan Cutler commented on SPARK-11219: -- That's my fault [~josephkb], I was instructing to go up to

[jira] [Commented] (SPARK-12701) Logging FileAppender should use join to ensure thread is finished

2016-01-07 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15088429#comment-15088429 ] Bryan Cutler commented on SPARK-12701: -- I can submit a PR for this. > Logging FileAppender should

[jira] [Created] (SPARK-12701) Logging FileAppender should use join to ensure thread is finished

2016-01-07 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-12701: Summary: Logging FileAppender should use join to ensure thread is finished Key: SPARK-12701 URL: https://issues.apache.org/jira/browse/SPARK-12701 Project: Spark

<    2   3   4   5   6   7   8   >