[jira] [Updated] (SPARK-23615) Add maxDF Parameter to Python CountVectorizer

2018-03-06 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-23615: - Component/s: ML > Add maxDF Parameter to Python CountVectorizer >

[jira] [Created] (SPARK-23615) Add maxDF Parameter to Python CountVectorizer

2018-03-06 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-23615: Summary: Add maxDF Parameter to Python CountVectorizer Key: SPARK-23615 URL: https://issues.apache.org/jira/browse/SPARK-23615 Project: Spark Issue Type:

[jira] [Commented] (SPARK-21812) PySpark ML Models should not depend transfering params from Java

2018-03-05 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387068#comment-16387068 ] Bryan Cutler commented on SPARK-21812: -- Adding SPARK-15009 as an example of how to restructure the

[jira] [Commented] (SPARK-23555) Add BinaryType support for Arrow in PySpark

2018-03-01 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382715#comment-16382715 ] Bryan Cutler commented on SPARK-23555: -- I'm working on it > Add BinaryType support for Arrow in

[jira] [Created] (SPARK-23555) Add BinaryType support for Arrow in PySpark

2018-03-01 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-23555: Summary: Add BinaryType support for Arrow in PySpark Key: SPARK-23555 URL: https://issues.apache.org/jira/browse/SPARK-23555 Project: Spark Issue Type:

[jira] [Updated] (SPARK-23159) Update Cloudpickle to match version 0.4.3

2018-02-13 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-23159: - Description: Update PySpark's version of Cloudpickle to match version 0.4.3.  The reasons for

[jira] [Updated] (SPARK-23159) Update Cloudpickle to match version 0.4.3

2018-02-13 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-23159: - Summary: Update Cloudpickle to match version 0.4.3 (was: Update Cloudpickle to match version

[jira] [Updated] (SPARK-23360) SparkSession.createDataFrame timestamps can be incorrect with non-Arrow codepath

2018-02-09 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-23360: - Summary: SparkSession.createDataFrame timestamps can be incorrect with non-Arrow codepath (was:

[jira] [Comment Edited] (SPARK-23244) Incorrect handling of default values when deserializing python wrappers of scala transformers

2018-02-08 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357504#comment-16357504 ] Bryan Cutler edited comment on SPARK-23244 at 2/8/18 8:08 PM: -- This is the

[jira] [Commented] (SPARK-23244) Incorrect handling of default values when deserializing python wrappers of scala transformers

2018-02-08 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357504#comment-16357504 ] Bryan Cutler commented on SPARK-23244: -- This is same issue as SPARK-21685 caused by pyspark not

[jira] [Updated] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

2018-02-08 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-21187: - Description: This is to track adding the remaining type support in Arrow Converters. Currently,

[jira] [Updated] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

2018-02-08 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-21187: - Description: This is to track adding the remaining type support in Arrow Converters. Currently,

[jira] [Updated] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

2018-02-08 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-21187: - Description: This is to track adding the remaining type support in Arrow Converters. Currently,

[jira] [Created] (SPARK-23258) Should not split Arrow record batches based on row count

2018-01-29 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-23258: Summary: Should not split Arrow record batches based on row count Key: SPARK-23258 URL: https://issues.apache.org/jira/browse/SPARK-23258 Project: Spark

[jira] [Comment Edited] (SPARK-23109) ML 2.3 QA: API: Python API coverage

2018-01-29 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16332698#comment-16332698 ] Bryan Cutler edited comment on SPARK-23109 at 1/29/18 5:26 PM: --- I did the

[jira] [Comment Edited] (SPARK-23109) ML 2.3 QA: API: Python API coverage

2018-01-29 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16332698#comment-16332698 ] Bryan Cutler edited comment on SPARK-23109 at 1/29/18 5:25 PM: --- I did the

[jira] [Commented] (SPARK-23109) ML 2.3 QA: API: Python API coverage

2018-01-29 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16343665#comment-16343665 ] Bryan Cutler commented on SPARK-23109: -- Thanks [~mlnick], yes this is done. > ML 2.3 QA: API:

[jira] [Resolved] (SPARK-23109) ML 2.3 QA: API: Python API coverage

2018-01-29 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-23109. -- Resolution: Done > ML 2.3 QA: API: Python API coverage > --- >

[jira] [Commented] (SPARK-22711) _pickle.PicklingError: args[0] from __newobj__ args has the wrong class from cloudpickle.py

2018-01-24 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338116#comment-16338116 ] Bryan Cutler commented on SPARK-22711: -- Yes, normally you would not need to import inside the

[jira] [Commented] (SPARK-22711) _pickle.PicklingError: args[0] from __newobj__ args has the wrong class from cloudpickle.py

2018-01-24 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338070#comment-16338070 ] Bryan Cutler commented on SPARK-22711: -- Hi [~PrateekRM], here is your code trimmed down to where the

[jira] [Comment Edited] (SPARK-23109) ML 2.3 QA: API: Python API coverage

2018-01-19 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16332698#comment-16332698 ] Bryan Cutler edited comment on SPARK-23109 at 1/20/18 3:17 AM: --- I did the

[jira] [Commented] (SPARK-23163) Sync Python ML API docs with Scala

2018-01-19 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333154#comment-16333154 ] Bryan Cutler commented on SPARK-23163: -- I'll do this, just a few minor things > Sync Python ML API

[jira] [Created] (SPARK-23163) Sync Python ML API docs with Scala

2018-01-19 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-23163: Summary: Sync Python ML API docs with Scala Key: SPARK-23163 URL: https://issues.apache.org/jira/browse/SPARK-23163 Project: Spark Issue Type: Documentation

[jira] [Comment Edited] (SPARK-23109) ML 2.3 QA: API: Python API coverage

2018-01-19 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16332698#comment-16332698 ] Bryan Cutler edited comment on SPARK-23109 at 1/20/18 3:06 AM: --- I did the

[jira] [Comment Edited] (SPARK-23109) ML 2.3 QA: API: Python API coverage

2018-01-19 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16332698#comment-16332698 ] Bryan Cutler edited comment on SPARK-23109 at 1/20/18 3:05 AM: --- I did the

[jira] [Updated] (SPARK-23161) Add missing APIs to Python GBTClassifier

2018-01-19 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-23161: - Labels: starter (was: ) > Add missing APIs to Python GBTClassifier >

[jira] [Created] (SPARK-23162) PySpark ML LinearRegressionSummary missing r2adj

2018-01-19 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-23162: Summary: PySpark ML LinearRegressionSummary missing r2adj Key: SPARK-23162 URL: https://issues.apache.org/jira/browse/SPARK-23162 Project: Spark Issue Type:

[jira] [Updated] (SPARK-23161) Add missing APIs to Python GBTClassifier

2018-01-19 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-23161: - Priority: Minor (was: Major) > Add missing APIs to Python GBTClassifier >

[jira] [Comment Edited] (SPARK-23109) ML 2.3 QA: API: Python API coverage

2018-01-19 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16332698#comment-16332698 ] Bryan Cutler edited comment on SPARK-23109 at 1/20/18 3:00 AM: --- I did the

[jira] [Created] (SPARK-23161) Add missing APIs to Python GBTClassifier

2018-01-19 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-23161: Summary: Add missing APIs to Python GBTClassifier Key: SPARK-23161 URL: https://issues.apache.org/jira/browse/SPARK-23161 Project: Spark Issue Type:

[jira] [Commented] (SPARK-23159) Update Cloudpickle to match version 0.4.2

2018-01-19 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16332752#comment-16332752 ] Bryan Cutler commented on SPARK-23159: -- I can work on this > Update Cloudpickle to match version

[jira] [Created] (SPARK-23159) Update Cloudpickle to match version 0.4.2

2018-01-19 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-23159: Summary: Update Cloudpickle to match version 0.4.2 Key: SPARK-23159 URL: https://issues.apache.org/jira/browse/SPARK-23159 Project: Spark Issue Type:

[jira] [Commented] (SPARK-23109) ML 2.3 QA: API: Python API coverage

2018-01-19 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16332703#comment-16332703 ] Bryan Cutler commented on SPARK-23109: -- [~josephkb] the image module is missing many of the get*

[jira] [Commented] (SPARK-23109) ML 2.3 QA: API: Python API coverage

2018-01-19 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16332698#comment-16332698 ] Bryan Cutler commented on SPARK-23109: -- I did the following: generated HTML doc and checked for

[jira] [Commented] (SPARK-23109) ML 2.3 QA: API: Python API coverage

2018-01-17 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329481#comment-16329481 ] Bryan Cutler commented on SPARK-23109: -- [~josephkb] I can take this, thanks! > ML 2.3 QA: API:

[jira] [Commented] (SPARK-12717) pyspark broadcast fails when using multiple threads

2018-01-15 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326459#comment-16326459 ] Bryan Cutler commented on SPARK-12717: -- Hi [~codlife], you can use Spark 2.2.1 which was released in

[jira] [Commented] (SPARK-23030) Decrease memory consumption with toPandas() collection using Arrow

2018-01-10 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320942#comment-16320942 ] Bryan Cutler commented on SPARK-23030: -- I'm looking into this, will submit a WIP PR if I see an

[jira] [Created] (SPARK-23030) Decrease memory consumption with toPandas() collection using Arrow

2018-01-10 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-23030: Summary: Decrease memory consumption with toPandas() collection using Arrow Key: SPARK-23030 URL: https://issues.apache.org/jira/browse/SPARK-23030 Project: Spark

[jira] [Updated] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

2018-01-10 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-21187: - Description: This is to track adding the remaining type support in Arrow Converters.

[jira] [Commented] (SPARK-23018) PySpark creatDataFrame causes Pandas warning of assignment to a copy of a reference

2018-01-09 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16319417#comment-16319417 ] Bryan Cutler commented on SPARK-23018: -- I can submit a PR > PySpark creatDataFrame causes Pandas

[jira] [Created] (SPARK-23018) PySpark creatDataFrame causes Pandas warning of assignment to a copy of a reference

2018-01-09 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-23018: Summary: PySpark creatDataFrame causes Pandas warning of assignment to a copy of a reference Key: SPARK-23018 URL: https://issues.apache.org/jira/browse/SPARK-23018

[jira] [Updated] (SPARK-23009) PySpark should not assume Pandas cols are a basestring type

2018-01-09 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-23009: - Description: When calling {{SparkSession.createDataFrame}} using a Pandas DataFrame as input,

[jira] [Commented] (SPARK-23009) PySpark should not assume Pandas cols are a basestring type

2018-01-09 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16318990#comment-16318990 ] Bryan Cutler commented on SPARK-23009: -- I can put in a fix for this > PySpark should not assume

[jira] [Updated] (SPARK-23009) PySpark should not assume Pandas cols are a basestring type

2018-01-09 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-23009: - Description: When calling {{SparkSession.createDataFrame}} using a Pandas DataFrame as input,

[jira] [Created] (SPARK-23009) PySpark should not assume Pandas cols are a basestring type

2018-01-09 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-23009: Summary: PySpark should not assume Pandas cols are a basestring type Key: SPARK-23009 URL: https://issues.apache.org/jira/browse/SPARK-23009 Project: Spark

[jira] [Commented] (SPARK-22126) Fix model-specific optimization support for ML tuning

2018-01-05 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16313833#comment-16313833 ] Bryan Cutler commented on SPARK-22126: -- Hi [~bago.amirbekian], I was looking into similar pipeline

[jira] [Commented] (SPARK-22126) Fix model-specific optimization support for ML tuning

2018-01-02 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308868#comment-16308868 ] Bryan Cutler commented on SPARK-22126: -- Thanks for taking a look [~josephkb]! I believe it's

[jira] [Commented] (SPARK-22126) Fix model-specific optimization support for ML tuning

2017-12-31 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16307145#comment-16307145 ] Bryan Cutler commented on SPARK-22126: -- Hi All, I've been following the discussions here and the

[jira] [Commented] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

2017-12-04 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277549#comment-16277549 ] Bryan Cutler commented on SPARK-21187: -- Hi [~icexelloss], StructType has been added on the Java

[jira] [Commented] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

2017-11-17 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257742#comment-16257742 ] Bryan Cutler commented on SPARK-21187: -- [~icexelloss] It looks like there is a bug in older Arrow

[jira] [Updated] (SPARK-22484) PySpark DataFrame.write.csv(quote="") uses nullchar as quote

2017-11-15 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-22484: - Component/s: PySpark > PySpark DataFrame.write.csv(quote="") uses nullchar as quote >

[jira] [Comment Edited] (SPARK-22534) Add integration test case to explicitly verify optional validity buffer

2017-11-15 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254450#comment-16254450 ] Bryan Cutler edited comment on SPARK-22534 at 11/15/17 11:37 PM: - Opened

[jira] [Resolved] (SPARK-22534) Add integration test case to explicitly verify optional validity buffer

2017-11-15 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-22534. -- Resolution: Not A Problem Opened by mistake > Add integration test case to explicitly verify

[jira] [Closed] (SPARK-22534) Add integration test case to explicitly verify optional validity buffer

2017-11-15 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler closed SPARK-22534. > Add integration test case to explicitly verify optional validity buffer >

[jira] [Created] (SPARK-22534) Add integration test case to explicitly verify optional validity buffer

2017-11-15 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-22534: Summary: Add integration test case to explicitly verify optional validity buffer Key: SPARK-22534 URL: https://issues.apache.org/jira/browse/SPARK-22534 Project:

[jira] [Commented] (SPARK-22530) Add ArrayType Support for working with Pandas and Arrow

2017-11-15 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254074#comment-16254074 ] Bryan Cutler commented on SPARK-22530: -- working on it > Add ArrayType Support for working with

[jira] [Created] (SPARK-22530) Add ArrayType Support for working with Pandas and Arrow

2017-11-15 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-22530: Summary: Add ArrayType Support for working with Pandas and Arrow Key: SPARK-22530 URL: https://issues.apache.org/jira/browse/SPARK-22530 Project: Spark

[jira] [Updated] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

2017-11-15 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-21187: - Description: This is to track adding the remaining type support in Arrow Converters.

[jira] [Commented] (SPARK-22324) Upgrade Arrow to version 0.8.0

2017-11-15 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254054#comment-16254054 ] Bryan Cutler commented on SPARK-22324: -- I started working on this to test out latest changes in

[jira] [Resolved] (SPARK-22209) PySpark does not recognize imports from submodules

2017-11-07 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-22209. -- Resolution: Fixed Fix Version/s: 2.3.0 Resolving this as fixed upstream by SPARK-21753,

[jira] [Updated] (SPARK-22324) Upgrade Arrow to version 0.8.0

2017-11-07 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-22324: - Description: Arrow version 0.8.0 is slated for release in early November, but I'd like to start

[jira] [Commented] (SPARK-22147) BlockId.hashCode allocates a StringBuilder/String on each call

2017-11-03 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16237775#comment-16237775 ] Bryan Cutler commented on SPARK-22147: -- Sorry, I linked the above PR to this JIRA accidentally >

[jira] [Created] (SPARK-22417) createDataFrame from a pandas.DataFrame reads datetime64 values as longs

2017-11-01 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-22417: Summary: createDataFrame from a pandas.DataFrame reads datetime64 values as longs Key: SPARK-22417 URL: https://issues.apache.org/jira/browse/SPARK-22417 Project:

[jira] [Commented] (SPARK-22209) PySpark does not recognize imports from submodules

2017-10-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219322#comment-16219322 ] Bryan Cutler commented on SPARK-22209: -- I tried the example with the latest master and did not get

[jira] [Updated] (SPARK-22324) Upgrade Arrow to version 0.8.0

2017-10-24 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-22324: - Description: Arrow version 0.8.0 is slated for release in early November, but I'd like to start

[jira] [Updated] (SPARK-22324) Upgrade Arrow to version 0.8.0

2017-10-23 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-22324: - Description: Arrow version 0.8.0 is slated for release in early November, but I'd like to start

[jira] [Comment Edited] (SPARK-22323) Design doc for different types of pandas_udf

2017-10-20 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213182#comment-16213182 ] Bryan Cutler edited comment on SPARK-22323 at 10/20/17 8:30 PM: Is this

[jira] [Commented] (SPARK-22323) Design doc for different types of pandas_udf

2017-10-20 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213182#comment-16213182 ] Bryan Cutler commented on SPARK-22323: -- I this meant to be a user doc? > Design doc for different

[jira] [Commented] (SPARK-22323) Design doc for different types of pandas_udf

2017-10-20 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213151#comment-16213151 ] Bryan Cutler commented on SPARK-22323: -- Should I close SPARK-1 since it looks like the docs will

[jira] [Commented] (SPARK-21750) Use arrow 0.6.0

2017-10-20 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213012#comment-16213012 ] Bryan Cutler commented on SPARK-21750: -- Thanks [~dongjoon], I opened SPARK-22324 under the Arrow

[jira] [Created] (SPARK-22324) Upgrade Arrow to version 0.8.0

2017-10-20 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-22324: Summary: Upgrade Arrow to version 0.8.0 Key: SPARK-22324 URL: https://issues.apache.org/jira/browse/SPARK-22324 Project: Spark Issue Type: Sub-task

[jira] [Comment Edited] (SPARK-22209) PySpark does not recognize imports from submodules

2017-10-20 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212890#comment-16212890 ] Bryan Cutler edited comment on SPARK-22209 at 10/20/17 5:01 PM: It does

[jira] [Commented] (SPARK-22209) PySpark does not recognize imports from submodules

2017-10-20 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212890#comment-16212890 ] Bryan Cutler commented on SPARK-22209: -- It does seem like a bug to me so it should be fixed, I

[jira] [Commented] (SPARK-22209) PySpark does not recognize imports from submodules

2017-10-19 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211772#comment-16211772 ] Bryan Cutler commented on SPARK-22209: -- As a workaround, you could probably do the following {code}

[jira] [Commented] (SPARK-22250) Be less restrictive on type checking

2017-10-19 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211750#comment-16211750 ] Bryan Cutler commented on SPARK-22250: -- [~ferdonline] maybe SPARK-20791 would help you out when

[jira] [Created] (SPARK-22221) Add User Documentation for Working with Arrow in Spark

2017-10-06 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-1: Summary: Add User Documentation for Working with Arrow in Spark Key: SPARK-1 URL: https://issues.apache.org/jira/browse/SPARK-1 Project: Spark Issue

[jira] [Comment Edited] (SPARK-22034) CrossValidator's training and testing set with different set of labels, resulting in encoder transform error

2017-09-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179938#comment-16179938 ] Bryan Cutler edited comment on SPARK-22034 at 9/25/17 11:18 PM: You would

[jira] [Commented] (SPARK-22034) CrossValidator's training and testing set with different set of labels, resulting in encoder transform error

2017-09-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179938#comment-16179938 ] Bryan Cutler commented on SPARK-22034: -- You would normally fit the VectorIndexer on the entire

[jira] [Commented] (SPARK-12717) pyspark broadcast fails when using multiple threads

2017-09-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179440#comment-16179440 ] Bryan Cutler commented on SPARK-12717: -- Hi [~avloss], the fix will be in Spark 2.1.2 which will be

[jira] [Commented] (SPARK-19357) Parallel Model Evaluation for ML Tuning: Scala

2017-09-22 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16176773#comment-16176773 ] Bryan Cutler commented on SPARK-19357: -- [~josephkb] I think trying to push down the parallelism to

[jira] [Commented] (SPARK-22106) Remove support for 0-parameter pandas_udfs

2017-09-22 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16176714#comment-16176714 ] Bryan Cutler commented on SPARK-22106: -- I'll submit a PR soon for this > Remove support for

[jira] [Resolved] (SPARK-21404) Simple Vectorized Python UDFs using Arrow

2017-09-22 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-21404. -- Resolution: Fixed This has been merged as SPARK-21190 > Simple Vectorized Python UDFs using

[jira] [Closed] (SPARK-21404) Simple Vectorized Python UDFs using Arrow

2017-09-22 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler closed SPARK-21404. > Simple Vectorized Python UDFs using Arrow > - > >

[jira] [Created] (SPARK-22106) Remove support for 0-parameter pandas_udfs

2017-09-22 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-22106: Summary: Remove support for 0-parameter pandas_udfs Key: SPARK-22106 URL: https://issues.apache.org/jira/browse/SPARK-22106 Project: Spark Issue Type:

[jira] [Updated] (SPARK-22067) ArrowWriter StringWriter not using position of ByteBuffer holding data

2017-09-19 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-22067: - Description: When ArrowWriter is copying a StringType column to ArrowData, then StringWriter

[jira] [Commented] (SPARK-22067) ArrowWriter StringWriter not using position of ByteBuffer holding data

2017-09-19 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172399#comment-16172399 ] Bryan Cutler commented on SPARK-22067: -- I'll submit a PR for this > ArrowWriter StringWriter not

[jira] [Created] (SPARK-22067) ArrowWriter StringWriter not using position of ByteBuffer holding data

2017-09-19 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-22067: Summary: ArrowWriter StringWriter not using position of ByteBuffer holding data Key: SPARK-22067 URL: https://issues.apache.org/jira/browse/SPARK-22067 Project:

[jira] [Updated] (SPARK-19357) Parallel Model Evaluation for ML Tuning: Scala

2017-09-11 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-19357: - Attachment: parallelism-verification-test.pdf Adding a document to show verification testing for

[jira] [Commented] (SPARK-21190) SPIP: Vectorized UDFs in Python

2017-09-11 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162205#comment-16162205 ] Bryan Cutler commented on SPARK-21190: -- Thanks [~icexelloss]. I definitely think collaboration

[jira] [Updated] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

2017-09-06 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-21187: - Description: This is to track adding the remaining type support in Arrow Converters.

[jira] [Commented] (SPARK-21190) SPIP: Vectorized UDFs in Python

2017-09-06 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156082#comment-16156082 ] Bryan Cutler commented on SPARK-21190: -- I attached my PR because it had already been done and pretty

[jira] [Updated] (SPARK-21926) Some transformers in spark.ml.feature fail when trying to transform streaming dataframes

2017-09-06 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-21926: - Summary: Some transformers in spark.ml.feature fail when trying to transform streaming

[jira] [Commented] (SPARK-21190) SPIP: Vectorized UDFs in Python

2017-09-05 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16154361#comment-16154361 ] Bryan Cutler commented on SPARK-21190: -- Thanks [~ueshin], I think having an optional {{kwargs}} at

[jira] [Comment Edited] (SPARK-21190) SPIP: Vectorized UDFs in Python

2017-09-01 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150924#comment-16150924 ] Bryan Cutler edited comment on SPARK-21190 at 9/1/17 5:56 PM: -- I'm good with

[jira] [Commented] (SPARK-21190) SPIP: Vectorized UDFs in Python

2017-09-01 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150924#comment-16150924 ] Bryan Cutler commented on SPARK-21190: -- I'm good with the API summary proposed by [~ueshin], but I'm

[jira] [Closed] (SPARK-21810) Add ML Examples for FeatureHasher

2017-08-22 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler closed SPARK-21810. > Add ML Examples for FeatureHasher > - > > Key:

[jira] [Commented] (SPARK-21469) Add doc and example for FeatureHasher

2017-08-22 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137189#comment-16137189 ] Bryan Cutler commented on SPARK-21469: -- I'm working on this > Add doc and example for FeatureHasher

[jira] [Commented] (SPARK-21810) Add ML Examples for FeatureHasher

2017-08-22 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137188#comment-16137188 ] Bryan Cutler commented on SPARK-21810: -- duplicate of SPARK-21469 > Add ML Examples for

[jira] [Resolved] (SPARK-21810) Add ML Examples for FeatureHasher

2017-08-22 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-21810. -- Resolution: Duplicate > Add ML Examples for FeatureHasher > -

[jira] [Commented] (SPARK-21810) Add ML Examples for FeatureHasher

2017-08-22 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137178#comment-16137178 ] Bryan Cutler commented on SPARK-21810: -- I can add these > Add ML Examples for FeatureHasher >

<    1   2   3   4   5   6   7   8   >