[jira] [Commented] (SPARK-25272) Show some kind of test output to indicate pyarrow tests were run

2018-08-29 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596859#comment-16596859 ] Bryan Cutler commented on SPARK-25272: -- This is a followup that is possible now that Arrow

[jira] [Created] (SPARK-25274) Improve `toPandas` with Arrow by sending out-of-order record batches

2018-08-29 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-25274: Summary: Improve `toPandas` with Arrow by sending out-of-order record batches Key: SPARK-25274 URL: https://issues.apache.org/jira/browse/SPARK-25274 Project: Spark

[jira] [Commented] (SPARK-25272) Show some kind of test output to indicate pyarrow tests were run

2018-08-29 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596850#comment-16596850 ] Bryan Cutler commented on SPARK-25272: -- yeah that would be great to make it easier to view the

[jira] [Updated] (SPARK-25272) Show some kind of test output to indicate pyarrow tests were run

2018-08-29 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-25272: - Issue Type: Sub-task (was: Improvement) Parent: SPARK-22216 > Show some kind of test

[jira] [Created] (SPARK-25272) Show some kind of test output to indicate pyarrow tests were run

2018-08-29 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-25272: Summary: Show some kind of test output to indicate pyarrow tests were run Key: SPARK-25272 URL: https://issues.apache.org/jira/browse/SPARK-25272 Project: Spark

[jira] [Commented] (SPARK-23874) Upgrade apache/arrow to 0.10.0

2018-08-28 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595835#comment-16595835 ] Bryan Cutler commented on SPARK-23874: -- [~smilegator] I linked the most relevant pyarrow issues

[jira] [Commented] (SPARK-25147) GroupedData.apply pandas_udf crashing

2018-08-22 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589177#comment-16589177 ] Bryan Cutler commented on SPARK-25147: -- Works for me on linux with: Python 3.6.6 pyarrow 0.10.0

[jira] [Commented] (SPARK-23698) Spark code contains numerous undefined names in Python 3

2018-08-22 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589143#comment-16589143 ] Bryan Cutler commented on SPARK-23698: -- Followup resolved by pull request 20838

[jira] [Resolved] (SPARK-25105) Importing all of pyspark.sql.functions should bring PandasUDFType in as well

2018-08-22 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-25105. -- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 22100

[jira] [Assigned] (SPARK-25105) Importing all of pyspark.sql.functions should bring PandasUDFType in as well

2018-08-22 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-25105: Assignee: kevin yu > Importing all of pyspark.sql.functions should bring PandasUDFType

[jira] [Commented] (SPARK-25179) Document the features that require Pyarrow 0.10

2018-08-21 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587882#comment-16587882 ] Bryan Cutler commented on SPARK-25179: -- I can work on this, probably can't get to it right away tho

[jira] [Updated] (SPARK-25179) Document the features that require Pyarrow 0.10

2018-08-21 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-25179: - Description: binary type support requires pyarrow 0.10.0 > Document the features that require

[jira] [Commented] (SPARK-23874) Upgrade apache/arrow to 0.10.0

2018-08-21 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587770#comment-16587770 ] Bryan Cutler commented on SPARK-23874: -- Yes, I still would recommend users upgrade pyarrow if

[jira] [Commented] (SPARK-23874) Upgrade apache/arrow to 0.10.0

2018-08-20 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16586612#comment-16586612 ] Bryan Cutler commented on SPARK-23874: -- For the Python fixes, yes the user would have to upgrade

[jira] [Commented] (SPARK-21375) Add date and timestamp support to ArrowConverters for toPandas() collection

2018-08-20 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16586369#comment-16586369 ] Bryan Cutler commented on SPARK-21375: -- Hi [~ewohlstadter], the timestamp values should be in UTC.

[jira] [Updated] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

2018-08-17 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-21187: - Description: This is to track adding the remaining type support in Arrow Converters.

[jira] [Resolved] (SPARK-23555) Add BinaryType support for Arrow in PySpark

2018-08-17 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-23555. -- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 20725

[jira] [Assigned] (SPARK-23555) Add BinaryType support for Arrow in PySpark

2018-08-17 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-23555: Assignee: Bryan Cutler > Add BinaryType support for Arrow in PySpark >

[jira] [Resolved] (SPARK-23874) Upgrade apache/arrow to 0.10.0

2018-08-14 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-23874. -- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 21939

[jira] [Updated] (SPARK-23874) Upgrade apache/arrow to 0.10.0

2018-08-13 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-23874: - Description: Version 0.10.0 will allow for the following improvements and bug fixes: * Allow

[jira] [Updated] (SPARK-23874) Upgrade apache/arrow to 0.10.0

2018-08-13 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-23874: - Description: Version 0.10.0 will allow for the following improvements and bug fixes: * Allow

[jira] [Comment Edited] (SPARK-25060) PySpark UDF in case statement is always run

2018-08-08 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573866#comment-16573866 ] Bryan Cutler edited comment on SPARK-25060 at 8/8/18 9:03 PM: -- I believe

[jira] [Commented] (SPARK-25060) PySpark UDF in case statement is always run

2018-08-08 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573866#comment-16573866 ] Bryan Cutler commented on SPARK-25060: -- I believe this was brought up here

[jira] [Resolved] (SPARK-24976) Allow None for Decimal type conversion (specific to PyArrow 0.9.0)

2018-07-31 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-24976. -- Resolution: Fixed Fix Version/s: 2.3.2 2.4.0 Issue resolved by pull

[jira] [Assigned] (SPARK-24976) Allow None for Decimal type conversion (specific to PyArrow 0.9.0)

2018-07-31 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-24976: Assignee: Hyukjin Kwon > Allow None for Decimal type conversion (specific to PyArrow

[jira] [Commented] (SPARK-24915) Calling SparkSession.createDataFrame with schema can throw exception

2018-07-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16556126#comment-16556126 ] Bryan Cutler commented on SPARK-24915: -- Hi [~stspencer], I've been trying fix similar issues, but

[jira] [Commented] (SPARK-24644) Pyarrow exception while running pandas_udf on pyspark 2.3.1

2018-07-16 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16545589#comment-16545589 ] Bryan Cutler commented on SPARK-24644: -- [~helkhalfi], the error in the stack trace is coming from

[jira] [Commented] (SPARK-23874) Upgrade apache/arrow to 0.10.0

2018-07-16 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16545519#comment-16545519 ] Bryan Cutler commented on SPARK-23874: -- [~smilegator], we are aiming to have the Arrow 0.10.0

[jira] [Commented] (SPARK-24632) Allow 3rd-party libraries to use pyspark.ml abstractions for Java wrappers for persistence

2018-07-15 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544703#comment-16544703 ] Bryan Cutler commented on SPARK-24632: -- Hi [~josephkb], would you mind clarifying why there needs

[jira] [Commented] (SPARK-24760) Pandas UDF does not handle NaN correctly

2018-07-12 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541967#comment-16541967 ] Bryan Cutler commented on SPARK-24760: -- Yeah, createDataFrame is inconsistent with pandas_udf here,

[jira] [Comment Edited] (SPARK-24760) Pandas UDF does not handle NaN correctly

2018-07-10 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538924#comment-16538924 ] Bryan Cutler edited comment on SPARK-24760 at 7/10/18 4:48 PM: --- Pandas

[jira] [Commented] (SPARK-24760) Pandas UDF does not handle NaN correctly

2018-07-10 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538924#comment-16538924 ] Bryan Cutler commented on SPARK-24760: -- Pandas uses NaNs as a special value that it interprets as a

[jira] [Resolved] (SPARK-24760) Pandas UDF does not handle NaN correctly

2018-07-09 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-24760. -- Resolution: Not A Problem > Pandas UDF does not handle NaN correctly >

[jira] [Commented] (SPARK-24760) Pandas UDF does not handle NaN correctly

2018-07-09 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16537893#comment-16537893 ] Bryan Cutler commented on SPARK-24760: -- Pandas interprets NaN to be missing data for numeric values

[jira] [Created] (SPARK-24735) Improve exception when mixing pandas_udf types

2018-07-03 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-24735: Summary: Improve exception when mixing pandas_udf types Key: SPARK-24735 URL: https://issues.apache.org/jira/browse/SPARK-24735 Project: Spark Issue Type:

[jira] [Resolved] (SPARK-24439) Add distanceMeasure to BisectingKMeans in PySpark

2018-06-28 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-24439. -- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 21557

[jira] [Assigned] (SPARK-24439) Add distanceMeasure to BisectingKMeans in PySpark

2018-06-28 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-24439: Assignee: Huaxin Gao > Add distanceMeasure to BisectingKMeans in PySpark >

[jira] [Commented] (SPARK-23858) Need to apply pyarrow adjustments to complex types with DateType/TimestampType

2018-06-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16522916#comment-16522916 ] Bryan Cutler commented on SPARK-23858: -- [~semanticbeeng] sorry, there aren't failing tests I can

[jira] [Commented] (SPARK-24579) SPIP: Standardize Optimized Data Exchange between Spark and DL/AI frameworks

2018-06-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16522869#comment-16522869 ] Bryan Cutler commented on SPARK-24579: -- I left some comments on the shared doc, overall sounds

[jira] [Updated] (SPARK-23874) Upgrade apache/arrow to 0.10.0

2018-06-13 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-23874: - Description: Version 0.10.0 will allow for the following improvements and bug fixes: * Allow

[jira] [Commented] (SPARK-24554) Add MapType Support for Arrow in PySpark

2018-06-13 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511691#comment-16511691 ] Bryan Cutler commented on SPARK-24554: -- There still is work to be done to add a Map logical type to

[jira] [Created] (SPARK-24554) Add MapType Support for Arrow in PySpark

2018-06-13 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-24554: Summary: Add MapType Support for Arrow in PySpark Key: SPARK-24554 URL: https://issues.apache.org/jira/browse/SPARK-24554 Project: Spark Issue Type:

[jira] [Updated] (SPARK-24444) Improve pandas_udf GROUPED_MAP docs to explain column assignment

2018-05-31 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-2: - Target Version/s: 2.3.1, 2.4.0 (was: 2.3.1) > Improve pandas_udf GROUPED_MAP docs to explain

[jira] [Commented] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

2018-05-31 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16497244#comment-16497244 ] Bryan Cutler commented on SPARK-21187: -- Hi [~teddy.choi], MapType still needs some work to be done

[jira] [Created] (SPARK-24444) Improve pandas_udf GROUPED_MAP docs to explain column assignment

2018-05-31 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-2: Summary: Improve pandas_udf GROUPED_MAP docs to explain column assignment Key: SPARK-2 URL: https://issues.apache.org/jira/browse/SPARK-2 Project: Spark

[jira] [Updated] (SPARK-23874) Upgrade apache/arrow to 0.10.0

2018-05-31 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-23874: - Description: Version 0.10.0 will allow for the following improvements and bug fixes: * Allow

[jira] [Resolved] (SPARK-23161) Add missing APIs to Python GBTClassifier

2018-05-30 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-23161. -- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 21413

[jira] [Assigned] (SPARK-23161) Add missing APIs to Python GBTClassifier

2018-05-30 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-23161: Assignee: Huaxin Gao > Add missing APIs to Python GBTClassifier >

[jira] [Updated] (SPARK-24392) Mark pandas_udf as Experimental

2018-05-29 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-24392: - Fix Version/s: 2.4.0 > Mark pandas_udf as Experimental > --- > >

[jira] [Comment Edited] (SPARK-24392) Mark pandas_udf as Experimental

2018-05-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491305#comment-16491305 ] Bryan Cutler edited comment on SPARK-24392 at 5/25/18 9:53 PM: --- Targeting

[jira] [Commented] (SPARK-24392) Mark pandas_udf as Experimental

2018-05-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491305#comment-16491305 ] Bryan Cutler commented on SPARK-24392: -- Targeting 2.3.1 > Mark pandas_udf as Experimental >

[jira] [Updated] (SPARK-24392) Mark pandas_udf as Experimental

2018-05-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-24392: - Fix Version/s: 2.3.1 > Mark pandas_udf as Experimental > --- > >

[jira] [Updated] (SPARK-24392) Mark pandas_udf as Experimental

2018-05-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-24392: - Priority: Blocker (was: Critical) > Mark pandas_udf as Experimental >

[jira] [Created] (SPARK-24392) Mark pandas_udf as Experimental

2018-05-25 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-24392: Summary: Mark pandas_udf as Experimental Key: SPARK-24392 URL: https://issues.apache.org/jira/browse/SPARK-24392 Project: Spark Issue Type: Task

[jira] [Updated] (SPARK-24324) Pandas Grouped Map UserDefinedFunction mixes column labels

2018-05-24 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-24324: - Summary: Pandas Grouped Map UserDefinedFunction mixes column labels (was: UserDefinedFunction

[jira] [Commented] (SPARK-24324) UserDefinedFunction mixes column labels

2018-05-24 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489533#comment-16489533 ] Bryan Cutler commented on SPARK-24324: -- I was able to reproduce, the problem is that when pyspark

[jira] [Created] (SPARK-24319) run-example can not print usage

2018-05-18 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-24319: Summary: run-example can not print usage Key: SPARK-24319 URL: https://issues.apache.org/jira/browse/SPARK-24319 Project: Spark Issue Type: Bug

[jira] [Resolved] (SPARK-24303) Update cloudpickle to v0.4.4

2018-05-18 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-24303. -- Resolution: Fixed Fix Version/s: 2.4.0 > Update cloudpickle to v0.4.4 >

[jira] [Commented] (SPARK-24303) Update cloudpickle to v0.4.4

2018-05-18 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16480912#comment-16480912 ] Bryan Cutler commented on SPARK-24303: -- Issue resolved by pull request 21350

[jira] [Assigned] (SPARK-24303) Update cloudpickle to v0.4.4

2018-05-18 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-24303: Assignee: Hyukjin Kwon > Update cloudpickle to v0.4.4 > > >

[jira] [Commented] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

2018-05-14 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16474664#comment-16474664 ] Bryan Cutler commented on SPARK-21187: -- Hi [~ewohlstadter], thanks for the interest!  The Map type

[jira] [Updated] (SPARK-23874) Upgrade apache/arrow to 0.10.0

2018-05-14 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-23874: - Description: Version 0.10.0 will allow for the following improvements and bug fixes: * Allow

[jira] [Commented] (SPARK-22232) Row objects in pyspark created using the `Row(**kwars)` syntax do not get serialized/deserialized properly

2018-05-11 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472714#comment-16472714 ] Bryan Cutler commented on SPARK-22232: -- I'm closing the PR for now, will reopen for Spark 3.0.0.

[jira] [Updated] (SPARK-23161) Add missing APIs to Python GBTClassifier

2018-05-07 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-23161: - Description: GBTClassifier is missing \{{featureSubsetStrategy}}.  This should be moved to

[jira] [Resolved] (SPARK-24044) Explicitly print out skipped tests from unittest module

2018-04-26 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-24044. -- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 21107

[jira] [Assigned] (SPARK-24044) Explicitly print out skipped tests from unittest module

2018-04-26 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-24044: Assignee: Hyukjin Kwon > Explicitly print out skipped tests from unittest module >

[jira] [Resolved] (SPARK-24057) put the real data type in the AssertionError message

2018-04-26 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-24057. -- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 21159

[jira] [Assigned] (SPARK-24057) put the real data type in the AssertionError message

2018-04-26 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-24057: Assignee: Huaxin Gao > put the real data type in the AssertionError message >

[jira] [Commented] (SPARK-23874) Upgrade apache/arrow to 0.10.0

2018-04-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16452666#comment-16452666 ] Bryan Cutler commented on SPARK-23874: -- [~smilegator] the Arrow community decided to put their

[jira] [Resolved] (SPARK-17508) Setting weightCol to None in ML library causes an error

2018-04-18 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-17508. -- Resolution: Won't Fix Resolving this for now unless there is more interest in the fix >

[jira] [Commented] (SPARK-23030) Decrease memory consumption with toPandas() collection using Arrow

2018-04-16 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16439727#comment-16439727 ] Bryan Cutler commented on SPARK-23030: -- Hi [~icexelloss], I have something working, just need to

[jira] [Commented] (SPARK-23929) pandas_udf schema mapped by position and not by name

2018-04-09 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431014#comment-16431014 ] Bryan Cutler commented on SPARK-23929: -- cc [~icexelloss] > pandas_udf schema mapped by position and

[jira] [Commented] (SPARK-23883) Error with conversion to arrow while using pandas_udf

2018-04-06 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429123#comment-16429123 ] Bryan Cutler commented on SPARK-23883: -- I think the problem might be that since the {{pandas_udf}}

[jira] [Updated] (SPARK-23871) add python api for VectorAssembler handleInvalid

2018-04-06 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-23871: - Component/s: PySpark > add python api for VectorAssembler handleInvalid >

[jira] [Assigned] (SPARK-23828) PySpark StringIndexerModel should have constructor from labels

2018-04-06 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-23828: Assignee: Huaxin Gao > PySpark StringIndexerModel should have constructor from labels >

[jira] [Resolved] (SPARK-23828) PySpark StringIndexerModel should have constructor from labels

2018-04-06 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-23828. -- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 20968

[jira] [Commented] (SPARK-23874) Upgrade apache/arrow to 0.9.0

2018-04-05 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427712#comment-16427712 ] Bryan Cutler commented on SPARK-23874: -- I can work on this.  I wasn't able to recreate the linked

[jira] [Commented] (SPARK-23836) Support returning StructType to the level support in GroupedMap Arrow's "scalar" UDFS (or similar)

2018-04-03 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424567#comment-16424567 ] Bryan Cutler commented on SPARK-23836: -- So does this mean returning a pandas.DataFrame in a scalar

[jira] [Updated] (SPARK-23858) Need to apply pyarrow adjustments to complex types with DateType/TimestampType

2018-04-03 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-23858: - Summary: Need to apply pyarrow adjustments to complex types with DateType/TimestampType (was:

[jira] [Updated] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

2018-04-03 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-21187: - Description: This is to track adding the remaining type support in Arrow Converters. Currently,

[jira] [Created] (SPARK-23858) Need to apply adjustments to complex types with DateType/TimestampType

2018-04-03 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-23858: Summary: Need to apply adjustments to complex types with DateType/TimestampType Key: SPARK-23858 URL: https://issues.apache.org/jira/browse/SPARK-23858 Project:

[jira] [Commented] (SPARK-23828) PySpark StringIndexerModel should have constructor from labels

2018-04-02 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422729#comment-16422729 ] Bryan Cutler commented on SPARK-23828: -- No I'm not working on it, please go ahead [~huaxingao],

[jira] [Created] (SPARK-23828) PySpark StringIndexerModel should have constructor from labels

2018-03-29 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-23828: Summary: PySpark StringIndexerModel should have constructor from labels Key: SPARK-23828 URL: https://issues.apache.org/jira/browse/SPARK-23828 Project: Spark

[jira] [Resolved] (SPARK-22711) _pickle.PicklingError: args[0] from __newobj__ args has the wrong class from cloudpickle.py

2018-03-29 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-22711. -- Resolution: Workaround Closing this because it seems wordnet is not serializable with

[jira] [Updated] (SPARK-23704) PySpark access of individual trees in random forest is slow

2018-03-29 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-23704: - Component/s: PySpark > PySpark access of individual trees in random forest is slow >

[jira] [Resolved] (SPARK-23699) PySpark should raise same Error when Arrow fallback is disabled

2018-03-27 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-23699. -- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 20839

[jira] [Assigned] (SPARK-23699) PySpark should raise same Error when Arrow fallback is disabled

2018-03-27 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-23699: Assignee: Bryan Cutler > PySpark should raise same Error when Arrow fallback is disabled

[jira] [Resolved] (SPARK-23162) PySpark ML LinearRegressionSummary missing r2adj

2018-03-26 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-23162. -- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 20842

[jira] [Assigned] (SPARK-23162) PySpark ML LinearRegressionSummary missing r2adj

2018-03-26 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-23162: Assignee: kevin yu > PySpark ML LinearRegressionSummary missing r2adj >

[jira] [Resolved] (SPARK-23615) Add maxDF Parameter to Python CountVectorizer

2018-03-23 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-23615. -- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 20777

[jira] [Assigned] (SPARK-23615) Add maxDF Parameter to Python CountVectorizer

2018-03-23 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-23615: Assignee: Huaxin Gao > Add maxDF Parameter to Python CountVectorizer >

[jira] [Resolved] (SPARK-23234) ML python test failure due to default outputCol

2018-03-21 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-23234. -- Resolution: Duplicate > ML python test failure due to default outputCol >

[jira] [Commented] (SPARK-23244) Incorrect handling of default values when deserializing python wrappers of scala transformers

2018-03-21 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407510#comment-16407510 ] Bryan Cutler commented on SPARK-23244: -- Just to clarify, the PySpark save/load is just a wrapper

[jira] [Resolved] (SPARK-23244) Incorrect handling of default values when deserializing python wrappers of scala transformers

2018-03-21 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-23244. -- Resolution: Duplicate > Incorrect handling of default values when deserializing python

[jira] [Commented] (SPARK-23244) Incorrect handling of default values when deserializing python wrappers of scala transformers

2018-03-21 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407507#comment-16407507 ] Bryan Cutler commented on SPARK-23244: -- I looked into this and it is a little bit different because

[jira] [Commented] (SPARK-23691) Use sql_conf util in PySpark tests where possible

2018-03-19 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405806#comment-16405806 ] Bryan Cutler commented on SPARK-23691: -- Thanks [~hyukjin.kwon]! > Use sql_conf util in PySpark

[jira] [Resolved] (SPARK-23691) Use sql_conf util in PySpark tests where possible

2018-03-19 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-23691. -- Resolution: Fixed Assignee: Hyukjin Kwon Fix Version/s: 2.4.0 Issue resolved

[jira] [Created] (SPARK-23700) Cleanup unused imports

2018-03-15 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-23700: Summary: Cleanup unused imports Key: SPARK-23700 URL: https://issues.apache.org/jira/browse/SPARK-23700 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-23699) PySpark should raise same Error when Arrow fallback is disabled

2018-03-15 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-23699: Summary: PySpark should raise same Error when Arrow fallback is disabled Key: SPARK-23699 URL: https://issues.apache.org/jira/browse/SPARK-23699 Project: Spark

[jira] [Commented] (SPARK-23615) Add maxDF Parameter to Python CountVectorizer

2018-03-08 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391577#comment-16391577 ] Bryan Cutler commented on SPARK-23615: -- Sure, go ahead > Add maxDF Parameter to Python

<    1   2   3   4   5   6   7   8   >