[jira] [Updated] (SPARK-29376) Upgrade Apache Arrow to 0.15.1

2019-11-08 Thread Bryan Cutler (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-29376: - Summary: Upgrade Apache Arrow to 0.15.1 (was: Upgrade Apache Arrow to 0.15.0) > Upgrade

[jira] [Commented] (SPARK-29803) remove all instances of 'from __future__ import print_function'

2019-11-08 Thread Bryan Cutler (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970522#comment-16970522 ] Bryan Cutler commented on SPARK-29803: -- This should be done once Python 2 support is dropped

[jira] [Commented] (SPARK-28502) Error with struct conversion while using pandas_udf

2019-11-06 Thread Bryan Cutler (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968744#comment-16968744 ] Bryan Cutler commented on SPARK-28502: -- Ahh, so Arrow 0.15.0+ had a change in the IPC format that

[jira] [Commented] (SPARK-28502) Error with struct conversion while using pandas_udf

2019-11-05 Thread Bryan Cutler (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967973#comment-16967973 ] Bryan Cutler commented on SPARK-28502: -- That's strange, I added your example as a unit test in

[jira] [Commented] (SPARK-29691) Estimator fit method fails to copy params (in PySpark)

2019-11-05 Thread Bryan Cutler (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967748#comment-16967748 ] Bryan Cutler commented on SPARK-29691: -- [~JohnHBauer] I'm not sure we should extend the API to

[jira] [Updated] (SPARK-29748) Remove sorting of fields in PySpark SQL Row creation

2019-11-04 Thread Bryan Cutler (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-29748: - Description: Currently, when a PySpark Row is created with keyword arguments, the fields are

[jira] [Created] (SPARK-29748) Remove sorting of fields in PySpark SQL Row creation

2019-11-04 Thread Bryan Cutler (Jira)
Bryan Cutler created SPARK-29748: Summary: Remove sorting of fields in PySpark SQL Row creation Key: SPARK-29748 URL: https://issues.apache.org/jira/browse/SPARK-29748 Project: Spark Issue

[jira] [Resolved] (SPARK-29414) HasOutputCol param isSet() property is not preserved after persistence

2019-10-25 Thread Bryan Cutler (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-29414. -- Fix Version/s: 2.4.4 Resolution: Fixed Thanks [~borys.biletskyy], I'll mark this as

[jira] [Resolved] (SPARK-29464) PySpark ML should expose Params.clear() to unset a user supplied Param

2019-10-17 Thread Bryan Cutler (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-29464. -- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26130

[jira] [Assigned] (SPARK-29464) PySpark ML should expose Params.clear() to unset a user supplied Param

2019-10-17 Thread Bryan Cutler (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-29464: Assignee: Huaxin Gao > PySpark ML should expose Params.clear() to unset a user supplied

[jira] [Reopened] (SPARK-24554) Add MapType Support for Arrow in PySpark

2019-10-16 Thread Bryan Cutler (Jira)
[ https://issues.apache.org/jira/browse/SPARK-24554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reopened SPARK-24554: -- Reopening this to be completed in 2 steps, first Java after Arrow 0.15.0 and then pyspark when

[jira] [Created] (SPARK-29493) Add MapType support for Arrow Java

2019-10-16 Thread Bryan Cutler (Jira)
Bryan Cutler created SPARK-29493: Summary: Add MapType support for Arrow Java Key: SPARK-29493 URL: https://issues.apache.org/jira/browse/SPARK-29493 Project: Spark Issue Type: Sub-task

[jira] [Updated] (SPARK-29464) PySpark ML should expose Params.clear() to unset a user supplied Param

2019-10-14 Thread Bryan Cutler (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-29464: - Description: PySpark ML currently has a private {{_clear()}} method that will unset a param.

[jira] [Created] (SPARK-29464) PySpark ML should expose Params.clear() to unset a user supplied Param

2019-10-14 Thread Bryan Cutler (Jira)
Bryan Cutler created SPARK-29464: Summary: PySpark ML should expose Params.clear() to unset a user supplied Param Key: SPARK-29464 URL: https://issues.apache.org/jira/browse/SPARK-29464 Project:

[jira] [Resolved] (SPARK-29428) Can't persist/set None-valued param

2019-10-14 Thread Bryan Cutler (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-29428. -- Resolution: Not A Problem > Can't persist/set None-valued param >

[jira] [Commented] (SPARK-29428) Can't persist/set None-valued param

2019-10-14 Thread Bryan Cutler (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951212#comment-16951212 ] Bryan Cutler commented on SPARK-29428: -- The usage of {{None}} in pyspark ml is a bit confusing in

[jira] [Assigned] (SPARK-29402) Add tests for grouped map pandas_udf using window

2019-10-11 Thread Bryan Cutler (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-29402: Assignee: Bryan Cutler > Add tests for grouped map pandas_udf using window >

[jira] [Resolved] (SPARK-29402) Add tests for grouped map pandas_udf using window

2019-10-11 Thread Bryan Cutler (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-29402. -- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26063

[jira] [Commented] (SPARK-28502) Error with struct conversion while using pandas_udf

2019-10-10 Thread Bryan Cutler (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16948981#comment-16948981 ] Bryan Cutler commented on SPARK-28502: -- Thanks for testing it out [~nasirali]! It's unlikely that

[jira] [Resolved] (SPARK-28502) Error with struct conversion while using pandas_udf

2019-10-08 Thread Bryan Cutler (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-28502. -- Fix Version/s: 3.0.0 Resolution: Fixed This was fixed once support for StructType was

[jira] [Commented] (SPARK-28502) Error with struct conversion while using pandas_udf

2019-10-08 Thread Bryan Cutler (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16947272#comment-16947272 ] Bryan Cutler commented on SPARK-28502: -- I'm closing this since it is working in master and will

[jira] [Commented] (SPARK-29402) Add tests for grouped map pandas_udf using window

2019-10-08 Thread Bryan Cutler (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16947266#comment-16947266 ] Bryan Cutler commented on SPARK-29402: -- This is related to SPARK-28502 that using grouped map

[jira] [Created] (SPARK-29402) Add tests for grouped map pandas_udf using window

2019-10-08 Thread Bryan Cutler (Jira)
Bryan Cutler created SPARK-29402: Summary: Add tests for grouped map pandas_udf using window Key: SPARK-29402 URL: https://issues.apache.org/jira/browse/SPARK-29402 Project: Spark Issue

[jira] [Created] (SPARK-29376) Upgrade Apache Arrow to 0.15.0

2019-10-07 Thread Bryan Cutler (Jira)
Bryan Cutler created SPARK-29376: Summary: Upgrade Apache Arrow to 0.15.0 Key: SPARK-29376 URL: https://issues.apache.org/jira/browse/SPARK-29376 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-29367) pandas udf not working with latest pyarrow release (0.15.0)

2019-10-07 Thread Bryan Cutler (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-29367: - Issue Type: Documentation (was: Bug) > pandas udf not working with latest pyarrow release

[jira] [Commented] (SPARK-29367) pandas udf not working with latest pyarrow release (0.15.0)

2019-10-07 Thread Bryan Cutler (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946093#comment-16946093 ] Bryan Cutler commented on SPARK-29367: -- There was a change in the Arrow IPC format, but you can

[jira] [Assigned] (SPARK-29367) pandas udf not working with latest pyarrow release (0.15.0)

2019-10-07 Thread Bryan Cutler (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-29367: Assignee: Bryan Cutler > pandas udf not working with latest pyarrow release (0.15.0) >

[jira] [Commented] (SPARK-28502) Error with struct conversion while using pandas_udf

2019-09-24 Thread Bryan Cutler (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937233#comment-16937233 ] Bryan Cutler commented on SPARK-28502: -- I was able to reproduce in Spark 2.4.3. The problem was

[jira] [Created] (SPARK-29126) Add usage guide for cogroup Pandas UDF

2019-09-17 Thread Bryan Cutler (Jira)
Bryan Cutler created SPARK-29126: Summary: Add usage guide for cogroup Pandas UDF Key: SPARK-29126 URL: https://issues.apache.org/jira/browse/SPARK-29126 Project: Spark Issue Type:

[jira] [Resolved] (SPARK-27463) Support Dataframe Cogroup via Pandas UDFs

2019-09-17 Thread Bryan Cutler (Jira)
[ https://issues.apache.org/jira/browse/SPARK-27463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-27463. -- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 24981

[jira] [Assigned] (SPARK-27463) Support Dataframe Cogroup via Pandas UDFs

2019-09-17 Thread Bryan Cutler (Jira)
[ https://issues.apache.org/jira/browse/SPARK-27463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-27463: Assignee: Chris Martin > Support Dataframe Cogroup via Pandas UDFs >

[jira] [Created] (SPARK-29040) Support pyspark.createDataFrame from a pyarrow.Table

2019-09-10 Thread Bryan Cutler (Jira)
Bryan Cutler created SPARK-29040: Summary: Support pyspark.createDataFrame from a pyarrow.Table Key: SPARK-29040 URL: https://issues.apache.org/jira/browse/SPARK-29040 Project: Spark Issue

[jira] [Resolved] (SPARK-28858) add tree-based transformation in the py side

2019-08-23 Thread Bryan Cutler (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-28858. -- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 25566

[jira] [Assigned] (SPARK-28858) add tree-based transformation in the py side

2019-08-23 Thread Bryan Cutler (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-28858: Assignee: zhengruifeng > add tree-based transformation in the py side >

[jira] [Resolved] (SPARK-28482) Data incomplete when using pandas udf in Python 3

2019-08-23 Thread Bryan Cutler (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-28482. -- Resolution: Not A Problem No problem [~jiangyu1211] ! I will resolve this then. In general, I

[jira] [Commented] (SPARK-28482) Data incomplete when using pandas udf in Python 3

2019-08-22 Thread Bryan Cutler (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16913611#comment-16913611 ] Bryan Cutler commented on SPARK-28482: -- I'm not really sure what you are doing above, are you

[jira] [Commented] (SPARK-28482) Data incomplete when using pandas udf in Python 3

2019-08-21 Thread Bryan Cutler (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912763#comment-16912763 ] Bryan Cutler commented on SPARK-28482: -- [~jiangyu1211] I was not able to reproduce. I tried Spark

[jira] [Commented] (SPARK-28502) Error with struct conversion while using pandas_udf

2019-07-26 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16894025#comment-16894025 ] Bryan Cutler commented on SPARK-28502: -- I'm not sure, but I don't think you can use the same syntax

[jira] [Commented] (SPARK-28264) Revisiting Python / pandas UDF

2019-07-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16893209#comment-16893209 ] Bryan Cutler commented on SPARK-28264: -- It's great to be taking another look at this, I think some

[jira] [Commented] (SPARK-28269) Pandas Grouped Map UDF can get deadlocked

2019-07-16 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886400#comment-16886400 ] Bryan Cutler commented on SPARK-28269: -- cc [~icexelloss] > Pandas Grouped Map UDF can get

[jira] [Updated] (SPARK-28269) Pandas Grouped Map UDF can get deadlocked

2019-07-16 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-28269: - Summary: Pandas Grouped Map UDF can get deadlocked (was: ArrowStreamPandasSerializer get

[jira] [Resolved] (SPARK-27992) PySpark socket server should sync with JVM connection thread future

2019-06-26 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-27992. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 24834

[jira] [Assigned] (SPARK-27992) PySpark socket server should sync with JVM connection thread future

2019-06-26 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-27992: Assignee: Bryan Cutler > PySpark socket server should sync with JVM connection thread

[jira] [Resolved] (SPARK-28003) spark.createDataFrame with Arrow doesn't work with pandas.NaT

2019-06-24 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-28003. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 24844

[jira] [Assigned] (SPARK-28003) spark.createDataFrame with Arrow doesn't work with pandas.NaT

2019-06-24 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-28003: Assignee: Li Jin > spark.createDataFrame with Arrow doesn't work with pandas.NaT >

[jira] [Updated] (SPARK-27992) PySpark socket server should sync with JVM connection thread future

2019-06-24 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-27992: - Description: Both SPARK-27805 and SPARK-27548 identified an issue that errors in a Spark job

[jira] [Resolved] (SPARK-28132) Update document type conversion for Pandas UDFs (pyarrow 0.13.0, pandas 0.24.2, Python 3.7)

2019-06-21 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-28132. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 24930

[jira] [Assigned] (SPARK-28132) Update document type conversion for Pandas UDFs (pyarrow 0.13.0, pandas 0.24.2, Python 3.7)

2019-06-21 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-28132: Assignee: Hyukjin Kwon > Update document type conversion for Pandas UDFs (pyarrow

[jira] [Assigned] (SPARK-28131) Update document type conversion between Python data and SQL types in normal UDFs (Python 3.7)

2019-06-21 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-28131: Assignee: Hyukjin Kwon > Update document type conversion between Python data and SQL

[jira] [Resolved] (SPARK-28131) Update document type conversion between Python data and SQL types in normal UDFs (Python 3.7)

2019-06-21 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-28131. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 24929

[jira] [Created] (SPARK-28128) Pandas Grouped UDFs should skip over empty partitions

2019-06-20 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-28128: Summary: Pandas Grouped UDFs should skip over empty partitions Key: SPARK-28128 URL: https://issues.apache.org/jira/browse/SPARK-28128 Project: Spark Issue

[jira] [Commented] (SPARK-28041) Increase the minimum pandas version to 0.23.2

2019-06-13 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863570#comment-16863570 ] Bryan Cutler commented on SPARK-28041: -- Yes, definitely. I made a quick PR, but we should run it

[jira] [Created] (SPARK-28041) Increase the minimum pandas version to 0.23.2

2019-06-13 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-28041: Summary: Increase the minimum pandas version to 0.23.2 Key: SPARK-28041 URL: https://issues.apache.org/jira/browse/SPARK-28041 Project: Spark Issue Type:

[jira] [Updated] (SPARK-27992) PySpark socket server should sync with JVM connection thread future

2019-06-10 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-27992: - Affects Version/s: (was: 2.4.3) 3.0.0 > PySpark socket server should

[jira] [Updated] (SPARK-27992) PySpark socket server should sync with JVM connection thread future

2019-06-10 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-27992: - Description: Both SPARK-27805 and SPARK-27548 identified an issue that errors in a Spark job

[jira] [Updated] (SPARK-27992) PySpark socket server should sync with JVM connection thread future

2019-06-10 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-27992: - Environment: (was: Both SPARK-27805 and SPARK-27548 identified an issue that errors in a

[jira] [Created] (SPARK-27992) PySpark socket server should sync with JVM connection thread future

2019-06-10 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-27992: Summary: PySpark socket server should sync with JVM connection thread future Key: SPARK-27992 URL: https://issues.apache.org/jira/browse/SPARK-27992 Project: Spark

[jira] [Resolved] (SPARK-27939) Defining a schema with VectorUDT

2019-06-04 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-27939. -- Resolution: Not A Problem > Defining a schema with VectorUDT >

[jira] [Comment Edited] (SPARK-27939) Defining a schema with VectorUDT

2019-06-04 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855969#comment-16855969 ] Bryan Cutler edited comment on SPARK-27939 at 6/4/19 6:13 PM: -- Linked to a

[jira] [Commented] (SPARK-27939) Defining a schema with VectorUDT

2019-06-04 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855969#comment-16855969 ] Bryan Cutler commented on SPARK-27939: -- Another problem with Python {{Row}} class > Defining a

[jira] [Comment Edited] (SPARK-27939) Defining a schema with VectorUDT

2019-06-04 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855966#comment-16855966 ] Bryan Cutler edited comment on SPARK-27939 at 6/4/19 6:11 PM: -- The problem

[jira] [Commented] (SPARK-27939) Defining a schema with VectorUDT

2019-06-04 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855966#comment-16855966 ] Bryan Cutler commented on SPARK-27939: -- The problem is the {{Row}} class sorts the field names

[jira] [Updated] (SPARK-27805) toPandas does not propagate SparkExceptions with arrow enabled

2019-06-04 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-27805: - Affects Version/s: (was: 3.1.0) 2.4.3 > toPandas does not propagate

[jira] [Resolved] (SPARK-27805) toPandas does not propagate SparkExceptions with arrow enabled

2019-06-04 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-27805. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 24677

[jira] [Assigned] (SPARK-27805) toPandas does not propagate SparkExceptions with arrow enabled

2019-06-04 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-27805: Assignee: David Vogelbacher > toPandas does not propagate SparkExceptions with arrow

[jira] [Resolved] (SPARK-27712) createDataFrame() reorders row

2019-05-17 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-27712. -- Resolution: Duplicate > createDataFrame() reorders row > -- > >

[jira] [Commented] (SPARK-27463) SPIP: Support Dataframe Cogroup via Pandas UDFs

2019-05-17 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16842335#comment-16842335 ] Bryan Cutler commented on SPARK-27463: -- [~d80tb7] I think you could remove the SPIP label from this

[jira] [Resolved] (SPARK-27660) Allow PySpark toLocalIterator to pre-fetch data

2019-05-08 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-27660. -- Resolution: Duplicate Somehow this issue got created twice > Allow PySpark toLocalIterator

[jira] [Updated] (SPARK-27659) Allow PySpark toLocalIterator to prefetch data

2019-05-08 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-27659: - Summary: Allow PySpark toLocalIterator to prefetch data (was: Allow PySpark toLocalIterator to

[jira] [Closed] (SPARK-27660) Allow PySpark toLocalIterator to pre-fetch data

2019-05-08 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler closed SPARK-27660. > Allow PySpark toLocalIterator to pre-fetch data > --- >

[jira] [Created] (SPARK-27660) Allow PySpark toLocalIterator to pre-fetch data

2019-05-08 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-27660: Summary: Allow PySpark toLocalIterator to pre-fetch data Key: SPARK-27660 URL: https://issues.apache.org/jira/browse/SPARK-27660 Project: Spark Issue Type:

[jira] [Created] (SPARK-27659) Allow PySpark toLocalIterator to pre-fetch data

2019-05-08 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-27659: Summary: Allow PySpark toLocalIterator to pre-fetch data Key: SPARK-27659 URL: https://issues.apache.org/jira/browse/SPARK-27659 Project: Spark Issue Type:

[jira] [Resolved] (SPARK-27548) PySpark toLocalIterator does not raise errors from worker

2019-05-07 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-27548. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 24070

[jira] [Assigned] (SPARK-23961) pyspark toLocalIterator throws an exception

2019-05-07 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-23961: Assignee: Bryan Cutler > pyspark toLocalIterator throws an exception >

[jira] [Assigned] (SPARK-27548) PySpark toLocalIterator does not raise errors from worker

2019-05-07 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-27548: Assignee: Bryan Cutler > PySpark toLocalIterator does not raise errors from worker >

[jira] [Resolved] (SPARK-23961) pyspark toLocalIterator throws an exception

2019-05-07 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-23961. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 24070

[jira] [Issue Comment Deleted] (SPARK-27612) Creating a DataFrame in PySpark with ArrayType produces some Rows with Arrays of None

2019-05-03 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-27612: - Comment: was deleted (was: Thanks for checking this out [~viirya] and [~hyukjin.kwon]. I agree

[jira] [Commented] (SPARK-27612) Creating a DataFrame in PySpark with ArrayType produces some Rows with Arrays of None

2019-05-03 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832712#comment-16832712 ] Bryan Cutler commented on SPARK-27612: -- Thanks for checking this out [~viirya] and [~hyukjin.kwon].

[jira] [Commented] (SPARK-27396) SPIP: Public APIs for extended Columnar Processing Support

2019-05-02 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832136#comment-16832136 ] Bryan Cutler commented on SPARK-27396: -- The revisions sound good to me, it has a little more focus

[jira] [Commented] (SPARK-27612) Creating a DataFrame in PySpark with ArrayType produces some Rows with Arrays of None

2019-05-01 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831098#comment-16831098 ] Bryan Cutler commented on SPARK-27612: -- Also cc [~viirya] [~hyukjin.kwon], this is a little

[jira] [Updated] (SPARK-27612) Creating a DataFrame in PySpark with ArrayType produces some Rows with Arrays of None

2019-05-01 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-27612: - Description: This seems to only affect Python 3. When creating a DataFrame with type

[jira] [Commented] (SPARK-27612) Creating a DataFrame in PySpark with ArrayType produces some Rows with Arrays of None

2019-05-01 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831092#comment-16831092 ] Bryan Cutler commented on SPARK-27612: -- Thanks [~mgaido], it seems like the problem does not happen

[jira] [Commented] (SPARK-27519) Pandas udf corrupting data

2019-04-30 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830755#comment-16830755 ] Bryan Cutler commented on SPARK-27519: -- I made SPARK-27612 for the problem with {{Row(value=[[None,

[jira] [Updated] (SPARK-27612) Creating a DataFrame in PySpark with ArrayType produces some Rows with Arrays of None

2019-04-30 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-27612: - Description: When creating a DataFrame with type {{ArrayType(IntegerType(), True)}} there ends

[jira] [Created] (SPARK-27612) Creating a DataFrame in PySpark with ArrayType produces some Rows with Arrays of None

2019-04-30 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-27612: Summary: Creating a DataFrame in PySpark with ArrayType produces some Rows with Arrays of None Key: SPARK-27612 URL: https://issues.apache.org/jira/browse/SPARK-27612

[jira] [Resolved] (SPARK-27519) Pandas udf corrupting data

2019-04-30 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-27519. -- Resolution: Fixed Fix Version/s: 3.0.0 Problem does not happen when running the latest

[jira] [Comment Edited] (SPARK-27519) Pandas udf corrupting data

2019-04-30 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830743#comment-16830743 ] Bryan Cutler edited comment on SPARK-27519 at 4/30/19 10:49 PM: Problem

[jira] [Updated] (SPARK-27519) Pandas udf corrupting data

2019-04-30 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-27519: - Affects Version/s: (was: 3.0.0) > Pandas udf corrupting data > -- >

[jira] [Commented] (SPARK-27519) Pandas udf corrupting data

2019-04-30 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830742#comment-16830742 ] Bryan Cutler commented on SPARK-27519: -- Thanks for the script [~f7faf8ba36], I was able to

[jira] [Commented] (SPARK-27463) SPIP: Support Dataframe Cogroup via Pandas UDFs

2019-04-30 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830535#comment-16830535 ] Bryan Cutler commented on SPARK-27463: -- I left some comments on the doc. Overall, I think it sounds

[jira] [Comment Edited] (SPARK-27548) PySpark toLocalIterator does not raise errors from worker

2019-04-29 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829861#comment-16829861 ] Bryan Cutler edited comment on SPARK-27548 at 4/30/19 12:46 AM: This is

[jira] [Comment Edited] (SPARK-27548) PySpark toLocalIterator does not raise errors from worker

2019-04-29 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829861#comment-16829861 ] Bryan Cutler edited comment on SPARK-27548 at 4/30/19 12:46 AM: This is

[jira] [Commented] (SPARK-27548) PySpark toLocalIterator does not raise errors from worker

2019-04-29 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829861#comment-16829861 ] Bryan Cutler commented on SPARK-27548: -- This is not that easy to fix by itself. Since there is no

[jira] [Resolved] (SPARK-26970) Can't load PipelineModel that was created in Scala with Python due to missing Interaction transformer

2019-04-23 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-26970. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 24426

[jira] [Created] (SPARK-27548) PySpark toLocalIterator does not raise errors from worker

2019-04-23 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-27548: Summary: PySpark toLocalIterator does not raise errors from worker Key: SPARK-27548 URL: https://issues.apache.org/jira/browse/SPARK-27548 Project: Spark

[jira] [Commented] (SPARK-27396) SPIP: Public APIs for extended Columnar Processing Support

2019-04-15 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818649#comment-16818649 ] Bryan Cutler commented on SPARK-27396: -- Thanks for this [~revans2], overall I think the proposal

[jira] [Assigned] (SPARK-27387) Replace sqlutils assertPandasEqual with Pandas assert_frame_equal in tests

2019-04-11 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-27387: Assignee: Bryan Cutler > Replace sqlutils assertPandasEqual with Pandas

[jira] [Commented] (SPARK-27389) pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'"

2019-04-09 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813896#comment-16813896 ] Bryan Cutler commented on SPARK-27389: -- Thanks [~shaneknapp] for the fix. I couldn't come up with

[jira] [Commented] (SPARK-27389) pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'"

2019-04-08 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812877#comment-16812877 ] Bryan Cutler commented on SPARK-27389: -- [~shaneknapp], I had a couple of successful tests with

[jira] [Commented] (SPARK-27389) pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'"

2019-04-05 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811355#comment-16811355 ] Bryan Cutler commented on SPARK-27389: -- >From the stacktrace, it looks like it's getting this from

<    1   2   3   4   5   6   7   8   >