[jira] [Commented] (SPARK-48220) Allow passing PyArrow Table to createDataFrame()

2024-05-17 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-48220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847448#comment-17847448 ] Ian Cook commented on SPARK-48220: -- [~gurwls223] the PR for this is ready for review:

[jira] [Updated] (SPARK-48302) Null values in map columns of PyArrow tables are replaced with empty lists

2024-05-16 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-48302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-48302: - Description: Because of a limitation in PyArrow, when PyArrow Tables are passed to

[jira] [Updated] (SPARK-48302) Null values in map columns of PyArrow tables are replaced with empty lists

2024-05-16 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-48302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-48302: - Description: Because of a limitation in PyArrow, when PyArrow Tables are passed to

[jira] [Updated] (SPARK-48302) Null values in map columns of PyArrow tables are replaced with empty lists

2024-05-16 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-48302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-48302: - Description: Because of a limitation in PyArrow, when PyArrow Tables are passed to

[jira] [Updated] (SPARK-48302) Null values in map columns of PyArrow tables are replaced with empty lists

2024-05-16 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-48302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-48302: - Description: Because of a limitation in PyArrow, when PyArrow Tables are passed to

[jira] [Updated] (SPARK-48302) Null values in map columns of PyArrow tables are replaced with empty lists

2024-05-16 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-48302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-48302: - Description: Because of a limitation in PyArrow, when PyArrow Tables are passed to

[jira] [Updated] (SPARK-48220) Allow passing PyArrow Table to createDataFrame()

2024-05-16 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-48220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-48220: - Fix Version/s: 4.0.0 > Allow passing PyArrow Table to createDataFrame() >

[jira] [Updated] (SPARK-48302) Null values in map columns of PyArrow tables are replaced with empty lists

2024-05-16 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-48302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-48302: - Description: Because of a limitation in PyArrow, when PyArrow Tables are passed to

[jira] [Created] (SPARK-48302) Null values in map columns of PyArrow tables are replaced with empty lists

2024-05-16 Thread Ian Cook (Jira)
Ian Cook created SPARK-48302: Summary: Null values in map columns of PyArrow tables are replaced with empty lists Key: SPARK-48302 URL: https://issues.apache.org/jira/browse/SPARK-48302 Project: Spark

[jira] [Updated] (SPARK-48220) Allow passing PyArrow Table to createDataFrame()

2024-05-14 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-48220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-48220: - Parent: SPARK-44111 Issue Type: Sub-task (was: Improvement) > Allow passing PyArrow Table to

[jira] [Updated] (SPARK-47465) Remove experimental tag from toArrow() PySpark DataFrame method

2024-05-09 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-47465: - Fix Version/s: 4.0.0 > Remove experimental tag from toArrow() PySpark DataFrame method >

[jira] [Updated] (SPARK-48220) Allow passing PyArrow Table to createDataFrame()

2024-05-09 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-48220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-48220: - Description: SPARK-47365 added support for returning a Spark DataFrame as a PyArrow Table. It would be

[jira] [Updated] (SPARK-47466) Add PySpark DataFrame method to return iterator of PyArrow RecordBatches

2024-05-09 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-47466: - Description: As a follow-up to SPARK-47365: {{toArrow()}} is useful when the data is relatively small.

[jira] [Created] (SPARK-48220) Allow passing PyArrow Table to createDataFrame()

2024-05-09 Thread Ian Cook (Jira)
Ian Cook created SPARK-48220: Summary: Allow passing PyArrow Table to createDataFrame() Key: SPARK-48220 URL: https://issues.apache.org/jira/browse/SPARK-48220 Project: Spark Issue Type:

[jira] [Updated] (SPARK-47465) Remove experimental tag from toArrow() PySpark DataFrame method

2024-05-08 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-47465: - Description: As a follow-up to SPARK-47365: What is needed to consider making the *toArrow()* PySpark

[jira] [Updated] (SPARK-47465) Remove experimental tag from toArrow() PySpark DataFrame method

2024-05-08 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-47465: - Summary: Remove experimental tag from toArrow() PySpark DataFrame method (was: Remove experimental tag

[jira] [Updated] (SPARK-47466) Add PySpark DataFrame method to return iterator of PyArrow RecordBatches

2024-05-08 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-47466: - Description: As a follow-up to SPARK-47365: *toArrow()* is useful when the data is relatively small.

[jira] [Updated] (SPARK-47365) Add toArrow() DataFrame method to PySpark

2024-05-08 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-47365: - Summary: Add toArrow() DataFrame method to PySpark (was: Add toArrowTable() DataFrame method to

[jira] [Updated] (SPARK-47365) Add toArrowTable() DataFrame method to PySpark

2024-05-08 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-47365: - Description: Over in the Apache Arrow community, we hear from a lot of users who want to return the

[jira] [Updated] (SPARK-47365) Add toArrowTable() DataFrame method to PySpark

2024-05-08 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-47365: - Affects Version/s: 4.0.0 > Add toArrowTable() DataFrame method to PySpark >

[jira] [Updated] (SPARK-47365) Add toArrowTable() DataFrame method to PySpark

2024-05-08 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-47365: - Description: Over in the Apache Arrow community, we hear from a lot of users who want to return the

[jira] [Resolved] (SPARK-47465) Remove experimental tag from toArrowTable() PySpark DataFrame method

2024-05-08 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook resolved SPARK-47465. -- Resolution: Duplicate This is now part of SPARK-47365. > Remove experimental tag from toArrowTable()

[jira] [Updated] (SPARK-47365) Add toArrowTable() DataFrame method to PySpark

2024-05-08 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-47365: - Parent: SPARK-44111 Issue Type: Sub-task (was: Improvement) > Add toArrowTable() DataFrame

[jira] [Updated] (SPARK-47365) Add toArrowTable() DataFrame method to PySpark

2024-05-08 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-47365: - Fix Version/s: 4.0.0 > Add toArrowTable() DataFrame method to PySpark >

[jira] [Updated] (SPARK-47365) Add toArrowTable() DataFrame method to PySpark

2024-05-07 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-47365: - Description: Over in the Apache Arrow community, we hear from a lot of users who want to return the

[jira] [Updated] (SPARK-47365) Add toArrowTable() DataFrame method to PySpark

2024-05-07 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-47365: - Summary: Add toArrowTable() DataFrame method to PySpark (was: Add _toArrowTable() DataFrame method to

[jira] [Updated] (SPARK-47465) Remove experimental tag from toArrowTable() PySpark DataFrame method

2024-05-07 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-47465: - Description: As a follow-up to SPARK-47365: What is needed to consider making the *toArrowTable()*

[jira] [Updated] (SPARK-47466) Add PySpark DataFrame method to return iterator of PyArrow RecordBatches

2024-05-07 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-47466: - Description: As a follow-up to SPARK-47365: *toArrowTable()* is useful when the data is relatively

[jira] [Updated] (SPARK-47465) Remove experimental tag from toArrowTable() PySpark DataFrame method

2024-05-07 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-47465: - Summary: Remove experimental tag from toArrowTable() PySpark DataFrame method (was: Remove

[jira] [Updated] (SPARK-47365) Add _toArrowTable() DataFrame method to PySpark

2024-05-07 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-47365: - Summary: Add _toArrowTable() DataFrame method to PySpark (was: Add _toArrow() DataFrame method to

[jira] [Updated] (SPARK-47365) Add _toArrowTable() DataFrame method to PySpark

2024-05-07 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-47365: - Description: Over in the Apache Arrow community, we hear from a lot of users who want to return the

[jira] [Created] (SPARK-47466) Add PySpark DataFrame method to return iterator of PyArrow RecordBatches

2024-03-19 Thread Ian Cook (Jira)
Ian Cook created SPARK-47466: Summary: Add PySpark DataFrame method to return iterator of PyArrow RecordBatches Key: SPARK-47466 URL: https://issues.apache.org/jira/browse/SPARK-47466 Project: Spark

[jira] [Updated] (SPARK-47365) Add _toArrow() DataFrame method to PySpark

2024-03-19 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-47365: - Component/s: Connect > Add _toArrow() DataFrame method to PySpark >

[jira] [Created] (SPARK-47465) Remove experimental tag from toArrow() PySpark DataFrame method

2024-03-19 Thread Ian Cook (Jira)
Ian Cook created SPARK-47465: Summary: Remove experimental tag from toArrow() PySpark DataFrame method Key: SPARK-47465 URL: https://issues.apache.org/jira/browse/SPARK-47465 Project: Spark

[jira] [Updated] (SPARK-47365) Add _toArrow() DataFrame method to PySpark

2024-03-19 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-47365: - Description: Over in the Apache Arrow community, we hear from a lot of users who want to return the

[jira] [Updated] (SPARK-47365) Add _toArrow() DataFrame method to PySpark

2024-03-19 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-47365: - Summary: Add _toArrow() DataFrame method to PySpark (was: Add toArrow() DataFrame method to PySpark)

[jira] [Updated] (SPARK-47365) Add toArrow() DataFrame method to PySpark

2024-03-12 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-47365: - Component/s: SQL > Add toArrow() DataFrame method to PySpark >

[jira] [Updated] (SPARK-47365) Add toArrow() DataFrame method to PySpark

2024-03-12 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-47365: - Description: Over in the Apache Arrow community, we hear from a lot of users who want to return the

[jira] [Updated] (SPARK-47365) Add toArrow() DataFrame method to PySpark

2024-03-12 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-47365: - Summary: Add toArrow() DataFrame method to PySpark (was: Add toArrow() DataFrame method) > Add

[jira] [Comment Edited] (SPARK-47365) Add toArrow() DataFrame method

2024-03-12 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825769#comment-17825769 ] Ian Cook edited comment on SPARK-47365 at 3/12/24 8:04 PM: --- It looks like all

[jira] [Commented] (SPARK-47365) Add toArrow() DataFrame method

2024-03-12 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825769#comment-17825769 ] Ian Cook commented on SPARK-47365: -- It looks like all the pieces required to enable this already exist:

[jira] [Created] (SPARK-47365) Add toArrow() DataFrame method

2024-03-12 Thread Ian Cook (Jira)
Ian Cook created SPARK-47365: Summary: Add toArrow() DataFrame method Key: SPARK-47365 URL: https://issues.apache.org/jira/browse/SPARK-47365 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-27335) cannot collect() from Correlation.corr

2020-07-29 Thread Ian Cook (Jira)
[ https://issues.apache.org/jira/browse/SPARK-27335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167475#comment-17167475 ] Ian Cook commented on SPARK-27335: -- Regarding the workaround code that [~natalinobusa] posted above: In