[jira] [Resolved] (SPARK-36091) Support TimestampNTZ type in expression TimeWindow

2021-07-19 Thread Gengliang Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-36091. Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33341

[jira] [Assigned] (SPARK-36091) Support TimestampNTZ type in expression TimeWindow

2021-07-19 Thread Gengliang Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-36091: -- Assignee: jiaan.geng > Support TimestampNTZ type in expression TimeWindow >

[jira] [Assigned] (SPARK-35806) Mapping the `mode` argument to pandas in DataFrame.to_csv

2021-07-19 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-35806: Assignee: Haejoon Lee > Mapping the `mode` argument to pandas in DataFrame.to_csv >

[jira] [Resolved] (SPARK-35806) Mapping the `mode` argument to pandas in DataFrame.to_csv

2021-07-19 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-35806. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33414

[jira] [Resolved] (SPARK-36205) Use set-env instead of set-output in GitHub Actions

2021-07-19 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-36205. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33412

[jira] [Assigned] (SPARK-36181) Update pyspark sql readwriter documentation to Scala level

2021-07-19 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-36181: Assignee: Dominik Gehl > Update pyspark sql readwriter documentation to Scala level >

[jira] [Resolved] (SPARK-36181) Update pyspark sql readwriter documentation to Scala level

2021-07-19 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-36181. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33394

[jira] [Resolved] (SPARK-36178) Document PySpark Catalog APIs in docs/source/reference/pyspark.sql.rst

2021-07-19 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-36178. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33392

[jira] [Assigned] (SPARK-36178) Document PySpark Catalog APIs in docs/source/reference/pyspark.sql.rst

2021-07-19 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-36178: Assignee: Dominik Gehl > Document PySpark Catalog APIs in

[jira] [Assigned] (SPARK-36205) Use set-env instead of set-output in GitHub Actions

2021-07-19 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-36205: Assignee: Hyukjin Kwon > Use set-env instead of set-output in GitHub Actions >

[jira] [Commented] (SPARK-36086) The case of the delta table is inconsistent with parquet

2021-07-19 Thread Wenchen Fan (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17383195#comment-17383195 ] Wenchen Fan commented on SPARK-36086: - Seems we should improve the v2 describe table command to

[jira] [Assigned] (SPARK-34806) Helper class for batch Dataset.observe()

2021-07-19 Thread Wenchen Fan (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-34806: --- Assignee: Enrico Minack > Helper class for batch Dataset.observe() >

[jira] [Resolved] (SPARK-34806) Helper class for batch Dataset.observe()

2021-07-19 Thread Wenchen Fan (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-34806. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 31905

[jira] [Commented] (SPARK-24965) Spark SQL fails when reading a partitioned hive table with different formats per partition

2021-07-19 Thread tiejiang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-24965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17383192#comment-17383192 ] tiejiang commented on SPARK-24965: -- I have a similar question, see the link, can anyone answer it,

[jira] [Assigned] (SPARK-36161) dropDuplicates does not type check argument

2021-07-19 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36161: Assignee: (was: Apache Spark) > dropDuplicates does not type check argument >

[jira] [Assigned] (SPARK-36161) dropDuplicates does not type check argument

2021-07-19 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36161: Assignee: Apache Spark > dropDuplicates does not type check argument >

[jira] [Commented] (SPARK-36161) dropDuplicates does not type check argument

2021-07-19 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17383175#comment-17383175 ] Apache Spark commented on SPARK-36161: -- User 'sammyjmoseley' has created a pull request for this

[jira] [Assigned] (SPARK-35806) Mapping the `mode` argument to pandas in DataFrame.to_csv

2021-07-19 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35806: Assignee: Apache Spark > Mapping the `mode` argument to pandas in DataFrame.to_csv >

[jira] [Assigned] (SPARK-35806) Mapping the `mode` argument to pandas in DataFrame.to_csv

2021-07-19 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35806: Assignee: (was: Apache Spark) > Mapping the `mode` argument to pandas in

[jira] [Commented] (SPARK-35806) Mapping the `mode` argument to pandas in DataFrame.to_csv

2021-07-19 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17383172#comment-17383172 ] Apache Spark commented on SPARK-35806: -- User 'itholic' has created a pull request for this issue:

[jira] [Assigned] (SPARK-36163) Propagate correct JDBC properties in JDBC connector provider and add "connectionProvider" option

2021-07-19 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-36163: Assignee: Ivan > Propagate correct JDBC properties in JDBC connector provider and add >

[jira] [Resolved] (SPARK-36163) Propagate correct JDBC properties in JDBC connector provider and add "connectionProvider" option

2021-07-19 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-36163. -- Fix Version/s: 3.3.0 Resolution: Fixed Fixed in

[jira] [Commented] (SPARK-36185) Implement functions in CategoricalAccessor/CategoricalIndex

2021-07-19 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17383146#comment-17383146 ] Hyukjin Kwon commented on SPARK-36185: -- I think it's for Spark 3.2. Most of fixes are being landed

[jira] [Commented] (SPARK-36187) Commit collision avoidance in dynamicPartitionOverwrite for non-Parquet formats

2021-07-19 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17383144#comment-17383144 ] Hyukjin Kwon commented on SPARK-36187: -- For question, let's interact it with Spark mailing list

[jira] [Resolved] (SPARK-36187) Commit collision avoidance in dynamicPartitionOverwrite for non-Parquet formats

2021-07-19 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-36187. -- Resolution: Incomplete > Commit collision avoidance in dynamicPartitionOverwrite for

[jira] [Updated] (SPARK-36192) Better error messages when comparing against list

2021-07-19 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-36192: - Description: We shall throw TypeError messages rather than Spark exceptions. > Better error

[jira] [Resolved] (SPARK-36203) Spark SQL can't use "group by" on the column of map type.

2021-07-19 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-36203. -- Resolution: Incomplete > Spark SQL can't use "group by" on the column of map type. >

[jira] [Commented] (SPARK-36203) Spark SQL can't use "group by" on the column of map type.

2021-07-19 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17383143#comment-17383143 ] Hyukjin Kwon commented on SPARK-36203: -- Can you show the fullly self-contained reproducer? BTW,

[jira] [Resolved] (SPARK-36134) jackson-databind RCE vulnerability

2021-07-19 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-36134. -- Resolution: Invalid > jackson-databind RCE vulnerability > --

[jira] [Commented] (SPARK-36088) 'spark.archives' does not extract the archive file into the driver under client mode

2021-07-19 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17383131#comment-17383131 ] Hyukjin Kwon commented on SPARK-36088: -- cc [~dongjoon] and [~holdenkarau] FYI > 'spark.archives'

[jira] [Commented] (SPARK-36088) 'spark.archives' does not extract the archive file into the driver under client mode

2021-07-19 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17383130#comment-17383130 ] Hyukjin Kwon commented on SPARK-36088: -- You might have to call

[jira] [Updated] (SPARK-35806) Mapping the `mode` argument to pandas in DataFrame.to_csv

2021-07-19 Thread Haejoon Lee (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-35806: Description: pandas and pandas-on-Spark both have an argument named `mode` in the

[jira] [Commented] (SPARK-36088) 'spark.archives' does not extract the archive file into the driver under client mode

2021-07-19 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17383128#comment-17383128 ] Hyukjin Kwon commented on SPARK-36088: -- does your driver run inside a pod or on a physical host? >

[jira] [Updated] (SPARK-35806) Mapping the `mode` argument to pandas in DataFrame.to_csv

2021-07-19 Thread Haejoon Lee (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-35806: Summary: Mapping the `mode` argument to pandas in DataFrame.to_csv (was: Mapping the `mode`

[jira] [Updated] (SPARK-35806) Mapping the `mode` argument to pandas

2021-07-19 Thread Haejoon Lee (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-35806: Description: pandas and pandas-on-Spark both have a argument named `mode` in the

[jira] [Commented] (SPARK-36201) Add check for inner field of schema

2021-07-19 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17383125#comment-17383125 ] Apache Spark commented on SPARK-36201: -- User 'AngersZh' has created a pull request for this

[jira] [Assigned] (SPARK-36201) Add check for inner field of schema

2021-07-19 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36201: Assignee: Apache Spark > Add check for inner field of schema >

[jira] [Assigned] (SPARK-36201) Add check for inner field of schema

2021-07-19 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36201: Assignee: (was: Apache Spark) > Add check for inner field of schema >

[jira] [Resolved] (SPARK-36197) InputFormat of PartitionDesc is not respected

2021-07-19 Thread Kent Yao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-36197. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33406

[jira] [Assigned] (SPARK-36197) InputFormat of PartitionDesc is not respected

2021-07-19 Thread Kent Yao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-36197: Assignee: Kent Yao > InputFormat of PartitionDesc is not respected >

[jira] [Updated] (SPARK-36206) Diagnose shuffle data corruption by checksum

2021-07-19 Thread wuyi (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wuyi updated SPARK-36206: - Description: After adding checksums in SPARK-35276, we can leverage the checksums to do diagnosis for shuffle

[jira] [Created] (SPARK-36206) Diagnose shuffle data corruption by checksum

2021-07-19 Thread wuyi (Jira)
wuyi created SPARK-36206: Summary: Diagnose shuffle data corruption by checksum Key: SPARK-36206 URL: https://issues.apache.org/jira/browse/SPARK-36206 Project: Spark Issue Type: Sub-task

[jira] [Reopened] (SPARK-35806) Mapping the `mode` argument to pandas

2021-07-19 Thread Haejoon Lee (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee reopened SPARK-35806: - Reopen issue with revised title & description. We should mapping the arguments rather than just

[jira] [Commented] (SPARK-35806) Mapping the `mode` argument to pandas

2021-07-19 Thread Haejoon Lee (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17383046#comment-17383046 ] Haejoon Lee commented on SPARK-35806: - I'm working on this > Mapping the `mode` argument to pandas

[jira] [Updated] (SPARK-35806) Mapping the `mode` argument to pandas

2021-07-19 Thread Haejoon Lee (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-35806: Summary: Mapping the `mode` argument to pandas (was: Rename the `mode` argument to avoid

[jira] [Updated] (SPARK-35806) Mapping the `mode` argument to pandas

2021-07-19 Thread Haejoon Lee (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-35806: Description: pandas on Spark has a argument named `mode` in the APIs below: *

[jira] [Resolved] (SPARK-36184) Use ValidateRequirements instead of EnsureRequirements to skip AQE rules that adds extra shuffles

2021-07-19 Thread Wenchen Fan (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-36184. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33396

[jira] [Assigned] (SPARK-36184) Use ValidateRequirements instead of EnsureRequirements to skip AQE rules that adds extra shuffles

2021-07-19 Thread Wenchen Fan (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-36184: --- Assignee: Wenchen Fan > Use ValidateRequirements instead of EnsureRequirements to skip AQE

[jira] [Commented] (SPARK-36175) Support TimestampNTZ in Avro data source

2021-07-19 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17383025#comment-17383025 ] Apache Spark commented on SPARK-36175: -- User 'beliefer' has created a pull request for this issue:

[jira] [Assigned] (SPARK-36175) Support TimestampNTZ in Avro data source

2021-07-19 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36175: Assignee: (was: Apache Spark) > Support TimestampNTZ in Avro data source >

[jira] [Assigned] (SPARK-36175) Support TimestampNTZ in Avro data source

2021-07-19 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36175: Assignee: Apache Spark > Support TimestampNTZ in Avro data source >

[jira] [Updated] (SPARK-33844) InsertIntoDir failed since query column name contains ',' cause column type and column names size not equal

2021-07-19 Thread angerszhu (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-33844: -- Parent: SPARK-36200 Issue Type: Sub-task (was: Improvement) > InsertIntoDir failed since

[jira] [Updated] (SPARK-36184) Use ValidateRequirements instead of EnsureRequirements to skip AQE rules that adds extra shuffles

2021-07-19 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-36184: -- Parent: SPARK-33828 Issue Type: Sub-task (was: Improvement) > Use

<    1   2