[jira] [Assigned] (SPARK-37194) Avoid unnecessary sort in FileFormatWriter if it's not dynamic partition

2022-07-29 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-37194:
---

Assignee: XiDuo You

> Avoid unnecessary sort in FileFormatWriter if it's not dynamic partition
> 
>
> Key: SPARK-37194
> URL: https://issues.apache.org/jira/browse/SPARK-37194
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.4.0
>
>
> `FileFormatWriter.write` will sort the partition and bucket column before 
> writing. I think this code path assumed the input `partitionColumns` are 
> dynamic but actually it's not. It now is used by three code path:
>  - `FileStreamSink`; it should be always dynamic partition
>  - `SaveAsHiveFile`; it followed the assuming that `InsertIntoHiveTable` has 
> removed the static partition and `InsertIntoHiveDirCommand` has no partition
>  - `InsertIntoHadoopFsRelationCommand`; it passed `partitionColumns` into 
> `FileFormatWriter.write` without removing static partition because we need it 
> to generate the partition path in `DynamicPartitionDataWriter`
> It shows that the unnecessary sort only affected the 
> `InsertIntoHadoopFsRelationCommand` if we write data with static partition.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37194) Avoid unnecessary sort in FileFormatWriter if it's not dynamic partition

2021-11-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37194:


Assignee: (was: Apache Spark)

> Avoid unnecessary sort in FileFormatWriter if it's not dynamic partition
> 
>
> Key: SPARK-37194
> URL: https://issues.apache.org/jira/browse/SPARK-37194
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: XiDuo You
>Priority: Major
>
> `FileFormatWriter.write` will sort the partition and bucket column before 
> writing. I think this code path assumed the input `partitionColumns` are 
> dynamic but actually it's not. It now is used by three code path:
>  - `FileStreamSink`; it should be always dynamic partition
>  - `SaveAsHiveFile`; it followed the assuming that `InsertIntoHiveTable` has 
> removed the static partition and `InsertIntoHiveDirCommand` has no partition
>  - `InsertIntoHadoopFsRelationCommand`; it passed `partitionColumns` into 
> `FileFormatWriter.write` without removing static partition because we need it 
> to generate the partition path in `DynamicPartitionDataWriter`
> It shows that the unnecessary sort only affected the 
> `InsertIntoHadoopFsRelationCommand` if we write data with static partition.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37194) Avoid unnecessary sort in FileFormatWriter if it's not dynamic partition

2021-11-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37194:


Assignee: Apache Spark

> Avoid unnecessary sort in FileFormatWriter if it's not dynamic partition
> 
>
> Key: SPARK-37194
> URL: https://issues.apache.org/jira/browse/SPARK-37194
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: XiDuo You
>Assignee: Apache Spark
>Priority: Major
>
> `FileFormatWriter.write` will sort the partition and bucket column before 
> writing. I think this code path assumed the input `partitionColumns` are 
> dynamic but actually it's not. It now is used by three code path:
>  - `FileStreamSink`; it should be always dynamic partition
>  - `SaveAsHiveFile`; it followed the assuming that `InsertIntoHiveTable` has 
> removed the static partition and `InsertIntoHiveDirCommand` has no partition
>  - `InsertIntoHadoopFsRelationCommand`; it passed `partitionColumns` into 
> `FileFormatWriter.write` without removing static partition because we need it 
> to generate the partition path in `DynamicPartitionDataWriter`
> It shows that the unnecessary sort only affected the 
> `InsertIntoHadoopFsRelationCommand` if we write data with static partition.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org