[jira] [Assigned] (SPARK-32709) Write Hive ORC/Parquet bucketed table with hivehash (for Hive 1,2)

2021-09-17 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-32709:
---

Assignee: Cheng Su

> Write Hive ORC/Parquet bucketed table with hivehash (for Hive 1,2)
> --
>
> Key: SPARK-32709
> URL: https://issues.apache.org/jira/browse/SPARK-32709
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Cheng Su
>Assignee: Cheng Su
>Priority: Minor
> Attachments: 91275701_stage6_metrics.png
>
>
> Hive ORC/Parquet write code path is same as data source v1 code path 
> (FileFormatWriter). This JIRA is to add the support to write Hive ORC/Parquet 
> bucketed table with hivehash. The change is to custom `bucketIdExpression` to 
> use hivehash when the table is Hive bucketed table, and the Hive version is 
> 1.x.y or 2.x.y.
>  
> This will allow us write Hive/Presto-compatible bucketed table for Hive 1 and 
> 2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32709) Write Hive ORC/Parquet bucketed table with hivehash (for Hive 1,2)

2020-10-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32709:


Assignee: (was: Apache Spark)

> Write Hive ORC/Parquet bucketed table with hivehash (for Hive 1,2)
> --
>
> Key: SPARK-32709
> URL: https://issues.apache.org/jira/browse/SPARK-32709
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Cheng Su
>Priority: Minor
>
> Hive ORC/Parquet write code path is same as data source v1 code path 
> (FileFormatWriter). This JIRA is to add the support to write Hive ORC/Parquet 
> bucketed table with hivehash. The change is to custom `bucketIdExpression` to 
> use hivehash when the table is Hive bucketed table, and the Hive version is 
> 1.x.y or 2.x.y.
>  
> This will allow us write Hive/Presto-compatible bucketed table for Hive 1 and 
> 2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32709) Write Hive ORC/Parquet bucketed table with hivehash (for Hive 1,2)

2020-10-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32709:


Assignee: Apache Spark

> Write Hive ORC/Parquet bucketed table with hivehash (for Hive 1,2)
> --
>
> Key: SPARK-32709
> URL: https://issues.apache.org/jira/browse/SPARK-32709
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Cheng Su
>Assignee: Apache Spark
>Priority: Minor
>
> Hive ORC/Parquet write code path is same as data source v1 code path 
> (FileFormatWriter). This JIRA is to add the support to write Hive ORC/Parquet 
> bucketed table with hivehash. The change is to custom `bucketIdExpression` to 
> use hivehash when the table is Hive bucketed table, and the Hive version is 
> 1.x.y or 2.x.y.
>  
> This will allow us write Hive/Presto-compatible bucketed table for Hive 1 and 
> 2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org