[ 
https://issues.apache.org/jira/browse/SPARK-33594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17252775#comment-17252775
 ] 

Ala Luszczak commented on SPARK-33594:
--------------------------------------

Big :+1: here. Having binary column as partition-by is a terrible idea.
I've seen at least two really bad scenarios result from this.

(1) When reading the data with the vectorized reader, I've seen segmentation 
faults.
(2) When reading the same data with the non-vectorized (parquet-mr) reader, the 
segmentation faults disappear, but instead incorrect values are returned for 
the binary columns.

I would like to point out that just covering the CREATE TABLE statement might 
not be enough. I think we should bail in the read path as well. After all the 
user can jest do spark.read.parquet("my/path") without creating a table first.

> Forbid binary type as partition column
> --------------------------------------
>
>                 Key: SPARK-33594
>                 URL: https://issues.apache.org/jira/browse/SPARK-33594
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: angerszhu
>            Priority: Major
>
> Forbid binary type as partition column



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to