[ 
https://issues.apache.org/jira/browse/SPARK-26149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-26149:
--------------------------------
    Description: 
How to reproduce:
{code:bash}
scala> 
spark.read.parquet("/Users/yumwang/SPARK-26149.snappy.parquet").selectExpr("s2 
= s1").show
+---------+
|(s2 = s1)|
+---------+
|    false|
+---------+


scala>     val first = 
spark.read.parquet("/Users/yumwang/SPARK-26149.snappy.parquet").collect().head
first: org.apache.spark.sql.Row = 
[a0750c1f13f0k5��F8j���b�Ro'4da96,a0750c1f13f0k5��F8j���b�Ro'4da96]

scala>     println(first.getString(0).equals(first.getString(1)))
true
{code}

Spark UTF8String returns {{false}} but String returns {{true}}.



  was:
How to reproduce:
{code:bash}
scala> 
spark.read.parquet("/Users/yumwang/SPARK-26149.snappy.parquet").selectExpr("s2 
= s1").show
+---------+
|(s2 = s1)|
+---------+
|    false|
+---------+


scala>     val first = 
spark.read.parquet("/Users/yumwang/SPARK-26149.snappy.parquet").collect().head
first: org.apache.spark.sql.Row = 
[a0750c1f13f0k5��F8j���b�Ro'4da96,a0750c1f13f0k5��F8j���b�Ro'4da96]

scala>     println(first.getString(0).equals(first.getString(1)))
true
{code}


> Read UTF8String from Parquet/ORC may be incorrect
> -------------------------------------------------
>
>                 Key: SPARK-26149
>                 URL: https://issues.apache.org/jira/browse/SPARK-26149
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.0, 2.2.0, 2.3.0, 2.4.0
>            Reporter: Yuming Wang
>            Priority: Major
>         Attachments: SPARK-26149.snappy.parquet
>
>
> How to reproduce:
> {code:bash}
> scala> 
> spark.read.parquet("/Users/yumwang/SPARK-26149.snappy.parquet").selectExpr("s2
>  = s1").show
> +---------+
> |(s2 = s1)|
> +---------+
> |    false|
> +---------+
> scala>     val first = 
> spark.read.parquet("/Users/yumwang/SPARK-26149.snappy.parquet").collect().head
> first: org.apache.spark.sql.Row = 
> [a0750c1f13f0k5��F8j���b�Ro'4da96,a0750c1f13f0k5��F8j���b�Ro'4da96]
> scala>     println(first.getString(0).equals(first.getString(1)))
> true
> {code}
> Spark UTF8String returns {{false}} but String returns {{true}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to