[ https://issues.apache.org/jira/browse/SPARK-26149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yuming Wang updated SPARK-26149: -------------------------------- Description: How to reproduce: {code:bash} scala> spark.read.parquet("/Users/yumwang/SPARK-26149.snappy.parquet").selectExpr("s2 = s1").show +---------+ |(s2 = s1)| +---------+ | false| +---------+ scala> val first = spark.read.parquet("/Users/yumwang/SPARK-26149.snappy.parquet").collect().head first: org.apache.spark.sql.Row = [a0750c1f13f0k5��F8j���b�Ro'4da96,a0750c1f13f0k5��F8j���b�Ro'4da96] scala> println(first.getString(0).equals(first.getString(1))) true {code} Spark UTF8String returns {{false}} but String returns {{true}}. was: How to reproduce: {code:bash} scala> spark.read.parquet("/Users/yumwang/SPARK-26149.snappy.parquet").selectExpr("s2 = s1").show +---------+ |(s2 = s1)| +---------+ | false| +---------+ scala> val first = spark.read.parquet("/Users/yumwang/SPARK-26149.snappy.parquet").collect().head first: org.apache.spark.sql.Row = [a0750c1f13f0k5��F8j���b�Ro'4da96,a0750c1f13f0k5��F8j���b�Ro'4da96] scala> println(first.getString(0).equals(first.getString(1))) true {code} > Read UTF8String from Parquet/ORC may be incorrect > ------------------------------------------------- > > Key: SPARK-26149 > URL: https://issues.apache.org/jira/browse/SPARK-26149 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.1.0, 2.2.0, 2.3.0, 2.4.0 > Reporter: Yuming Wang > Priority: Major > Attachments: SPARK-26149.snappy.parquet > > > How to reproduce: > {code:bash} > scala> > spark.read.parquet("/Users/yumwang/SPARK-26149.snappy.parquet").selectExpr("s2 > = s1").show > +---------+ > |(s2 = s1)| > +---------+ > | false| > +---------+ > scala> val first = > spark.read.parquet("/Users/yumwang/SPARK-26149.snappy.parquet").collect().head > first: org.apache.spark.sql.Row = > [a0750c1f13f0k5��F8j���b�Ro'4da96,a0750c1f13f0k5��F8j���b�Ro'4da96] > scala> println(first.getString(0).equals(first.getString(1))) > true > {code} > Spark UTF8String returns {{false}} but String returns {{true}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org