[ https://issues.apache.org/jira/browse/SPARK-17806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Davies Liu reassigned SPARK-17806: ---------------------------------- Assignee: Davies Liu > Incorrect result when work with data from parquet > ------------------------------------------------- > > Key: SPARK-17806 > URL: https://issues.apache.org/jira/browse/SPARK-17806 > Project: Spark > Issue Type: Bug > Affects Versions: 2.0.0, 2.0.1 > Reporter: Vitaly Gerasimov > Assignee: Davies Liu > Priority: Blocker > Labels: correctness > > {code} > import org.apache.spark.SparkConf > import org.apache.spark.sql.SparkSession > import org.apache.spark.sql.types.{StructField, StructType} > import org.apache.spark.sql.types.DataTypes._ > val sc = SparkSession.builder().config(new > SparkConf().setMaster("local")).getOrCreate() > val jsonRDD = sc.sparkContext.parallelize(Seq( > """{"a":1,"b":1,"c":1}""", > """{"a":1,"b":1,"c":2}""" > )) > sc.read.schema(StructType(Seq( > StructField("a", IntegerType), > StructField("b", IntegerType), > StructField("c", LongType) > ))).json(jsonRDD).write.parquet("/tmp/test") > val df = sc.read.load("/tmp/test") > df.join(df, Seq("a", "b", "c"), "left_outer").show() > {code} > returns: > {code} > +---+---+---+ > | a| b| c| > +---+---+---+ > | 1| 1| 1| > | 1| 1| 1| > | 1| 1| 2| > | 1| 1| 2| > +---+---+---+ > {code} > Expected result: > {code} > +---+---+---+ > | a| b| c| > +---+---+---+ > | 1| 1| 1| > | 1| 1| 2| > +---+---+---+ > {code} > If I use this code without saving to parquet it works fine. If you change > type of `c` column to `IntegerType` it also works fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org