Kevin Jung created SPARK-5737: --------------------------------- Summary: Scanning duplicate columns from parquet table Key: SPARK-5737 URL: https://issues.apache.org/jira/browse/SPARK-5737 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.2.1 Reporter: Kevin Jung
{quote} import org.apache.spark.sql._ val sqlContext = new SQLContext(sc) import sqlContext._ val rdd = sqlContext.parquetFile("temp.parquet") rdd.select('d1,'d1,'d2,'d2).take(3).foreach(println) {quote} The results of above code have null values at the preceding columns of duplicate two. For example, {quote} [null,-5.7,null,121.05] [null,-61.17,null,108.91] [null,50.60,null,72.15] {quote} This happens only in ParquetTableScan. PysicalRDD works fine and the rows have duplicate values like... {quote} [-5.7,-5.7,121.05,121.05] [-61.17,-61.17,108.91,108.91] [50.60,50.60,72.15,72.15] {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org