[ https://issues.apache.org/jira/browse/SPARK-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen reopened SPARK-5737: ------------------------------ [~kallsu] since we can't point to a change that resolved this at this point, it should be Cannot Reproduce > Scanning duplicate columns from parquet table > --------------------------------------------- > > Key: SPARK-5737 > URL: https://issues.apache.org/jira/browse/SPARK-5737 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.2.1 > Reporter: Kevin Jung > > {quote} > import org.apache.spark.sql._ > val sqlContext = new SQLContext(sc) > import sqlContext._ > val rdd = sqlContext.parquetFile("temp.parquet") > rdd.select('d1,'d1,'d2,'d2).take(3).foreach(println) > {quote} > The results of above code have null values at the preceding columns of > duplicate two. > For example, > {quote} > [null,-5.7,null,121.05] > [null,-61.17,null,108.91] > [null,50.60,null,72.15] > {quote} > This happens only in ParquetTableScan. PysicalRDD works fine and the rows > have duplicate values like... > {quote} > [-5.7,-5.7,121.05,121.05] > [-61.17,-61.17,108.91,108.91] > [50.60,50.60,72.15,72.15] > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org