[ https://issues.apache.org/jira/browse/SPARK-13141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marcelo Vanzin resolved SPARK-13141. ------------------------------------ Resolution: Not A Problem Hi, this was a bug in CDH 5.5.0/5.5.1, it was fixed in CDH 5.5.2. Sorry about the trouble. > Dataframe created from Hive partitioned tables using HiveContext returns > wrong results > -------------------------------------------------------------------------------------- > > Key: SPARK-13141 > URL: https://issues.apache.org/jira/browse/SPARK-13141 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.5.0 > Environment: CDH 5.5.1 > Reporter: Simone > Priority: Critical > > I get wrong dataframe results using HiveContext with Spark 1.5.0 on CDH 5.5.1 > in yarn-client mode. > The problem occurs with partitioned tables on text delimited HDFS data, both > with Scala and Python. > This an example code: > import org.apache.spark.sql.hive.HiveContext > val hc = new HiveContext(sc) > hc.table("my_db.partition_table").show() > The result is that all values of all rows are NULL, except from the first > column (that contains the whole line of data) and the partitioning columns, > which appears to be correct. > With Hive and Impala I get correct results. > Also with Spark on the same data with a not partitioned table I get correct > results. > I think that similar problems occurs also with Avro data: > https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Pyspark-Table-Dataframe-returning-empty-records-from-Partitioned/td-p/35836 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org