[ 
https://issues.apache.org/jira/browse/SPARK-13141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-13141.
------------------------------------
    Resolution: Not A Problem

Hi, this was a bug in CDH 5.5.0/5.5.1, it was fixed in CDH 5.5.2. Sorry about 
the trouble.

> Dataframe created from Hive partitioned tables using HiveContext returns 
> wrong results
> --------------------------------------------------------------------------------------
>
>                 Key: SPARK-13141
>                 URL: https://issues.apache.org/jira/browse/SPARK-13141
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0
>         Environment: CDH 5.5.1
>            Reporter: Simone
>            Priority: Critical
>
> I get wrong dataframe results using HiveContext with Spark 1.5.0 on CDH 5.5.1 
> in yarn-client mode.
> The problem occurs with partitioned tables on text delimited HDFS data, both 
> with Scala and Python.
> This an example code:
> import org.apache.spark.sql.hive.HiveContext
> val hc = new HiveContext(sc)
> hc.table("my_db.partition_table").show()
> The result is that all values of all rows are NULL, except from the first 
> column (that contains the whole line of data) and the partitioning columns, 
> which appears to be correct.
> With Hive and Impala I get correct results.
> Also with Spark on the same data with a not partitioned table I get correct 
> results.
> I think that similar problems occurs also with Avro data:
> https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Pyspark-Table-Dataframe-returning-empty-records-from-Partitioned/td-p/35836



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to