Virgil Palanciuc created SPARK-11657: ----------------------------------------
Summary: Bad data read using dataframes Key: SPARK-11657 URL: https://issues.apache.org/jira/browse/SPARK-11657 Project: Spark Issue Type: Bug Components: Spark Core, SQL Affects Versions: 1.5.1, 1.5.2 Environment: EMR (yarn) Reporter: Virgil Palanciuc Priority: Critical I get strange behaviour when reading parquet data: {code} scala> val data = sqlContext.read.parquet("hdfs:///sample") data: org.apache.spark.sql.DataFrame = [clusterSize: int, clusterName: string, clusterData: array<string>, dpid: int] scala> data.take(1) /// this returns garbage res0: Array[org.apache.spark.sql.Row] = Array([1,56169A947F000101????????,WrappedArray(164594606101815510825479776971????????),813]) scala> data.collect() /// this works res1: Array[org.apache.spark.sql.Row] = Array([1,6A01CACD56169A947F000101,WrappedArray(77512098164594606101815510825479776971),813]) {code} I've included the "hdfs:///sample" directory here: https://www.dropbox.com/s/su0flfn49rrc7jz/sample.tgz?dl=0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org