Liquan Pei created SPARK-3828: --------------------------------- Summary: Spark returns inconsistent result when compiling with different HADOOP version Key: SPARK-3828 URL: https://issues.apache.org/jira/browse/SPARK-3828 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0 Environment: OSX 10.9, Spark master branch Reporter: Liquan Pei
For text8 data at http://mattmahoney.net/dc/text8.zip. To reproduce, please unzip first. Spark build with different Hadoop version returns different result. {code} val data = sc.textFile("text8") data.count() {code} returns 1 when built with SPARK_HADOOP_VERSION=1.0.4 and return 2 when built with SPARK_HADOOP_VERSION=2.4.0. Looking through the rdd code, it seems that textFile uses hadoopFile which creates HadoopRDD, we should probably create newHadoopRDD when building spark with SPARK_HADOOP_VERSION >= 2.0.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org