Liquan Pei created SPARK-3828:
---------------------------------

             Summary: Spark returns inconsistent result when compiling with 
different HADOOP version 
                 Key: SPARK-3828
                 URL: https://issues.apache.org/jira/browse/SPARK-3828
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 1.1.0
         Environment: OSX 10.9, Spark master branch
            Reporter: Liquan Pei


For text8 data at http://mattmahoney.net/dc/text8.zip. To reproduce, please 
unzip first. 

Spark build with different Hadoop version returns different result. 
{code}
val data = sc.textFile("text8")
data.count()
{code}
returns 1 when built with SPARK_HADOOP_VERSION=1.0.4 and return 2 when built 
with SPARK_HADOOP_VERSION=2.4.0. 

Looking through the rdd code, it seems that textFile uses hadoopFile which 
creates HadoopRDD, we should probably create newHadoopRDD when building spark 
with SPARK_HADOOP_VERSION >= 2.0.0. 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to