Fengdong Yu created SPARK-10850: ----------------------------------- Summary: wholeTextFileRDD only affect the first line in each partition Key: SPARK-10850 URL: https://issues.apache.org/jira/browse/SPARK-10850 Project: Spark Issue Type: Bug Affects Versions: 1.4.1 Reporter: Fengdong Yu
{code} val sparkConf = new SparkConf() val sc = new SparkContext(sparkConf) val text = sc.wholeTextFiles("/test/*/", 3) text.map(x => x._1 + "^^^" + x._2).collect {code} output: {code} hdfs://xxxx/test/test1/1.data^^^hello1 hello2 hello3 hdfs://xxxx/test/test2/2.data^^^hello1 hello2 hello3 {code} I have two datasets under '/test/': /test/test1/1.data; /test/test2/2.data each dataset has three lines: hello1 hello2 hello3 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org