I know these workaround, but wouldn't it be more convenient and straightforward to use SparkContext#textFiles ?
On Thu, Nov 12, 2015 at 2:27 AM, Mark Hamstra <m...@clearstorydata.com> wrote: > For more than a small number of files, you'd be better off using > SparkContext#union instead of RDD#union. That will avoid building up a > lengthy lineage. > > On Wed, Nov 11, 2015 at 10:21 AM, Jakob Odersky <joder...@gmail.com> > wrote: > >> Hey Jeff, >> Do you mean reading from multiple text files? In that case, as a >> workaround, you can use the RDD#union() (or ++) method to concatenate >> multiple rdds. For example: >> >> val lines1 = sc.textFile("file1") >> val lines2 = sc.textFile("file2") >> >> val rdd = lines1 union lines2 >> >> regards, >> --Jakob >> >> On 11 November 2015 at 01:20, Jeff Zhang <zjf...@gmail.com> wrote: >> >>> Although user can use the hdfs glob syntax to support multiple inputs. >>> But sometimes, it is not convenient to do that. Not sure why there's no api >>> of SparkContext#textFiles. It should be easy to implement that. I'd love to >>> create a ticket and contribute for that if there's no other consideration >>> that I don't know. >>> >>> -- >>> Best Regards >>> >>> Jeff Zhang >>> >> >> > -- Best Regards Jeff Zhang