For more than a small number of files, you'd be better off using SparkContext#union instead of RDD#union. That will avoid building up a lengthy lineage.
On Wed, Nov 11, 2015 at 10:21 AM, Jakob Odersky <joder...@gmail.com> wrote: > Hey Jeff, > Do you mean reading from multiple text files? In that case, as a > workaround, you can use the RDD#union() (or ++) method to concatenate > multiple rdds. For example: > > val lines1 = sc.textFile("file1") > val lines2 = sc.textFile("file2") > > val rdd = lines1 union lines2 > > regards, > --Jakob > > On 11 November 2015 at 01:20, Jeff Zhang <zjf...@gmail.com> wrote: > >> Although user can use the hdfs glob syntax to support multiple inputs. >> But sometimes, it is not convenient to do that. Not sure why there's no api >> of SparkContext#textFiles. It should be easy to implement that. I'd love to >> create a ticket and contribute for that if there's no other consideration >> that I don't know. >> >> -- >> Best Regards >> >> Jeff Zhang >> > >