Re: Why there's no api for SparkContext#textFiles to support multiple inputs ?

Mark Hamstra Wed, 11 Nov 2015 10:28:24 -0800

For more than a small number of files, you'd be better off using
SparkContext#union instead of RDD#union.  That will avoid building up a
lengthy lineage.


On Wed, Nov 11, 2015 at 10:21 AM, Jakob Odersky <joder...@gmail.com> wrote:

> Hey Jeff,
> Do you mean reading from multiple text files? In that case, as a
> workaround, you can use the RDD#union() (or ++) method to concatenate
> multiple rdds. For example:
>
> val lines1 = sc.textFile("file1")
> val lines2 = sc.textFile("file2")
>
> val rdd = lines1 union lines2
>
> regards,
> --Jakob
>
> On 11 November 2015 at 01:20, Jeff Zhang <zjf...@gmail.com> wrote:
>
>> Although user can use the hdfs glob syntax to support multiple inputs.
>> But sometimes, it is not convenient to do that. Not sure why there's no api
>> of SparkContext#textFiles. It should be easy to implement that. I'd love to
>> create a ticket and contribute for that if there's no other consideration
>> that I don't know.
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>
>

Re: Why there's no api for SparkContext#textFiles to support multiple inputs ?

Reply via email to