Re: Why there's no api for SparkContext#textFiles to support multiple inputs ?

Jeff Zhang Wed, 11 Nov 2015 16:30:48 -0800

I know these workaround, but wouldn't it be more convenient and
straightforward to use SparkContext#textFiles ?


On Thu, Nov 12, 2015 at 2:27 AM, Mark Hamstra <m...@clearstorydata.com>
wrote:

> For more than a small number of files, you'd be better off using
> SparkContext#union instead of RDD#union.  That will avoid building up a
> lengthy lineage.
>
> On Wed, Nov 11, 2015 at 10:21 AM, Jakob Odersky <joder...@gmail.com>
> wrote:
>
>> Hey Jeff,
>> Do you mean reading from multiple text files? In that case, as a
>> workaround, you can use the RDD#union() (or ++) method to concatenate
>> multiple rdds. For example:
>>
>> val lines1 = sc.textFile("file1")
>> val lines2 = sc.textFile("file2")
>>
>> val rdd = lines1 union lines2
>>
>> regards,
>> --Jakob
>>
>> On 11 November 2015 at 01:20, Jeff Zhang <zjf...@gmail.com> wrote:
>>
>>> Although user can use the hdfs glob syntax to support multiple inputs.
>>> But sometimes, it is not convenient to do that. Not sure why there's no api
>>> of SparkContext#textFiles. It should be easy to implement that. I'd love to
>>> create a ticket and contribute for that if there's no other consideration
>>> that I don't know.
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>>
>>
>>
>


-- 
Best Regards

Jeff Zhang

Re: Why there's no api for SparkContext#textFiles to support multiple inputs ?

Reply via email to