Yes, that's what I suggest. TextInputFormat support multiple inputs. So in
spark side, we just need to provide API to for that.

On Thu, Nov 12, 2015 at 8:45 AM, Pradeep Gollakota <pradeep...@gmail.com>
wrote:

> IIRC, TextInputFormat supports an input path that is a comma separated
> list. I haven't tried this, but I think you should just be able to do
> sc.textFile("file1,file2,...")
>
> On Wed, Nov 11, 2015 at 4:30 PM, Jeff Zhang <zjf...@gmail.com> wrote:
>
>> I know these workaround, but wouldn't it be more convenient and
>> straightforward to use SparkContext#textFiles ?
>>
>> On Thu, Nov 12, 2015 at 2:27 AM, Mark Hamstra <m...@clearstorydata.com>
>> wrote:
>>
>>> For more than a small number of files, you'd be better off using
>>> SparkContext#union instead of RDD#union.  That will avoid building up a
>>> lengthy lineage.
>>>
>>> On Wed, Nov 11, 2015 at 10:21 AM, Jakob Odersky <joder...@gmail.com>
>>> wrote:
>>>
>>>> Hey Jeff,
>>>> Do you mean reading from multiple text files? In that case, as a
>>>> workaround, you can use the RDD#union() (or ++) method to concatenate
>>>> multiple rdds. For example:
>>>>
>>>> val lines1 = sc.textFile("file1")
>>>> val lines2 = sc.textFile("file2")
>>>>
>>>> val rdd = lines1 union lines2
>>>>
>>>> regards,
>>>> --Jakob
>>>>
>>>> On 11 November 2015 at 01:20, Jeff Zhang <zjf...@gmail.com> wrote:
>>>>
>>>>> Although user can use the hdfs glob syntax to support multiple inputs.
>>>>> But sometimes, it is not convenient to do that. Not sure why there's no 
>>>>> api
>>>>> of SparkContext#textFiles. It should be easy to implement that. I'd love 
>>>>> to
>>>>> create a ticket and contribute for that if there's no other consideration
>>>>> that I don't know.
>>>>>
>>>>> --
>>>>> Best Regards
>>>>>
>>>>> Jeff Zhang
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>
>


-- 
Best Regards

Jeff Zhang

Reply via email to