The first text file is not that large, it has 6 million records (lines).
For each line I need to read a file out of 80000 files. They total around
1.5TB. I didn't understand what you meant by "then again read text files
for each line and union all rdds."

On Tue, Sep 13, 2016 at 10:04 PM, Raghavendra Pandey <
raghavendra.pan...@gmail.com> wrote:

> How large is your first text file? The idea is you read first text file
> and if it is not large you can collect all the lines on driver and then
> again read text files for each line and union all rdds.
>
> On 13 Sep 2016 11:39 p.m., "Saliya Ekanayake" <esal...@gmail.com> wrote:
>
>> Just wonder if this is possible with Spark?
>>
>> On Mon, Sep 12, 2016 at 12:14 AM, Saliya Ekanayake <esal...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I've got a text file where each line is a record. For each record, I
>>> need to process a file in HDFS.
>>>
>>> So if I represent these records as an RDD and invoke a map() operation
>>> on them how can I access the HDFS within that map()? Do I have to create a
>>> Spark context within map() or is there a better solution to that?
>>>
>>> Thank you,
>>> Saliya
>>>
>>>
>>>
>>> --
>>> Saliya Ekanayake
>>> Ph.D. Candidate | Research Assistant
>>> School of Informatics and Computing | Digital Science Center
>>> Indiana University, Bloomington
>>>
>>>
>>
>>
>> --
>> Saliya Ekanayake
>> Ph.D. Candidate | Research Assistant
>> School of Informatics and Computing | Digital Science Center
>> Indiana University, Bloomington
>>
>>


-- 
Saliya Ekanayake
Ph.D. Candidate | Research Assistant
School of Informatics and Computing | Digital Science Center
Indiana University, Bloomington

Reply via email to