Re: Find the file info of when load the data into RDD

Anwar Rizal Sun, 21 Dec 2014 14:59:24 -0800

Yeah..., buat apparently mapPartitionsWithInputSplit thing
is mapPartitionsWithInputSplit is tagged as DeveloperApi. Because of that,
I'm not sure that it's a good idea to use the function.


For this problem, I had to create a subclass HadoopRDD and use
mapPartitions instead.

Is there any reason why mapPartitionsWithInputSplit  has DeveloperApi
annotation ? Is it possible to remove ?

Best regards,
Anwar Rizal.

On Sun, Dec 21, 2014 at 10:47 PM, Shuai Zheng <szheng.c...@gmail.com> wrote:

> I just found a possible answer:
>
>
> http://themodernlife.github.io/scala/spark/hadoop/hdfs/2014/09/28/spark-input-filename/
>
> Will give a try on it. Although it is a bit troublesome, but if it works,
> will give what I want.
>
> Sorry for bother everyone here
>
> Regards,
>
> Shuai
>
> On Sun, Dec 21, 2014 at 4:43 PM, Shuai Zheng <szheng.c...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> When I try to load a folder into the RDDs, any way for me to find the
>> input file name of particular partitions? So I can track partitions from
>> which file.
>>
>> In the hadoop, I can find this information through the code:
>>
>> FileSplit fileSplit = (FileSplit) context.getInputSplit();
>> String strFilename = fileSplit.getPath().getName();
>>
>> But how can I do this in spark?
>>
>> Regards,
>>
>> Shuai
>>
>
>

Re: Find the file info of when load the data into RDD

Reply via email to