Yeah..., buat apparently mapPartitionsWithInputSplit thing is mapPartitionsWithInputSplit is tagged as DeveloperApi. Because of that, I'm not sure that it's a good idea to use the function.
For this problem, I had to create a subclass HadoopRDD and use mapPartitions instead. Is there any reason why mapPartitionsWithInputSplit has DeveloperApi annotation ? Is it possible to remove ? Best regards, Anwar Rizal. On Sun, Dec 21, 2014 at 10:47 PM, Shuai Zheng <szheng.c...@gmail.com> wrote: > I just found a possible answer: > > > http://themodernlife.github.io/scala/spark/hadoop/hdfs/2014/09/28/spark-input-filename/ > > Will give a try on it. Although it is a bit troublesome, but if it works, > will give what I want. > > Sorry for bother everyone here > > Regards, > > Shuai > > On Sun, Dec 21, 2014 at 4:43 PM, Shuai Zheng <szheng.c...@gmail.com> > wrote: > >> Hi All, >> >> When I try to load a folder into the RDDs, any way for me to find the >> input file name of particular partitions? So I can track partitions from >> which file. >> >> In the hadoop, I can find this information through the code: >> >> FileSplit fileSplit = (FileSplit) context.getInputSplit(); >> String strFilename = fileSplit.getPath().getName(); >> >> But how can I do this in spark? >> >> Regards, >> >> Shuai >> > >