Re: Reading one partition at a time

2015-01-13 Thread Imran Rashid
this looks reasonable to me. As you've done, the important thing is just to make isSplittable return false. this shares a bit in common with the sc.wholeTextFile method. It sounds like you really want something much simpler than what that is doing, but you might be interested in looking at that

Reading one partition at a time

2015-01-04 Thread Michael Albert
Greetings! I would like to know if the code below will read one-partition-at-a-time, and whether I am reinventing the wheel. If I may explain, upstream code has managed (I hope) to save an RDD such that each partition file (e.g, part-r-0, part-r-1) contains exactly the data subset