I also need help with this. I need to know how to handle a HAR file when it is the input to a MapReduce task. How do we read the HAR file so we can work on the individual logical files? I suppose we need to create our own InputFormat and RecordReader files, but I´m not sure how to proceed.
Julian Roshan James-3 wrote: > > When I run map reduce task over a har file as the input, I see that the > input splits refer to 64mb byte boundaries inside the part file. > > My mappers only know how to process the contents of each logical file > inside > the har file. Is there some way by which I can take the offset range > specified by the input split and determine which logical files lie in that > offset range? (How else would one do map reduce over a har file?) > > Roshan > > -- View this message in context: http://www.nabble.com/Doing-MapReduce-over-Har-files-tp24171216p24217500.html Sent from the Hadoop core-user mailing list archive at Nabble.com.