I also need help with this. I need to know how to handle a HAR file when it
is the input to a MapReduce task. How do we read the HAR file so we can work
on the individual logical files? I suppose we need to create our own
InputFormat and RecordReader files, but I´m not sure how to proceed.

Julian 


Roshan James-3 wrote:
> 
> When I run map reduce task over a har file as the input, I see that the
> input splits refer to 64mb byte boundaries inside the part file.
> 
> My mappers only know how to process the contents of each logical file
> inside
> the har file. Is there some way by which I can take the offset range
> specified by the input split and determine which logical files lie in that
> offset range? (How else would one do map reduce over a har file?)
> 
> Roshan
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Doing-MapReduce-over-Har-files-tp24171216p24217500.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Reply via email to