The current code guarantees that they will be received in order.  There some 
patches that are likely to go in soon that would allow for the JVM itself to be 
reused.  In those cases I believe that the mapper class would be recreated, so 
the only thing you would have to worry about would be static values that are 
updated while processing the data.

-- Bobby Evans

On 4/24/12 4:45 AM, "Dan Drew" <wirefr...@googlemail.com> wrote:

I have chosen to use Jay's suggestion as a quick workaround and am pleased
to report that it seems to work well on small test inputs.

My question now is, are the mappers guaranteed to receive the file's lines
in order?

Browsing the source suggests this is so, but I just want to make sure as my
understanding of Hadoop is transubstantial.

Thank you for your patience in answering my questions.

On 23 April 2012 14:28, Harsh J <ha...@cloudera.com> wrote:

> Jay,
>
> On Mon, Apr 23, 2012 at 6:43 PM, JAX <jayunit...@gmail.com> wrote:
> > Curious : Seems like you could aggregate the results in the mapper as a
> local variable or list of strings--- is there a way to know that your
> mapper has just read the LAST line of an input split?
>
> True. Can be one way to do it (unless aggregation of 'records' needs
> to happen live, and you don't wish to store it all in memory).
>
> > Is there a "cleanup" or "finalize" method in mappers that is run at the
> end of a whole steam read to support these sort of chunked, in memor map/r
> operations?
>
> Yes there is. See:
>
> Old API:
> http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/Mapper.html
> (See Closeable's close())
>
> New API:
> http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapreduce/Mapper.html#cleanup(org.apache.hadoop.mapreduce.Mapper.Context)
>
>
> --
> Harsh J
>

Reply via email to