Thanks Kevin for the clarification. I ran couple of tests as well and the
system behaved exactly what you had said.
So now the question is, how can I achieve what I want to do - share an
object (Lucene IndexWriter instance) between mappers running on same node. I
thought of running the IndexWriter separately outside of Hadoop and use
RMI/socket etc to communicate with it, but I am being optimistic that there
should be a simpler way than this. Any thoughts ?
Also, what if I modify the default behaviour of Hadoop to run mappers on a
node in one JVM ? (not sure if that will be possible in one first place,
just a thought)
-Tarandeep
On Thu, Jun 4, 2009 at 12:49 AM, Kevin Peterson wrote:
> On Wed, Jun 3, 2009 at 10:59 AM, Tarandeep Singh >wrote:
>
> > I want to share a object (Lucene Index Writer Instance) between mappers
> > running on same node of 1 job (not across multiple jobs). Please correct
> me
> > if I am wrong -
> >
> > If I set the -1 for the property: mapred.job.reuse.jvm.num.tasks then all
> > mappers of one job will be executed in the same jvm and in that case if I
> > create a static Lucene Index Writer instance in my mapper class, all
> > mappers
> > running on the same node will be able to use it.
> >
>
> Not quite. The JVM reuse controls whether the JVM will be terminated after
> a
> single mapper run and a new one created for the next. It doesn't influence
> how many JVMs are created -- you will still get one jvm per mapper or
> reducer.
>
> I think there is, or was, or maybe a patch enables, what you are asking
> for,
> IIRC.
>