Jason, didn't get that. The jvm should exit naturally even without calling System.exit. Where exactly did you insert the System.exit? Please clarify. Thanks!
> -----Original Message----- > From: Jason Venner [mailto:[EMAIL PROTECTED] > Sent: Friday, April 18, 2008 6:48 PM > To: core-user@hadoop.apache.org > Subject: Re: Reusing jobs > > We have terrible issues with threads in the JVM's holding > down resources and causing the compute nodes to run out of > memory and lock up. We in fact patch the JobTracker to cause > the mapper/reduce jvm to System.exit, to ensure that the > resources are freed. > > This is particularly a problem for mapper/reducers that > enable jmx or spool off many threads for internal processing. > > Our solution is to tune the input split size so that the > minimum mapper time is > 1 minute > > Karl Wettin wrote: > > Ted Dunning skrev: > >> Hadoop has enormous startup costs that are relatively > inherent in the > >> current design. > >> > >> Most notably, mappers and reducers are executed in a > standalone JVM > >> (ostensibly for safety reasons). > > > > Is it possible to hack in support to reuse JVMs? Keep it > alive until > > timed out and have it execute the jobs by opening a socket and say > > hello? What classes should I start looking in? Could be a > fun exercise. > > > > > > karl > > > > > > > >> > >> > >> > >> On 4/17/08 6:00 PM, "Karl Wettin" <[EMAIL PROTECTED]> wrote: > >> > >>> Is it possible to execute a job more than once? > >>> > >>> I use map reduce when adding a new instance to a > hierarchial cluster > >>> tree. It finds the least distant node and inserts the new > instance > >>> as a sibling to that node. > >>> > >>> As far as I know it is in very the nature of this > algorithm that one > >>> inserts one instance at a time, that this is how the second > >>> dimension is created that makes it better than a vector > cluster. It > >>> would be possible to map all permutations of instances > and skip the > >>> reduction, but that would result in many more calulations than > >>> iteratively training the tree as the latter only require > one to test > >>> against the instances already inserted to the tree. > >>> > >>> Iteratively training this tree using Hadoop means > executing one job > >>> per instance that measure distance to all instances in a > file that I > >>> also append the new instance to once inserted in the tree. > >>> > >>> All of above is very inefficient, especially with a young > tree that > >>> could be trained in nanoseconds locally. So I do that > until it takes > >>> 20 seconds to insert an instance. > >>> > >>> But really, this is all Hadoop framework overhead. I'm not quite > >>> sure of all it does when I execute a job, but it seems > like quite a > >>> lot. And all I'm doing is executing a couple of identical > jobs over > >>> and over again using new data. > >>> > >>> It would be very nice if I it just took a few > milliseconds to do that. > >>> > >>> > >>> karl > >> > > >