Hi. Just to be clear. It is the jobtracker that needs the patched code right ? Or is it the tasktrackers ?
Kindly //Marcus On Mon, Jun 29, 2009 at 12:08 AM, Mikhail Bautin <mbau...@gmail.com> wrote: > Marcus, > > We currently use 0.20.0 but this patch just inserts 8 lines of code into > TaskRunner.java, which could certainly be done with 0.18.3. > > Yes, this patch just appends additional jars to the child JVM classpath. > > I've never really used tmpjars myself, but if it involves uploading > multiple > jar files into HDFS every time a job is started, I see how it can be really > slow. On our ~80-job workflow this would have really slowed things down. > > Thanks, > Mikhail > > On Sun, Jun 28, 2009 at 5:40 PM, Marcus Herou <marcus.he...@tailsweep.com > >wrote: > > > Makes sense... I will try both rsync and NFS but I think rsync will beat > > NFS > > since NFS can be slow as hell sometimes but what the heck we already have > > our maven2 repo on NFS so why not :) > > > > Are you saying that this patch make the client able to configure which > > "extra" local jar files to add as classpath when firing up the > > TaskTrackerChild ? > > > > To be explicit: Do you confirm that using tmpjars like I do is a costful > > slow operation ? > > > > To what branch to you apply the patch (we use 0.18.3) ? > > > > Cheers > > > > //Marcus > > > > > > On Sun, Jun 28, 2009 at 11:26 PM, Mikhail Bautin <mbau...@gmail.com> > > wrote: > > > > > This is the way we deal with this problem, too. We put our jar files on > > > NFS, and the attached patch makes possible to add those jar files to > the > > > tasktracker classpath through a configuration property. > > > > > > Thanks, > > > Mikhail > > > > > > On Sun, Jun 28, 2009 at 5:21 PM, Stuart White <stuart.whi...@gmail.com > > >wrote: > > > > > >> Although I've never done it, I believe you could manually copy your > jar > > >> files out to your cluster somewhere in hadoop's classpath, and that > > would > > >> remove the need for you to copy them to your cluster at the start of > > each > > >> job. > > >> > > >> On Sun, Jun 28, 2009 at 4:08 PM, Marcus Herou < > > marcus.he...@tailsweep.com > > >> >wrote: > > >> > > >> > Hi. > > >> > > > >> > Running without a jobtracker makes the job start almost instantly. > > >> > I think it is due to something with the classloader. I use a huge > > amount > > >> of > > >> > jarfiles jobConf.set("tmpjars", "jar1.jar,jar2.jar")... which need > to > > be > > >> > loaded every time I guess. > > >> > > > >> > By issuing conf.setNumTasksToExecutePerJvm(-1); will the TaskTracker > > >> child > > >> > live forever then ? > > >> > > > >> > Cheers > > >> > > > >> > //Marcus > > >> > > > >> > On Sun, Jun 28, 2009 at 9:54 PM, tim robertson < > > >> timrobertson...@gmail.com > > >> > >wrote: > > >> > > > >> > > How long does it take to start the code locally in a single > thread? > > >> > > > > >> > > Can you reuse the JVM so it only starts once per node per job? > > >> > > conf.setNumTasksToExecutePerJvm(-1) > > >> > > > > >> > > Cheers, > > >> > > Tim > > >> > > > > >> > > > > >> > > > > >> > > On Sun, Jun 28, 2009 at 9:43 PM, Marcus Herou< > > >> marcus.he...@tailsweep.com > > >> > > > > >> > > wrote: > > >> > > > Hi. > > >> > > > > > >> > > > Wonder how one should improve the startup times of a hadoop job. > > >> Some > > >> > of > > >> > > my > > >> > > > jobs which have a lot of dependencies in terms of many jar files > > >> take a > > >> > > long > > >> > > > time to start in hadoop up to 2 minutes some times. > > >> > > > The data input amounts in these cases are neglible so it seems > > that > > >> > > Hadoop > > >> > > > have a really high setup cost, which I can live with but this > > seems > > >> to > > >> > > much. > > >> > > > > > >> > > > Let's say a job takes 10 minutes to complete then it is bad if > it > > >> takes > > >> > 2 > > >> > > > mins to set it up... 20-30 sec max would be a lot more > reasonable. > > >> > > > > > >> > > > Hints ? > > >> > > > > > >> > > > //Marcus > > >> > > > > > >> > > > > > >> > > > -- > > >> > > > Marcus Herou CTO and co-founder Tailsweep AB > > >> > > > +46702561312 > > >> > > > marcus.he...@tailsweep.com > > >> > > > http://www.tailsweep.com/ > > >> > > > > > >> > > > > >> > > > >> > > > >> > > > >> > -- > > >> > Marcus Herou CTO and co-founder Tailsweep AB > > >> > +46702561312 > > >> > marcus.he...@tailsweep.com > > >> > http://www.tailsweep.com/ > > >> > > > >> > > > > > > > > > > > -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/