On Fri, Jan 7, 2011 at 2:11 PM, Nigel Daley <nda...@mac.com> wrote: > Hrm, the MR precommit test I'm running has hung (been running for 14 hours > so far). FWIW, 2 HDFS precommit tests are hung too. I suspect it could be > the NFS mounts on the machines. I forced a thread dump which you can see in > the console: > https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/10/console > > Strange, haven't seen a hang like that before in handleConnectionFailure. It should retry for 15 minutes max in that loop.
> Any other ideas why these might be hanging? > > There is an HDFS bug right now that can cause hangs on some tests - HDFS-1529 - would appreciate if someone can take a look. But I don't think this is responsible for the MR hang above. -Todd > On Jan 5, 2011, at 5:42 PM, Todd Lipcon wrote: > > > On Wed, Jan 5, 2011 at 4:39 PM, Nigel Daley <nda...@mac.com> wrote: > > > >> Thanks for looking into it Todd. Let's first see if you think it can be > >> fixed quickly. Let me know. > >> > >> > > No problem, it wasn't too bad after all. Patch up on HADOOP-7087 which > fixes > > this test timeout for me. > > > > -Todd > > > > > >> On Jan 5, 2011, at 4:33 PM, Todd Lipcon wrote: > >> > >>> On Wed, Jan 5, 2011 at 4:19 PM, Nigel Daley <nda...@mac.com> wrote: > >>> > >>>> Todd, would love to get > >>>> https://issues.apache.org/jira/browse/MAPREDUCE-2121 fixed first > since > >>>> this is failing every night on trunk. > >>>> > >>> > >>> What if we disable that test, move that issue to 0.22 blocker, and then > >>> enable the test-patch? I'll also look into that one today, but if it's > >>> something that will take a while to fix, I don't think we should hold > off > >>> the useful testing for all the other patches. > >>> > >>> -Todd > >>> > >>> On Jan 5, 2011, at 2:45 PM, Todd Lipcon wrote: > >>>> > >>>>> Hi Nigel, > >>>>> > >>>>> MAPREDUCE-2172 has been fixed for a while. Are there any other > >> particular > >>>>> JIRAs you think need to be fixed before the MR test-patch queue gets > >>>>> enabled? I have a lot of outstanding patches and doing all the > >> test-patch > >>>>> turnaround manually on 3 different boxes is a real headache. > >>>>> > >>>>> Thanks > >>>>> -Todd > >>>>> > >>>>> On Tue, Dec 21, 2010 at 1:33 PM, Nigel Daley <nda...@mac.com> wrote: > >>>>> > >>>>>> Ok, HDFS is now enabled. You'll see a stream of updates shortly on > >> the > >>>> ~30 > >>>>>> Patch Available HDFS issues. > >>>>>> > >>>>>> Nige > >>>>>> > >>>>>> On Dec 20, 2010, at 12:42 PM, Jakob Homan wrote: > >>>>>> > >>>>>>> I committed HDFS-1511 this morning. We should be good to go. I > can > >>>>>>> haz snooty robot butler? > >>>>>>> > >>>>>>> On Fri, Dec 17, 2010 at 8:31 PM, Konstantin Boudnik < > c...@apache.org> > >>>>>> wrote: > >>>>>>>> Thanks Jacob. I am wasted already but I can do it on Sun, I think, > >>>>>>>> unless it is done earlier. > >>>>>>>> -- > >>>>>>>> Take care, > >>>>>>>> Konstantin (Cos) Boudnik > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On Fri, Dec 17, 2010 at 19:41, Jakob Homan <jgho...@gmail.com> > >> wrote: > >>>>>>>>> Ok. I'll get a patch out for 1511 tomorrow, unless someone wants > >> to > >>>>>>>>> whip one up tonight. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Fri, Dec 17, 2010 at 7:22 PM, Nigel Daley <nda...@mac.com> > >> wrote: > >>>>>>>>>> I agree with Cos on fixing HDFS-1511 first. Once that is done > I'll > >>>>>> enable hdfs patch testing. > >>>>>>>>>> > >>>>>>>>>> Cheers, > >>>>>>>>>> Nige > >>>>>>>>>> > >>>>>>>>>> Sent from my iPhone4 > >>>>>>>>>> > >>>>>>>>>> On Dec 17, 2010, at 7:01 PM, Konstantin Boudnik <c...@apache.org > > > >>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> One more issue needs to be addressed before test-patch is > turned > >> on > >>>>>> HDFS is > >>>>>>>>>>> https://issues.apache.org/jira/browse/HDFS-1511 > >>>>>>>>>>> -- > >>>>>>>>>>> Take care, > >>>>>>>>>>> Konstantin (Cos) Boudnik > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Fri, Dec 17, 2010 at 16:17, Konstantin Boudnik < > >> c...@apache.org> > >>>>>> wrote: > >>>>>>>>>>>> Considering that because of these 4 faulty cases every patch > >> will > >>>> be > >>>>>>>>>>>> -1'ed a patch author will still have to look at it and make a > >>>>>> comment > >>>>>>>>>>>> why this particular -1 isn't valid. Lesser work, perhaps, but > >>>>>> messier > >>>>>>>>>>>> IMO. I'm not blocking it - I just feel like there's a better > >> way. > >>>>>>>>>>>> > >>>>>>>>>>>> -- > >>>>>>>>>>>> Take care, > >>>>>>>>>>>> Konstantin (Cos) Boudnik > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On Fri, Dec 17, 2010 at 15:55, Jakob Homan <jgho...@gmail.com > > > >>>>>> wrote: > >>>>>>>>>>>>>> If HDFS is added to the test-patch queue right now we get > >>>>>>>>>>>>>> nothing but dozens of -1'ed patches. > >>>>>>>>>>>>> There aren't dozens of patches being submitted currently. > The > >> -1 > >>>>>>>>>>>>> isn't the important thing, it's the grunt work of actually > >>>> running > >>>>>>>>>>>>> (and waiting) for the tests, test-patch, etc. that Hudson > does > >> so > >>>>>> that > >>>>>>>>>>>>> the developer doesn't have to. > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Fri, Dec 17, 2010 at 3:48 PM, Dhruba Borthakur < > >>>>>> dhr...@gmail.com> wrote: > >>>>>>>>>>>>>> +1, thanks for doing this. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Fri, Dec 17, 2010 at 3:19 PM, Jakob Homan < > >> jgho...@gmail.com > >>>>> > >>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> So, with test-patch updated to show the failing tests, > saving > >>>> the > >>>>>>>>>>>>>>> developers the need to go and verify that the failed tests > >> are > >>>>>> all > >>>>>>>>>>>>>>> known, how do people feel about turning on test-patch again > >> for > >>>>>> HDFS > >>>>>>>>>>>>>>> and mapred? I think it'll help prevent any more tests from > >>>>>> entering > >>>>>>>>>>>>>>> the "yeah, we know" category. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>> jg > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Wed, Nov 17, 2010 at 5:08 PM, Jakob Homan < > >>>>>> jho...@yahoo-inc.com> wrote: > >>>>>>>>>>>>>>>> True, each patch would get a -1 and the failing tests > would > >>>> need > >>>>>> to be > >>>>>>>>>>>>>>>> verified as those known bad (BTW, it would be great if > >> Hudson > >>>>>> could list > >>>>>>>>>>>>>>>> which tests failed in the message it posts to JIRA). But > >>>> that's > >>>>>> still > >>>>>>>>>>>>>>> quite > >>>>>>>>>>>>>>>> a bit less error-prone work than if the developer runs the > >>>> tests > >>>>>> and > >>>>>>>>>>>>>>>> test-patch themselves. Also, with 22 being cut, there are > a > >>>> lot > >>>>>> of > >>>>>>>>>>>>>>> patches > >>>>>>>>>>>>>>>> up in the air and several developers are juggling multiple > >>>>>> patches. The > >>>>>>>>>>>>>>>> more automation we can have, even if it's not perfect, > will > >>>>>> decrease > >>>>>>>>>>>>>>> errors > >>>>>>>>>>>>>>>> we may make. > >>>>>>>>>>>>>>>> -jg > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Nigel Daley wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On Nov 17, 2010, at 3:11 PM, Jakob Homan wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> It's also ready to run on MapReduce and HDFS but we > won't > >>>>>> turn it on > >>>>>>>>>>>>>>>>>>> until these projects build and test cleanly. Looks > like > >>>> both > >>>>>> these > >>>>>>>>>>>>>>> projects > >>>>>>>>>>>>>>>>>>> currently have test failures. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Assuming the projects are compiling and building, is > there > >> a > >>>>>> reason to > >>>>>>>>>>>>>>>>>> not turn it on despite the test failures? Hudson is > >>>> invaluable > >>>>>> to > >>>>>>>>>>>>>>> developers > >>>>>>>>>>>>>>>>>> who then don't have to run the tests and test-patch > >>>>>> themselves. We > >>>>>>>>>>>>>>> didn't > >>>>>>>>>>>>>>>>>> turn Hudson off when it was working previously and there > >>>> were > >>>>>> known > >>>>>>>>>>>>>>>>>> failures. I think one of the reasons we have more > failing > >>>>>> tests now is > >>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>> higher cost of doing Hudson's work (not a great excuse I > >>>>>> know). This > >>>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>>> particularly true now because several of the failing > tests > >>>>>> involve > >>>>>>>>>>>>>>> tests > >>>>>>>>>>>>>>>>>> timing out, making the whole testing regime even longer. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Every single patch would get a -1 and need investigation. > >>>>>> Currently, > >>>>>>>>>>>>>>> that > >>>>>>>>>>>>>>>>> would be about 83 investigations between MR and HDFS > issues > >>>>>> that are in > >>>>>>>>>>>>>>>>> patch available state. Shouldn't we focus on getting > these > >>>>>> tests fixed > >>>>>>>>>>>>>>> or > >>>>>>>>>>>>>>>>> removed/? Also, I need to get MAPREDUCE-2172 fixed > >> (applies > >>>> to > >>>>>> HDFS as > >>>>>>>>>>>>>>>>> well) before I turn this on. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Cheers, > >>>>>>>>>>>>>>>>> Nige > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> -- > >>>>>>>>>>>>>> Connect to me at http://www.facebook.com/dhruba > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Todd Lipcon > >>>>> Software Engineer, Cloudera > >>>> > >>>> > >>> > >>> > >>> -- > >>> Todd Lipcon > >>> Software Engineer, Cloudera > >> > >> > > > > > > -- > > Todd Lipcon > > Software Engineer, Cloudera > > -- Todd Lipcon Software Engineer, Cloudera