On Fri, Jan 7, 2011 at 2:11 PM, Nigel Daley <nda...@mac.com> wrote:

> Hrm, the MR precommit test I'm running has hung (been running for 14 hours
> so far).  FWIW, 2 HDFS precommit tests are hung too.  I suspect it could be
> the NFS mounts on the machines.  I forced a thread dump which you can see in
> the console:
> https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/10/console
>
>
Strange, haven't seen a hang like that before in handleConnectionFailure. It
should retry for 15 minutes max in that loop.


> Any other ideas why these might be hanging?
>
>
There is an HDFS bug right now that can cause hangs on some tests -
HDFS-1529 - would appreciate if someone can take a look. But I don't think
this is responsible for the MR hang above.

-Todd


> On Jan 5, 2011, at 5:42 PM, Todd Lipcon wrote:
>
> > On Wed, Jan 5, 2011 at 4:39 PM, Nigel Daley <nda...@mac.com> wrote:
> >
> >> Thanks for looking into it Todd.  Let's first see if you think it can be
> >> fixed quickly.  Let me know.
> >>
> >>
> > No problem, it wasn't too bad after all. Patch up on HADOOP-7087 which
> fixes
> > this test timeout for me.
> >
> > -Todd
> >
> >
> >> On Jan 5, 2011, at 4:33 PM, Todd Lipcon wrote:
> >>
> >>> On Wed, Jan 5, 2011 at 4:19 PM, Nigel Daley <nda...@mac.com> wrote:
> >>>
> >>>> Todd, would love to get
> >>>> https://issues.apache.org/jira/browse/MAPREDUCE-2121 fixed first
> since
> >>>> this is failing every night on trunk.
> >>>>
> >>>
> >>> What if we disable that test, move that issue to 0.22 blocker, and then
> >>> enable the test-patch? I'll also look into that one today, but if it's
> >>> something that will take a while to fix, I don't think we should hold
> off
> >>> the useful testing for all the other patches.
> >>>
> >>> -Todd
> >>>
> >>> On Jan 5, 2011, at 2:45 PM, Todd Lipcon wrote:
> >>>>
> >>>>> Hi Nigel,
> >>>>>
> >>>>> MAPREDUCE-2172 has been fixed for a while. Are there any other
> >> particular
> >>>>> JIRAs you think need to be fixed before the MR test-patch queue gets
> >>>>> enabled? I have a lot of outstanding patches and doing all the
> >> test-patch
> >>>>> turnaround manually on 3 different boxes is a real headache.
> >>>>>
> >>>>> Thanks
> >>>>> -Todd
> >>>>>
> >>>>> On Tue, Dec 21, 2010 at 1:33 PM, Nigel Daley <nda...@mac.com> wrote:
> >>>>>
> >>>>>> Ok, HDFS is now enabled.  You'll see a stream of updates shortly on
> >> the
> >>>> ~30
> >>>>>> Patch Available HDFS issues.
> >>>>>>
> >>>>>> Nige
> >>>>>>
> >>>>>> On Dec 20, 2010, at 12:42 PM, Jakob Homan wrote:
> >>>>>>
> >>>>>>> I committed HDFS-1511 this morning.  We should be good to go.  I
> can
> >>>>>>> haz snooty robot butler?
> >>>>>>>
> >>>>>>> On Fri, Dec 17, 2010 at 8:31 PM, Konstantin Boudnik <
> c...@apache.org>
> >>>>>> wrote:
> >>>>>>>> Thanks Jacob. I am wasted already but I can do it on Sun, I think,
> >>>>>>>> unless it is done earlier.
> >>>>>>>> --
> >>>>>>>> Take care,
> >>>>>>>> Konstantin (Cos) Boudnik
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Fri, Dec 17, 2010 at 19:41, Jakob Homan <jgho...@gmail.com>
> >> wrote:
> >>>>>>>>> Ok.  I'll get a patch out for 1511 tomorrow, unless someone wants
> >> to
> >>>>>>>>> whip one up tonight.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Fri, Dec 17, 2010 at 7:22 PM, Nigel Daley <nda...@mac.com>
> >> wrote:
> >>>>>>>>>> I agree with Cos on fixing HDFS-1511 first. Once that is done
> I'll
> >>>>>> enable hdfs patch testing.
> >>>>>>>>>>
> >>>>>>>>>> Cheers,
> >>>>>>>>>> Nige
> >>>>>>>>>>
> >>>>>>>>>> Sent from my iPhone4
> >>>>>>>>>>
> >>>>>>>>>> On Dec 17, 2010, at 7:01 PM, Konstantin Boudnik <c...@apache.org
> >
> >>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> One more issue needs to be addressed before test-patch is
> turned
> >> on
> >>>>>> HDFS is
> >>>>>>>>>>> https://issues.apache.org/jira/browse/HDFS-1511
> >>>>>>>>>>> --
> >>>>>>>>>>> Take care,
> >>>>>>>>>>> Konstantin (Cos) Boudnik
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, Dec 17, 2010 at 16:17, Konstantin Boudnik <
> >> c...@apache.org>
> >>>>>> wrote:
> >>>>>>>>>>>> Considering that because of these 4 faulty cases every patch
> >> will
> >>>> be
> >>>>>>>>>>>> -1'ed a patch author will still have to look at it and make a
> >>>>>> comment
> >>>>>>>>>>>> why this particular -1 isn't valid. Lesser work, perhaps, but
> >>>>>> messier
> >>>>>>>>>>>> IMO. I'm not blocking it - I just feel like there's a better
> >> way.
> >>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>> Take care,
> >>>>>>>>>>>> Konstantin (Cos) Boudnik
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Fri, Dec 17, 2010 at 15:55, Jakob Homan <jgho...@gmail.com
> >
> >>>>>> wrote:
> >>>>>>>>>>>>>> If HDFS is added to the test-patch queue right now we get
> >>>>>>>>>>>>>> nothing but dozens of -1'ed patches.
> >>>>>>>>>>>>> There aren't dozens of patches being submitted currently.
>  The
> >> -1
> >>>>>>>>>>>>> isn't the important thing, it's the grunt work of actually
> >>>> running
> >>>>>>>>>>>>> (and waiting) for the tests, test-patch, etc. that Hudson
> does
> >> so
> >>>>>> that
> >>>>>>>>>>>>> the developer doesn't have to.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Fri, Dec 17, 2010 at 3:48 PM, Dhruba Borthakur <
> >>>>>> dhr...@gmail.com> wrote:
> >>>>>>>>>>>>>> +1, thanks for doing this.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Fri, Dec 17, 2010 at 3:19 PM, Jakob Homan <
> >> jgho...@gmail.com
> >>>>>
> >>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> So, with test-patch updated to show the failing tests,
> saving
> >>>> the
> >>>>>>>>>>>>>>> developers the need to go and verify that the failed tests
> >> are
> >>>>>> all
> >>>>>>>>>>>>>>> known, how do people feel about turning on test-patch again
> >> for
> >>>>>> HDFS
> >>>>>>>>>>>>>>> and mapred?  I think it'll help prevent any more tests from
> >>>>>> entering
> >>>>>>>>>>>>>>> the "yeah, we know" category.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>> jg
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Wed, Nov 17, 2010 at 5:08 PM, Jakob Homan <
> >>>>>> jho...@yahoo-inc.com> wrote:
> >>>>>>>>>>>>>>>> True, each patch would get a -1 and the failing tests
> would
> >>>> need
> >>>>>> to be
> >>>>>>>>>>>>>>>> verified as those known bad (BTW, it would be great if
> >> Hudson
> >>>>>> could list
> >>>>>>>>>>>>>>>> which tests failed in the message it posts to JIRA).  But
> >>>> that's
> >>>>>> still
> >>>>>>>>>>>>>>> quite
> >>>>>>>>>>>>>>>> a bit less error-prone work than if the developer runs the
> >>>> tests
> >>>>>> and
> >>>>>>>>>>>>>>>> test-patch themselves.  Also, with 22 being cut, there are
> a
> >>>> lot
> >>>>>> of
> >>>>>>>>>>>>>>> patches
> >>>>>>>>>>>>>>>> up in the air and several developers are juggling multiple
> >>>>>> patches.  The
> >>>>>>>>>>>>>>>> more automation we can have, even if it's not perfect,
> will
> >>>>>> decrease
> >>>>>>>>>>>>>>> errors
> >>>>>>>>>>>>>>>> we may make.
> >>>>>>>>>>>>>>>> -jg
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Nigel Daley wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Nov 17, 2010, at 3:11 PM, Jakob Homan wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> It's also ready to run on MapReduce and HDFS but we
> won't
> >>>>>> turn it on
> >>>>>>>>>>>>>>>>>>> until these projects build and test cleanly.  Looks
> like
> >>>> both
> >>>>>> these
> >>>>>>>>>>>>>>> projects
> >>>>>>>>>>>>>>>>>>> currently have test failures.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Assuming the projects are compiling and building, is
> there
> >> a
> >>>>>> reason to
> >>>>>>>>>>>>>>>>>> not turn it on despite the test failures? Hudson is
> >>>> invaluable
> >>>>>> to
> >>>>>>>>>>>>>>> developers
> >>>>>>>>>>>>>>>>>> who then don't have to run the tests and test-patch
> >>>>>> themselves.  We
> >>>>>>>>>>>>>>> didn't
> >>>>>>>>>>>>>>>>>> turn Hudson off when it was working previously and there
> >>>> were
> >>>>>> known
> >>>>>>>>>>>>>>>>>> failures.  I think one of the reasons we have more
> failing
> >>>>>> tests now is
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> higher cost of doing Hudson's work (not a great excuse I
> >>>>>> know).  This
> >>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>> particularly true now because several of the failing
> tests
> >>>>>> involve
> >>>>>>>>>>>>>>> tests
> >>>>>>>>>>>>>>>>>> timing out, making the whole testing regime even longer.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Every single patch would get a -1 and need investigation.
> >>>>>> Currently,
> >>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>> would be about 83 investigations between MR and HDFS
> issues
> >>>>>> that are in
> >>>>>>>>>>>>>>>>> patch available state.  Shouldn't we focus on getting
> these
> >>>>>> tests fixed
> >>>>>>>>>>>>>>> or
> >>>>>>>>>>>>>>>>> removed/?  Also, I need to get MAPREDUCE-2172 fixed
> >> (applies
> >>>> to
> >>>>>> HDFS as
> >>>>>>>>>>>>>>>>> well) before I turn this on.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>> Nige
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> Connect to me at http://www.facebook.com/dhruba
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Todd Lipcon
> >>>>> Software Engineer, Cloudera
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Todd Lipcon
> >>> Software Engineer, Cloudera
> >>
> >>
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to