raid (contrib) test hanging: TestBlockFixer

I forced 2 thread dumps.  Both hung in the same place.  Filed 
https://issues.apache.org/jira/browse/MAPREDUCE-2283  This is a blocker for 
turning on MR precommit.

Cheers,
Nige

On Jan 25, 2011, at 11:19 PM, Nigel Daley wrote:

> Started another trial run of MR precommit testing:
> https://hudson.apache.org/hudson/view/G-L/view/Hadoop/job/PreCommit-MAPREDUCE-Build/17/
> 
> Let's see if 17th time is a charm...
> 
> Nige
> 
> On Jan 7, 2011, at 5:14 PM, Todd Lipcon wrote:
> 
>> On Fri, Jan 7, 2011 at 2:11 PM, Nigel Daley <nda...@mac.com> wrote:
>> 
>>> Hrm, the MR precommit test I'm running has hung (been running for 14 hours
>>> so far).  FWIW, 2 HDFS precommit tests are hung too.  I suspect it could be
>>> the NFS mounts on the machines.  I forced a thread dump which you can see in
>>> the console:
>>> https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/10/console
>>> 
>>> 
>> Strange, haven't seen a hang like that before in handleConnectionFailure. It
>> should retry for 15 minutes max in that loop.
>> 
>> 
>>> Any other ideas why these might be hanging?
>>> 
>>> 
>> There is an HDFS bug right now that can cause hangs on some tests -
>> HDFS-1529 - would appreciate if someone can take a look. But I don't think
>> this is responsible for the MR hang above.
>> 
>> -Todd
>> 
>> 
>>> On Jan 5, 2011, at 5:42 PM, Todd Lipcon wrote:
>>> 
>>>> On Wed, Jan 5, 2011 at 4:39 PM, Nigel Daley <nda...@mac.com> wrote:
>>>> 
>>>>> Thanks for looking into it Todd.  Let's first see if you think it can be
>>>>> fixed quickly.  Let me know.
>>>>> 
>>>>> 
>>>> No problem, it wasn't too bad after all. Patch up on HADOOP-7087 which
>>> fixes
>>>> this test timeout for me.
>>>> 
>>>> -Todd
>>>> 
>>>> 
>>>>> On Jan 5, 2011, at 4:33 PM, Todd Lipcon wrote:
>>>>> 
>>>>>> On Wed, Jan 5, 2011 at 4:19 PM, Nigel Daley <nda...@mac.com> wrote:
>>>>>> 
>>>>>>> Todd, would love to get
>>>>>>> https://issues.apache.org/jira/browse/MAPREDUCE-2121 fixed first
>>> since
>>>>>>> this is failing every night on trunk.
>>>>>>> 
>>>>>> 
>>>>>> What if we disable that test, move that issue to 0.22 blocker, and then
>>>>>> enable the test-patch? I'll also look into that one today, but if it's
>>>>>> something that will take a while to fix, I don't think we should hold
>>> off
>>>>>> the useful testing for all the other patches.
>>>>>> 
>>>>>> -Todd
>>>>>> 
>>>>>> On Jan 5, 2011, at 2:45 PM, Todd Lipcon wrote:
>>>>>>> 
>>>>>>>> Hi Nigel,
>>>>>>>> 
>>>>>>>> MAPREDUCE-2172 has been fixed for a while. Are there any other
>>>>> particular
>>>>>>>> JIRAs you think need to be fixed before the MR test-patch queue gets
>>>>>>>> enabled? I have a lot of outstanding patches and doing all the
>>>>> test-patch
>>>>>>>> turnaround manually on 3 different boxes is a real headache.
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> -Todd
>>>>>>>> 
>>>>>>>> On Tue, Dec 21, 2010 at 1:33 PM, Nigel Daley <nda...@mac.com> wrote:
>>>>>>>> 
>>>>>>>>> Ok, HDFS is now enabled.  You'll see a stream of updates shortly on
>>>>> the
>>>>>>> ~30
>>>>>>>>> Patch Available HDFS issues.
>>>>>>>>> 
>>>>>>>>> Nige
>>>>>>>>> 
>>>>>>>>> On Dec 20, 2010, at 12:42 PM, Jakob Homan wrote:
>>>>>>>>> 
>>>>>>>>>> I committed HDFS-1511 this morning.  We should be good to go.  I
>>> can
>>>>>>>>>> haz snooty robot butler?
>>>>>>>>>> 
>>>>>>>>>> On Fri, Dec 17, 2010 at 8:31 PM, Konstantin Boudnik <
>>> c...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>>> Thanks Jacob. I am wasted already but I can do it on Sun, I think,
>>>>>>>>>>> unless it is done earlier.
>>>>>>>>>>> --
>>>>>>>>>>> Take care,
>>>>>>>>>>> Konstantin (Cos) Boudnik
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, Dec 17, 2010 at 19:41, Jakob Homan <jgho...@gmail.com>
>>>>> wrote:
>>>>>>>>>>>> Ok.  I'll get a patch out for 1511 tomorrow, unless someone wants
>>>>> to
>>>>>>>>>>>> whip one up tonight.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Fri, Dec 17, 2010 at 7:22 PM, Nigel Daley <nda...@mac.com>
>>>>> wrote:
>>>>>>>>>>>>> I agree with Cos on fixing HDFS-1511 first. Once that is done
>>> I'll
>>>>>>>>> enable hdfs patch testing.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> Nige
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Sent from my iPhone4
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Dec 17, 2010, at 7:01 PM, Konstantin Boudnik <c...@apache.org
>>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> One more issue needs to be addressed before test-patch is
>>> turned
>>>>> on
>>>>>>>>> HDFS is
>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/HDFS-1511
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Take care,
>>>>>>>>>>>>>> Konstantin (Cos) Boudnik
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Fri, Dec 17, 2010 at 16:17, Konstantin Boudnik <
>>>>> c...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>>>>>>> Considering that because of these 4 faulty cases every patch
>>>>> will
>>>>>>> be
>>>>>>>>>>>>>>> -1'ed a patch author will still have to look at it and make a
>>>>>>>>> comment
>>>>>>>>>>>>>>> why this particular -1 isn't valid. Lesser work, perhaps, but
>>>>>>>>> messier
>>>>>>>>>>>>>>> IMO. I'm not blocking it - I just feel like there's a better
>>>>> way.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Take care,
>>>>>>>>>>>>>>> Konstantin (Cos) Boudnik
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Fri, Dec 17, 2010 at 15:55, Jakob Homan <jgho...@gmail.com
>>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> If HDFS is added to the test-patch queue right now we get
>>>>>>>>>>>>>>>>> nothing but dozens of -1'ed patches.
>>>>>>>>>>>>>>>> There aren't dozens of patches being submitted currently.
>>> The
>>>>> -1
>>>>>>>>>>>>>>>> isn't the important thing, it's the grunt work of actually
>>>>>>> running
>>>>>>>>>>>>>>>> (and waiting) for the tests, test-patch, etc. that Hudson
>>> does
>>>>> so
>>>>>>>>> that
>>>>>>>>>>>>>>>> the developer doesn't have to.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Fri, Dec 17, 2010 at 3:48 PM, Dhruba Borthakur <
>>>>>>>>> dhr...@gmail.com> wrote:
>>>>>>>>>>>>>>>>> +1, thanks for doing this.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Fri, Dec 17, 2010 at 3:19 PM, Jakob Homan <
>>>>> jgho...@gmail.com
>>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> So, with test-patch updated to show the failing tests,
>>> saving
>>>>>>> the
>>>>>>>>>>>>>>>>>> developers the need to go and verify that the failed tests
>>>>> are
>>>>>>>>> all
>>>>>>>>>>>>>>>>>> known, how do people feel about turning on test-patch again
>>>>> for
>>>>>>>>> HDFS
>>>>>>>>>>>>>>>>>> and mapred?  I think it'll help prevent any more tests from
>>>>>>>>> entering
>>>>>>>>>>>>>>>>>> the "yeah, we know" category.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> jg
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Wed, Nov 17, 2010 at 5:08 PM, Jakob Homan <
>>>>>>>>> jho...@yahoo-inc.com> wrote:
>>>>>>>>>>>>>>>>>>> True, each patch would get a -1 and the failing tests
>>> would
>>>>>>> need
>>>>>>>>> to be
>>>>>>>>>>>>>>>>>>> verified as those known bad (BTW, it would be great if
>>>>> Hudson
>>>>>>>>> could list
>>>>>>>>>>>>>>>>>>> which tests failed in the message it posts to JIRA).  But
>>>>>>> that's
>>>>>>>>> still
>>>>>>>>>>>>>>>>>> quite
>>>>>>>>>>>>>>>>>>> a bit less error-prone work than if the developer runs the
>>>>>>> tests
>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> test-patch themselves.  Also, with 22 being cut, there are
>>> a
>>>>>>> lot
>>>>>>>>> of
>>>>>>>>>>>>>>>>>> patches
>>>>>>>>>>>>>>>>>>> up in the air and several developers are juggling multiple
>>>>>>>>> patches.  The
>>>>>>>>>>>>>>>>>>> more automation we can have, even if it's not perfect,
>>> will
>>>>>>>>> decrease
>>>>>>>>>>>>>>>>>> errors
>>>>>>>>>>>>>>>>>>> we may make.
>>>>>>>>>>>>>>>>>>> -jg
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Nigel Daley wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Nov 17, 2010, at 3:11 PM, Jakob Homan wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> It's also ready to run on MapReduce and HDFS but we
>>> won't
>>>>>>>>> turn it on
>>>>>>>>>>>>>>>>>>>>>> until these projects build and test cleanly.  Looks
>>> like
>>>>>>> both
>>>>>>>>> these
>>>>>>>>>>>>>>>>>> projects
>>>>>>>>>>>>>>>>>>>>>> currently have test failures.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Assuming the projects are compiling and building, is
>>> there
>>>>> a
>>>>>>>>> reason to
>>>>>>>>>>>>>>>>>>>>> not turn it on despite the test failures? Hudson is
>>>>>>> invaluable
>>>>>>>>> to
>>>>>>>>>>>>>>>>>> developers
>>>>>>>>>>>>>>>>>>>>> who then don't have to run the tests and test-patch
>>>>>>>>> themselves.  We
>>>>>>>>>>>>>>>>>> didn't
>>>>>>>>>>>>>>>>>>>>> turn Hudson off when it was working previously and there
>>>>>>> were
>>>>>>>>> known
>>>>>>>>>>>>>>>>>>>>> failures.  I think one of the reasons we have more
>>> failing
>>>>>>>>> tests now is
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> higher cost of doing Hudson's work (not a great excuse I
>>>>>>>>> know).  This
>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>> particularly true now because several of the failing
>>> tests
>>>>>>>>> involve
>>>>>>>>>>>>>>>>>> tests
>>>>>>>>>>>>>>>>>>>>> timing out, making the whole testing regime even longer.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Every single patch would get a -1 and need investigation.
>>>>>>>>> Currently,
>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>> would be about 83 investigations between MR and HDFS
>>> issues
>>>>>>>>> that are in
>>>>>>>>>>>>>>>>>>>> patch available state.  Shouldn't we focus on getting
>>> these
>>>>>>>>> tests fixed
>>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>> removed/?  Also, I need to get MAPREDUCE-2172 fixed
>>>>> (applies
>>>>>>> to
>>>>>>>>> HDFS as
>>>>>>>>>>>>>>>>>>>> well) before I turn this on.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>> Nige
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Connect to me at http://www.facebook.com/dhruba
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Todd Lipcon
>>>>>>>> Software Engineer, Cloudera
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Todd Lipcon
>>>>>> Software Engineer, Cloudera
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Todd Lipcon
>>>> Software Engineer, Cloudera
>>> 
>>> 
>> 
>> 
>> -- 
>> Todd Lipcon
>> Software Engineer, Cloudera
> 

Reply via email to