Related to this thread, where I asked how to save off the intermediate
state but in general how do you debug the project, specifically for
the IT tests?  Do you typically run through Eclipse with special
profiles?

I'm still trying to track down an odd failure in crunch-hbase when
swapping out the dependencies to use CDH4.1.x.  The test failure seems
to indicate the test is joining the same PCollection on itself.

Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 63.13
sec <<< FAILURE!
testWordCount(org.apache.crunch.io.hbase.WordCountHBaseIT)  Time
elapsed: 62.789 sec  <<< FAILURE!
java.lang.AssertionError: expected:<[cat,zebra, cat,donkey, dog,bird]>
but was:<[bird,bird, zebra,zebra, horse,horse, donkey,donkey]>
        at org.junit.Assert.fail(Assert.java:93)
        at org.junit.Assert.failNotEquals(Assert.java:647)
        at org.junit.Assert.assertEquals(Assert.java:128)
        at org.junit.Assert.assertEquals(Assert.java:147)
        at 
org.apache.crunch.io.hbase.WordCountHBaseIT.run(WordCountHBaseIT.java:257)
        at 
org.apache.crunch.io.hbase.WordCountHBaseIT.testWordCount(WordCountHBaseIT.java:202)

and sometimes:

Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 71.958
sec <<< FAILURE!
testWordCount(org.apache.crunch.io.hbase.WordCountHBaseIT)  Time
elapsed: 71.469 sec  <<< FAILURE!
java.lang.AssertionError: expected:<[cat,zebra, cat,donkey, dog,bird]>
but was:<[dog,dog, cat,cat]>
        at org.junit.Assert.fail(Assert.java:93)
        at org.junit.Assert.failNotEquals(Assert.java:647)
        at org.junit.Assert.assertEquals(Assert.java:128)
        at org.junit.Assert.assertEquals(Assert.java:147)
        at 
org.apache.crunch.io.hbase.WordCountHBaseIT.run(WordCountHBaseIT.java:259)
        at 
org.apache.crunch.io.hbase.WordCountHBaseIT.testWordCount(WordCountHBaseIT.java:202)

Most likely due to the same reason Crunch requires a special build of
HBase 0.94.1, I've found I need to mix and match CDH4 versions as
shown by the attached patch.  For the Crunch core build I need to use
all of the latest 2.0.0 code but for testing crunch-hbase I need to
use the mrv1 fork for hadoop-core and hadoop-minicluster.  I wouldn't
think that either of those would affect the tests unless somehow the
files used for the intermediate states were not being temporarily
stored correctly.  The fact that the test fails differently does make
me wonder about a concurrency issue but I'm not sure where.

Any pointers on debugging would be helpful.
Micah

On Thu, Jan 24, 2013 at 2:24 PM, Micah Whitacre <[email protected]> wrote:
> I am creating an entirely new profile simply to keep my changes
> separate from what is in apache/master.
>
> Thanks for the hint about the "naive" approach.  Previously I had the 
> following:
>
>             <hadoop.version>2.0.0-cdh4.1.1</hadoop.version>
>             <hadoop.client.version>2.0.0-mr1-cdh4.1.1</hadoop.client.version>
>             <hbase.version>0.92.1-cdh4.1.1</hbase.version>
>
> If I follow what you did and change it to:
>
>             <hadoop.version>2.0.0-cdh4.1.1</hadoop.version>
>             <hadoop.client.version>2.0.0-cdh4.1.1</hadoop.client.version>
>             <hbase.version>0.92.1-cdh4.1.1</hbase.version>
>
> The build gets farther.  I now have a different failure in
> crunch-hbase I'll start working on.
>
> Thanks for your help.
> Micah
>
>
> On Thu, Jan 24, 2013 at 12:23 PM, Josh Wills <[email protected]> wrote:
>> Micah,
>>
>> I did the naive thing and just swapped in 2.0.0-cdh4.1.2 for 2.0.0-alpha in
>> the crunch.platform=2 profile in the top level POM and then added in the
>> Cloudera repositories. That works for me-- does it work for you? It sounds
>> to me like you're creating an entirely new profile.
>>
>> J
>>
>>
>> On Thu, Jan 24, 2013 at 7:58 AM, Micah Whitacre <[email protected]>
>> wrote:
>>>
>>> running dependency:tree on both projects shows that the version of
>>> Avro is 1.7.0 for running under both profiles.  I wish it was that
>>> easy.  :)
>>>
>>> On Thu, Jan 24, 2013 at 9:53 AM, Josh Wills <[email protected]> wrote:
>>> >
>>> >
>>> >
>>> > On Thu, Jan 24, 2013 at 6:40 AM, Micah Whitacre <[email protected]>
>>> > wrote:
>>> >>
>>> >> Taking a step back and comparing what is being generated for a normal
>>> >> successful test run of "-Dcrunch.platform=2" I do see a p1 and p2
>>> >> directory being created, with the expected materialized output being
>>> >> in the p1 directory.  So I'm still curious about tracking all of the
>>> >> intermediate state but it doesn't look like it is an issue with regard
>>> >> to creating the output in the wrong directory.
>>> >
>>> >
>>> > That's a relief. :)
>>> >
>>> > I think the issue with temp outputs has to do with our use of the
>>> > TemporaryPath libraries for creating, well, temporary paths. We do this
>>> > so
>>> > we play nicely with CI frameworks, but you might need to disable it for
>>> > investigating intermediate outputs.
>>> >
>>> > Re: the specific error you're seeing, that looks interesting. I wonder
>>> > if
>>> > it's an Avro version change or some such thing. Will see if I can
>>> > replicate
>>> > it.
>>> >
>>> >
>>> > --
>>> > Director of Data Science
>>> > Cloudera
>>> > Twitter: @josh_wills
>>
>>
>>
>>
>> --
>> Director of Data Science
>> Cloudera
>> Twitter: @josh_wills

Attachment: cdh4.1.3-versions.patch
Description: Binary data

Reply via email to