Ack, sorry-- was checking email on my phone and didn't see the patch. I can
replicate it locally, digging in now.


On Mon, Jan 28, 2013 at 6:47 PM, Whitacre,Micah
<[email protected]>wrote:

>  The patch should contain the specifics but I've tested using 4.1.1,
> 4.1.2, and 4.1.3. Each gives the same results.
>
>
>
>
> On Jan 28, 2013, at 20:44, "Josh Wills" <[email protected]> wrote:
>
>   I usually run them in Eclipse, but not using a particularly special run
> configuration (I think.) Let me see if I can replicate that one-- which CDH
> version?
>
>
> On Mon, Jan 28, 2013 at 3:13 PM, Micah Whitacre <[email protected]>wrote:
>
>> Related to this thread, where I asked how to save off the intermediate
>> state but in general how do you debug the project, specifically for
>> the IT tests?  Do you typically run through Eclipse with special
>> profiles?
>>
>> I'm still trying to track down an odd failure in crunch-hbase when
>> swapping out the dependencies to use CDH4.1.x.  The test failure seems
>> to indicate the test is joining the same PCollection on itself.
>>
>> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 63.13
>> sec <<< FAILURE!
>> testWordCount(org.apache.crunch.io.hbase.WordCountHBaseIT)  Time
>> elapsed: 62.789 sec  <<< FAILURE!
>> java.lang.AssertionError: expected:<[cat,zebra, cat,donkey, dog,bird]>
>> but was:<[bird,bird, zebra,zebra, horse,horse, donkey,donkey]>
>>         at org.junit.Assert.fail(Assert.java:93)
>>         at org.junit.Assert.failNotEquals(Assert.java:647)
>>         at org.junit.Assert.assertEquals(Assert.java:128)
>>         at org.junit.Assert.assertEquals(Assert.java:147)
>>         at
>> org.apache.crunch.io.hbase.WordCountHBaseIT.run(WordCountHBaseIT.java:257)
>>         at
>> org.apache.crunch.io.hbase.WordCountHBaseIT.testWordCount(WordCountHBaseIT.java:202)
>>
>> and sometimes:
>>
>> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 71.958
>> sec <<< FAILURE!
>> testWordCount(org.apache.crunch.io.hbase.WordCountHBaseIT)  Time
>> elapsed: 71.469 sec  <<< FAILURE!
>> java.lang.AssertionError: expected:<[cat,zebra, cat,donkey, dog,bird]>
>> but was:<[dog,dog, cat,cat]>
>>         at org.junit.Assert.fail(Assert.java:93)
>>         at org.junit.Assert.failNotEquals(Assert.java:647)
>>         at org.junit.Assert.assertEquals(Assert.java:128)
>>         at org.junit.Assert.assertEquals(Assert.java:147)
>>         at
>> org.apache.crunch.io.hbase.WordCountHBaseIT.run(WordCountHBaseIT.java:259)
>>         at
>> org.apache.crunch.io.hbase.WordCountHBaseIT.testWordCount(WordCountHBaseIT.java:202)
>>
>> Most likely due to the same reason Crunch requires a special build of
>> HBase 0.94.1, I've found I need to mix and match CDH4 versions as
>> shown by the attached patch.  For the Crunch core build I need to use
>> all of the latest 2.0.0 code but for testing crunch-hbase I need to
>> use the mrv1 fork for hadoop-core and hadoop-minicluster.  I wouldn't
>> think that either of those would affect the tests unless somehow the
>> files used for the intermediate states were not being temporarily
>> stored correctly.  The fact that the test fails differently does make
>> me wonder about a concurrency issue but I'm not sure where.
>>
>> Any pointers on debugging would be helpful.
>> Micah
>>
>> On Thu, Jan 24, 2013 at 2:24 PM, Micah Whitacre <[email protected]>
>> wrote:
>> > I am creating an entirely new profile simply to keep my changes
>> > separate from what is in apache/master.
>> >
>> > Thanks for the hint about the "naive" approach.  Previously I had the
>> following:
>> >
>> >             <hadoop.version>2.0.0-cdh4.1.1</hadoop.version>
>> >
>> <hadoop.client.version>2.0.0-mr1-cdh4.1.1</hadoop.client.version>
>> >             <hbase.version>0.92.1-cdh4.1.1</hbase.version>
>> >
>> > If I follow what you did and change it to:
>> >
>> >             <hadoop.version>2.0.0-cdh4.1.1</hadoop.version>
>> >
>> <hadoop.client.version>2.0.0-cdh4.1.1</hadoop.client.version>
>> >             <hbase.version>0.92.1-cdh4.1.1</hbase.version>
>> >
>> > The build gets farther.  I now have a different failure in
>> > crunch-hbase I'll start working on.
>> >
>> > Thanks for your help.
>> > Micah
>> >
>> >
>> > On Thu, Jan 24, 2013 at 12:23 PM, Josh Wills <[email protected]>
>> wrote:
>> >> Micah,
>> >>
>> >> I did the naive thing and just swapped in 2.0.0-cdh4.1.2 for
>> 2.0.0-alpha in
>> >> the crunch.platform=2 profile in the top level POM and then added in
>> the
>> >> Cloudera repositories. That works for me-- does it work for you? It
>> sounds
>> >> to me like you're creating an entirely new profile.
>> >>
>> >> J
>> >>
>> >>
>> >> On Thu, Jan 24, 2013 at 7:58 AM, Micah Whitacre <[email protected]>
>> >> wrote:
>> >>>
>> >>> running dependency:tree on both projects shows that the version of
>> >>> Avro is 1.7.0 for running under both profiles.  I wish it was that
>> >>> easy.  :)
>> >>>
>> >>> On Thu, Jan 24, 2013 at 9:53 AM, Josh Wills <[email protected]>
>> wrote:
>> >>> >
>> >>> >
>> >>> >
>> >>> > On Thu, Jan 24, 2013 at 6:40 AM, Micah Whitacre <
>> [email protected]>
>> >>> > wrote:
>> >>> >>
>> >>> >> Taking a step back and comparing what is being generated for a
>> normal
>> >>> >> successful test run of "-Dcrunch.platform=2" I do see a p1 and p2
>> >>> >> directory being created, with the expected materialized output
>> being
>> >>> >> in the p1 directory.  So I'm still curious about tracking all of
>> the
>> >>> >> intermediate state but it doesn't look like it is an issue with
>> regard
>> >>> >> to creating the output in the wrong directory.
>> >>> >
>> >>> >
>> >>> > That's a relief. :)
>> >>> >
>> >>> > I think the issue with temp outputs has to do with our use of the
>> >>> > TemporaryPath libraries for creating, well, temporary paths. We do
>> this
>> >>> > so
>> >>> > we play nicely with CI frameworks, but you might need to disable it
>> for
>> >>> > investigating intermediate outputs.
>> >>> >
>> >>> > Re: the specific error you're seeing, that looks interesting. I
>> wonder
>> >>> > if
>> >>> > it's an Avro version change or some such thing. Will see if I can
>> >>> > replicate
>> >>> > it.
>> >>> >
>> >>> >
>> >>> > --
>> >>> > Director of Data Science
>> >>> > Cloudera
>> >>> > Twitter: @josh_wills
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Director of Data Science
>> >> Cloudera
>> >> Twitter: @josh_wills
>>
>
>
>
>  --
> Director of Data Science
> Cloudera<https://urldefense.proofpoint.com/v1/url?u=http://www.cloudera.com&k=PmKqfXspAHNo6iYJ48Q45A%3D%3D%0A&r=MwP8zm6sgnnstbiUpAReMZvSqrZXwpejyuwyb6GLlpU%3D%0A&m=qBPTU4qgjE%2FLGW3Bwb5WgOnDlwJ6euuGn0IKZTVxbQY%3D%0A&s=552d22114c95db1fcadfa02f1a7841e42c6db639b91c5226c4d54c50661f252e>
> Twitter: 
> @josh_wills<https://urldefense.proofpoint.com/v1/url?u=http://twitter.com/josh_wills&k=PmKqfXspAHNo6iYJ48Q45A%3D%3D%0A&r=MwP8zm6sgnnstbiUpAReMZvSqrZXwpejyuwyb6GLlpU%3D%0A&m=qBPTU4qgjE%2FLGW3Bwb5WgOnDlwJ6euuGn0IKZTVxbQY%3D%0A&s=d1d000d9b288d5cfbf562cab73004a21f9073d6c0b96f03c7bf29a58109b37f9>
>
>  CONFIDENTIALITY NOTICE This message and any included attachments are from
> Cerner Corporation and are intended only for the addressee. The information
> contained in this message is confidential and may constitute inside or
> non-public information under international, federal, or state securities
> laws. Unauthorized forwarding, printing, copying, distribution, or use of
> such information is strictly prohibited and may be unlawful. If you are not
> the addressee, please promptly delete this message and notify the sender of
> the delivery error by e-mail or you may call Cerner's corporate offices in
> Kansas City, Missouri, U.S.A at (+1) (816)221-1024.
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Reply via email to