Ack, sorry-- was checking email on my phone and didn't see the patch. I can replicate it locally, digging in now.
On Mon, Jan 28, 2013 at 6:47 PM, Whitacre,Micah <[email protected]>wrote: > The patch should contain the specifics but I've tested using 4.1.1, > 4.1.2, and 4.1.3. Each gives the same results. > > > > > On Jan 28, 2013, at 20:44, "Josh Wills" <[email protected]> wrote: > > I usually run them in Eclipse, but not using a particularly special run > configuration (I think.) Let me see if I can replicate that one-- which CDH > version? > > > On Mon, Jan 28, 2013 at 3:13 PM, Micah Whitacre <[email protected]>wrote: > >> Related to this thread, where I asked how to save off the intermediate >> state but in general how do you debug the project, specifically for >> the IT tests? Do you typically run through Eclipse with special >> profiles? >> >> I'm still trying to track down an odd failure in crunch-hbase when >> swapping out the dependencies to use CDH4.1.x. The test failure seems >> to indicate the test is joining the same PCollection on itself. >> >> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 63.13 >> sec <<< FAILURE! >> testWordCount(org.apache.crunch.io.hbase.WordCountHBaseIT) Time >> elapsed: 62.789 sec <<< FAILURE! >> java.lang.AssertionError: expected:<[cat,zebra, cat,donkey, dog,bird]> >> but was:<[bird,bird, zebra,zebra, horse,horse, donkey,donkey]> >> at org.junit.Assert.fail(Assert.java:93) >> at org.junit.Assert.failNotEquals(Assert.java:647) >> at org.junit.Assert.assertEquals(Assert.java:128) >> at org.junit.Assert.assertEquals(Assert.java:147) >> at >> org.apache.crunch.io.hbase.WordCountHBaseIT.run(WordCountHBaseIT.java:257) >> at >> org.apache.crunch.io.hbase.WordCountHBaseIT.testWordCount(WordCountHBaseIT.java:202) >> >> and sometimes: >> >> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 71.958 >> sec <<< FAILURE! >> testWordCount(org.apache.crunch.io.hbase.WordCountHBaseIT) Time >> elapsed: 71.469 sec <<< FAILURE! >> java.lang.AssertionError: expected:<[cat,zebra, cat,donkey, dog,bird]> >> but was:<[dog,dog, cat,cat]> >> at org.junit.Assert.fail(Assert.java:93) >> at org.junit.Assert.failNotEquals(Assert.java:647) >> at org.junit.Assert.assertEquals(Assert.java:128) >> at org.junit.Assert.assertEquals(Assert.java:147) >> at >> org.apache.crunch.io.hbase.WordCountHBaseIT.run(WordCountHBaseIT.java:259) >> at >> org.apache.crunch.io.hbase.WordCountHBaseIT.testWordCount(WordCountHBaseIT.java:202) >> >> Most likely due to the same reason Crunch requires a special build of >> HBase 0.94.1, I've found I need to mix and match CDH4 versions as >> shown by the attached patch. For the Crunch core build I need to use >> all of the latest 2.0.0 code but for testing crunch-hbase I need to >> use the mrv1 fork for hadoop-core and hadoop-minicluster. I wouldn't >> think that either of those would affect the tests unless somehow the >> files used for the intermediate states were not being temporarily >> stored correctly. The fact that the test fails differently does make >> me wonder about a concurrency issue but I'm not sure where. >> >> Any pointers on debugging would be helpful. >> Micah >> >> On Thu, Jan 24, 2013 at 2:24 PM, Micah Whitacre <[email protected]> >> wrote: >> > I am creating an entirely new profile simply to keep my changes >> > separate from what is in apache/master. >> > >> > Thanks for the hint about the "naive" approach. Previously I had the >> following: >> > >> > <hadoop.version>2.0.0-cdh4.1.1</hadoop.version> >> > >> <hadoop.client.version>2.0.0-mr1-cdh4.1.1</hadoop.client.version> >> > <hbase.version>0.92.1-cdh4.1.1</hbase.version> >> > >> > If I follow what you did and change it to: >> > >> > <hadoop.version>2.0.0-cdh4.1.1</hadoop.version> >> > >> <hadoop.client.version>2.0.0-cdh4.1.1</hadoop.client.version> >> > <hbase.version>0.92.1-cdh4.1.1</hbase.version> >> > >> > The build gets farther. I now have a different failure in >> > crunch-hbase I'll start working on. >> > >> > Thanks for your help. >> > Micah >> > >> > >> > On Thu, Jan 24, 2013 at 12:23 PM, Josh Wills <[email protected]> >> wrote: >> >> Micah, >> >> >> >> I did the naive thing and just swapped in 2.0.0-cdh4.1.2 for >> 2.0.0-alpha in >> >> the crunch.platform=2 profile in the top level POM and then added in >> the >> >> Cloudera repositories. That works for me-- does it work for you? It >> sounds >> >> to me like you're creating an entirely new profile. >> >> >> >> J >> >> >> >> >> >> On Thu, Jan 24, 2013 at 7:58 AM, Micah Whitacre <[email protected]> >> >> wrote: >> >>> >> >>> running dependency:tree on both projects shows that the version of >> >>> Avro is 1.7.0 for running under both profiles. I wish it was that >> >>> easy. :) >> >>> >> >>> On Thu, Jan 24, 2013 at 9:53 AM, Josh Wills <[email protected]> >> wrote: >> >>> > >> >>> > >> >>> > >> >>> > On Thu, Jan 24, 2013 at 6:40 AM, Micah Whitacre < >> [email protected]> >> >>> > wrote: >> >>> >> >> >>> >> Taking a step back and comparing what is being generated for a >> normal >> >>> >> successful test run of "-Dcrunch.platform=2" I do see a p1 and p2 >> >>> >> directory being created, with the expected materialized output >> being >> >>> >> in the p1 directory. So I'm still curious about tracking all of >> the >> >>> >> intermediate state but it doesn't look like it is an issue with >> regard >> >>> >> to creating the output in the wrong directory. >> >>> > >> >>> > >> >>> > That's a relief. :) >> >>> > >> >>> > I think the issue with temp outputs has to do with our use of the >> >>> > TemporaryPath libraries for creating, well, temporary paths. We do >> this >> >>> > so >> >>> > we play nicely with CI frameworks, but you might need to disable it >> for >> >>> > investigating intermediate outputs. >> >>> > >> >>> > Re: the specific error you're seeing, that looks interesting. I >> wonder >> >>> > if >> >>> > it's an Avro version change or some such thing. Will see if I can >> >>> > replicate >> >>> > it. >> >>> > >> >>> > >> >>> > -- >> >>> > Director of Data Science >> >>> > Cloudera >> >>> > Twitter: @josh_wills >> >> >> >> >> >> >> >> >> >> -- >> >> Director of Data Science >> >> Cloudera >> >> Twitter: @josh_wills >> > > > > -- > Director of Data Science > Cloudera<https://urldefense.proofpoint.com/v1/url?u=http://www.cloudera.com&k=PmKqfXspAHNo6iYJ48Q45A%3D%3D%0A&r=MwP8zm6sgnnstbiUpAReMZvSqrZXwpejyuwyb6GLlpU%3D%0A&m=qBPTU4qgjE%2FLGW3Bwb5WgOnDlwJ6euuGn0IKZTVxbQY%3D%0A&s=552d22114c95db1fcadfa02f1a7841e42c6db639b91c5226c4d54c50661f252e> > Twitter: > @josh_wills<https://urldefense.proofpoint.com/v1/url?u=http://twitter.com/josh_wills&k=PmKqfXspAHNo6iYJ48Q45A%3D%3D%0A&r=MwP8zm6sgnnstbiUpAReMZvSqrZXwpejyuwyb6GLlpU%3D%0A&m=qBPTU4qgjE%2FLGW3Bwb5WgOnDlwJ6euuGn0IKZTVxbQY%3D%0A&s=d1d000d9b288d5cfbf562cab73004a21f9073d6c0b96f03c7bf29a58109b37f9> > > CONFIDENTIALITY NOTICE This message and any included attachments are from > Cerner Corporation and are intended only for the addressee. The information > contained in this message is confidential and may constitute inside or > non-public information under international, federal, or state securities > laws. Unauthorized forwarding, printing, copying, distribution, or use of > such information is strictly prohibited and may be unlawful. If you are not > the addressee, please promptly delete this message and notify the sender of > the delivery error by e-mail or you may call Cerner's corporate offices in > Kansas City, Missouri, U.S.A at (+1) (816)221-1024. > -- Director of Data Science Cloudera <http://www.cloudera.com> Twitter: @josh_wills <http://twitter.com/josh_wills>
