Unfortunately it doesn't look like this is just a test failure as running against a CDH4.1.1 cluster fails in the exact same manner. Here is a copy of the code I used[1]
[1] - http://pastebin.com/QLEc5fmG On Tue, Jan 29, 2013 at 8:44 AM, Micah Whitacre <[email protected]> wrote: > The problem of reading from the same table twice seems interesting. > At one point when trying to figure out the problem I tweaked the test > to run the joinedTable through the same wordCount steps to make sure > everything was read and then persisted correctly. So the flow of the > test became: > > write to wordcount table > wordcount > write to join table > wordcount the join table (output to a different table) > attempt to join words with others. > > That flow would work as expected but still fail on the last join. So > it seems like it would be reading in correctly from HBase. > > I am working on building a stand alone example and will report back > the findings. > > thanks for your help, > micah > > > On Mon, Jan 28, 2013 at 11:55 PM, Josh Wills <[email protected]> wrote: >> I have to call it a night, but this is an odd one. >> >> The basic problem seems to be that we are reading from the same table >> twice-- it seems like the HTable object is the same on both splits (always >> reading from the words table, or always reading from the joinTableName >> table), but the Scan object appears to get updated. I verified this by using >> a different column family on the joinTableName table and seeing that the >> test returned no output for the join, which is what we would expect if one >> of the reads had no input. >> >> Looking in the code, I don't see a place where the 0.92.1 and 0.90.4 code >> differ significantly in terms of the input format, record reader, etc. I'm >> on the road this week, but I'd like to work on this one some more when I'm >> back in SF and can sit down with my co-workers who know more HBase than I >> do. >> >> Out of curiousity-- is it just the unit test that fails, or can you run a >> real HBase MR job that suffers from this problem? >> >> J >> >> >> On Mon, Jan 28, 2013 at 7:26 PM, Josh Wills <[email protected]> wrote: >>> >>> Ack, sorry-- was checking email on my phone and didn't see the patch. I >>> can replicate it locally, digging in now. >>> >>> >>> On Mon, Jan 28, 2013 at 6:47 PM, Whitacre,Micah >>> <[email protected]> wrote: >>>> >>>> The patch should contain the specifics but I've tested using 4.1.1, >>>> 4.1.2, and 4.1.3. Each gives the same results. >>>> >>>> >>>> >>>> >>>> On Jan 28, 2013, at 20:44, "Josh Wills" <[email protected]> wrote: >>>> >>>> I usually run them in Eclipse, but not using a particularly special run >>>> configuration (I think.) Let me see if I can replicate that one-- which CDH >>>> version? >>>> >>>> >>>> On Mon, Jan 28, 2013 at 3:13 PM, Micah Whitacre <[email protected]> >>>> wrote: >>>>> >>>>> Related to this thread, where I asked how to save off the intermediate >>>>> state but in general how do you debug the project, specifically for >>>>> the IT tests? Do you typically run through Eclipse with special >>>>> profiles? >>>>> >>>>> I'm still trying to track down an odd failure in crunch-hbase when >>>>> swapping out the dependencies to use CDH4.1.x. The test failure seems >>>>> to indicate the test is joining the same PCollection on itself. >>>>> >>>>> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 63.13 >>>>> sec <<< FAILURE! >>>>> testWordCount(org.apache.crunch.io.hbase.WordCountHBaseIT) Time >>>>> elapsed: 62.789 sec <<< FAILURE! >>>>> java.lang.AssertionError: expected:<[cat,zebra, cat,donkey, dog,bird]> >>>>> but was:<[bird,bird, zebra,zebra, horse,horse, donkey,donkey]> >>>>> at org.junit.Assert.fail(Assert.java:93) >>>>> at org.junit.Assert.failNotEquals(Assert.java:647) >>>>> at org.junit.Assert.assertEquals(Assert.java:128) >>>>> at org.junit.Assert.assertEquals(Assert.java:147) >>>>> at >>>>> org.apache.crunch.io.hbase.WordCountHBaseIT.run(WordCountHBaseIT.java:257) >>>>> at >>>>> org.apache.crunch.io.hbase.WordCountHBaseIT.testWordCount(WordCountHBaseIT.java:202) >>>>> >>>>> and sometimes: >>>>> >>>>> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 71.958 >>>>> sec <<< FAILURE! >>>>> testWordCount(org.apache.crunch.io.hbase.WordCountHBaseIT) Time >>>>> elapsed: 71.469 sec <<< FAILURE! >>>>> java.lang.AssertionError: expected:<[cat,zebra, cat,donkey, dog,bird]> >>>>> but was:<[dog,dog, cat,cat]> >>>>> at org.junit.Assert.fail(Assert.java:93) >>>>> at org.junit.Assert.failNotEquals(Assert.java:647) >>>>> at org.junit.Assert.assertEquals(Assert.java:128) >>>>> at org.junit.Assert.assertEquals(Assert.java:147) >>>>> at >>>>> org.apache.crunch.io.hbase.WordCountHBaseIT.run(WordCountHBaseIT.java:259) >>>>> at >>>>> org.apache.crunch.io.hbase.WordCountHBaseIT.testWordCount(WordCountHBaseIT.java:202) >>>>> >>>>> Most likely due to the same reason Crunch requires a special build of >>>>> HBase 0.94.1, I've found I need to mix and match CDH4 versions as >>>>> shown by the attached patch. For the Crunch core build I need to use >>>>> all of the latest 2.0.0 code but for testing crunch-hbase I need to >>>>> use the mrv1 fork for hadoop-core and hadoop-minicluster. I wouldn't >>>>> think that either of those would affect the tests unless somehow the >>>>> files used for the intermediate states were not being temporarily >>>>> stored correctly. The fact that the test fails differently does make >>>>> me wonder about a concurrency issue but I'm not sure where. >>>>> >>>>> Any pointers on debugging would be helpful. >>>>> Micah >>>>> >>>>> On Thu, Jan 24, 2013 at 2:24 PM, Micah Whitacre <[email protected]> >>>>> wrote: >>>>> > I am creating an entirely new profile simply to keep my changes >>>>> > separate from what is in apache/master. >>>>> > >>>>> > Thanks for the hint about the "naive" approach. Previously I had the >>>>> > following: >>>>> > >>>>> > <hadoop.version>2.0.0-cdh4.1.1</hadoop.version> >>>>> > >>>>> > <hadoop.client.version>2.0.0-mr1-cdh4.1.1</hadoop.client.version> >>>>> > <hbase.version>0.92.1-cdh4.1.1</hbase.version> >>>>> > >>>>> > If I follow what you did and change it to: >>>>> > >>>>> > <hadoop.version>2.0.0-cdh4.1.1</hadoop.version> >>>>> > >>>>> > <hadoop.client.version>2.0.0-cdh4.1.1</hadoop.client.version> >>>>> > <hbase.version>0.92.1-cdh4.1.1</hbase.version> >>>>> > >>>>> > The build gets farther. I now have a different failure in >>>>> > crunch-hbase I'll start working on. >>>>> > >>>>> > Thanks for your help. >>>>> > Micah >>>>> > >>>>> > >>>>> > On Thu, Jan 24, 2013 at 12:23 PM, Josh Wills <[email protected]> >>>>> > wrote: >>>>> >> Micah, >>>>> >> >>>>> >> I did the naive thing and just swapped in 2.0.0-cdh4.1.2 for >>>>> >> 2.0.0-alpha in >>>>> >> the crunch.platform=2 profile in the top level POM and then added in >>>>> >> the >>>>> >> Cloudera repositories. That works for me-- does it work for you? It >>>>> >> sounds >>>>> >> to me like you're creating an entirely new profile. >>>>> >> >>>>> >> J >>>>> >> >>>>> >> >>>>> >> On Thu, Jan 24, 2013 at 7:58 AM, Micah Whitacre >>>>> >> <[email protected]> >>>>> >> wrote: >>>>> >>> >>>>> >>> running dependency:tree on both projects shows that the version of >>>>> >>> Avro is 1.7.0 for running under both profiles. I wish it was that >>>>> >>> easy. :) >>>>> >>> >>>>> >>> On Thu, Jan 24, 2013 at 9:53 AM, Josh Wills <[email protected]> >>>>> >>> wrote: >>>>> >>> > >>>>> >>> > >>>>> >>> > >>>>> >>> > On Thu, Jan 24, 2013 at 6:40 AM, Micah Whitacre >>>>> >>> > <[email protected]> >>>>> >>> > wrote: >>>>> >>> >> >>>>> >>> >> Taking a step back and comparing what is being generated for a >>>>> >>> >> normal >>>>> >>> >> successful test run of "-Dcrunch.platform=2" I do see a p1 and p2 >>>>> >>> >> directory being created, with the expected materialized output >>>>> >>> >> being >>>>> >>> >> in the p1 directory. So I'm still curious about tracking all of >>>>> >>> >> the >>>>> >>> >> intermediate state but it doesn't look like it is an issue with >>>>> >>> >> regard >>>>> >>> >> to creating the output in the wrong directory. >>>>> >>> > >>>>> >>> > >>>>> >>> > That's a relief. :) >>>>> >>> > >>>>> >>> > I think the issue with temp outputs has to do with our use of the >>>>> >>> > TemporaryPath libraries for creating, well, temporary paths. We do >>>>> >>> > this >>>>> >>> > so >>>>> >>> > we play nicely with CI frameworks, but you might need to disable >>>>> >>> > it for >>>>> >>> > investigating intermediate outputs. >>>>> >>> > >>>>> >>> > Re: the specific error you're seeing, that looks interesting. I >>>>> >>> > wonder >>>>> >>> > if >>>>> >>> > it's an Avro version change or some such thing. Will see if I can >>>>> >>> > replicate >>>>> >>> > it. >>>>> >>> > >>>>> >>> > >>>>> >>> > -- >>>>> >>> > Director of Data Science >>>>> >>> > Cloudera >>>>> >>> > Twitter: @josh_wills >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> -- >>>>> >> Director of Data Science >>>>> >> Cloudera >>>>> >> Twitter: @josh_wills >>>> >>>> >>>> >>>> >>>> -- >>>> Director of Data Science >>>> Cloudera >>>> Twitter: @josh_wills >>>> >>>> CONFIDENTIALITY NOTICE This message and any included attachments are from >>>> Cerner Corporation and are intended only for the addressee. The information >>>> contained in this message is confidential and may constitute inside or >>>> non-public information under international, federal, or state securities >>>> laws. Unauthorized forwarding, printing, copying, distribution, or use of >>>> such information is strictly prohibited and may be unlawful. If you are not >>>> the addressee, please promptly delete this message and notify the sender of >>>> the delivery error by e-mail or you may call Cerner's corporate offices in >>>> Kansas City, Missouri, U.S.A at (+1) (816)221-1024. >>> >>> >>> >>> >>> -- >>> Director of Data Science >>> Cloudera >>> Twitter: @josh_wills >> >> >> >> >> -- >> Director of Data Science >> Cloudera >> Twitter: @josh_wills
