Re: Viewing intermediate states for debugging

Micah Whitacre Mon, 04 Feb 2013 08:10:08 -0800

Applied the fix with the dependency adjustments I mentioned previously
and all tests are passing successfully.


On Sun, Feb 3, 2013 at 7:53 PM, Josh Wills <[email protected]> wrote:
> Hey Micah,
>
> This should fix the join issue:
> https://issues.apache.org/jira/browse/CRUNCH-160
>
> Let me know if it works for you.
>
> J
>
>
> On Wed, Jan 30, 2013 at 6:08 AM, Josh Wills <[email protected]> wrote:
>>
>> Okay, good to know. I'll be back in SF on Friday and will sit down w/some
>> of my friends who know HBase better than I do and take another look.
>>
>> J
>>
>>
>> On Tue, Jan 29, 2013 at 9:12 AM, Micah Whitacre <[email protected]>
>> wrote:
>>>
>>> Unfortunately it doesn't look like this is just a test failure as
>>> running against a CDH4.1.1 cluster fails in the exact same manner.
>>> Here is a copy of the code I used[1]
>>>
>>> [1] - http://pastebin.com/QLEc5fmG
>>>
>>> On Tue, Jan 29, 2013 at 8:44 AM, Micah Whitacre <[email protected]>
>>> wrote:
>>> > The problem of reading from the same table twice seems interesting.
>>> > At one point when trying to figure out the problem I tweaked the test
>>> > to run the joinedTable through the same wordCount steps to make sure
>>> > everything was read and then persisted correctly.  So the flow of the
>>> > test became:
>>> >
>>> > write to wordcount table
>>> > wordcount
>>> > write to join table
>>> > wordcount the join table (output to a different table)
>>> > attempt to join words with others.
>>> >
>>> > That flow would work as expected but still fail on the last join.  So
>>> > it seems like it would be reading in correctly from HBase.
>>> >
>>> > I am working on building a stand alone example and will report back
>>> > the findings.
>>> >
>>> > thanks for your help,
>>> > micah
>>> >
>>> >
>>> > On Mon, Jan 28, 2013 at 11:55 PM, Josh Wills <[email protected]>
>>> > wrote:
>>> >> I have to call it a night, but this is an odd one.
>>> >>
>>> >> The basic problem seems to be that we are reading from the same table
>>> >> twice-- it seems like the HTable object is the same on both splits
>>> >> (always
>>> >> reading from the words table, or always reading from the joinTableName
>>> >> table), but the Scan object appears to get updated. I verified this by
>>> >> using
>>> >> a different column family on the joinTableName table and seeing that
>>> >> the
>>> >> test returned no output for the join, which is what we would expect if
>>> >> one
>>> >> of the reads had no input.
>>> >>
>>> >> Looking in the code, I don't see a place where the 0.92.1 and 0.90.4
>>> >> code
>>> >> differ significantly in terms of the input format, record reader, etc.
>>> >> I'm
>>> >> on the road this week, but I'd like to work on this one some more when
>>> >> I'm
>>> >> back in SF and can sit down with my co-workers who know more HBase
>>> >> than I
>>> >> do.
>>> >>
>>> >> Out of curiousity-- is it just the unit test that fails, or can you
>>> >> run a
>>> >> real HBase MR job that suffers from this problem?
>>> >>
>>> >> J
>>> >>
>>> >>
>>> >> On Mon, Jan 28, 2013 at 7:26 PM, Josh Wills <[email protected]>
>>> >> wrote:
>>> >>>
>>> >>> Ack, sorry-- was checking email on my phone and didn't see the patch.
>>> >>> I
>>> >>> can replicate it locally, digging in now.
>>> >>>
>>> >>>
>>> >>> On Mon, Jan 28, 2013 at 6:47 PM, Whitacre,Micah
>>> >>> <[email protected]> wrote:
>>> >>>>
>>> >>>> The patch should contain the specifics but I've tested using 4.1.1,
>>> >>>> 4.1.2, and 4.1.3. Each gives the same results.
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On Jan 28, 2013, at 20:44, "Josh Wills" <[email protected]> wrote:
>>> >>>>
>>> >>>> I usually run them in Eclipse, but not using a particularly special
>>> >>>> run
>>> >>>> configuration (I think.) Let me see if I can replicate that one--
>>> >>>> which CDH
>>> >>>> version?
>>> >>>>
>>> >>>>
>>> >>>> On Mon, Jan 28, 2013 at 3:13 PM, Micah Whitacre
>>> >>>> <[email protected]>
>>> >>>> wrote:
>>> >>>>>
>>> >>>>> Related to this thread, where I asked how to save off the
>>> >>>>> intermediate
>>> >>>>> state but in general how do you debug the project, specifically for
>>> >>>>> the IT tests?  Do you typically run through Eclipse with special
>>> >>>>> profiles?
>>> >>>>>
>>> >>>>> I'm still trying to track down an odd failure in crunch-hbase when
>>> >>>>> swapping out the dependencies to use CDH4.1.x.  The test failure
>>> >>>>> seems
>>> >>>>> to indicate the test is joining the same PCollection on itself.
>>> >>>>>
>>> >>>>> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed:
>>> >>>>> 63.13
>>> >>>>> sec <<< FAILURE!
>>> >>>>> testWordCount(org.apache.crunch.io.hbase.WordCountHBaseIT)  Time
>>> >>>>> elapsed: 62.789 sec  <<< FAILURE!
>>> >>>>> java.lang.AssertionError: expected:<[cat,zebra, cat,donkey,
>>> >>>>> dog,bird]>
>>> >>>>> but was:<[bird,bird, zebra,zebra, horse,horse, donkey,donkey]>
>>> >>>>>         at org.junit.Assert.fail(Assert.java:93)
>>> >>>>>         at org.junit.Assert.failNotEquals(Assert.java:647)
>>> >>>>>         at org.junit.Assert.assertEquals(Assert.java:128)
>>> >>>>>         at org.junit.Assert.assertEquals(Assert.java:147)
>>> >>>>>         at
>>> >>>>>
>>> >>>>> org.apache.crunch.io.hbase.WordCountHBaseIT.run(WordCountHBaseIT.java:257)
>>> >>>>>         at
>>> >>>>>
>>> >>>>> org.apache.crunch.io.hbase.WordCountHBaseIT.testWordCount(WordCountHBaseIT.java:202)
>>> >>>>>
>>> >>>>> and sometimes:
>>> >>>>>
>>> >>>>> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed:
>>> >>>>> 71.958
>>> >>>>> sec <<< FAILURE!
>>> >>>>> testWordCount(org.apache.crunch.io.hbase.WordCountHBaseIT)  Time
>>> >>>>> elapsed: 71.469 sec  <<< FAILURE!
>>> >>>>> java.lang.AssertionError: expected:<[cat,zebra, cat,donkey,
>>> >>>>> dog,bird]>
>>> >>>>> but was:<[dog,dog, cat,cat]>
>>> >>>>>         at org.junit.Assert.fail(Assert.java:93)
>>> >>>>>         at org.junit.Assert.failNotEquals(Assert.java:647)
>>> >>>>>         at org.junit.Assert.assertEquals(Assert.java:128)
>>> >>>>>         at org.junit.Assert.assertEquals(Assert.java:147)
>>> >>>>>         at
>>> >>>>>
>>> >>>>> org.apache.crunch.io.hbase.WordCountHBaseIT.run(WordCountHBaseIT.java:259)
>>> >>>>>         at
>>> >>>>>
>>> >>>>> org.apache.crunch.io.hbase.WordCountHBaseIT.testWordCount(WordCountHBaseIT.java:202)
>>> >>>>>
>>> >>>>> Most likely due to the same reason Crunch requires a special build
>>> >>>>> of
>>> >>>>> HBase 0.94.1, I've found I need to mix and match CDH4 versions as
>>> >>>>> shown by the attached patch.  For the Crunch core build I need to
>>> >>>>> use
>>> >>>>> all of the latest 2.0.0 code but for testing crunch-hbase I need to
>>> >>>>> use the mrv1 fork for hadoop-core and hadoop-minicluster.  I
>>> >>>>> wouldn't
>>> >>>>> think that either of those would affect the tests unless somehow
>>> >>>>> the
>>> >>>>> files used for the intermediate states were not being temporarily
>>> >>>>> stored correctly.  The fact that the test fails differently does
>>> >>>>> make
>>> >>>>> me wonder about a concurrency issue but I'm not sure where.
>>> >>>>>
>>> >>>>> Any pointers on debugging would be helpful.
>>> >>>>> Micah
>>> >>>>>
>>> >>>>> On Thu, Jan 24, 2013 at 2:24 PM, Micah Whitacre
>>> >>>>> <[email protected]>
>>> >>>>> wrote:
>>> >>>>> > I am creating an entirely new profile simply to keep my changes
>>> >>>>> > separate from what is in apache/master.
>>> >>>>> >
>>> >>>>> > Thanks for the hint about the "naive" approach.  Previously I had
>>> >>>>> > the
>>> >>>>> > following:
>>> >>>>> >
>>> >>>>> >             <hadoop.version>2.0.0-cdh4.1.1</hadoop.version>
>>> >>>>> >
>>> >>>>> > <hadoop.client.version>2.0.0-mr1-cdh4.1.1</hadoop.client.version>
>>> >>>>> >             <hbase.version>0.92.1-cdh4.1.1</hbase.version>
>>> >>>>> >
>>> >>>>> > If I follow what you did and change it to:
>>> >>>>> >
>>> >>>>> >             <hadoop.version>2.0.0-cdh4.1.1</hadoop.version>
>>> >>>>> >
>>> >>>>> > <hadoop.client.version>2.0.0-cdh4.1.1</hadoop.client.version>
>>> >>>>> >             <hbase.version>0.92.1-cdh4.1.1</hbase.version>
>>> >>>>> >
>>> >>>>> > The build gets farther.  I now have a different failure in
>>> >>>>> > crunch-hbase I'll start working on.
>>> >>>>> >
>>> >>>>> > Thanks for your help.
>>> >>>>> > Micah
>>> >>>>> >
>>> >>>>> >
>>> >>>>> > On Thu, Jan 24, 2013 at 12:23 PM, Josh Wills
>>> >>>>> > <[email protected]>
>>> >>>>> > wrote:
>>> >>>>> >> Micah,
>>> >>>>> >>
>>> >>>>> >> I did the naive thing and just swapped in 2.0.0-cdh4.1.2 for
>>> >>>>> >> 2.0.0-alpha in
>>> >>>>> >> the crunch.platform=2 profile in the top level POM and then
>>> >>>>> >> added in
>>> >>>>> >> the
>>> >>>>> >> Cloudera repositories. That works for me-- does it work for you?
>>> >>>>> >> It
>>> >>>>> >> sounds
>>> >>>>> >> to me like you're creating an entirely new profile.
>>> >>>>> >>
>>> >>>>> >> J
>>> >>>>> >>
>>> >>>>> >>
>>> >>>>> >> On Thu, Jan 24, 2013 at 7:58 AM, Micah Whitacre
>>> >>>>> >> <[email protected]>
>>> >>>>> >> wrote:
>>> >>>>> >>>
>>> >>>>> >>> running dependency:tree on both projects shows that the version
>>> >>>>> >>> of
>>> >>>>> >>> Avro is 1.7.0 for running under both profiles.  I wish it was
>>> >>>>> >>> that
>>> >>>>> >>> easy.  :)
>>> >>>>> >>>
>>> >>>>> >>> On Thu, Jan 24, 2013 at 9:53 AM, Josh Wills
>>> >>>>> >>> <[email protected]>
>>> >>>>> >>> wrote:
>>> >>>>> >>> >
>>> >>>>> >>> >
>>> >>>>> >>> >
>>> >>>>> >>> > On Thu, Jan 24, 2013 at 6:40 AM, Micah Whitacre
>>> >>>>> >>> > <[email protected]>
>>> >>>>> >>> > wrote:
>>> >>>>> >>> >>
>>> >>>>> >>> >> Taking a step back and comparing what is being generated for
>>> >>>>> >>> >> a
>>> >>>>> >>> >> normal
>>> >>>>> >>> >> successful test run of "-Dcrunch.platform=2" I do see a p1
>>> >>>>> >>> >> and p2
>>> >>>>> >>> >> directory being created, with the expected materialized
>>> >>>>> >>> >> output
>>> >>>>> >>> >> being
>>> >>>>> >>> >> in the p1 directory.  So I'm still curious about tracking
>>> >>>>> >>> >> all of
>>> >>>>> >>> >> the
>>> >>>>> >>> >> intermediate state but it doesn't look like it is an issue
>>> >>>>> >>> >> with
>>> >>>>> >>> >> regard
>>> >>>>> >>> >> to creating the output in the wrong directory.
>>> >>>>> >>> >
>>> >>>>> >>> >
>>> >>>>> >>> > That's a relief. :)
>>> >>>>> >>> >
>>> >>>>> >>> > I think the issue with temp outputs has to do with our use of
>>> >>>>> >>> > the
>>> >>>>> >>> > TemporaryPath libraries for creating, well, temporary paths.
>>> >>>>> >>> > We do
>>> >>>>> >>> > this
>>> >>>>> >>> > so
>>> >>>>> >>> > we play nicely with CI frameworks, but you might need to
>>> >>>>> >>> > disable
>>> >>>>> >>> > it for
>>> >>>>> >>> > investigating intermediate outputs.
>>> >>>>> >>> >
>>> >>>>> >>> > Re: the specific error you're seeing, that looks interesting.
>>> >>>>> >>> > I
>>> >>>>> >>> > wonder
>>> >>>>> >>> > if
>>> >>>>> >>> > it's an Avro version change or some such thing. Will see if I
>>> >>>>> >>> > can
>>> >>>>> >>> > replicate
>>> >>>>> >>> > it.
>>> >>>>> >>> >
>>> >>>>> >>> >
>>> >>>>> >>> > --
>>> >>>>> >>> > Director of Data Science
>>> >>>>> >>> > Cloudera
>>> >>>>> >>> > Twitter: @josh_wills
>>> >>>>> >>
>>> >>>>> >>
>>> >>>>> >>
>>> >>>>> >>
>>> >>>>> >> --
>>> >>>>> >> Director of Data Science
>>> >>>>> >> Cloudera
>>> >>>>> >> Twitter: @josh_wills
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> --
>>> >>>> Director of Data Science
>>> >>>> Cloudera
>>> >>>> Twitter: @josh_wills
>>> >>>>
>>> >>>> CONFIDENTIALITY NOTICE This message and any included attachments are
>>> >>>> from
>>> >>>> Cerner Corporation and are intended only for the addressee. The
>>> >>>> information
>>> >>>> contained in this message is confidential and may constitute inside
>>> >>>> or
>>> >>>> non-public information under international, federal, or state
>>> >>>> securities
>>> >>>> laws. Unauthorized forwarding, printing, copying, distribution, or
>>> >>>> use of
>>> >>>> such information is strictly prohibited and may be unlawful. If you
>>> >>>> are not
>>> >>>> the addressee, please promptly delete this message and notify the
>>> >>>> sender of
>>> >>>> the delivery error by e-mail or you may call Cerner's corporate
>>> >>>> offices in
>>> >>>> Kansas City, Missouri, U.S.A at (+1) (816)221-1024.
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Director of Data Science
>>> >>> Cloudera
>>> >>> Twitter: @josh_wills
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Director of Data Science
>>> >> Cloudera
>>> >> Twitter: @josh_wills
>>
>>
>>
>>
>> --
>> Director of Data Science
>> Cloudera
>> Twitter: @josh_wills
>
>
>
>
> --
> Director of Data Science
> Cloudera
> Twitter: @josh_wills

Re: Viewing intermediate states for debugging

Reply via email to