Re: Viewing intermediate states for debugging

Josh Wills Sun, 03 Feb 2013 17:54:16 -0800

Hey Micah,

This should fix the join issue:
https://issues.apache.org/jira/browse/CRUNCH-160


Let me know if it works for you.

J


On Wed, Jan 30, 2013 at 6:08 AM, Josh Wills <[email protected]> wrote:

> Okay, good to know. I'll be back in SF on Friday and will sit down w/some
> of my friends who know HBase better than I do and take another look.
>
> J
>
>
> On Tue, Jan 29, 2013 at 9:12 AM, Micah Whitacre <[email protected]>wrote:
>
>> Unfortunately it doesn't look like this is just a test failure as
>> running against a CDH4.1.1 cluster fails in the exact same manner.
>> Here is a copy of the code I used[1]
>>
>> [1] - http://pastebin.com/QLEc5fmG
>>
>> On Tue, Jan 29, 2013 at 8:44 AM, Micah Whitacre <[email protected]>
>> wrote:
>> > The problem of reading from the same table twice seems interesting.
>> > At one point when trying to figure out the problem I tweaked the test
>> > to run the joinedTable through the same wordCount steps to make sure
>> > everything was read and then persisted correctly.  So the flow of the
>> > test became:
>> >
>> > write to wordcount table
>> > wordcount
>> > write to join table
>> > wordcount the join table (output to a different table)
>> > attempt to join words with others.
>> >
>> > That flow would work as expected but still fail on the last join.  So
>> > it seems like it would be reading in correctly from HBase.
>> >
>> > I am working on building a stand alone example and will report back
>> > the findings.
>> >
>> > thanks for your help,
>> > micah
>> >
>> >
>> > On Mon, Jan 28, 2013 at 11:55 PM, Josh Wills <[email protected]>
>> wrote:
>> >> I have to call it a night, but this is an odd one.
>> >>
>> >> The basic problem seems to be that we are reading from the same table
>> >> twice-- it seems like the HTable object is the same on both splits
>> (always
>> >> reading from the words table, or always reading from the joinTableName
>> >> table), but the Scan object appears to get updated. I verified this by
>> using
>> >> a different column family on the joinTableName table and seeing that
>> the
>> >> test returned no output for the join, which is what we would expect if
>> one
>> >> of the reads had no input.
>> >>
>> >> Looking in the code, I don't see a place where the 0.92.1 and 0.90.4
>> code
>> >> differ significantly in terms of the input format, record reader, etc.
>> I'm
>> >> on the road this week, but I'd like to work on this one some more when
>> I'm
>> >> back in SF and can sit down with my co-workers who know more HBase
>> than I
>> >> do.
>> >>
>> >> Out of curiousity-- is it just the unit test that fails, or can you
>> run a
>> >> real HBase MR job that suffers from this problem?
>> >>
>> >> J
>> >>
>> >>
>> >> On Mon, Jan 28, 2013 at 7:26 PM, Josh Wills <[email protected]>
>> wrote:
>> >>>
>> >>> Ack, sorry-- was checking email on my phone and didn't see the patch.
>> I
>> >>> can replicate it locally, digging in now.
>> >>>
>> >>>
>> >>> On Mon, Jan 28, 2013 at 6:47 PM, Whitacre,Micah
>> >>> <[email protected]> wrote:
>> >>>>
>> >>>> The patch should contain the specifics but I've tested using 4.1.1,
>> >>>> 4.1.2, and 4.1.3. Each gives the same results.
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Jan 28, 2013, at 20:44, "Josh Wills" <[email protected]> wrote:
>> >>>>
>> >>>> I usually run them in Eclipse, but not using a particularly special
>> run
>> >>>> configuration (I think.) Let me see if I can replicate that one--
>> which CDH
>> >>>> version?
>> >>>>
>> >>>>
>> >>>> On Mon, Jan 28, 2013 at 3:13 PM, Micah Whitacre <
>> [email protected]>
>> >>>> wrote:
>> >>>>>
>> >>>>> Related to this thread, where I asked how to save off the
>> intermediate
>> >>>>> state but in general how do you debug the project, specifically for
>> >>>>> the IT tests?  Do you typically run through Eclipse with special
>> >>>>> profiles?
>> >>>>>
>> >>>>> I'm still trying to track down an odd failure in crunch-hbase when
>> >>>>> swapping out the dependencies to use CDH4.1.x.  The test failure
>> seems
>> >>>>> to indicate the test is joining the same PCollection on itself.
>> >>>>>
>> >>>>> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed:
>> 63.13
>> >>>>> sec <<< FAILURE!
>> >>>>> testWordCount(org.apache.crunch.io.hbase.WordCountHBaseIT)  Time
>> >>>>> elapsed: 62.789 sec  <<< FAILURE!
>> >>>>> java.lang.AssertionError: expected:<[cat,zebra, cat,donkey,
>> dog,bird]>
>> >>>>> but was:<[bird,bird, zebra,zebra, horse,horse, donkey,donkey]>
>> >>>>>         at org.junit.Assert.fail(Assert.java:93)
>> >>>>>         at org.junit.Assert.failNotEquals(Assert.java:647)
>> >>>>>         at org.junit.Assert.assertEquals(Assert.java:128)
>> >>>>>         at org.junit.Assert.assertEquals(Assert.java:147)
>> >>>>>         at
>> >>>>>
>> org.apache.crunch.io.hbase.WordCountHBaseIT.run(WordCountHBaseIT.java:257)
>> >>>>>         at
>> >>>>>
>> org.apache.crunch.io.hbase.WordCountHBaseIT.testWordCount(WordCountHBaseIT.java:202)
>> >>>>>
>> >>>>> and sometimes:
>> >>>>>
>> >>>>> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed:
>> 71.958
>> >>>>> sec <<< FAILURE!
>> >>>>> testWordCount(org.apache.crunch.io.hbase.WordCountHBaseIT)  Time
>> >>>>> elapsed: 71.469 sec  <<< FAILURE!
>> >>>>> java.lang.AssertionError: expected:<[cat,zebra, cat,donkey,
>> dog,bird]>
>> >>>>> but was:<[dog,dog, cat,cat]>
>> >>>>>         at org.junit.Assert.fail(Assert.java:93)
>> >>>>>         at org.junit.Assert.failNotEquals(Assert.java:647)
>> >>>>>         at org.junit.Assert.assertEquals(Assert.java:128)
>> >>>>>         at org.junit.Assert.assertEquals(Assert.java:147)
>> >>>>>         at
>> >>>>>
>> org.apache.crunch.io.hbase.WordCountHBaseIT.run(WordCountHBaseIT.java:259)
>> >>>>>         at
>> >>>>>
>> org.apache.crunch.io.hbase.WordCountHBaseIT.testWordCount(WordCountHBaseIT.java:202)
>> >>>>>
>> >>>>> Most likely due to the same reason Crunch requires a special build
>> of
>> >>>>> HBase 0.94.1, I've found I need to mix and match CDH4 versions as
>> >>>>> shown by the attached patch.  For the Crunch core build I need to
>> use
>> >>>>> all of the latest 2.0.0 code but for testing crunch-hbase I need to
>> >>>>> use the mrv1 fork for hadoop-core and hadoop-minicluster.  I
>> wouldn't
>> >>>>> think that either of those would affect the tests unless somehow the
>> >>>>> files used for the intermediate states were not being temporarily
>> >>>>> stored correctly.  The fact that the test fails differently does
>> make
>> >>>>> me wonder about a concurrency issue but I'm not sure where.
>> >>>>>
>> >>>>> Any pointers on debugging would be helpful.
>> >>>>> Micah
>> >>>>>
>> >>>>> On Thu, Jan 24, 2013 at 2:24 PM, Micah Whitacre <
>> [email protected]>
>> >>>>> wrote:
>> >>>>> > I am creating an entirely new profile simply to keep my changes
>> >>>>> > separate from what is in apache/master.
>> >>>>> >
>> >>>>> > Thanks for the hint about the "naive" approach.  Previously I had
>> the
>> >>>>> > following:
>> >>>>> >
>> >>>>> >             <hadoop.version>2.0.0-cdh4.1.1</hadoop.version>
>> >>>>> >
>> >>>>> > <hadoop.client.version>2.0.0-mr1-cdh4.1.1</hadoop.client.version>
>> >>>>> >             <hbase.version>0.92.1-cdh4.1.1</hbase.version>
>> >>>>> >
>> >>>>> > If I follow what you did and change it to:
>> >>>>> >
>> >>>>> >             <hadoop.version>2.0.0-cdh4.1.1</hadoop.version>
>> >>>>> >
>> >>>>> > <hadoop.client.version>2.0.0-cdh4.1.1</hadoop.client.version>
>> >>>>> >             <hbase.version>0.92.1-cdh4.1.1</hbase.version>
>> >>>>> >
>> >>>>> > The build gets farther.  I now have a different failure in
>> >>>>> > crunch-hbase I'll start working on.
>> >>>>> >
>> >>>>> > Thanks for your help.
>> >>>>> > Micah
>> >>>>> >
>> >>>>> >
>> >>>>> > On Thu, Jan 24, 2013 at 12:23 PM, Josh Wills <[email protected]
>> >
>> >>>>> > wrote:
>> >>>>> >> Micah,
>> >>>>> >>
>> >>>>> >> I did the naive thing and just swapped in 2.0.0-cdh4.1.2 for
>> >>>>> >> 2.0.0-alpha in
>> >>>>> >> the crunch.platform=2 profile in the top level POM and then
>> added in
>> >>>>> >> the
>> >>>>> >> Cloudera repositories. That works for me-- does it work for you?
>> It
>> >>>>> >> sounds
>> >>>>> >> to me like you're creating an entirely new profile.
>> >>>>> >>
>> >>>>> >> J
>> >>>>> >>
>> >>>>> >>
>> >>>>> >> On Thu, Jan 24, 2013 at 7:58 AM, Micah Whitacre
>> >>>>> >> <[email protected]>
>> >>>>> >> wrote:
>> >>>>> >>>
>> >>>>> >>> running dependency:tree on both projects shows that the version
>> of
>> >>>>> >>> Avro is 1.7.0 for running under both profiles.  I wish it was
>> that
>> >>>>> >>> easy.  :)
>> >>>>> >>>
>> >>>>> >>> On Thu, Jan 24, 2013 at 9:53 AM, Josh Wills <
>> [email protected]>
>> >>>>> >>> wrote:
>> >>>>> >>> >
>> >>>>> >>> >
>> >>>>> >>> >
>> >>>>> >>> > On Thu, Jan 24, 2013 at 6:40 AM, Micah Whitacre
>> >>>>> >>> > <[email protected]>
>> >>>>> >>> > wrote:
>> >>>>> >>> >>
>> >>>>> >>> >> Taking a step back and comparing what is being generated for
>> a
>> >>>>> >>> >> normal
>> >>>>> >>> >> successful test run of "-Dcrunch.platform=2" I do see a p1
>> and p2
>> >>>>> >>> >> directory being created, with the expected materialized
>> output
>> >>>>> >>> >> being
>> >>>>> >>> >> in the p1 directory.  So I'm still curious about tracking
>> all of
>> >>>>> >>> >> the
>> >>>>> >>> >> intermediate state but it doesn't look like it is an issue
>> with
>> >>>>> >>> >> regard
>> >>>>> >>> >> to creating the output in the wrong directory.
>> >>>>> >>> >
>> >>>>> >>> >
>> >>>>> >>> > That's a relief. :)
>> >>>>> >>> >
>> >>>>> >>> > I think the issue with temp outputs has to do with our use of
>> the
>> >>>>> >>> > TemporaryPath libraries for creating, well, temporary paths.
>> We do
>> >>>>> >>> > this
>> >>>>> >>> > so
>> >>>>> >>> > we play nicely with CI frameworks, but you might need to
>> disable
>> >>>>> >>> > it for
>> >>>>> >>> > investigating intermediate outputs.
>> >>>>> >>> >
>> >>>>> >>> > Re: the specific error you're seeing, that looks interesting.
>> I
>> >>>>> >>> > wonder
>> >>>>> >>> > if
>> >>>>> >>> > it's an Avro version change or some such thing. Will see if I
>> can
>> >>>>> >>> > replicate
>> >>>>> >>> > it.
>> >>>>> >>> >
>> >>>>> >>> >
>> >>>>> >>> > --
>> >>>>> >>> > Director of Data Science
>> >>>>> >>> > Cloudera
>> >>>>> >>> > Twitter: @josh_wills
>> >>>>> >>
>> >>>>> >>
>> >>>>> >>
>> >>>>> >>
>> >>>>> >> --
>> >>>>> >> Director of Data Science
>> >>>>> >> Cloudera
>> >>>>> >> Twitter: @josh_wills
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Director of Data Science
>> >>>> Cloudera
>> >>>> Twitter: @josh_wills
>> >>>>
>> >>>> CONFIDENTIALITY NOTICE This message and any included attachments are
>> from
>> >>>> Cerner Corporation and are intended only for the addressee. The
>> information
>> >>>> contained in this message is confidential and may constitute inside
>> or
>> >>>> non-public information under international, federal, or state
>> securities
>> >>>> laws. Unauthorized forwarding, printing, copying, distribution, or
>> use of
>> >>>> such information is strictly prohibited and may be unlawful. If you
>> are not
>> >>>> the addressee, please promptly delete this message and notify the
>> sender of
>> >>>> the delivery error by e-mail or you may call Cerner's corporate
>> offices in
>> >>>> Kansas City, Missouri, U.S.A at (+1) (816)221-1024.
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Director of Data Science
>> >>> Cloudera
>> >>> Twitter: @josh_wills
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Director of Data Science
>> >> Cloudera
>> >> Twitter: @josh_wills
>>
>
>
>
> --
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Re: Viewing intermediate states for debugging

Reply via email to