Re: Viewing intermediate states for debugging

Josh Wills Wed, 30 Jan 2013 06:09:28 -0800

Okay, good to know. I'll be back in SF on Friday and will sit down w/some
of my friends who know HBase better than I do and take another look.


J


On Tue, Jan 29, 2013 at 9:12 AM, Micah Whitacre <[email protected]>wrote:

> Unfortunately it doesn't look like this is just a test failure as
> running against a CDH4.1.1 cluster fails in the exact same manner.
> Here is a copy of the code I used[1]
>
> [1] - http://pastebin.com/QLEc5fmG
>
> On Tue, Jan 29, 2013 at 8:44 AM, Micah Whitacre <[email protected]>
> wrote:
> > The problem of reading from the same table twice seems interesting.
> > At one point when trying to figure out the problem I tweaked the test
> > to run the joinedTable through the same wordCount steps to make sure
> > everything was read and then persisted correctly.  So the flow of the
> > test became:
> >
> > write to wordcount table
> > wordcount
> > write to join table
> > wordcount the join table (output to a different table)
> > attempt to join words with others.
> >
> > That flow would work as expected but still fail on the last join.  So
> > it seems like it would be reading in correctly from HBase.
> >
> > I am working on building a stand alone example and will report back
> > the findings.
> >
> > thanks for your help,
> > micah
> >
> >
> > On Mon, Jan 28, 2013 at 11:55 PM, Josh Wills <[email protected]>
> wrote:
> >> I have to call it a night, but this is an odd one.
> >>
> >> The basic problem seems to be that we are reading from the same table
> >> twice-- it seems like the HTable object is the same on both splits
> (always
> >> reading from the words table, or always reading from the joinTableName
> >> table), but the Scan object appears to get updated. I verified this by
> using
> >> a different column family on the joinTableName table and seeing that the
> >> test returned no output for the join, which is what we would expect if
> one
> >> of the reads had no input.
> >>
> >> Looking in the code, I don't see a place where the 0.92.1 and 0.90.4
> code
> >> differ significantly in terms of the input format, record reader, etc.
> I'm
> >> on the road this week, but I'd like to work on this one some more when
> I'm
> >> back in SF and can sit down with my co-workers who know more HBase than
> I
> >> do.
> >>
> >> Out of curiousity-- is it just the unit test that fails, or can you run
> a
> >> real HBase MR job that suffers from this problem?
> >>
> >> J
> >>
> >>
> >> On Mon, Jan 28, 2013 at 7:26 PM, Josh Wills <[email protected]>
> wrote:
> >>>
> >>> Ack, sorry-- was checking email on my phone and didn't see the patch. I
> >>> can replicate it locally, digging in now.
> >>>
> >>>
> >>> On Mon, Jan 28, 2013 at 6:47 PM, Whitacre,Micah
> >>> <[email protected]> wrote:
> >>>>
> >>>> The patch should contain the specifics but I've tested using 4.1.1,
> >>>> 4.1.2, and 4.1.3. Each gives the same results.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Jan 28, 2013, at 20:44, "Josh Wills" <[email protected]> wrote:
> >>>>
> >>>> I usually run them in Eclipse, but not using a particularly special
> run
> >>>> configuration (I think.) Let me see if I can replicate that one--
> which CDH
> >>>> version?
> >>>>
> >>>>
> >>>> On Mon, Jan 28, 2013 at 3:13 PM, Micah Whitacre <[email protected]
> >
> >>>> wrote:
> >>>>>
> >>>>> Related to this thread, where I asked how to save off the
> intermediate
> >>>>> state but in general how do you debug the project, specifically for
> >>>>> the IT tests?  Do you typically run through Eclipse with special
> >>>>> profiles?
> >>>>>
> >>>>> I'm still trying to track down an odd failure in crunch-hbase when
> >>>>> swapping out the dependencies to use CDH4.1.x.  The test failure
> seems
> >>>>> to indicate the test is joining the same PCollection on itself.
> >>>>>
> >>>>> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 63.13
> >>>>> sec <<< FAILURE!
> >>>>> testWordCount(org.apache.crunch.io.hbase.WordCountHBaseIT)  Time
> >>>>> elapsed: 62.789 sec  <<< FAILURE!
> >>>>> java.lang.AssertionError: expected:<[cat,zebra, cat,donkey,
> dog,bird]>
> >>>>> but was:<[bird,bird, zebra,zebra, horse,horse, donkey,donkey]>
> >>>>>         at org.junit.Assert.fail(Assert.java:93)
> >>>>>         at org.junit.Assert.failNotEquals(Assert.java:647)
> >>>>>         at org.junit.Assert.assertEquals(Assert.java:128)
> >>>>>         at org.junit.Assert.assertEquals(Assert.java:147)
> >>>>>         at
> >>>>>
> org.apache.crunch.io.hbase.WordCountHBaseIT.run(WordCountHBaseIT.java:257)
> >>>>>         at
> >>>>>
> org.apache.crunch.io.hbase.WordCountHBaseIT.testWordCount(WordCountHBaseIT.java:202)
> >>>>>
> >>>>> and sometimes:
> >>>>>
> >>>>> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed:
> 71.958
> >>>>> sec <<< FAILURE!
> >>>>> testWordCount(org.apache.crunch.io.hbase.WordCountHBaseIT)  Time
> >>>>> elapsed: 71.469 sec  <<< FAILURE!
> >>>>> java.lang.AssertionError: expected:<[cat,zebra, cat,donkey,
> dog,bird]>
> >>>>> but was:<[dog,dog, cat,cat]>
> >>>>>         at org.junit.Assert.fail(Assert.java:93)
> >>>>>         at org.junit.Assert.failNotEquals(Assert.java:647)
> >>>>>         at org.junit.Assert.assertEquals(Assert.java:128)
> >>>>>         at org.junit.Assert.assertEquals(Assert.java:147)
> >>>>>         at
> >>>>>
> org.apache.crunch.io.hbase.WordCountHBaseIT.run(WordCountHBaseIT.java:259)
> >>>>>         at
> >>>>>
> org.apache.crunch.io.hbase.WordCountHBaseIT.testWordCount(WordCountHBaseIT.java:202)
> >>>>>
> >>>>> Most likely due to the same reason Crunch requires a special build of
> >>>>> HBase 0.94.1, I've found I need to mix and match CDH4 versions as
> >>>>> shown by the attached patch.  For the Crunch core build I need to use
> >>>>> all of the latest 2.0.0 code but for testing crunch-hbase I need to
> >>>>> use the mrv1 fork for hadoop-core and hadoop-minicluster.  I wouldn't
> >>>>> think that either of those would affect the tests unless somehow the
> >>>>> files used for the intermediate states were not being temporarily
> >>>>> stored correctly.  The fact that the test fails differently does make
> >>>>> me wonder about a concurrency issue but I'm not sure where.
> >>>>>
> >>>>> Any pointers on debugging would be helpful.
> >>>>> Micah
> >>>>>
> >>>>> On Thu, Jan 24, 2013 at 2:24 PM, Micah Whitacre <
> [email protected]>
> >>>>> wrote:
> >>>>> > I am creating an entirely new profile simply to keep my changes
> >>>>> > separate from what is in apache/master.
> >>>>> >
> >>>>> > Thanks for the hint about the "naive" approach.  Previously I had
> the
> >>>>> > following:
> >>>>> >
> >>>>> >             <hadoop.version>2.0.0-cdh4.1.1</hadoop.version>
> >>>>> >
> >>>>> > <hadoop.client.version>2.0.0-mr1-cdh4.1.1</hadoop.client.version>
> >>>>> >             <hbase.version>0.92.1-cdh4.1.1</hbase.version>
> >>>>> >
> >>>>> > If I follow what you did and change it to:
> >>>>> >
> >>>>> >             <hadoop.version>2.0.0-cdh4.1.1</hadoop.version>
> >>>>> >
> >>>>> > <hadoop.client.version>2.0.0-cdh4.1.1</hadoop.client.version>
> >>>>> >             <hbase.version>0.92.1-cdh4.1.1</hbase.version>
> >>>>> >
> >>>>> > The build gets farther.  I now have a different failure in
> >>>>> > crunch-hbase I'll start working on.
> >>>>> >
> >>>>> > Thanks for your help.
> >>>>> > Micah
> >>>>> >
> >>>>> >
> >>>>> > On Thu, Jan 24, 2013 at 12:23 PM, Josh Wills <[email protected]>
> >>>>> > wrote:
> >>>>> >> Micah,
> >>>>> >>
> >>>>> >> I did the naive thing and just swapped in 2.0.0-cdh4.1.2 for
> >>>>> >> 2.0.0-alpha in
> >>>>> >> the crunch.platform=2 profile in the top level POM and then added
> in
> >>>>> >> the
> >>>>> >> Cloudera repositories. That works for me-- does it work for you?
> It
> >>>>> >> sounds
> >>>>> >> to me like you're creating an entirely new profile.
> >>>>> >>
> >>>>> >> J
> >>>>> >>
> >>>>> >>
> >>>>> >> On Thu, Jan 24, 2013 at 7:58 AM, Micah Whitacre
> >>>>> >> <[email protected]>
> >>>>> >> wrote:
> >>>>> >>>
> >>>>> >>> running dependency:tree on both projects shows that the version
> of
> >>>>> >>> Avro is 1.7.0 for running under both profiles.  I wish it was
> that
> >>>>> >>> easy.  :)
> >>>>> >>>
> >>>>> >>> On Thu, Jan 24, 2013 at 9:53 AM, Josh Wills <[email protected]
> >
> >>>>> >>> wrote:
> >>>>> >>> >
> >>>>> >>> >
> >>>>> >>> >
> >>>>> >>> > On Thu, Jan 24, 2013 at 6:40 AM, Micah Whitacre
> >>>>> >>> > <[email protected]>
> >>>>> >>> > wrote:
> >>>>> >>> >>
> >>>>> >>> >> Taking a step back and comparing what is being generated for a
> >>>>> >>> >> normal
> >>>>> >>> >> successful test run of "-Dcrunch.platform=2" I do see a p1
> and p2
> >>>>> >>> >> directory being created, with the expected materialized output
> >>>>> >>> >> being
> >>>>> >>> >> in the p1 directory.  So I'm still curious about tracking all
> of
> >>>>> >>> >> the
> >>>>> >>> >> intermediate state but it doesn't look like it is an issue
> with
> >>>>> >>> >> regard
> >>>>> >>> >> to creating the output in the wrong directory.
> >>>>> >>> >
> >>>>> >>> >
> >>>>> >>> > That's a relief. :)
> >>>>> >>> >
> >>>>> >>> > I think the issue with temp outputs has to do with our use of
> the
> >>>>> >>> > TemporaryPath libraries for creating, well, temporary paths.
> We do
> >>>>> >>> > this
> >>>>> >>> > so
> >>>>> >>> > we play nicely with CI frameworks, but you might need to
> disable
> >>>>> >>> > it for
> >>>>> >>> > investigating intermediate outputs.
> >>>>> >>> >
> >>>>> >>> > Re: the specific error you're seeing, that looks interesting. I
> >>>>> >>> > wonder
> >>>>> >>> > if
> >>>>> >>> > it's an Avro version change or some such thing. Will see if I
> can
> >>>>> >>> > replicate
> >>>>> >>> > it.
> >>>>> >>> >
> >>>>> >>> >
> >>>>> >>> > --
> >>>>> >>> > Director of Data Science
> >>>>> >>> > Cloudera
> >>>>> >>> > Twitter: @josh_wills
> >>>>> >>
> >>>>> >>
> >>>>> >>
> >>>>> >>
> >>>>> >> --
> >>>>> >> Director of Data Science
> >>>>> >> Cloudera
> >>>>> >> Twitter: @josh_wills
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Director of Data Science
> >>>> Cloudera
> >>>> Twitter: @josh_wills
> >>>>
> >>>> CONFIDENTIALITY NOTICE This message and any included attachments are
> from
> >>>> Cerner Corporation and are intended only for the addressee. The
> information
> >>>> contained in this message is confidential and may constitute inside or
> >>>> non-public information under international, federal, or state
> securities
> >>>> laws. Unauthorized forwarding, printing, copying, distribution, or
> use of
> >>>> such information is strictly prohibited and may be unlawful. If you
> are not
> >>>> the addressee, please promptly delete this message and notify the
> sender of
> >>>> the delivery error by e-mail or you may call Cerner's corporate
> offices in
> >>>> Kansas City, Missouri, U.S.A at (+1) (816)221-1024.
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Director of Data Science
> >>> Cloudera
> >>> Twitter: @josh_wills
> >>
> >>
> >>
> >>
> >> --
> >> Director of Data Science
> >> Cloudera
> >> Twitter: @josh_wills
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Re: Viewing intermediate states for debugging

Reply via email to