On Tue, Nov 7, 2017 at 6:10 AM, Sean Busbey <bus...@apache.org> wrote:
> > Should I be able to see the machine dir when I look at nightlies output? > > (Was trying to see what else is running). > > Ah. we don't have the same machine sampling on nightly as we do in > precommit. I am 80% on a patch for HBASE-19189 (run test ad-hoc > repeatedly) that includes pulling that information gathering into a > place where we could also use it in nightly. > > Sweet. > Did we ever figure out how many cores we expect our tests to need? It > looks like the Hadoop nodes have 8 cores. (with 2 executors that means > 4 is our fair share) > > At the end of the thread inquiry I suggested that we don't use enough cores, that we could up our fork counts and tests would complete in less time. I wanted to experiment some w/ high fork counts -- 16 or so -- to see if concurrent running brought on more failure. St.Ack > On Tue, Nov 7, 2017 at 8:05 AM, Sean Busbey <bus...@apache.org> wrote: > > surefire results get zipped up (we were filling the jenkins hosts with > > old test logs previously) and stored in a file called "test_logs.zip" > > for each jvm run. So if that happend in the jdk7 run for branch-1.2, > > it'd be in artifacts -> output-jdk7 -> test_logs.zip. > > > > I don't know if the archival process grabs things from surefire that > > aren't the surefire XML files, but we can update it to do so if it > > doesn't. > > > > On Mon, Nov 6, 2017 at 11:39 PM, Stack <st...@duboce.net> wrote: > >> I see this in the 1.2 nightly just when it gives up the ghost.... > >> > >> [WARNING] Corrupted STDOUT by directly writing to native stream in > >> forked JVM 2. See FAQ web page and the dump file > >> /testptch/hbase/hbase-server/target/surefire-reports/2017- > 11-06T20-11-30_219-jvmRun2.dumpstream > >> > >> .. but the pointed to dumpstream doesn't seem to be around post build. > >> I am looking in wrong place? > >> > >> > >> Thanks, > >> > >> S > >> > >> > >> On Mon, Nov 6, 2017 at 8:20 PM, Stack <st...@duboce.net> wrote: > >> > >>> On Mon, Nov 6, 2017 at 8:35 AM, Sean Busbey <sean.bus...@gmail.com> > wrote: > >>> > >>>> Given that all of the old post-commit tests have been posting that > >>>> they're failing to JIRAs for what looks like a month, is there any > >>>> reason not to switch to the new tests that also say they're failing? > >>>> > >>>> > >>> No reason. > >>> > >>> > >>> > >>>> The reason HBASE-18467 has been sitting on hold this whole time has > >>>> been because the new nightly branch tests keep complaining about > >>>> failures. > >>>> > >>>> > >>> Looking just now, it looks like killed-off test runs. > >>> > >>> +1 on move to nightlies. > >>> > >>> Can I help? > >>> > >>> Should I be able to see the machine dir when I look at nightlies > output? > >>> (Was trying to see what else is running). > >>> > >>> Thanks Sean, > >>> St.Ack > >>> > >>> > >>> > >>> > >>> > >>> > >>>> On Mon, Nov 6, 2017 at 10:21 AM, Sean Busbey <sean.bus...@gmail.com> > >>>> wrote: > >>>> > It looks like old tests branch-1.2 and branch-1.3 are failing with > >>>> > some maven enforcer problem that we thought we had fixed a few times > >>>> > before. It's probably fixable by changing the version of maven they > >>>> > use, but I'd much rather any test effort go into the last mile of > >>>> > getting our new nightly tests working. > >>>> > > >>>> > I'll start picking this up as soon as I close out HBASE-18784. > >>>> > > >>>> > Please consider branch-1.2 release blocked. :( > >>>> > > >>>> > On Mon, Nov 6, 2017 at 10:19 AM, Stack <st...@duboce.net> wrote: > >>>> >> Our builds seem pretty sick up on builds.apache.org even after the > >>>> miracle > >>>> >> work by Allen W containing errant hadoop processes. Looking at 1.2 > and > >>>> 1.3, > >>>> >> we don't even get off the ground. Anyone been taking a look? > >>>> >> > >>>> >> When I try to run the branch-1.2 and branch-1.3 unit tests locally, > >>>> about > >>>> >> ten tests or so timeout. Have others tried branch-1 test runs > recently? > >>>> >> > >>>> >> Thanks, > >>>> >> S > >>>> >> > >>>> >> > >>>> >> On Mon, Aug 21, 2017 at 1:54 PM, Stack <st...@duboce.net> wrote: > >>>> >> > >>>> >>> Loads of tests timing out in test runs -- then they all pass. > Anyone > >>>> have > >>>> >>> an input? I'm trying to take a look as background task... > >>>> >>> > >>>> >>> S > >>>> >>> > >>>> >>> On Tue, Jul 11, 2017 at 7:05 PM, Stack <st...@duboce.net> wrote: > >>>> >>> > >>>> >>>> Thanks Appy. > >>>> >>>> > >>>> >>>> Any one looking at the 'ERROR ExecutionException Java heap > space...' > >>>> >>>> errors on patch builds or failed forking? Seems common enough. > Here > >>>> are > >>>> >>>> complaints that remote JVM went away: > >>>> >>>> > >>>> >>>> https://builds.apache.org/view/H-L/view/HBase/job/PreCommit- > >>>> >>>> HBASE-Build/7617/artifact/patchprocess/patch-unit-hbase- > server.txt > >>>> >>>> https://builds.apache.org/view/H-L/view/HBase/job/PreCommit- > >>>> >>>> HBASE-Build/7616/artifact/patchprocess/patch-unit-hbase- > server.txt > >>>> >>>> > >>>> >>>> Then this succeeds.... > >>>> >>>> > >>>> >>>> https://builds.apache.org/view/H-L/view/HBase/job/PreCommit- > >>>> >>>> HBASE-Build/7614/artifact/patchprocess/patch-unit-hbase- > server.txt > >>>> >>>> > >>>> >>>> And we are good for a while. > >>>> >>>> > >>>> >>>> Then heap issues: > >>>> >>>> > >>>> >>>> https://builds.apache.org/view/H-L/view/HBase/job/PreCommit- > >>>> >>>> HBASE-Build/7607/artifact/patchprocess/patch-unit-hbase- > server.txt > >>>> >>>> > >>>> >>>> Are the zombies back? > >>>> >>>> > >>>> >>>> St.Ack > >>>> >>>> > >>>> >>>> On Tue, Jul 11, 2017 at 12:33 AM, Apekshit Sharma < > a...@cloudera.com > >>>> > > >>>> >>>> wrote: > >>>> >>>> > >>>> >>>>> Fixed 'trends' in flaky dashboard. Since i changed the test > names > >>>> in last > >>>> >>>>> fix, the dots in the name were messing up with CSS selectors. :) > >>>> >>>>> > >>>> >>>>> > >>>> >>>>> On Mon, Jul 10, 2017 at 11:34 AM, Apekshit Sharma < > >>>> a...@cloudera.com> > >>>> >>>>> wrote: > >>>> >>>>> > >>>> >>>>> > Quick update on flaky dashboard: > >>>> >>>>> > Flaky dashboard wasn't working earlier because our trunk > build was > >>>> >>>>> broken. > >>>> >>>>> > After trunk was fixed, the format of log lines in consoleText > was > >>>> not > >>>> >>>>> the > >>>> >>>>> > same, so findHangingTests.py was not able to parse it > correctly > >>>> for > >>>> >>>>> > broken/hanging/timeout tests. That's been fixed now > HBASE-18341 > >>>> >>>>> > <https://issues.apache.org/jira/browse/HBASE-18341>. > >>>> >>>>> > Drob brought up in other thread that 'treads' isn't working. > It's > >>>> >>>>> probably > >>>> >>>>> > because i changed tests names (which are used as keys in > python > >>>> dicts) > >>>> >>>>> from > >>>> >>>>> > just class name to package name+classname (without common > >>>> >>>>> > org.apache.hadoop.hbase prefix). I had to do it because we > have > >>>> some > >>>> >>>>> tests > >>>> >>>>> > with same class name but in different packages. > >>>> >>>>> > > >>>> >>>>> > I'll take a look at it sometime this week (unless someone > wants to > >>>> >>>>> take it > >>>> >>>>> > up and work on this beautiful piece of infra ;) ) > >>>> >>>>> > > >>>> >>>>> > > >>>> >>>>> > On Thu, Jul 6, 2017 at 11:25 PM, Stack <st...@duboce.net> > wrote: > >>>> >>>>> > > >>>> >>>>> >> On Thu, Jul 6, 2017 at 3:45 PM, Sean Busbey < > bus...@apache.org> > >>>> >>>>> wrote: > >>>> >>>>> >> > >>>> >>>>> >> > that sounds like our project structure is broken. Please > make > >>>> sure > >>>> >>>>> >> there's > >>>> >>>>> >> > a jira that tracks it and I'll take a look later. > >>>> >>>>> >> > > >>>> >>>>> >> > > >>>> >>>>> >> > >>>> >>>>> >> Filed HBASE-18331 for now. > >>>> >>>>> >> > >>>> >>>>> >> I can take a look too later. > >>>> >>>>> >> > >>>> >>>>> >> St.Ack > >>>> >>>>> >> > >>>> >>>>> >> > >>>> >>>>> >> > >>>> >>>>> >> > On Thu, Jul 6, 2017 at 6:15 PM, Stack <st...@duboce.net> > >>>> wrote: > >>>> >>>>> >> > > >>>> >>>>> >> > > I tried publishing hbase-3.0.0-SNAPSHOT... so > >>>> hbase-checkstyle > >>>> >>>>> was up > >>>> >>>>> >> in > >>>> >>>>> >> > > repo (presuming it relied on an aged-out snapshot). > Seems to > >>>> have > >>>> >>>>> >> 'fixed' > >>>> >>>>> >> > > it for now.... > >>>> >>>>> >> > > > >>>> >>>>> >> > > St.Ack > >>>> >>>>> >> > > > >>>> >>>>> >> > > On Thu, Jul 6, 2017 at 12:50 PM, Stack <st...@duboce.net > > > >>>> wrote: > >>>> >>>>> >> > > > >>>> >>>>> >> > > > The 3.0.0-SNAPSHOT looks suspicious ... the hbase > >>>> version.... > >>>> >>>>> >> > > > St.Ack > >>>> >>>>> >> > > > > >>>> >>>>> >> > > > On Thu, Jul 6, 2017 at 12:49 PM, Stack < > st...@duboce.net> > >>>> >>>>> wrote: > >>>> >>>>> >> > > > > >>>> >>>>> >> > > >> On Thu, Jul 6, 2017 at 12:48 PM, Stack < > st...@duboce.net> > >>>> >>>>> wrote: > >>>> >>>>> >> > > >> > >>>> >>>>> >> > > >>> Checkstyle is currently broke on our builds... > looking. > >>>> >>>>> >> > > >>> St.Ack > >>>> >>>>> >> > > >>> > >>>> >>>>> >> > > >>> > >>>> >>>>> >> > > >> Works if I run it locally (of course) > >>>> >>>>> >> > > >> St.Ack > >>>> >>>>> >> > > >> > >>>> >>>>> >> > > >> > >>>> >>>>> >> > > >> > >>>> >>>>> >> > > >> > >>>> >>>>> >> > > >>> > >>>> >>>>> >> > > >>> > >>>> >>>>> >> > > >>> [ERROR] Failed to execute goal > org.apache.maven.plugins: > >>>> >>>>> >> > > maven-checkstyle-plugin:2.17:checkstyle (default-cli) on > >>>> project > >>>> >>>>> >> hbase: > >>>> >>>>> >> > > Execution default-cli of goal org.apache.maven.plugins: > >>>> >>>>> >> > > maven-checkstyle-plugin:2.17:checkstyle failed: Plugin > >>>> >>>>> >> > > org.apache.maven.plugins:maven-checkstyle-plugin:2.17 or > >>>> one of > >>>> >>>>> its > >>>> >>>>> >> > > dependencies could not be resolved: Could not find > artifact > >>>> >>>>> >> > > org.apache.hbase:hbase-checkstyle:jar:3.0.0-SNAPSHOT in > >>>> Nexus ( > >>>> >>>>> >> > > http://repository.apache.org/snapshots) -> [Help > 1][ERROR] > >>>> >>>>> [ERROR] To > >>>> >>>>> >> > see > >>>> >>>>> >> > > the full stack trace of the errors, re-run Maven with > the -e > >>>> >>>>> >> > switch.[ERROR] > >>>> >>>>> >> > > Re-run Maven using the -X switch to enable full debug > >>>> >>>>> logging.[ERROR] > >>>> >>>>> >> > > [ERROR] For more information about the errors and > possible > >>>> >>>>> solutions, > >>>> >>>>> >> > > please read the following articles:[ERROR] [Help 1] > >>>> >>>>> >> > > http://cwiki.apache.org/confluence/display/MAVEN/ > >>>> >>>>> >> > > PluginResolutionExceptionBuild step 'Invoke top-level > Maven > >>>> >>>>> targets' > >>>> >>>>> >> > > marked build as failure > >>>> >>>>> >> > > >>> Performing Post build task... > >>>> >>>>> >> > > >>> Match found for :.* : True > >>>> >>>>> >> > > >>> Logical operation result is TRUE > >>>> >>>>> >> > > >>> Running script : # Run zombie detector script > >>>> >>>>> >> > > >>> ./dev-support/zombie-detector.sh --jenkins > ${BUILD_ID} > >>>> >>>>> >> > > >>> [a3159d73] $ /bin/bash -xe > /tmp/hudson1697041977582083402 > >>>> .sh > >>>> >>>>> >> > > >>> + ./dev-support/zombie-detector.sh --jenkins 3320 > >>>> >>>>> >> > > >>> Thu Jul 6 01:37:09 UTC 2017 We're ok: there is no > >>>> zombie test > >>>> >>>>> >> > > >>> > >>>> >>>>> >> > > >>> > >>>> >>>>> >> > > >>> > >>>> >>>>> >> > > >>> > >>>> >>>>> >> > > >>> On Fri, Jun 30, 2017 at 2:43 PM, Sean Busbey < > >>>> >>>>> bus...@apache.org> > >>>> >>>>> >> > > wrote: > >>>> >>>>> >> > > >>> > >>>> >>>>> >> > > >>>> jacoco was added ages ago. I'd guess that something > >>>> changed > >>>> >>>>> on > >>>> >>>>> >> the > >>>> >>>>> >> > > >>>> machines > >>>> >>>>> >> > > >>>> we use to cause it to stop working. > >>>> >>>>> >> > > >>>> > >>>> >>>>> >> > > >>>> On Thu, Jun 29, 2017 at 12:02 PM, Stack < > >>>> st...@duboce.net> > >>>> >>>>> >> wrote: > >>>> >>>>> >> > > >>>> > >>>> >>>>> >> > > >>>> > On Wed, Jun 28, 2017 at 8:43 AM, Josh Elser < > >>>> >>>>> els...@apache.org > >>>> >>>>> >> > > >>>> >>>>> >> > > >>>> wrote: > >>>> >>>>> >> > > >>>> > > >>>> >>>>> >> > > >>>> > > > >>>> >>>>> >> > > >>>> > > > >>>> >>>>> >> > > >>>> > > On 6/27/17 7:20 PM, Stack wrote: > >>>> >>>>> >> > > >>>> > > > >>>> >>>>> >> > > >>>> > >> * test-patch's whitespace plugin can > configured to > >>>> >>>>> ignore > >>>> >>>>> >> some > >>>> >>>>> >> > > >>>> files > >>>> >>>>> >> > > >>>> > (but > >>>> >>>>> >> > > >>>> > >>> I > >>>> >>>>> >> > > >>>> > >>> can't think of any we'd care to so whitelist) > >>>> >>>>> >> > > >>>> > >>> > >>>> >>>>> >> > > >>>> > >>> Generated files. > >>>> >>>>> >> > > >>>> > >> > >>>> >>>>> >> > > >>>> > > > >>>> >>>>> >> > > >>>> > > Oh my goodness, yes, please. This has been such > a > >>>> pain > >>>> >>>>> in the > >>>> >>>>> >> > rear > >>>> >>>>> >> > > >>>> for me > >>>> >>>>> >> > > >>>> > > as I've been rebasing space quota patches. > >>>> Sometimes, the > >>>> >>>>> >> spaces > >>>> >>>>> >> > > in > >>>> >>>>> >> > > >>>> > > pb-gen'ed code are removed by folks before > commit, > >>>> other > >>>> >>>>> >> times > >>>> >>>>> >> > > they > >>>> >>>>> >> > > >>>> > aren't. > >>>> >>>>> >> > > >>>> > > > >>>> >>>>> >> > > >>>> > > >>>> >>>>> >> > > >>>> > Agree sir. Its a distraction at least. > >>>> >>>>> >> > > >>>> > > >>>> >>>>> >> > > >>>> > I see Jacoco report here now: > >>>> >>>>> >> > > >>>> > https://builds.apache.org/job/ > >>>> HBase-Trunk_matrix/jdk=JDK% > >>>> >>>>> >> > > >>>> > 201.8%20(latest),label=Hadoop/3277/ > >>>> >>>>> >> > > >>>> > > >>>> >>>>> >> > > >>>> > Maybe it has been there always and I just haven't > >>>> noticed. > >>>> >>>>> >> > > >>>> > > >>>> >>>>> >> > > >>>> > Its all 0%. We need to turn on stuff? > >>>> >>>>> >> > > >>>> > > >>>> >>>>> >> > > >>>> > St.Ack > >>>> >>>>> >> > > >>>> > > >>>> >>>>> >> > > >>>> > >>>> >>>>> >> > > >>> > >>>> >>>>> >> > > >>> > >>>> >>>>> >> > > >> > >>>> >>>>> >> > > > > >>>> >>>>> >> > > > >>>> >>>>> >> > > >>>> >>>>> >> > >>>> >>>>> > > >>>> >>>>> > > >>>> >>>>> > > >>>> >>>>> > -- > >>>> >>>>> > > >>>> >>>>> > -- Appy > >>>> >>>>> > > >>>> >>>>> > >>>> >>>>> > >>>> >>>>> > >>>> >>>>> -- > >>>> >>>>> > >>>> >>>>> -- Appy > >>>> >>>>> > >>>> >>>> > >>>> >>>> > >>>> >>> > >>>> > > >>>> > > >>>> > > >>>> > -- > >>>> > Sean > >>>> > >>>> > >>>> > >>>> -- > >>>> Sean > >>>> > >>> > >>> >