I am currently trying to find all such test-cases which are specifically stopping our build. Here are 3 of these: All are getting OOM after given time-
TestExampleQueries 10.0 mins. TestTpchSingleMode 50.0 minutes. TestReverseImplicitCast 50.0 minutes. Travis currently allows 3 gigs for entire build. If we are exceeding this then we probably should also see how we can reduce the memory footprint. Building the code in Jenkins/Travis skipping the testcases is also not very convincing. I am on a chat with Travis guys on a different window.. Is there anything you would like me to discuss in specific? Thoughts? Yash On Fri, Jun 27, 2014 at 11:27 PM, Jacques Nadeau <[email protected]> wrote: > sounds like the travis instance doesn't have enough memory to run our > tests. > > > On Fri, Jun 27, 2014 at 6:08 AM, Yash Sharma <[email protected]> wrote: > > > *Single forked *surefire run doesn't help. Neither does *unbounded* > > *timeout* limit. > > > > There is some part of test that is on infinite wait state and ends with > > OOM. > > Here is the full log for Travis build with no max memory limit on jvm: > > > > https://api.travis-ci.org/jobs/28583768/log.txt?deansi=true > > > > > > > > > > > Exception in thread "0d42f23c-c1a8-417a-bee5-672d4449ebac:frag:0:0 - > > Producer Thread" java.lang.OutOfMemoryError: Direct buffer memory > > > at java.nio.Bits.reserveMemory(Bits.java:658) > > > at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) > > > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) > > > at > > io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:434) > > > at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) > > > at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) > > > at io.netty.buffer.PoolArena.allocate(PoolArena.java:98) > > > at > > > io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:46) > > > at > > > io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:66) > > > at > > > org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:144) > > > at > > > org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:151) > > > at > > > org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:306) > > > at > > > org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:158) > > > at > > > org.apache.drill.exec.vector.AllocationHelper.allocate(AllocationHelper.java:31) > > > at > > > org.apache.drill.exec.physical.impl.ScanBatch$Mutator.allocate(ScanBatch.java:281) > > > at > > org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:137) > > > at > > > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:112) > > > at > > > org.apache.drill.exec.physical.impl.producer.ProducerConsumerBatch$Producer.run(ProducerConsumerBatch.java:122) > > > at java.lang.Thread.run(Thread.java:744) > > > > > > > > > No output has been received in the last 10 minutes, this potentially > > indicates a stalled build or something wrong with the build itself. > > > > > > > > > > > > > > On Fri, Jun 27, 2014 at 10:24 AM, Jacques Nadeau <[email protected]> > > wrote: > > > > > I believe you're getting killed for excessive memory consumption. > > Dropping > > > to a single surefire should help > > > On Jun 26, 2014 8:43 PM, "Yash Sharma" <[email protected]> wrote: > > > > > > > The build still failed on Travis after increasing timeout to > 200000ms. > > > Need > > > > to find appropriate value for it. > > > > It fails with this error - which typically comes in timeout case: > > > > > > > > > > > > Failed to execute goal > > > > org.apache.maven.plugins:maven-surefire-plugin:2.17:test > > > > (default-test) on project drill-java-exec: ExecutionException: > > > > java.lang.RuntimeException: The forked VM terminated without properly > > > > saying goodbye. VM crash or System.exit called? > > > > [ERROR] Command was /bin/sh -c cd > > > > /home/travis/build/yssharma/incubator-drill/exec/java-exec && > > > > /usr/lib/jvm/java-7-oracle/jre/bin/java -Xms512m -Xmx2g > > > > -Ddrill.exec.http.enabled=false > > > > -Ddrill.exec.sys.store.provider.local.write=false > -XX:MaxPermSize=256M > > > > -XX:MaxDirectMemorySize=2096M -XX:+CMSClassUnloadingEnabled -jar > > > > > > > > > > > > > > /home/travis/build/yssharma/incubator-drill/exec/java-exec/target/surefire/surefirebooter5087846151760741500.jar > > > > > > > > > > > > Will keep digging. > > > > > > > > > > > > @Jacques: I will re-submit the command line configurable patch soon. > > > > Will also dig into the surefire forks you mentioned. > > > > > > > > > > > > > > > > Yash > > > > > > > > > > > > > > > > > > > > On Fri, Jun 27, 2014 at 12:39 AM, Yash Sharma <[email protected]> > > wrote: > > > > > > > > > Its was not failing because of the git plugin - rather its how > Travis > > > > > takes the clone. > > > > > Travis uses git clone --depth=50 for fast building. > > > > > > > > > > I am able to take a neat build with help of Travis team member > Hiro. > > > The > > > > > current build is still going on here on my box. Will share the > status > > > on > > > > > completion. > > > > > > > > > > Have added JIRA for the same, will add a patch soon: > > > > > https://issues.apache.org/jira/browse/DRILL-1083 > > > > > > > > > > Peace, > > > > > Yash > > > > > > > > > > > > > > > > > > > > On Wed, Jun 25, 2014 at 10:00 AM, Jacques Nadeau < > [email protected] > > > > > > > > wrote: > > > > > > > > > >> Yeah, we have to disable test on the apache hardware as our tests > > our > > > > too > > > > >> hungry. I'm try to get some alternatives to work. If someone > wanted > > > to > > > > >> try > > > > >> to figure out if we could run on Travis with fork 1, that would be > > > > great. > > > > >> Right now its failing because of the got plugin. You can try > Travis > > > on > > > > >> your local fork to try to find a config that works > > > > >> On Jun 24, 2014 8:04 PM, "Yash Sharma" <[email protected]> wrote: > > > > >> > > > > >> > The final build #57 with skipping tests was successful. > > > > >> > > > > > >> > Majority of the tests #55 and #56 have failed due to TimedOut > > > > exception. > > > > >> > Other exceptions being - IllegalState(Child level allocators not > > > > >> closed). > > > > >> > One instance of InterruptedException which probably occurred > > because > > > > of > > > > >> the > > > > >> > test case termination only. > > > > >> > > > > > >> > > > > > >> > On Wed, Jun 25, 2014 at 2:59 AM, Timothy Chen < > [email protected]> > > > > >> wrote: > > > > >> > > > > > >> > > Looks like lots of tests timed out and errored? > > > > >> > > > > > > >> > > Tim > > > > >> > > > > > > >> > > On Tue, Jun 24, 2014 at 11:53 AM, Yash Sharma < > > [email protected]> > > > > >> wrote: > > > > >> > > > *fingers-crossed* :) > > > > >> > > > > > > > >> > > > > > > > >> > > > On Wed, Jun 25, 2014 at 12:19 AM, Jacques Nadeau < > > > > >> [email protected]> > > > > >> > > wrote: > > > > >> > > > > > > > >> > > >> I kicked off another build with clean install. Good catch. > > > > >> Hopefully > > > > >> > > that > > > > >> > > >> will put things back on track. > > > > >> > > >> > > > > >> > > >> > > > > >> > > >> On Tue, Jun 24, 2014 at 11:46 AM, Yash Sharma < > > > [email protected] > > > > > > > > > >> > > wrote: > > > > >> > > >> > > > > >> > > >> > Not exactly able to reproduce the same error currently > but > > I > > > > see > > > > >> > that > > > > >> > > it > > > > >> > > >> > was related to the Drill-1024 commit where the > hive-storage > > > > code > > > > >> was > > > > >> > > >> moved > > > > >> > > >> > out of java-exec. The *drillOI* definition has moved from > > > > >> > config.fmpp > > > > >> > > >> > (java-exec) to config.fmpp (hive-exec). > > > > >> > > >> > > > > > >> > > >> > Jenkins build was still failing in java-exec - that means > > > that > > > > >> the > > > > >> > old > > > > >> > > >> > ObjectInspectorHelper class was still present and it was > > > > probably > > > > >> > > looking > > > > >> > > >> > for the tdd definition in config.fmpp(java-exec). > > > > >> > > >> > > > > > >> > > >> > Jenkins used 'mvn install' rather than 'mvn clean > install' > > - > > > > >> maybe > > > > >> > it > > > > >> > > was > > > > >> > > >> > still referring to old ObjectInspectorHelper class. > > > > >> > > >> > > > > > >> > > >> > Still not sure. Will try reproducing exact error. > > > > >> > > >> > > > > > >> > > >> > Yash > > > > >> > > >> > > > > > >> > > >> > > > > > >> > > >> > > > > > >> > > >> > > > > > >> > > >> > > > > > >> > > >> > On Tue, Jun 24, 2014 at 10:46 PM, Jacques Nadeau < > > > > >> > [email protected]> > > > > >> > > >> > wrote: > > > > >> > > >> > > > > > >> > > >> > > Hey guys, > > > > >> > > >> > > > > > > >> > > >> > > I just saw that the build on Jenkins is failing. Any > > > > committer > > > > >> > > >> > interested > > > > >> > > >> > > in trying to troubleshoot? > > > > >> > > >> > > > > > > >> > > >> > > https://builds.apache.org/job/drill-scm/54 > > > > >> > > >> > > > > > > >> > > >> > > > > > >> > > >> > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > > > > > > > > > > >
