I am currently trying to find all such test-cases which are specifically
stopping our build. Here are 3 of these:
All are getting OOM after given time-

TestExampleQueries 10.0 mins.
TestTpchSingleMode 50.0 minutes.
TestReverseImplicitCast 50.0 minutes.

Travis currently allows 3 gigs for entire build.
If we are exceeding this then we probably should also see how we can reduce
the memory footprint. Building the code in Jenkins/Travis skipping the
testcases is also not very convincing.

I am on a chat with Travis guys on a different window.. Is there anything
you would like me to discuss in specific?

Thoughts?

Yash




On Fri, Jun 27, 2014 at 11:27 PM, Jacques Nadeau <[email protected]> wrote:

> sounds like the travis instance doesn't have enough memory to run our
> tests.
>
>
> On Fri, Jun 27, 2014 at 6:08 AM, Yash Sharma <[email protected]> wrote:
>
> > *Single forked *surefire run doesn't help. Neither does *unbounded*
> > *timeout* limit.
> >
> > There is some part of test that is on infinite wait state and ends with
> > OOM.
> > Here is the full log for Travis build with no max memory limit on jvm:
> >
> > https://api.travis-ci.org/jobs/28583768/log.txt?deansi=true
> >
> >
> >
> >
> > > Exception in thread "0d42f23c-c1a8-417a-bee5-672d4449ebac:frag:0:0 -
> > Producer Thread" java.lang.OutOfMemoryError: Direct buffer memory
> > >       at java.nio.Bits.reserveMemory(Bits.java:658)
> > >       at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
> > >       at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
> > >       at
> > io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:434)
> > >       at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179)
> > >       at io.netty.buffer.PoolArena.allocate(PoolArena.java:168)
> > >       at io.netty.buffer.PoolArena.allocate(PoolArena.java:98)
> > >       at
> >
> io.netty.buffer.PooledByteBufAllocatorL.newDirectBuffer(PooledByteBufAllocatorL.java:46)
> > >       at
> >
> io.netty.buffer.PooledByteBufAllocatorL.directBuffer(PooledByteBufAllocatorL.java:66)
> > >       at
> >
> org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:144)
> > >       at
> >
> org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:151)
> > >       at
> >
> org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:306)
> > >       at
> >
> org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:158)
> > >       at
> >
> org.apache.drill.exec.vector.AllocationHelper.allocate(AllocationHelper.java:31)
> > >       at
> >
> org.apache.drill.exec.physical.impl.ScanBatch$Mutator.allocate(ScanBatch.java:281)
> > >       at
> > org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:137)
> > >       at
> >
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:112)
> > >       at
> >
> org.apache.drill.exec.physical.impl.producer.ProducerConsumerBatch$Producer.run(ProducerConsumerBatch.java:122)
> > >       at java.lang.Thread.run(Thread.java:744)
> > >
> > >
> > > No output has been received in the last 10 minutes, this potentially
> > indicates a stalled build or something wrong with the build itself.
> > >
> > >
> >
> >
> >
> > On Fri, Jun 27, 2014 at 10:24 AM, Jacques Nadeau <[email protected]>
> > wrote:
> >
> > > I believe you're getting killed for excessive memory consumption.
> > Dropping
> > > to a single surefire should help
> > > On Jun 26, 2014 8:43 PM, "Yash Sharma" <[email protected]> wrote:
> > >
> > > > The build still failed on Travis after increasing timeout to
> 200000ms.
> > > Need
> > > > to find appropriate value for it.
> > > > It fails with this error - which typically comes in timeout case:
> > > >
> > > >
> > > > Failed to execute goal
> > > > org.apache.maven.plugins:maven-surefire-plugin:2.17:test
> > > > (default-test) on project drill-java-exec: ExecutionException:
> > > > java.lang.RuntimeException: The forked VM terminated without properly
> > > > saying goodbye. VM crash or System.exit called?
> > > > [ERROR] Command was /bin/sh -c cd
> > > > /home/travis/build/yssharma/incubator-drill/exec/java-exec &&
> > > > /usr/lib/jvm/java-7-oracle/jre/bin/java -Xms512m -Xmx2g
> > > > -Ddrill.exec.http.enabled=false
> > > > -Ddrill.exec.sys.store.provider.local.write=false
> -XX:MaxPermSize=256M
> > > > -XX:MaxDirectMemorySize=2096M -XX:+CMSClassUnloadingEnabled -jar
> > > >
> > > >
> > >
> >
> /home/travis/build/yssharma/incubator-drill/exec/java-exec/target/surefire/surefirebooter5087846151760741500.jar
> > > >
> > > >
> > > > Will keep digging.
> > > >
> > > >
> > > > @Jacques: I will re-submit the command line configurable patch soon.
> > > > Will also dig into the surefire forks you mentioned.
> > > >
> > > >
> > > >
> > > > Yash
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, Jun 27, 2014 at 12:39 AM, Yash Sharma <[email protected]>
> > wrote:
> > > >
> > > > > Its was not failing because of the git plugin - rather its how
> Travis
> > > > > takes the clone.
> > > > > Travis uses git clone --depth=50 for fast building.
> > > > >
> > > > > I am able to take a neat build with help of Travis team member
> Hiro.
> > > The
> > > > > current build is still going on here on my box. Will share the
> status
> > > on
> > > > > completion.
> > > > >
> > > > > Have added JIRA for the same, will add a patch soon:
> > > > > https://issues.apache.org/jira/browse/DRILL-1083
> > > > >
> > > > > Peace,
> > > > > Yash
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Jun 25, 2014 at 10:00 AM, Jacques Nadeau <
> [email protected]
> > >
> > > > > wrote:
> > > > >
> > > > >> Yeah, we have to disable test on the apache hardware as our tests
> > our
> > > > too
> > > > >> hungry.  I'm try to get some alternatives to work. If someone
> wanted
> > > to
> > > > >> try
> > > > >> to figure out if we could run on Travis with fork 1, that would be
> > > > great.
> > > > >> Right now its failing because of the got plugin.  You can try
> Travis
> > > on
> > > > >> your local fork to try to find a config that works
> > > > >> On Jun 24, 2014 8:04 PM, "Yash Sharma" <[email protected]> wrote:
> > > > >>
> > > > >> > The final build #57 with skipping tests was successful.
> > > > >> >
> > > > >> > Majority of the tests #55 and #56 have failed due to TimedOut
> > > > exception.
> > > > >> > Other exceptions being - IllegalState(Child level allocators not
> > > > >> closed).
> > > > >> > One instance of InterruptedException which probably occurred
> > because
> > > > of
> > > > >> the
> > > > >> > test case termination only.
> > > > >> >
> > > > >> >
> > > > >> > On Wed, Jun 25, 2014 at 2:59 AM, Timothy Chen <
> [email protected]>
> > > > >> wrote:
> > > > >> >
> > > > >> > > Looks like lots of tests timed out and errored?
> > > > >> > >
> > > > >> > > Tim
> > > > >> > >
> > > > >> > > On Tue, Jun 24, 2014 at 11:53 AM, Yash Sharma <
> > [email protected]>
> > > > >> wrote:
> > > > >> > > > *fingers-crossed* :)
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > On Wed, Jun 25, 2014 at 12:19 AM, Jacques Nadeau <
> > > > >> [email protected]>
> > > > >> > > wrote:
> > > > >> > > >
> > > > >> > > >> I kicked off another build with clean install.  Good catch.
> > > > >>  Hopefully
> > > > >> > > that
> > > > >> > > >> will put things back on track.
> > > > >> > > >>
> > > > >> > > >>
> > > > >> > > >> On Tue, Jun 24, 2014 at 11:46 AM, Yash Sharma <
> > > [email protected]
> > > > >
> > > > >> > > wrote:
> > > > >> > > >>
> > > > >> > > >> > Not exactly able to reproduce the same error currently
> but
> > I
> > > > see
> > > > >> > that
> > > > >> > > it
> > > > >> > > >> > was related to the Drill-1024 commit where the
> hive-storage
> > > > code
> > > > >> was
> > > > >> > > >> moved
> > > > >> > > >> > out of java-exec. The *drillOI* definition has moved from
> > > > >> > config.fmpp
> > > > >> > > >> > (java-exec) to config.fmpp (hive-exec).
> > > > >> > > >> >
> > > > >> > > >> > Jenkins build was still failing in java-exec - that means
> > > that
> > > > >> the
> > > > >> > old
> > > > >> > > >> > ObjectInspectorHelper class was still present and it was
> > > > probably
> > > > >> > > looking
> > > > >> > > >> > for the tdd definition in config.fmpp(java-exec).
> > > > >> > > >> >
> > > > >> > > >> > Jenkins used 'mvn install' rather than 'mvn clean
> install'
> > -
> > > > >> maybe
> > > > >> > it
> > > > >> > > was
> > > > >> > > >> > still referring to old ObjectInspectorHelper class.
> > > > >> > > >> >
> > > > >> > > >> > Still not sure. Will try reproducing exact error.
> > > > >> > > >> >
> > > > >> > > >> > Yash
> > > > >> > > >> >
> > > > >> > > >> >
> > > > >> > > >> >
> > > > >> > > >> >
> > > > >> > > >> >
> > > > >> > > >> > On Tue, Jun 24, 2014 at 10:46 PM, Jacques Nadeau <
> > > > >> > [email protected]>
> > > > >> > > >> > wrote:
> > > > >> > > >> >
> > > > >> > > >> > > Hey guys,
> > > > >> > > >> > >
> > > > >> > > >> > > I just saw that the build on Jenkins is failing.  Any
> > > > committer
> > > > >> > > >> > interested
> > > > >> > > >> > > in trying to troubleshoot?
> > > > >> > > >> > >
> > > > >> > > >> > > https://builds.apache.org/job/drill-scm/54
> > > > >> > > >> > >
> > > > >> > > >> >
> > > > >> > > >>
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to