Re: Suspicious direct memory consumption when running queries concurrently

Jacques Nadeau Sun, 02 Aug 2015 09:12:22 -0700

If you give me 5 sample queries,  a simple harness should be easy to
create.
On Aug 2, 2015 9:06 AM, "Abdel Hakim Deneche" <[email protected]> wrote:


> @yulia,
> Just checked the available disk space and there is more than enough for the
> dump :(
>
> @Jacques,
> You'll need the test framework to be able to reproduce this. I'm already
> using one single node and it's just a matter of running a bunch of window
> function queries concurrently and repeat this a lot.
>
> Looking at the memory growth it seems to become stable after some time,
> which may suggest it's not a memory leak, I still have 3 questions I will
> try to find answers for:
> - why Netty doesn't release memory chunks when no queries are running (up
> to 5GB if you run enough iterations)
> - are all those allocated chunks being used when you run one more
> iteration, or does Netty only use some of them and leave the rest allocated
> for no reason ? (I should be able to get this from the memory logs I
> already have)
> - is this an "expected" behavior of Netty's allocator and we should just
> learn to live with it ?
>
>
> On Fri, Jul 31, 2015 at 10:40 PM, yuliya Feldman <
> [email protected]> wrote:
>
> > How much memory your jvm is taking?
> > Do you even have enough disk space to dump it.
> >       From: Abdel Hakim Deneche <[email protected]>
> >  To: "[email protected]" <[email protected]>
> >  Sent: Friday, July 31, 2015 9:19 PM
> >  Subject: Re: Suspicious direct memory consumption when running queries
> > concurrently
> >
> > I tried getting a jmap dump multiple times without success, each time it
> > crashes the jvm with the following exception:
> >
> > Dumping heap to /home/mapr/private-sql-hadoop-test/framework/myfile.hprof
> > > ...
> > > Exception in thread "main" java.io.IOException: Premature EOF
> > >        at
> > >
> >
> sun.tools.attach.HotSpotVirtualMachine.readInt(HotSpotVirtualMachine.java:248)
> > >        at
> > >
> >
> sun.tools.attach.LinuxVirtualMachine.execute(LinuxVirtualMachine.java:199)
> > >        at
> > >
> >
> sun.tools.attach.HotSpotVirtualMachine.executeCommand(HotSpotVirtualMachine.java:217)
> > >        at
> > >
> >
> sun.tools.attach.HotSpotVirtualMachine.dumpHeap(HotSpotVirtualMachine.java:180)
> > >        at sun.tools.jmap.JMap.dump(JMap.java:242)
> > >        at sun.tools.jmap.JMap.main(JMap.java:140)
> >
> >
> > On Mon, Jul 27, 2015 at 3:45 PM, Jacques Nadeau <[email protected]>
> > wrote:
> >
> > > A allocate -> release cycle all on the same thread goes into a per
> thread
> > > cache.
> > >
> > > A bunch of Netty arena settings are configurable.  The big issue I
> > believe
> > > is that the limits are soft limits implemented by the allocation-time
> > > release mechanism.  As such, if you allocate a bunch of memory, then
> > > release it all, that won't necessarily trigger any actual chunk
> releases.
> > >
> > > --
> > > Jacques Nadeau
> > > CTO and Co-Founder, Dremio
> > >
> > > On Mon, Jul 27, 2015 at 12:47 PM, Abdel Hakim Deneche <
> > > [email protected]
> > > > wrote:
> > >
> > > > @Jacques, my understanding is that chunks are not owned by specific a
> > > > thread but they are part of a specific memory arena which is in turn
> > only
> > > > accessed by specific threads. Do you want me to find which threads
> are
> > > > associated with the same arena where we have hanging chunks ?
> > > >
> > > >
> > > > On Mon, Jul 27, 2015 at 11:04 AM, Jacques Nadeau <[email protected]
> >
> > > > wrote:
> > > >
> > > > > It sounds like your statement is that we're cacheing too many
> unused
> > > > > chunks.  Hanifi and I previously discussed implementing a separate
> > > > flushing
> > > > > mechanism to release unallocated chunks that are hanging around.
> The
> > > > main
> > > > > question is, why are so many chunks hanging around and what threads
> > are
> > > > > they associated with.  A Jmap dump and analysis should allow you to
> > do
> > > > > determine which thread owns the excess chunks.  My guess would be
> the
> > > RPC
> > > > > pool since those are long lasting (as opposed to the WorkManager
> > pool,
> > > > > which is contracting).
> > > > >
> > > > > --
> > > > > Jacques Nadeau
> > > > > CTO and Co-Founder, Dremio
> > > > >
> > > > > On Mon, Jul 27, 2015 at 9:53 AM, Abdel Hakim Deneche <
> > > > > [email protected]>
> > > > > wrote:
> > > > >
> > > > > > When running a set of, mostly window function, queries
> concurrently
> > > on
> > > > a
> > > > > > single drillbit with a 8GB max direct memory. We are seeing a
> > > > continuous
> > > > > > increase of direct memory allocation.
> > > > > >
> > > > > > We repeat the following steps multiple times:
> > > > > > - we launch in "iteration" of tests that will run all queries in
> a
> > > > random
> > > > > > order, 10 queries at a time
> > > > > > - after the iteration finishes, we wait for a couple of minute to
> > > give
> > > > > > Drill time to release the memory being held by the finishing
> > > fragments
> > > > > >
> > > > > > Using Drill's memory logger ("drill.allocator") we were able to
> get
> > > > > > snapshots of how memory was internally used by Netty, we only
> > focused
> > > > on
> > > > > > the number of allocated chunks, if we take this number and
> multiply
> > > it
> > > > by
> > > > > > 16MB (netty's chunk size) we get approximately the same value
> > > reported
> > > > by
> > > > > > Drill's direct memory allocation.
> > > > > > Here is a graph that shows the evolution of the number of
> allocated
> > > > > chunks
> > > > > > on a 500 iterations run (I'm working on improving the plots) :
> > > > > >
> > > > > > http://bit.ly/1JL6Kp3
> > > > > >
> > > > > > In this specific case, after the first iteration Drill was
> > allocating
> > > > > ~2GB
> > > > > > of direct memory, this number kept rising after each iteration to
> > > ~6GB.
> > > > > We
> > > > > > suspect this caused one of our previous runs to crash the JVM.
> > > > > >
> > > > > > If we only focus on the log lines between iterations (when
> Drill's
> > > > memory
> > > > > > usage is below 10MB) then all allocated chunks are at most 2%
> > usage.
> > > At
> > > > > > some point we end up with 288 nearly empty chunks, yet the next
> > > > iteration
> > > > > > will cause more chunks to be allocated!!!
> > > > > >
> > > > > > is this expected ?
> > > > > >
> > > > > > PS: I am running more tests and will update this thread with more
> > > > > > informations.
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Abdelhakim Deneche
> > > > > >
> > > > > > Software Engineer
> > > > > >
> > > > > >  <http://www.mapr.com/>
> > > > > >
> > > > > >
> > > > > > Now Available - Free Hadoop On-Demand Training
> > > > > > <
> > > > > >
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
> >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Abdelhakim Deneche
> > > >
> > > > Software Engineer
> > > >
> > > >  <http://www.mapr.com/>
> > > >
> > > >
> > > > Now Available - Free Hadoop On-Demand Training
> > > > <
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
> >
> >
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>

Re: Suspicious direct memory consumption when running queries concurrently

Reply via email to