Re: Impalad JVM OOM minutes after restart

Jeszy Tue, 21 Aug 2018 11:51:31 -0700

Hm, that's interesting because:
- I haven't yet seen query planning itself cause OOM
- if it was catalog related to the tables involved in the query,
following initial topic size would be bigger


Can you share diagnostic data, like the query text, definitions and
stats for tables involved, hs_err_pid written on crash, etc?
On Tue, 21 Aug 2018 at 20:32, Brock Noland <br...@phdata.io> wrote:
>
> Hi Jeezy,
>
> Thanks, good tip.
>
> The MS is quite small. Even mysqldump format is only 12MB. The largest
> catalog-update I could find is only 1.5MB which should be easy to
> process with 32GB of of heap. Lastly, it's possible we can reproduce
> by running the query the impalad was processing during the issue,
> going to wait until after the users head home to try, but it doesn't
> appear reproducible in the method you describe. When we restarted, it
> did not reproduce until users started running queries.
>
> I0820 19:45:25.106437 25474 statestore.cc:568] Preparing initial
> catalog-update topic update for impalad@XXX:22000. Size = 1.45 MB
>
> Brock
>
> On Tue, Aug 21, 2018 at 1:18 PM, Jeszy <jes...@gmail.com> wrote:
> > Hey,
> >
> > If it happens shortly after a restart, there is a fair chance you're
> > crashing while processing the initial catalog topic update. Statestore
> > logs will tell you how big that was (it takes more memory to process
> > it than the actual size of the update).
> > If this is the case, it should also be reproducible, ie. the daemon
> > will keep restarting and running OOM on initial update until you clear
> > the metadata cache either by restarting catalog or via a (global)
> > invalidate metadata.
> >
> > HTH
> > On Tue, 21 Aug 2018 at 20:13, Brock Noland <br...@phdata.io> wrote:
> >>
> >> Hi folks,
> >>
> >> I've got an Impala CDH 5.14.2 cluster with a handful of users, 2-3, at
> >> any one time. All of a sudden the JVM inside the Impalad started
> >> running out of memory.
> >>
> >> I got a heap dump, but the heap was 32GB, host is 240GB, so it's very
> >> large. Thus I wasn't able to get Memory Analyzer Tool (MAT) to open
> >> it. I was able to get JHAT to opening it when setting JHAT's heap to
> >> 160GB. It's pretty unwieldy so much of the JHAT functionality doesn't
> >> work.
> >>
> >> I am spelunking around, but really curious if there is some places I
> >> should check....
> >>
> >> I am only an occasional reader of Impala source so I am just pointing
> >> out things which felt interesting:
> >>
> >> * Impalad was restarted shortly before the JVM OOM
> >> * Joining Parquet on S3 with Kudu
> >> * Only 13  instances of org.apache.impala.catalog.HdfsTable
> >> * 176836 instances of org.apache.impala.analysis.Analyzer - this feels
> >> odd to me. I remember one bug a while back in Hive when it would clone
> >> the query tree until it ran OOM.
> >> * 176796 of those _user fields point at the same user
> >> * org.apache.impala.thrift.TQueryCt@0x7f90975297f8 has 11048
> >> org.apache.impala.analysis.Analyzer@GlobalState objects pointing at
> >> it.
> >> *  There is only a single instance of
> >> org.apache.impala.thrift.TQueryCtx alive in the JVM which appears to
> >> indicate there is only a single query running. I've tracked that query
> >> down in CM. The users need to compute stats, but I don't feel that is
> >> relevant to this JVM OOM condition.
> >>
> >> Any pointers on what I might look for?
> >>
> >> Cheers,
> >> Brock

Re: Impalad JVM OOM minutes after restart

Reply via email to