Re: Impalad JVM OOM minutes after restart

Brock Noland Tue, 21 Aug 2018 12:06:01 -0700

Jeezy - yes unfortunately I cannot share the query details at this
time. No hs_err file was generated.


Philip - Yeah that seems to be the way to go.

On Tue, Aug 21, 2018 at 1:51 PM, Philip Zeyliger <[email protected]> wrote:
> Hi Brock,
>
> If you want to make Eclipse MAT more usable, set JAVA_TOOL_OPTIONS="-Xmx2g
>  -XX:+HeapDumpOnOutOfMemoryError" and you should see the max heap at 2GB,
> thereby making Eclipse MAT friendlier. Folks have also been using
> http://www.jxray.com/.
>
> The query itself will also be interesting. If there's something like an
> loop in analyzing it, you could imagine that showing up as an OOM. The heap
> dump should tell us.
>
> -- Philip
>
> On Tue, Aug 21, 2018 at 11:32 AM Brock Noland <[email protected]> wrote:
>
>> Hi Jeezy,
>>
>> Thanks, good tip.
>>
>> The MS is quite small. Even mysqldump format is only 12MB. The largest
>> catalog-update I could find is only 1.5MB which should be easy to
>> process with 32GB of of heap. Lastly, it's possible we can reproduce
>> by running the query the impalad was processing during the issue,
>> going to wait until after the users head home to try, but it doesn't
>> appear reproducible in the method you describe. When we restarted, it
>> did not reproduce until users started running queries.
>>
>> I0820 19:45:25.106437 25474 statestore.cc:568] Preparing initial
>> catalog-update topic update for impalad@XXX:22000. Size = 1.45 MB
>>
>> Brock
>>
>> On Tue, Aug 21, 2018 at 1:18 PM, Jeszy <[email protected]> wrote:
>> > Hey,
>> >
>> > If it happens shortly after a restart, there is a fair chance you're
>> > crashing while processing the initial catalog topic update. Statestore
>> > logs will tell you how big that was (it takes more memory to process
>> > it than the actual size of the update).
>> > If this is the case, it should also be reproducible, ie. the daemon
>> > will keep restarting and running OOM on initial update until you clear
>> > the metadata cache either by restarting catalog or via a (global)
>> > invalidate metadata.
>> >
>> > HTH
>> > On Tue, 21 Aug 2018 at 20:13, Brock Noland <[email protected]> wrote:
>> >>
>> >> Hi folks,
>> >>
>> >> I've got an Impala CDH 5.14.2 cluster with a handful of users, 2-3, at
>> >> any one time. All of a sudden the JVM inside the Impalad started
>> >> running out of memory.
>> >>
>> >> I got a heap dump, but the heap was 32GB, host is 240GB, so it's very
>> >> large. Thus I wasn't able to get Memory Analyzer Tool (MAT) to open
>> >> it. I was able to get JHAT to opening it when setting JHAT's heap to
>> >> 160GB. It's pretty unwieldy so much of the JHAT functionality doesn't
>> >> work.
>> >>
>> >> I am spelunking around, but really curious if there is some places I
>> >> should check....
>> >>
>> >> I am only an occasional reader of Impala source so I am just pointing
>> >> out things which felt interesting:
>> >>
>> >> * Impalad was restarted shortly before the JVM OOM
>> >> * Joining Parquet on S3 with Kudu
>> >> * Only 13  instances of org.apache.impala.catalog.HdfsTable
>> >> * 176836 instances of org.apache.impala.analysis.Analyzer - this feels
>> >> odd to me. I remember one bug a while back in Hive when it would clone
>> >> the query tree until it ran OOM.
>> >> * 176796 of those _user fields point at the same user
>> >> * org.apache.impala.thrift.TQueryCt@0x7f90975297f8 has 11048
>> >> org.apache.impala.analysis.Analyzer@GlobalState objects pointing at
>> >> it.
>> >> *  There is only a single instance of
>> >> org.apache.impala.thrift.TQueryCtx alive in the JVM which appears to
>> >> indicate there is only a single query running. I've tracked that query
>> >> down in CM. The users need to compute stats, but I don't feel that is
>> >> relevant to this JVM OOM condition.
>> >>
>> >> Any pointers on what I might look for?
>> >>
>> >> Cheers,
>> >> Brock
>>

Re: Impalad JVM OOM minutes after restart

Reply via email to