Jeezy - yes unfortunately I cannot share the query details at this time. No hs_err file was generated.
Philip - Yeah that seems to be the way to go. On Tue, Aug 21, 2018 at 1:51 PM, Philip Zeyliger <[email protected]> wrote: > Hi Brock, > > If you want to make Eclipse MAT more usable, set JAVA_TOOL_OPTIONS="-Xmx2g > -XX:+HeapDumpOnOutOfMemoryError" and you should see the max heap at 2GB, > thereby making Eclipse MAT friendlier. Folks have also been using > http://www.jxray.com/. > > The query itself will also be interesting. If there's something like an > loop in analyzing it, you could imagine that showing up as an OOM. The heap > dump should tell us. > > -- Philip > > On Tue, Aug 21, 2018 at 11:32 AM Brock Noland <[email protected]> wrote: > >> Hi Jeezy, >> >> Thanks, good tip. >> >> The MS is quite small. Even mysqldump format is only 12MB. The largest >> catalog-update I could find is only 1.5MB which should be easy to >> process with 32GB of of heap. Lastly, it's possible we can reproduce >> by running the query the impalad was processing during the issue, >> going to wait until after the users head home to try, but it doesn't >> appear reproducible in the method you describe. When we restarted, it >> did not reproduce until users started running queries. >> >> I0820 19:45:25.106437 25474 statestore.cc:568] Preparing initial >> catalog-update topic update for impalad@XXX:22000. Size = 1.45 MB >> >> Brock >> >> On Tue, Aug 21, 2018 at 1:18 PM, Jeszy <[email protected]> wrote: >> > Hey, >> > >> > If it happens shortly after a restart, there is a fair chance you're >> > crashing while processing the initial catalog topic update. Statestore >> > logs will tell you how big that was (it takes more memory to process >> > it than the actual size of the update). >> > If this is the case, it should also be reproducible, ie. the daemon >> > will keep restarting and running OOM on initial update until you clear >> > the metadata cache either by restarting catalog or via a (global) >> > invalidate metadata. >> > >> > HTH >> > On Tue, 21 Aug 2018 at 20:13, Brock Noland <[email protected]> wrote: >> >> >> >> Hi folks, >> >> >> >> I've got an Impala CDH 5.14.2 cluster with a handful of users, 2-3, at >> >> any one time. All of a sudden the JVM inside the Impalad started >> >> running out of memory. >> >> >> >> I got a heap dump, but the heap was 32GB, host is 240GB, so it's very >> >> large. Thus I wasn't able to get Memory Analyzer Tool (MAT) to open >> >> it. I was able to get JHAT to opening it when setting JHAT's heap to >> >> 160GB. It's pretty unwieldy so much of the JHAT functionality doesn't >> >> work. >> >> >> >> I am spelunking around, but really curious if there is some places I >> >> should check.... >> >> >> >> I am only an occasional reader of Impala source so I am just pointing >> >> out things which felt interesting: >> >> >> >> * Impalad was restarted shortly before the JVM OOM >> >> * Joining Parquet on S3 with Kudu >> >> * Only 13 instances of org.apache.impala.catalog.HdfsTable >> >> * 176836 instances of org.apache.impala.analysis.Analyzer - this feels >> >> odd to me. I remember one bug a while back in Hive when it would clone >> >> the query tree until it ran OOM. >> >> * 176796 of those _user fields point at the same user >> >> * org.apache.impala.thrift.TQueryCt@0x7f90975297f8 has 11048 >> >> org.apache.impala.analysis.Analyzer@GlobalState objects pointing at >> >> it. >> >> * There is only a single instance of >> >> org.apache.impala.thrift.TQueryCtx alive in the JVM which appears to >> >> indicate there is only a single query running. I've tracked that query >> >> down in CM. The users need to compute stats, but I don't feel that is >> >> relevant to this JVM OOM condition. >> >> >> >> Any pointers on what I might look for? >> >> >> >> Cheers, >> >> Brock >>
