Re: Re: Impala Incident Sharing

Todd Lipcon Wed, 01 Aug 2018 10:10:09 -0700

On Wed, Aug 1, 2018 at 6:02 AM, Quanlong Huang <[email protected]>
wrote:


> Hi Todd and Philip,
>
>
> The idea of separate ClassLoader is cool. But I would vote for executing
> java UDFs in a separate JVM. We don't need to create a JVM for each UDF.
> They can share the same JVM with a configurable hard limit. If the JVM for
> UDFs runs out of heap size, we can destroy the JVM and fail the queries.
> (Not sure if this is doable) The drawback is that some innocent queries
> using java UDFs will fail too, but the cluster will still be healthy.
>

I think this would be great, but we need to make sure not to regress
performance significantly. For example, maybe we'd need to expose row
batches as a shared memory region and then map them as a DirectByteBuffer
on the Java side. I think this would probably be a lot of work, though, and
maybe the Java UDF usage is uncommon enough that it may not be worth all of
the coding?

-Todd


>
> The patch of adding JVM pause logs and expose JMX metrics is really
> helpful. I'd like to join the code review.
>
>
> Thanks,
> Quanlong
>
> At 2018-08-01 05:47:25, "Philip Zeyliger" <[email protected]>
> wrote:
> >Hi Quanlong,
> >
> >Thank you for the incident note! You might be interested in
> >https://gerrit.cloudera.org/#/c/10998/ which is adding some
> instrumentation
> >to make it easier to notice with monitoring tools that we're running out
> of
> >memory or having large GC pauses.
> >
> >-- Philip
> >
> >On Tue, Jul 31, 2018 at 8:21 AM Todd Lipcon <[email protected]>
> >wrote:
> >
> >> Hi Quanlong,
> >>
> >> Thanks for the writeup. I wonder if we could two of the issues with a
> >> single approach of using a separate ClassLoader instance when we load
> and
> >> instantiate custom UDFs. For one, using a custom ClassLoader means that
> we
> >> could probably ensure that the user's jar is checked first even if
> there is
> >> a conflicting class on the Impala classpath. For two, if we close the
> >> custom classloader and ensure that we don't retain any references to it
> or
> >> to classes created by it (i.e the UDF classes), then we should be able
> to
> >> collect leaked memory on the next GC. In the particular case you pointed
> >> out, the leak is in static fields of the UDF, and those static fields
> would
> >> get destructed when destructing the classloader.
> >>
> >> On the downside, making this change could negatively harm performance if
> >> any UDFs are relying on their static initializer blocks to do expensive
> >> setup. Maybe a compromise would be to make this behavior optional or to
> >> cache the ClassLoader via a WeakReference across instantiations of a
> UDF so
> >> that, if there isn't memory pressure, the UDF doesn't need to be
> >> re-class-loaded repeatedly.
> >>
> >> As for tracking memory explicitly, the way that HBase does it is via
> >> somewhat careful accounting inline with all of the code that allocates
> >> memory. I don't think that approach is feasible in UDFs because the UDF
> >> code is provided by end users, and they aren't likely to want to
> carefully
> >> track their heap usage. I think the only way we could effectively
> "sandbox"
> >> UDFs would be to actually execute them in a separate JVM with a limited
> >> heap size, but that would be somewhat complex to implement without a big
> >> performance drop.
> >>
> >> -Todd
> >>
> >> On Tue, Jul 31, 2018 at 7:53 AM, Quanlong Huang <[email protected]
> >
> >> wrote:
> >>
> >> > Hi all,
> >> >
> >> >
> >> > Just to draw your attention of a patch, I'd like to share an incident
> >> with
> >> > you.
> >> >
> >> >
> >> > Two months ago we encountered an incident in one of our Impala
> clusters
> >> in
> >> > version 2.5-cdh5.7.3. Queries were pilled up and most of them were in
> >> > CREATED state. It looks like an infamous incident when the catalogd is
> >> > loading metadata of huge tables and finally OOM so metadata loading
> >> > requests from impalad are stuck. However, the catalogd looks good this
> >> > time. The logs showed it's not loading any metadata. Then I looked
> into
> >> the
> >> > statestored and found many timeout logs like "Unable to send topic
> update
> >> > message to subscriber impalad@xxx:22000, received error: RPC timed
> out".
> >> > Chose one of the problematic impalad and dug into its log, I found
> most
> >> of
> >> > them are OOM. Logs look like:
> >> >
> >> >
> >> > E0524 13:46:46.200798  4405impala-server.cc:1344] There was an error
> >> > processing the impalad catalog update. Requesting a full topic update
> to
> >> > recover: OutOfMemoryError: Java heap space
> >> >         at com.cloudera.impala.hive.executor.UdfExecutor.evaluate(
> >> > UdfExecutor.java:402)
> >> >         at com.cloudera.impala.hive.executor.UdfExecutor.evaluate(
> >> > UdfExecutor.java:329)
> >> > Caused by: java.lang.reflect.InvocationTargetException
> >> >         at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown
> Source)
> >> >         at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> >> > DelegatingMethodAccessorImpl.java:43)E0524 13:49:06.433274
> >> > 4405impala-server.cc:1344] There was an error processing the impalad
> >> > catalog update. Requesting a full topic update to recover:
> >> > OutOfMemoryError: Java heap space
> >> >         at java.lang.reflect.Method.invoke(Method.java:498)
> >> >         at com.cloudera.impala.hive.executor.UdfExecutor.evaluate(
> >> > UdfExecutor.java:394)
> >> >         ... 1 more
> >> > Caused by: java.lang.OutOfMemoryError: Java heap space
> >> > E0524 13:56:44.460489  4405impala-server.cc:1344] There was an error
> >> > processing the impalad catalog update. Requesting a full topic update
> to
> >> > recover: OutOfMemoryError: Java heap space
> >> > E0524 14:03:45.630472  4405impala-server.cc:1344] There was an error
> >> > processing the impalad catalog update. Requesting a full topic update
> to
> >> > recover: OutOfMemoryError: Java heap space
> >> >
> >> >
> >> > At that time I can only restart the whole cluster since most of the
> >> > impalads were showing OOM errors. It's quite painful but after that
> the
> >> > cluster recovered.
> >> >
> >> >
> >> > After I dug into the query history and replayed the queries in a test
> >> > cluster, I realized it's caused by a UDF we added one day before. Our
> >> users
> >> > want to parse JSON strings, but there're no builtin functions for this
> >> > (IMPALA-376). So we simply added the Hive UDF get_json_object by
> >> >
> >> >
> >> > create function get_json_object location
> >> 'hdfs://xxx/hive-exec-1.1.0-cdh5.7.3.jar'
> >> > SYMBOL='org.apache.hadoop.hive.ql.udf.UDFJson';
> >> >
> >> >
> >> > No problem with this query. But there's a bug in hive-1.1.0-cdh5.7.3
> that
> >> > would cause memory leak if more than one UdfJson objects are running
> in a
> >> > JVM (HIVE-16196). That's why the impalads got OOM. Memory in the JVM
> is
> >> not
> >> > tracked. The bug in UdfJson is fixed in later hive versions (
> >> > https://github.com/cloudera/hive/commit/
> 8397f933ea3ee5e2087b4def6b58ed
> >> > 5d77026108#diff-e0b4cff52301169fede5c9550e968b67  ). However, using
> new
> >> > hive jars (e.g. hive-exec-1.1.0-cdh5.16.0-SNAPSHOT-core.jar) in the
> >> > CREATE FUNCTION statement does not work, since the class name is still
> >> the
> >> > same and the hive jars in version cdh5.7.3 are already in the
> CLASSPATH.
> >> > Impala will still load the old implementation of UDFJson. To work
> around
> >> > this, I copied the implementation of UDFJson and use another class
> name,
> >> > then use my custom jar file instead.
> >> >
> >> >
> >> > This is not the final fix since memory used in java UDFs are not
> tracked.
> >> > It's still unsafe. Looking into the C++ implementation of
> >> get_json_object (
> >> > https://github.com/nazgul33/impala-get-json-object-udf) provided by
> an
> >> > early comment in IMPALA-376, it still not tracks the memory. So I give
> >> > another implementation and it's ready for code review:
> >> > https://gerrit.cloudera.org/c/10950/ (It has passed the
> pre-view-test:
> >> > https://jenkins.impala.io/job/pre-review-test/189/ ) Yes, this's the
> >> > purpose of this email. Thank you for reading!
> >> >
> >> >
> >> > Further thoughts:
> >> > * We might need to mention in the docs that adding java UDFs may not
> use
> >> > the given jar file if the impala CLASSPATH already contains a jar file
> >> > containing the same class.
> >> > * We should avoid using Hive builtin UDFs and any other Java UDFs
> since
> >> > their memory is not tracked.
> >> > * How to track memory used in JVM? HBase, a pure java project, is
> able to
> >> > track its MemStore and BlockCache size. Can we learn from it?
> >> >
> >> >
> >> > Thanks,
> >> > Quanlong
> >> > --
> >> > Quanlong Huang
> >> > Software Developer, Hulu
> >>
> >>
> >>
> >>
> >> --
> >> Todd Lipcon
> >> Software Engineer, Cloudera
> >>
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Re: Impala Incident Sharing

Reply via email to