On Wed, Aug 1, 2018 at 6:02 AM, Quanlong Huang <[email protected]> wrote:
> Hi Todd and Philip, > > > The idea of separate ClassLoader is cool. But I would vote for executing > java UDFs in a separate JVM. We don't need to create a JVM for each UDF. > They can share the same JVM with a configurable hard limit. If the JVM for > UDFs runs out of heap size, we can destroy the JVM and fail the queries. > (Not sure if this is doable) The drawback is that some innocent queries > using java UDFs will fail too, but the cluster will still be healthy. > I think this would be great, but we need to make sure not to regress performance significantly. For example, maybe we'd need to expose row batches as a shared memory region and then map them as a DirectByteBuffer on the Java side. I think this would probably be a lot of work, though, and maybe the Java UDF usage is uncommon enough that it may not be worth all of the coding? -Todd > > The patch of adding JVM pause logs and expose JMX metrics is really > helpful. I'd like to join the code review. > > > Thanks, > Quanlong > > At 2018-08-01 05:47:25, "Philip Zeyliger" <[email protected]> > wrote: > >Hi Quanlong, > > > >Thank you for the incident note! You might be interested in > >https://gerrit.cloudera.org/#/c/10998/ which is adding some > instrumentation > >to make it easier to notice with monitoring tools that we're running out > of > >memory or having large GC pauses. > > > >-- Philip > > > >On Tue, Jul 31, 2018 at 8:21 AM Todd Lipcon <[email protected]> > >wrote: > > > >> Hi Quanlong, > >> > >> Thanks for the writeup. I wonder if we could two of the issues with a > >> single approach of using a separate ClassLoader instance when we load > and > >> instantiate custom UDFs. For one, using a custom ClassLoader means that > we > >> could probably ensure that the user's jar is checked first even if > there is > >> a conflicting class on the Impala classpath. For two, if we close the > >> custom classloader and ensure that we don't retain any references to it > or > >> to classes created by it (i.e the UDF classes), then we should be able > to > >> collect leaked memory on the next GC. In the particular case you pointed > >> out, the leak is in static fields of the UDF, and those static fields > would > >> get destructed when destructing the classloader. > >> > >> On the downside, making this change could negatively harm performance if > >> any UDFs are relying on their static initializer blocks to do expensive > >> setup. Maybe a compromise would be to make this behavior optional or to > >> cache the ClassLoader via a WeakReference across instantiations of a > UDF so > >> that, if there isn't memory pressure, the UDF doesn't need to be > >> re-class-loaded repeatedly. > >> > >> As for tracking memory explicitly, the way that HBase does it is via > >> somewhat careful accounting inline with all of the code that allocates > >> memory. I don't think that approach is feasible in UDFs because the UDF > >> code is provided by end users, and they aren't likely to want to > carefully > >> track their heap usage. I think the only way we could effectively > "sandbox" > >> UDFs would be to actually execute them in a separate JVM with a limited > >> heap size, but that would be somewhat complex to implement without a big > >> performance drop. > >> > >> -Todd > >> > >> On Tue, Jul 31, 2018 at 7:53 AM, Quanlong Huang <[email protected] > > > >> wrote: > >> > >> > Hi all, > >> > > >> > > >> > Just to draw your attention of a patch, I'd like to share an incident > >> with > >> > you. > >> > > >> > > >> > Two months ago we encountered an incident in one of our Impala > clusters > >> in > >> > version 2.5-cdh5.7.3. Queries were pilled up and most of them were in > >> > CREATED state. It looks like an infamous incident when the catalogd is > >> > loading metadata of huge tables and finally OOM so metadata loading > >> > requests from impalad are stuck. However, the catalogd looks good this > >> > time. The logs showed it's not loading any metadata. Then I looked > into > >> the > >> > statestored and found many timeout logs like "Unable to send topic > update > >> > message to subscriber impalad@xxx:22000, received error: RPC timed > out". > >> > Chose one of the problematic impalad and dug into its log, I found > most > >> of > >> > them are OOM. Logs look like: > >> > > >> > > >> > E0524 13:46:46.200798 4405impala-server.cc:1344] There was an error > >> > processing the impalad catalog update. Requesting a full topic update > to > >> > recover: OutOfMemoryError: Java heap space > >> > at com.cloudera.impala.hive.executor.UdfExecutor.evaluate( > >> > UdfExecutor.java:402) > >> > at com.cloudera.impala.hive.executor.UdfExecutor.evaluate( > >> > UdfExecutor.java:329) > >> > Caused by: java.lang.reflect.InvocationTargetException > >> > at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown > Source) > >> > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > >> > DelegatingMethodAccessorImpl.java:43)E0524 13:49:06.433274 > >> > 4405impala-server.cc:1344] There was an error processing the impalad > >> > catalog update. Requesting a full topic update to recover: > >> > OutOfMemoryError: Java heap space > >> > at java.lang.reflect.Method.invoke(Method.java:498) > >> > at com.cloudera.impala.hive.executor.UdfExecutor.evaluate( > >> > UdfExecutor.java:394) > >> > ... 1 more > >> > Caused by: java.lang.OutOfMemoryError: Java heap space > >> > E0524 13:56:44.460489 4405impala-server.cc:1344] There was an error > >> > processing the impalad catalog update. Requesting a full topic update > to > >> > recover: OutOfMemoryError: Java heap space > >> > E0524 14:03:45.630472 4405impala-server.cc:1344] There was an error > >> > processing the impalad catalog update. Requesting a full topic update > to > >> > recover: OutOfMemoryError: Java heap space > >> > > >> > > >> > At that time I can only restart the whole cluster since most of the > >> > impalads were showing OOM errors. It's quite painful but after that > the > >> > cluster recovered. > >> > > >> > > >> > After I dug into the query history and replayed the queries in a test > >> > cluster, I realized it's caused by a UDF we added one day before. Our > >> users > >> > want to parse JSON strings, but there're no builtin functions for this > >> > (IMPALA-376). So we simply added the Hive UDF get_json_object by > >> > > >> > > >> > create function get_json_object location > >> 'hdfs://xxx/hive-exec-1.1.0-cdh5.7.3.jar' > >> > SYMBOL='org.apache.hadoop.hive.ql.udf.UDFJson'; > >> > > >> > > >> > No problem with this query. But there's a bug in hive-1.1.0-cdh5.7.3 > that > >> > would cause memory leak if more than one UdfJson objects are running > in a > >> > JVM (HIVE-16196). That's why the impalads got OOM. Memory in the JVM > is > >> not > >> > tracked. The bug in UdfJson is fixed in later hive versions ( > >> > https://github.com/cloudera/hive/commit/ > 8397f933ea3ee5e2087b4def6b58ed > >> > 5d77026108#diff-e0b4cff52301169fede5c9550e968b67 ). However, using > new > >> > hive jars (e.g. hive-exec-1.1.0-cdh5.16.0-SNAPSHOT-core.jar) in the > >> > CREATE FUNCTION statement does not work, since the class name is still > >> the > >> > same and the hive jars in version cdh5.7.3 are already in the > CLASSPATH. > >> > Impala will still load the old implementation of UDFJson. To work > around > >> > this, I copied the implementation of UDFJson and use another class > name, > >> > then use my custom jar file instead. > >> > > >> > > >> > This is not the final fix since memory used in java UDFs are not > tracked. > >> > It's still unsafe. Looking into the C++ implementation of > >> get_json_object ( > >> > https://github.com/nazgul33/impala-get-json-object-udf) provided by > an > >> > early comment in IMPALA-376, it still not tracks the memory. So I give > >> > another implementation and it's ready for code review: > >> > https://gerrit.cloudera.org/c/10950/ (It has passed the > pre-view-test: > >> > https://jenkins.impala.io/job/pre-review-test/189/ ) Yes, this's the > >> > purpose of this email. Thank you for reading! > >> > > >> > > >> > Further thoughts: > >> > * We might need to mention in the docs that adding java UDFs may not > use > >> > the given jar file if the impala CLASSPATH already contains a jar file > >> > containing the same class. > >> > * We should avoid using Hive builtin UDFs and any other Java UDFs > since > >> > their memory is not tracked. > >> > * How to track memory used in JVM? HBase, a pure java project, is > able to > >> > track its MemStore and BlockCache size. Can we learn from it? > >> > > >> > > >> > Thanks, > >> > Quanlong > >> > -- > >> > Quanlong Huang > >> > Software Developer, Hulu > >> > >> > >> > >> > >> -- > >> Todd Lipcon > >> Software Engineer, Cloudera > >> > -- Todd Lipcon Software Engineer, Cloudera
