Re: Mystery memory leak in fuseki

Dave Reynolds Mon, 10 Jul 2023 12:53:11 -0700

Since this thread has got complex, I'm posting this update here at thetop level.

Thanks to folks, especially Andy and Rob for suggestions and forinvestigating.


After a lot more testing at our end I believe we now have some workarounds.

First, at least on java 17, the process growth does seem to level out.Despite what I just said to Rob, having just checked our soak tests, ajena 4.7.0/java 17 test with 500MB max heap has lasted for 7 days.Process size oscillates between 1.5GB and 2GB but hasn't gone above thatin a week. The oscillation is almost entirely the cycling of the directmemory buffers used by Jetty. Empirically those cycle up to somethingcomparable to the set max heap size, at least for us.

While this week long test was 4.7.0, based on earlier tests I suspect4.8.0 (and now 4.9.0) would also level out at least on a timescale of days.

The key has been setting the max heap low. At 2GB and even 1GB (thedefault on a 4GB machine) we see higher peak levels of direct buffersand overall process size grew to around 3GB at which point the containeris killed on the small machines. Though java 17 does seem to be betterbehaved that java 11, so switching to that probably also helped.

Given the actual heap is low (50MB heap, 60MB non-heap) then needing 2GBto run in feels high but is workable. So my previously suggested rule ofthumb that, in this low memory regime, allow 4x the max heap size seemsto work.


Second, we're now pretty confident the issue is jetty 10+.

We've built a fuseki-server 4.9.0 with Jetty replaced by version9.4.51.v20230217. This required some minor source changes to compile andpass tests. On a local bare metal test where we saw process growth up to1.5-2GB this build has run stably using less than 500MB for 4 hours.

We'll set a longer term test running in the target containerizedenvironment to confirm things but quite hopeful this will be long termstable.

I realise Jetty 9.4.x is out of community support but eclipse say EOL is"unlikely to happen before 2025". So, while this may not be a solutionfor the Jena project, it could give us a workaround at the cost of doingcustom builds.


Dave


On 03/07/2023 14:20, Dave Reynolds wrote:

We have a very strange problem with recent fuseki versions when running(in docker containers) on small machines. Suspect a jetty issue but it'snot clear.
Wondering if anyone has seen anything like this.
This is a production service but with tiny data (~250k triples, ~60MB asNQuads). Runs on 4GB machines with java heap allocation of 500MB[1].
We used to run using 3.16 on jdk 8 (AWS Corretto for the long termsupport) with no problems.
Switching to fuseki 4.8.0 on jdk 11 the process grows in the space of aday or so to reach ~3GB of memory at which point the 4GB machine becomesunviable and things get OOM killed.
The strange thing is that this growth happens when the system isanswering no Sparql queries at all, just regular health ping checks and(prometheus) metrics scrapes from the monitoring systems.
Furthermore the space being consumed is not visible to any of the JVMmetrics:- Heap and and non-heap are stable at around 100MB total (mostlynon-heap metaspace).
- Mapped buffers stay at 50MB and remain long term stable.
- Direct memory buffers being allocated up to around 500MB then beingreclaimed. Since there are no sparql queries at all we assume this isjetty NIO buffers being churned as a result of the metric scrapes.However, this direct buffer behaviour seems stable, it cycles between 0and 500MB on approx a 10min cycle but is stable over a period of daysand shows no leaks.
Yet the java process grows from an initial 100MB to at least 3GB. Thiscan occur in the space of a couple of hours or can take up to a day ortwo with no predictability in how fast.
Presumably there is some low level JNI space allocated by Jetty (?)which is invisible to all the JVM metrics and is not being reliablyreclaimed.
Trying 4.6.0, which we've had less problems with elsewhere, that seemsto grow to around 1GB (plus up to 0.5GB for the cycling direct memorybuffers) and then stays stable (at least on a three day soak test). Wecould live with allocating 1.5GB to a system that should only need a few100MB but concerned that it may not be stable in the really long termand, in any case, would rather be able to update to more recent fusekiversions.
Trying 4.8.0 on java 17 it grows rapidly to around 1GB again but thenkeeps ticking up slowly at random intervals. We project that it wouldtake a few weeks to grow the scale it did under java 11 but it willstill eventually kill the machine.
Anyone seem anything remotely like this?

Dave
[1] 500M heap may be overkill but there can be some complex queries andthat should still leave plenty of space for OS buffers etc in theremaining memory on a 4GB machine.

Re: Mystery memory leak in fuseki

Reply via email to