Re: Mystery memory leak in fuseki

Marco Neumann Tue, 11 Jul 2023 01:05:44 -0700

Dave, can you say a bit more about the profiling methodology? Are you using
a tool such as VisualVM to collect the data? Or do you just use the system
monitor?


Marco

On Tue, Jul 11, 2023 at 8:57 AM Dave Reynolds <dave.e.reyno...@gmail.com>
wrote:

> For interest[*] ...
>
> This is what the core JVM metrics look like when transitioning from a
> Jetty10 to a Jetty9.4 instance. You can see the direct buffer cycling up
> to 500MB (which happens to be the max heap setting) on Jetty 10, nothing
> on Jetty 9. The drop in Mapped buffers is just because TDB hadn't been
> asked any queries yet.
>
>
> https://www.dropbox.com/scl/fi/9afhrztbb36fvzqkuw996/fuseki-jetty10-jetty9-transition.png?rlkey=7fpj4x1pn5mjnf3jjwenmp65m&dl=0
>
> Here' the same metrics around the time of triggering a TDB backup. Shows
> the mapped buffer use for TDB but no significant impact on heap etc.
>
>
> https://www.dropbox.com/scl/fi/0s40vpizf94c4w3m2awna/fuseki-jetty10-backup.png?rlkey=ai31m6z58w0uex8zix8e9ctna&dl=0
>
> These are all on the same instance as the RES memory trace:
>
>
> https://www.dropbox.com/scl/fi/c58nqkr2hi193a84btedg/fuseki-4.9.0-jetty-9.4.png?rlkey=b7osnj6k1oy1xskl4j25zz6o8&dl=0
>
> Dave
>
> [*] I've been staring and metric graphs for so many days I may have a
> distorted notion of what's interesting :)
>
> On 11/07/2023 08:39, Dave Reynolds wrote:
> > After a 10 hour test of 4.9.0 with Jetty 9.4 on java 17 in the
> > production, containerized, environment then it is indeed very stable.
> >
> > Running at less that 6% of memory on 4GB machine compared to peaks of
> > ~50% for versions with Jetty 10. RES shows as 240K with 35K shared
> > (presume mostly libraries).
> >
> > Copy of trace is:
> >
> https://www.dropbox.com/scl/fi/c58nqkr2hi193a84btedg/fuseki-4.9.0-jetty-9.4.png?rlkey=b7osnj6k1oy1xskl4j25zz6o8&dl=0
> >
> > The high spikes on left of image are the prior run on with out of the
> > box 4.7.0 on same JVM.
> >
> > The small spike at 06:00 is a dump so TDB was able to touch and scan all
> > the (modest) data with very minor blip in resident size (as you'd hope).
> > JVM stats show the mapped buffers for TDB jumping up but confirm heap is
> > stable at < 60M, non-heap 60M.
> >
> > Dave
> >
> > On 10/07/2023 20:52, Dave Reynolds wrote:
> >> Since this thread has got complex, I'm posting this update here at the
> >> top level.
> >>
> >> Thanks to folks, especially Andy and Rob for suggestions and for
> >> investigating.
> >>
> >> After a lot more testing at our end I believe we now have some
> >> workarounds.
> >>
> >> First, at least on java 17, the process growth does seem to level out.
> >> Despite what I just said to Rob, having just checked our soak tests, a
> >> jena 4.7.0/java 17 test with 500MB max heap has lasted for 7 days.
> >> Process size oscillates between 1.5GB and 2GB but hasn't gone above
> >> that in a week. The oscillation is almost entirely the cycling of the
> >> direct memory buffers used by Jetty. Empirically those cycle up to
> >> something comparable to the set max heap size, at least for us.
> >>
> >> While this week long test was 4.7.0, based on earlier tests I suspect
> >> 4.8.0 (and now 4.9.0) would also level out at least on a timescale of
> >> days.
> >>
> >> The key has been setting the max heap low. At 2GB and even 1GB (the
> >> default on a 4GB machine) we see higher peak levels of direct buffers
> >> and overall process size grew to around 3GB at which point the
> >> container is killed on the small machines. Though java 17 does seem to
> >> be better behaved that java 11, so switching to that probably also
> >> helped.
> >>
> >> Given the actual heap is low (50MB heap, 60MB non-heap) then needing
> >> 2GB to run in feels high but is workable. So my previously suggested
> >> rule of thumb that, in this low memory regime, allow 4x the max heap
> >> size seems to work.
> >>
> >> Second, we're now pretty confident the issue is jetty 10+.
> >>
> >> We've built a fuseki-server 4.9.0 with Jetty replaced by version
> >> 9.4.51.v20230217. This required some minor source changes to compile
> >> and pass tests. On a local bare metal test where we saw process growth
> >> up to 1.5-2GB this build has run stably using less than 500MB for 4
> >> hours.
> >>
> >> We'll set a longer term test running in the target containerized
> >> environment to confirm things but quite hopeful this will be long term
> >> stable.
> >>
> >> I realise Jetty 9.4.x is out of community support but eclipse say EOL
> >> is "unlikely to happen before 2025". So, while this may not be a
> >> solution for the Jena project, it could give us a workaround at the
> >> cost of doing custom builds.
> >>
> >> Dave
> >>
> >>
> >> On 03/07/2023 14:20, Dave Reynolds wrote:
> >>> We have a very strange problem with recent fuseki versions when
> >>> running (in docker containers) on small machines. Suspect a jetty
> >>> issue but it's not clear.
> >>>
> >>> Wondering if anyone has seen anything like this.
> >>>
> >>> This is a production service but with tiny data (~250k triples, ~60MB
> >>> as NQuads). Runs on 4GB machines with java heap allocation of 500MB[1].
> >>>
> >>> We used to run using 3.16 on jdk 8 (AWS Corretto for the long term
> >>> support) with no problems.
> >>>
> >>> Switching to fuseki 4.8.0 on jdk 11 the process grows in the space of
> >>> a day or so to reach ~3GB of memory at which point the 4GB machine
> >>> becomes unviable and things get OOM killed.
> >>>
> >>> The strange thing is that this growth happens when the system is
> >>> answering no Sparql queries at all, just regular health ping checks
> >>> and (prometheus) metrics scrapes from the monitoring systems.
> >>>
> >>> Furthermore the space being consumed is not visible to any of the JVM
> >>> metrics:
> >>> - Heap and and non-heap are stable at around 100MB total (mostly
> >>> non-heap metaspace).
> >>> - Mapped buffers stay at 50MB and remain long term stable.
> >>> - Direct memory buffers being allocated up to around 500MB then being
> >>> reclaimed. Since there are no sparql queries at all we assume this is
> >>> jetty NIO buffers being churned as a result of the metric scrapes.
> >>> However, this direct buffer behaviour seems stable, it cycles between
> >>> 0 and 500MB on approx a 10min cycle but is stable over a period of
> >>> days and shows no leaks.
> >>>
> >>> Yet the java process grows from an initial 100MB to at least 3GB.
> >>> This can occur in the space of a couple of hours or can take up to a
> >>> day or two with no predictability in how fast.
> >>>
> >>> Presumably there is some low level JNI space allocated by Jetty (?)
> >>> which is invisible to all the JVM metrics and is not being reliably
> >>> reclaimed.
> >>>
> >>> Trying 4.6.0, which we've had less problems with elsewhere, that
> >>> seems to grow to around 1GB (plus up to 0.5GB for the cycling direct
> >>> memory buffers) and then stays stable (at least on a three day soak
> >>> test). We could live with allocating 1.5GB to a system that should
> >>> only need a few 100MB but concerned that it may not be stable in the
> >>> really long term and, in any case, would rather be able to update to
> >>> more recent fuseki versions.
> >>>
> >>> Trying 4.8.0 on java 17 it grows rapidly to around 1GB again but then
> >>> keeps ticking up slowly at random intervals. We project that it would
> >>> take a few weeks to grow the scale it did under java 11 but it will
> >>> still eventually kill the machine.
> >>>
> >>> Anyone seem anything remotely like this?
> >>>
> >>> Dave
> >>>
> >>> [1]  500M heap may be overkill but there can be some complex queries
> >>> and that should still leave plenty of space for OS buffers etc in the
> >>> remaining memory on a 4GB machine.
> >>>
> >>>
> >>>
>


-- 


---
Marco Neumann

Re: Mystery memory leak in fuseki

Reply via email to