Re: Fuseki lock-up - any ideas?

Martynas Jusevičius Tue, 16 Feb 2021 04:55:56 -0800

Not Fuseki-related per se, but I've experienced something similar when
the HTTP client is running out of connections.


On Tue, Feb 16, 2021 at 11:50 AM Dave Reynolds
<dave.e.reyno...@gmail.com> wrote:
>
> We have a mysterious problem with fuseki in production that we've not
> seen before.  Posting in case anyone has seen something similar and has
> any advice but I realise there's not really much here to go on.
>
> Environment:
>     Fuseki 3.17 (was 3.16, tried upgrade just in case) using TDB1
>     OpenJDK java 8
>     Docker container (running in k8s pod)
>     ABW EBS file system
>     O(2k) small updates per day (uses RDFConnection to send update)
>     Variable read request rate but issue hits at low request levels
>
> Symptoms are that fuseki receives an update request but never completes it:
>
>      INFO  550175  POST http://localhost:3030/ds
>      INFO  550175  Update
>      INFO  550175  204 No Content (20 ms)
>      INFO  550176  POST http://localhost:3030/ds
>      INFO  550176  Update
> -->
>      INFO  550178  Query = ASK { ?s ?p ?o }
>      INFO  550178  GET
> http://localhost:3030/ds?query=ASK+%7B+%3Fs+%3Fp+%3Fo+%7D
>      INFO  550179  GET
> http://localhost:3030/ds?query=ASK+%7B+%3Fs+%3Fp+%3Fo+%7D
>      INFO  550179  Query = ASK { ?s ?p ?o }
>
> So no 204 return from request 550176.
>
>  From that point on fuseki continues to log incoming read queries but
> does not answer any of them and the update request never terminates.
> Acts as if there's some form of deadlock.
>
> Update requests are serialised, there's never more than one in flight at
> a time.
>
> It's not the update itself that's the issue. It's small and if the
> container is restarted with the same data and the same update sequence
> is reapplied it all works fine.
>
> The jvm stats all look completely fine in the prometheus records.
>
> The various parts of this set up have been in various production
> settings without problems in the past. In particular, we've run the
> exact same pattern of mixed updates and queries in fuseki in a k8s
> environment for two years without ever having a lockup. But on a new
> deployment it's happening every few days.
>
> There are differences between the new and old deployments but the ones
> we've identified seem very unlikely to be the cause. We've not used
> RDFConnection in the client before but can't see how that could affect
> this. We don't often run with TDB on EBS but we do have a dozen
> instances of that around which haven't had problems. We have generally
> shifted to AWS Corretto as the jvm but we have plenty of OpenJDK
> instances around without problems. The docker image is slightly unusual
> in using the s6 overlay init system rather than running fuseki as the
> root process but again can't see how this might cause these symptoms and
> other uses of that, with fuseki, have been fine.
>
> We'll find a workaround eventually, possibly involving shifting to TDB2,
> but posting in case anyone has had an experience similar enough to this
> to give us some hints.
>
> Dave
>

Re: Fuseki lock-up - any ideas?

Reply via email to