Disabling federated query
Spurred by Kamalraj's question… Is there an easy way to disable federated querying in Fuseki? None seems to be mentioned at <https://jena.apache.org/documentation/query/service.html>. I'm happy to document it/put it in JIRA if there is/isn't a way. Best regards, Alex -- Alexander Dutton Linked Open Data Architect, Office of the CIO; data.ox.ac.uk, OxPoints IT Services, University of Oxford, ℡ 01865 (6)13483
Re: operations on xsd:duration
Hi Ewa, On 06/12/13 16:03, Ewa Szwed wrote: > Got it! > Done: > > *BIND(str(floor(fn:days-from-duration(?date_of_death - ?date_of_birth) > / 365)) as ?age_at_death)* Careful; that'll only get you the days component¹. Even if it did what you were hoping for, you'll end up with off-by-one errors if they die close to their birthday due to the cumulative effects of leap years. You probably want: BIND(fn:years-from-duration(?date_of_death - ?date_of_birth) AS ?age_at_death) Best regards, Alex ¹ <http://www.w3.org/TR/xpath-functions/#func-days-from-duration> -- Alexander Dutton Linked Open Data Architect, Office of the CIO; data.ox.ac.uk, OxPoints IT Services, University of Oxford signature.asc Description: OpenPGP digital signature
TDB, Fuseki, large journal file, and maxing a CPU
Hi all, We're running Fuseki with a few TDB datasets, and it seems to be acting rather inefficiently. Here's version numbers: > [root@opendata ~]# /usr/bin/java -jar > /usr/share/java/fuseki-server.jar --version > Jena: VERSION: 2.7.5-SNAPSHOT > Jena: BUILD_DATE: 2012-10-21T09:26:22+0100 > ARQ:VERSION: 2.9.5-SNAPSHOT > ARQ:BUILD_DATE: 2012-10-21T09:29:20+0100 > TDB:VERSION: 0.9.5-SNAPSHOT > TDB:BUILD_DATE: 2012-10-21T09:40:32+0100 > Fuseki: VERSION: 0.2.6-SNAPSHOT > Fuseki: BUILD_DATE: 2012-10-21T09:44:10+0100 Here's top: > top - 11:29:56 up 123 days, 18:52, 1 user, load average: 1.06, 1.20, 1.27 > Tasks: 208 total, 1 running, 207 sleeping, 0 stopped, 0 zombie > Cpu(s):100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > Mem: 6132016k total, 5290072k used, 841944k free,93864k buffers > Swap: 499704k total, 499704k used,0k free, 1290944k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND > 22324 fuseki20 0 6624m 1.2g 23m S 99.3 20.0 1172:44 java You can see the trends at <http://opendata.oucs.ox.ac.uk/oucs.ox.ac.uk/opendata.oucs.ox.ac.uk/cpu.html>. The journal files look like this: > [root@opendata tdb]# ls */journal.jrnl -lh > -rw-r--r-- 1 fuseki fuseki 3.5M Jun 15 03:55 courses/journal.jrnl > -rw-r--r-- 1 fuseki fuseki 22M Jun 15 03:37 equipment/journal.jrnl > -rw-r--r-- 1 fuseki fuseki 5.2M Jun 15 02:17 itservices/journal.jrnl > -rw-r--r-- 1 fuseki fuseki 448M Jun 15 03:59 public/journal.jrnl > -rw-r--r-- 1 fuseki fuseki 18M Jun 12 14:41 seesec/journal.jrnl Looking at the Fuseki logs, there have been various quiet periods where there shouldn't have been any read locks, and I would have thought these would have been cleared (particularly as non-public stores don't attract search engines or "users"). We're getting rather a number of Java heap space errors.java has a "-Xmx1g" (that's right, right? :D). The TDB DBs also seem to be growing over time disproportionally to any increase in triples. For example, the entire TDB directory for our public store is 28GB on disk; dumping it and reloading it recently put it at 97MB. The trend can be seen at <http://opendata.oucs.ox.ac.uk/oucs.ox.ac.uk/opendata.oucs.ox.ac.uk/df.html>; the sudden drops on the by-year graph are me dumping and reloading. The increase in disk usage in the last few days is — I suspect — something else. I'm thinking this could be managed by periodically shutting down Fuseki, applying the journal, reloading the store, and then setting Fuseki going again. However, I'm loathe to do this without understanding why it gets the way it does. Any thoughts? Answers of "yes, we've fixed this; you need to upgrade" are perfectly reasonable ;-). Yours, Alex -- Alexander Dutton Linked Open Data Architect, Office of the CIO; data.ox.ac.uk, OxPoints IT Services, University of Oxford signature.asc Description: OpenPGP digital signature
Re: Binding causes hang in Fuseki
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Rob, On 29/01/13 18:11, Rob Walpole wrote: > Am I doing something wrong here? The short answer is that the inner SELECT is evaluated first, leading to the results being calculated in the second case in a rather inefficient way. In the first inner SELECT ?deselected is bound, so it's quite quick to find all its ancestors. In the second, all possible ?deselected and ?ancestor pairs are returned by the inner query, which are then (effectively) filtered to remove all the pairs where ?deselected isn't whatever it was BINDed to. Here's more from the spec: <http://www.w3.org/TR/sparql11-query/#subqueries>. I /think/ ARQ is able to perform some optimisations along these lines, but obviously not for your query. Best regards, Alex PS. You don't need to do URI("http://?";); you can do a straight IRI literal: - -- Alexander Dutton Developer, Office of the CIO; data.ox.ac.uk, OxPoints IT Services, University of Oxford -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.13 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJRCBMZAAoJEPotabD1ANF7Fb0H/jeCedjfCIuhI2KTNETOcrVR Gvl8N4k9ty4AN4F0xFKA3kcGCTR2CIpgz/hez6BM5s8mDqLc7ViNPXWxbUhb4kHh fxVuuoYBr13VhGnyufvWFliFeT3xSVLO3eDUilzoja2pvH/Cx/sNQvcHbi2Ee+EX MoWLyfSvtSGY2rXDmMAXvBz49wgk42mC2Bsr5ptNUfXWQjzz6BXp5SxTKADySBXG Tm/DmqGRclHxw233I6EcB9lKfDytTosVugH1Yl0BGEHiFPL2/wkkB+AZiLIwCmb/ cy+Y8/I9PlD4onvYlDMRmP169HQVYt849Skx5/TnTyjMBBNIgQiE8+cj0a/oDc8= =ZQec -END PGP SIGNATURE-
Re: Dealing with expensive queries
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 We (data.ox.ac.uk) do query throttling at the application level; each IP has an 'intensity' score, increased by the duration of each query, but which decays over time (at a rate of 0.05s/s). Once this hits fifteen seconds we delay the query, and once it hits thirty we refuse it (but include a Retry-After header). This should slow down the very intensive agents, but leave most people well alone. Here's the code: https://github.com/ox-it/humfrey/blob/master/humfrey/sparql/views/core.py#L172 I do intend to get query timeouts working, but I'd like to set them per-request (as far as Fuseki is concerned). This needs me to find the time to finish off JENA-218. This will allow me to trust logged in users with longer timeouts, and have no timeouts on pages where the user doesn't get to specify the query. I also asked about this back in the day on answers.semanticweb.com; the answers there might be quite useful: http://answers.semanticweb.com/questions/3771 That said, I don't think we've ever had problems with excessive querying. Most expensive querying will be the result of naïveté ("give me everything", non-overlapping joins), and the only time I've noticed that is when it's been me making the mistake. Maybe as more people cotton on to SPARQL it'll become more of an issue for us… Best regards, Alex On 29/11/12 21:41, Sarven Capadisli wrote: > I would like to better control over my public SPARQL Endpoints > (using Fuseki) due to some harmless looking, but expensive queries > coming in. My initial thoughts were, and not necessarily the ones I > want to take: > > * Block the IP or agent * Use default query timeout values * Start > using API keys or other authentication * Catch the exact queries > from httpd and block it off * Handle certain query types from > Fuseki or TDB > > But, more importantly, I'd love to her some of the actions you all > take on your endpoints. If you can point me to any documentation or > some of the common practices out there, that'd be awesome as well. - -- Alexander Dutton Developer, Office of the CIO; data.ox.ac.uk, OxPoints IT Services, University of Oxford, ℡ 01865 (6)13483 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ iQEcBAEBAgAGBQJQuJlQAAoJEPotabD1ANF7bwEH/Rg8+JQ5wiLbeg17K35jXUgv rpUKZPJCV+TpndVn9apnqn50n4bcddmqSNYhVWV5XF7vV9gNY4mf4OLcJhJI3kMP 2ddAqeQKRVb0RoWV+HLPoqawFxzO8hVf5ls5ZOPYt9imhhGRzAbAxqaG86pXeFUi w+df+CBjSZj4JZCKRMkk6WLw0Q0dj6G+rk5c46FrdUu3AUlBPAw38yHnWeE2f66l NBNiAAz7i6F6Cz0MzzAg6/A9MqO8jG0RSRE2EWpHf42uGJ5Qn9d1muFgdZtRTjUA DeQ93gRP/Oh/iMXlxi41Ql1/YdfRRXrxmL35QezJ7vY2stjfaJRwUjzXpmbA1nY= =s0gc -END PGP SIGNATURE-
Re: HTTP Error 500: Currently in a transaction
Hi all, Turns out I'm an idiot. It was our staging instance that was throwing the errors, and it has a rather large journal file. (And everything I was doing to fix the problem I was doing on the wrong machine. Sigh. Sorry for the noise. Best regards, Alex
HTTP Error 500: Currently in a transaction
Hi all, Our TDB-backed SPARQL endpoint is intermittently returning "HTTP Error 500: Currently in a transaction". The Googlebot is prodding pages on the site, which performs SPARQL queries, which occasionally 500. When subsequently viewing those pages the queries succeed. There doesn't seem to be anything in the server logs. The journal files are empty, and the server has been restarted a couple of times without a change in behaviour. Any ideas? Here are server versions (a trunk build from today): Jena: VERSION: 2.7.5-SNAPSHOT Jena: BUILD_DATE: 20121020-1638 ARQ:VERSION: 2.9.5-SNAPSHOT ARQ:BUILD_DATE: 20121020-1638 TDB:VERSION: 0.9.5-SNAPSHOT TDB:BUILD_DATE: 20121020-1638 Fuseki: VERSION: 0.2.6-SNAPSHOT Fuseki: BUILD_DATE: 2012-10-20T23:21:00+0100 Best regards, Alex -- Alexander Dutton Developer, Office of the CIO; data.ox.ac.uk, OxPoints IT Services, University of Oxford
HTTP Error 500: Currently in a transaction
Hi all, Our TDB-backed SPARQL endpoint is intermittently returning "HTTP Error 500: Currently in a transaction". The Googlebot is prodding pages on the site, which performs SPARQL queries, which occasionally 500. When subsequently viewing those pages the queries succeed. There doesn't seem to be anything in the server logs. The journal files are empty, and the server has been restarted a couple of times without a change in behaviour. Any ideas? Here are server versions (a trunk build from today): Jena: VERSION: 2.7.5-SNAPSHOT Jena: BUILD_DATE: 20121020-1638 ARQ:VERSION: 2.9.5-SNAPSHOT ARQ:BUILD_DATE: 20121020-1638 TDB:VERSION: 0.9.5-SNAPSHOT TDB:BUILD_DATE: 20121020-1638 Fuseki: VERSION: 0.2.6-SNAPSHOT Fuseki: BUILD_DATE: 2012-10-20T23:21:00+0100 Best regards, Alex -- Alexander Dutton Developer, Office of the CIO; data.ox.ac.uk, OxPoints IT Services, University of Oxford