Disabling federated query

2014-02-14 Thread Alexander Dutton

Spurred by Kamalraj's question…

Is there an easy way to disable federated querying in Fuseki? None seems 
to be mentioned at 
<https://jena.apache.org/documentation/query/service.html>.


I'm happy to document it/put it in JIRA if there is/isn't a way.

Best regards,

Alex


--
Alexander Dutton
Linked Open Data Architect, Office of the CIO; data.ox.ac.uk, OxPoints
IT Services, University of Oxford, ℡ 01865 (6)13483


Re: operations on xsd:duration

2013-12-06 Thread Alexander Dutton
Hi Ewa,

On 06/12/13 16:03, Ewa Szwed wrote:
> Got it!
> Done:
>
> *BIND(str(floor(fn:days-from-duration(?date_of_death - ?date_of_birth)
> / 365)) as ?age_at_death)*

Careful; that'll only get you the days component¹. Even if it did what
you were hoping for, you'll end up with off-by-one errors if they die
close to their birthday due to the cumulative effects of leap years.

You probably want:

BIND(fn:years-from-duration(?date_of_death - ?date_of_birth) AS
?age_at_death)

Best regards,

Alex

¹ <http://www.w3.org/TR/xpath-functions/#func-days-from-duration>

-- 
Alexander Dutton
Linked Open Data Architect, Office of the CIO; data.ox.ac.uk, OxPoints
IT Services, University of Oxford




signature.asc
Description: OpenPGP digital signature


TDB, Fuseki, large journal file, and maxing a CPU

2013-06-15 Thread Alexander Dutton
Hi all,

We're running Fuseki with a few TDB datasets, and it seems to be acting
rather inefficiently.

Here's version numbers:

> [root@opendata ~]# /usr/bin/java -jar
> /usr/share/java/fuseki-server.jar --version
> Jena:   VERSION: 2.7.5-SNAPSHOT
> Jena:   BUILD_DATE: 2012-10-21T09:26:22+0100
> ARQ:VERSION: 2.9.5-SNAPSHOT
> ARQ:BUILD_DATE: 2012-10-21T09:29:20+0100
> TDB:VERSION: 0.9.5-SNAPSHOT
> TDB:BUILD_DATE: 2012-10-21T09:40:32+0100
> Fuseki: VERSION: 0.2.6-SNAPSHOT
> Fuseki: BUILD_DATE: 2012-10-21T09:44:10+0100

Here's top:

> top - 11:29:56 up 123 days, 18:52,  1 user,  load average: 1.06, 1.20,
1.27
> Tasks: 208 total,   1 running, 207 sleeping,   0 stopped,   0 zombie
> Cpu(s):100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi, 
0.0%si,  0.0%st
> Mem:   6132016k total,  5290072k used,   841944k free,93864k buffers
> Swap:   499704k total,   499704k used,0k free,  1290944k cached
>
>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+ 
COMMAND 


> 22324 fuseki20   0 6624m 1.2g  23m S 99.3 20.0   1172:44
java



You can see the trends at
<http://opendata.oucs.ox.ac.uk/oucs.ox.ac.uk/opendata.oucs.ox.ac.uk/cpu.html>.

The journal files look like this:

> [root@opendata tdb]# ls */journal.jrnl -lh
> -rw-r--r-- 1 fuseki fuseki 3.5M Jun 15 03:55 courses/journal.jrnl
> -rw-r--r-- 1 fuseki fuseki  22M Jun 15 03:37 equipment/journal.jrnl
> -rw-r--r-- 1 fuseki fuseki 5.2M Jun 15 02:17 itservices/journal.jrnl
> -rw-r--r-- 1 fuseki fuseki 448M Jun 15 03:59 public/journal.jrnl
> -rw-r--r-- 1 fuseki fuseki  18M Jun 12 14:41 seesec/journal.jrnl

Looking at the Fuseki logs, there have been various quiet periods where
there shouldn't have been any read locks, and I would have thought these
would have been cleared (particularly as non-public stores don't attract
search engines or "users").

We're getting rather a number of Java heap space errors.java has a
"-Xmx1g" (that's right, right? :D).

The TDB DBs also seem to be growing over time disproportionally to any
increase in triples. For example, the entire TDB directory for our
public store is 28GB on disk; dumping it and reloading it recently put
it at 97MB. The trend can be seen at
<http://opendata.oucs.ox.ac.uk/oucs.ox.ac.uk/opendata.oucs.ox.ac.uk/df.html>;
the sudden drops on the by-year graph are me dumping and reloading. The
increase in disk usage in the last few days is — I suspect ­— something
else.

I'm thinking this could be managed by periodically shutting down Fuseki,
applying the journal, reloading the store, and then setting Fuseki going
again. However, I'm loathe to do this without understanding why it gets
the way it does.

Any thoughts? Answers of "yes, we've fixed this; you need to upgrade"
are perfectly reasonable ;-).

Yours,

Alex

-- 
Alexander Dutton
Linked Open Data Architect, Office of the CIO; data.ox.ac.uk, OxPoints
IT Services, University of Oxford




signature.asc
Description: OpenPGP digital signature


Re: Binding causes hang in Fuseki

2013-01-29 Thread Alexander Dutton

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Rob,

On 29/01/13 18:11, Rob Walpole wrote:
> Am I doing something wrong here?

The short answer is that the inner SELECT is evaluated first, leading to
the results being calculated in the second case in a rather inefficient way.

In the first inner SELECT ?deselected is bound, so it's quite quick to
find all its ancestors.

In the second, all possible ?deselected and ?ancestor pairs are returned
by the inner query, which are then (effectively) filtered to remove all
the pairs where ?deselected isn't whatever it was BINDed to.

Here's more from the spec:
<http://www.w3.org/TR/sparql11-query/#subqueries>.

I /think/ ARQ is able to perform some optimisations along these lines,
but obviously not for your query.

Best regards,

Alex

PS. You don't need to do URI("http://?";); you can do a straight IRI
literal: 

- -- 
Alexander Dutton
Developer, Office of the CIO; data.ox.ac.uk, OxPoints
IT Services, University of Oxford
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.13 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJRCBMZAAoJEPotabD1ANF7Fb0H/jeCedjfCIuhI2KTNETOcrVR
Gvl8N4k9ty4AN4F0xFKA3kcGCTR2CIpgz/hez6BM5s8mDqLc7ViNPXWxbUhb4kHh
fxVuuoYBr13VhGnyufvWFliFeT3xSVLO3eDUilzoja2pvH/Cx/sNQvcHbi2Ee+EX
MoWLyfSvtSGY2rXDmMAXvBz49wgk42mC2Bsr5ptNUfXWQjzz6BXp5SxTKADySBXG
Tm/DmqGRclHxw233I6EcB9lKfDytTosVugH1Yl0BGEHiFPL2/wkkB+AZiLIwCmb/
cy+Y8/I9PlD4onvYlDMRmP169HQVYt849Skx5/TnTyjMBBNIgQiE8+cj0a/oDc8=
=ZQec
-END PGP SIGNATURE-



Re: Dealing with expensive queries

2012-11-30 Thread Alexander Dutton
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

We (data.ox.ac.uk) do query throttling at the application level; each
IP has an 'intensity' score, increased by the duration of each query,
but which decays over time (at a rate of 0.05s/s). Once this hits
fifteen seconds we delay the query, and once it hits thirty we refuse
it (but include a Retry-After header). This should slow down the very
intensive agents, but leave most people well alone. Here's the code:

https://github.com/ox-it/humfrey/blob/master/humfrey/sparql/views/core.py#L172

I do intend to get query timeouts working, but I'd like to set them
per-request (as far as Fuseki is concerned). This needs me to find the
time to finish off JENA-218. This will allow me to trust logged in
users with longer timeouts, and have no timeouts on pages where the
user doesn't get to specify the query.

I also asked about this back in the day on answers.semanticweb.com;
the answers there might be quite useful:

http://answers.semanticweb.com/questions/3771

That said, I don't think we've ever had problems with excessive
querying. Most expensive querying will be the result of naïveté ("give
me everything", non-overlapping joins), and the only time I've noticed
that is when it's been me making the mistake. Maybe as more people
cotton on to SPARQL it'll become more of an issue for us…

Best regards,

Alex


On 29/11/12 21:41, Sarven Capadisli wrote:
> I would like to better control over my public SPARQL Endpoints
> (using Fuseki) due to some harmless looking, but expensive queries
> coming in. My initial thoughts were, and not necessarily the ones I
> want to take:
> 
> * Block the IP or agent * Use default query timeout values * Start
> using API keys or other authentication * Catch the exact queries
> from httpd and block it off * Handle certain query types from
> Fuseki or TDB
> 
> But, more importantly, I'd love to her some of the actions you all
> take on your endpoints. If you can point me to any documentation or
> some of the common practices out there, that'd be awesome as well.


- -- 
Alexander Dutton
Developer, Office of the CIO; data.ox.ac.uk, OxPoints
IT Services, University of Oxford, ℡ 01865 (6)13483
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQuJlQAAoJEPotabD1ANF7bwEH/Rg8+JQ5wiLbeg17K35jXUgv
rpUKZPJCV+TpndVn9apnqn50n4bcddmqSNYhVWV5XF7vV9gNY4mf4OLcJhJI3kMP
2ddAqeQKRVb0RoWV+HLPoqawFxzO8hVf5ls5ZOPYt9imhhGRzAbAxqaG86pXeFUi
w+df+CBjSZj4JZCKRMkk6WLw0Q0dj6G+rk5c46FrdUu3AUlBPAw38yHnWeE2f66l
NBNiAAz7i6F6Cz0MzzAg6/A9MqO8jG0RSRE2EWpHf42uGJ5Qn9d1muFgdZtRTjUA
DeQ93gRP/Oh/iMXlxi41Ql1/YdfRRXrxmL35QezJ7vY2stjfaJRwUjzXpmbA1nY=
=s0gc
-END PGP SIGNATURE-


Re: HTTP Error 500: Currently in a transaction

2012-10-21 Thread Alexander Dutton

Hi all,

Turns out I'm an idiot. It was our staging instance that was throwing 
the errors, and it has a rather large journal file. (And everything I 
was doing to fix the problem I was doing on the wrong machine.


Sigh. Sorry for the noise.

Best regards,

Alex


HTTP Error 500: Currently in a transaction

2012-10-20 Thread Alexander Dutton

Hi all,

Our TDB-backed SPARQL endpoint is intermittently returning "HTTP Error 
500: Currently in a transaction". The Googlebot is prodding pages on the 
site, which performs SPARQL queries, which occasionally 500. When 
subsequently viewing those pages the queries succeed.


There doesn't seem to be anything in the server logs. The journal files 
are empty, and the server has been restarted a couple of times without a 
change in behaviour.


Any ideas?

Here are server versions (a trunk build from today):

Jena:   VERSION: 2.7.5-SNAPSHOT
Jena:   BUILD_DATE: 20121020-1638
ARQ:VERSION: 2.9.5-SNAPSHOT
ARQ:BUILD_DATE: 20121020-1638
TDB:VERSION: 0.9.5-SNAPSHOT
TDB:BUILD_DATE: 20121020-1638
Fuseki: VERSION: 0.2.6-SNAPSHOT
Fuseki: BUILD_DATE: 2012-10-20T23:21:00+0100


Best regards,

Alex

--
Alexander Dutton
Developer, Office of the CIO; data.ox.ac.uk, OxPoints
IT Services, University of Oxford



HTTP Error 500: Currently in a transaction

2012-10-20 Thread Alexander Dutton

Hi all,

Our TDB-backed SPARQL endpoint is intermittently returning "HTTP Error 
500: Currently in a transaction". The Googlebot is prodding pages on the 
site, which performs SPARQL queries, which occasionally 500. When 
subsequently viewing those pages the queries succeed.


There doesn't seem to be anything in the server logs. The journal files 
are empty, and the server has been restarted a couple of times without a 
change in behaviour.


Any ideas?

Here are server versions (a trunk build from today):

Jena:   VERSION: 2.7.5-SNAPSHOT
Jena:   BUILD_DATE: 20121020-1638
ARQ:VERSION: 2.9.5-SNAPSHOT
ARQ:BUILD_DATE: 20121020-1638
TDB:VERSION: 0.9.5-SNAPSHOT
TDB:BUILD_DATE: 20121020-1638
Fuseki: VERSION: 0.2.6-SNAPSHOT
Fuseki: BUILD_DATE: 2012-10-20T23:21:00+0100


Best regards,

Alex

--
Alexander Dutton
Developer, Office of the CIO; data.ox.ac.uk, OxPoints
IT Services, University of Oxford