[ https://issues.apache.org/jira/browse/JENA-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17022151#comment-17022151 ]
Andy Seaborne commented on JENA-1826: ------------------------------------- Using the {{riot}} command line tool shows the effect: This uses the non-pretty form of RDF/XML: {{riot --out RDFXML data.nt}} takes 2 seconds inc program startup. This uses the pretty form of RDF/XML and takes 5m 20s. {{riot --pretty RDFXML data.nt}} Some of the rules mentioned in https://jena.apache.org/documentation/io/rdfxml_howto.html#advanced-rdfxml-output might effect the output usefully but none are obvious to me. Fixing this uncommon (but has happened before) case in the RDF/XML writer may be quite a task if it isn't fairly obvious which steps cause the problem. Someone needs take a few minutes and look to assess the situation. An alternative is for Fuseki could switch to plain RDF/XML output. This may get feedback but any visible change does. > Fuseki RDF/XML response never finishes > -------------------------------------- > > Key: JENA-1826 > URL: https://issues.apache.org/jira/browse/JENA-1826 > Project: Apache Jena > Issue Type: Bug > Components: Fuseki > Affects Versions: Jena 3.14.0 > Environment: Ubuntu 16.04 > java version "1.8.0_201" > Java(TM) SE Runtime Environment (build 1.8.0_201-b09) > Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode) > Reporter: Osma Suominen > Priority: Major > Attachments: W00067442800.ttl, data.nt > > > I have a web app running SPARQL CONSTRUCT queries against Fuseki and > generating web pages. I noticed that Fuseki started hogging all CPU cores a > few hours after it was restarted. It turned out that some of the CONSTRUCT > queries take a very long time to complete - at least 40 minutes but probably > more and it seems quite likely they will never finish. > I was able to turn this into a fairly minimal example. I've attached a 1.3MB > Turtle file (~29k triples) with all the data necessary to demonstrate the > problem. > Start Fuseki like this: {{./fuseki-server --file W00067442800.ttl /ds}} > Then open the Fuseki web UI and run this SPARQL query against the dataset: > {noformat} > PREFIX schema: <http://schema.org/> > PREFIX skos: <http://www.w3.org/2004/02/skos/core#> > CONSTRUCT { > <http://urn.fi/URN:NBN:fi:bib:me:W00067442800> ?p ?o . > ?o schema:name ?oname ; > skos:prefLabel ?olabel . > ?inst ?instprop ?instval . > ?instval schema:name ?instvalName ; > skos:prefLabel ?instvalLabel . > } > WHERE { > { > <http://urn.fi/URN:NBN:fi:bib:me:W00067442800> ?p ?o . > OPTIONAL { > { > ?o schema:name ?oname > } UNION { > ?o skos:prefLabel ?olabel > } > } > } UNION { > { > <http://urn.fi/URN:NBN:fi:bib:me:W00067442800> schema:workExample ?inst > } OPTIONAL { > { > ?inst ?instprop ?instval . > OPTIONAL { > { > ?instval schema:name ?instvalName > } UNION { > ?instval skos:prefLabel ?instvalLabel > } > } > } > } > } > } > {noformat} > If you select Turtle as the content type, the query will finish in around 3 > seconds (plus rendering the result in the browser takes a while). If instead > you select XML as the format, the query will just keep running, with Fuseki > taking over a single CPU core completely. With several such queries running, > all the CPU cores will eventually be used. > This can also be demonstrated using curl (with the above query saved as > {{query.rq}}): > {noformat} > curl -H 'Accept: text/turtle' --data-urlencode "qu...@query.rq" > http://localhost:3030/ds/sparql > {noformat} > works fine and gives you the Turtle output; > {noformat} > curl -H 'Accept: application/rdf+xml' --data-urlencode "qu...@query.rq" > http://localhost:3030/ds/sparql > {noformat} > never seems to finish. > What's perhaps even worse, even a query timeout setting doesn't help. If I > start Fuseki with a 10 second query timeout, i.e. {{--timeout 10000}}, it > still won't stop the query from hogging the CPU forever. I'm guessing that > the problem is in the final stages of the query processing, when the results > just have to be serialized into the correct syntax, and the timeout is no > longer applied in this stage. > I discovered this problem while running Fuseki 3.5.0, but it happens with the > most recent release 3.14.0 as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)