[ 
https://issues.apache.org/jira/browse/JENA-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17022151#comment-17022151
 ] 

Andy Seaborne commented on JENA-1826:
-------------------------------------

Using the {{riot}} command line tool shows the effect:

This uses the non-pretty form of RDF/XML:
{{riot --out RDFXML data.nt}}
takes 2 seconds inc program startup.

This uses the pretty form of RDF/XML and takes 5m 20s.
{{riot --pretty RDFXML data.nt}}

Some of the rules mentioned in
https://jena.apache.org/documentation/io/rdfxml_howto.html#advanced-rdfxml-output
might effect the output usefully but none are obvious to me.

Fixing this uncommon (but has happened before) case in the RDF/XML writer may 
be quite a task if it isn't fairly obvious which steps cause the problem. 
Someone needs take a few minutes and look to assess the situation.

An alternative is for Fuseki could switch to plain RDF/XML output. This may get 
feedback but any visible change does.


> Fuseki RDF/XML response never finishes
> --------------------------------------
>
>                 Key: JENA-1826
>                 URL: https://issues.apache.org/jira/browse/JENA-1826
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: Fuseki
>    Affects Versions: Jena 3.14.0
>         Environment: Ubuntu 16.04
> java version "1.8.0_201"
> Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
> Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)
>            Reporter: Osma Suominen
>            Priority: Major
>         Attachments: W00067442800.ttl, data.nt
>
>
> I have a web app running SPARQL CONSTRUCT queries against Fuseki and 
> generating web pages. I noticed that Fuseki started hogging all CPU cores a 
> few hours after it was restarted. It turned out that some of the CONSTRUCT 
> queries take a very long time to complete - at least 40 minutes but probably 
> more and it seems quite likely they will never finish.
> I was able to turn this into a fairly minimal example. I've attached a 1.3MB 
> Turtle file (~29k triples) with all the data necessary to demonstrate the 
> problem. 
> Start Fuseki like this: {{./fuseki-server --file W00067442800.ttl /ds}}
> Then open the Fuseki web UI and run this SPARQL query against the dataset:
> {noformat}
> PREFIX schema: <http://schema.org/>       
> PREFIX skos: <http://www.w3.org/2004/02/skos/core#>                   
> CONSTRUCT {
>   <http://urn.fi/URN:NBN:fi:bib:me:W00067442800> ?p ?o .
>   ?o schema:name ?oname ;
>     skos:prefLabel ?olabel .
>   ?inst ?instprop ?instval .
>   ?instval schema:name ?instvalName ;
>     skos:prefLabel ?instvalLabel .
> }
> WHERE {
>   {
>     <http://urn.fi/URN:NBN:fi:bib:me:W00067442800> ?p ?o .
>     OPTIONAL {
>       {
>         ?o schema:name ?oname 
>       }             UNION             {
>         ?o skos:prefLabel ?olabel 
>       }           
>     }         
>   }         UNION         {
>     {
>       <http://urn.fi/URN:NBN:fi:bib:me:W00067442800> schema:workExample ?inst 
>     }           OPTIONAL {
>       {
>         ?inst ?instprop ?instval .
>         OPTIONAL {
>           {
>             ?instval schema:name ?instvalName 
>           }                 UNION                 {
>             ?instval skos:prefLabel ?instvalLabel 
>           }               
>         }             
>       }      
>     }         
>   }       
> }
> {noformat}
> If you select Turtle as the content type, the query will finish in around 3 
> seconds (plus rendering the result in the browser takes a while). If instead 
> you select XML as the format, the query will just keep running, with Fuseki 
> taking over a single CPU core completely. With several such queries running, 
> all the CPU cores will eventually be used.
> This can also be demonstrated using curl (with the above query saved as 
> {{query.rq}}):
> {noformat}
> curl -H 'Accept: text/turtle' --data-urlencode "qu...@query.rq" 
> http://localhost:3030/ds/sparql
> {noformat}
> works fine and gives you the Turtle output;
> {noformat}
> curl -H 'Accept: application/rdf+xml' --data-urlencode "qu...@query.rq" 
> http://localhost:3030/ds/sparql
> {noformat}
> never seems to finish.
> What's perhaps even worse, even a query timeout setting doesn't help. If I 
> start Fuseki with a 10 second query timeout, i.e. {{--timeout 10000}}, it 
> still won't stop the query from hogging the CPU forever. I'm guessing that 
> the problem is in the final stages of the query processing, when the results 
> just have to be serialized into the correct syntax, and the timeout is no 
> longer applied in this stage.
> I discovered this problem while running Fuseki 3.5.0, but it happens with the 
> most recent release 3.14.0 as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to