Re: Optimising path to root concept SPARQL query

2016-02-02 Thread Andy Seaborne

On 01/02/16 15:13, Joël Kuiper wrote:

Well the query does what it needs to do, for a given concept find the path to a 
root, for example:


query:
SELECT DISTINCT ?parent
WHERE {
   GRAPH ?g {
 > rdfs:subClassOf+ ?parent .
   }}

output:
parent
http://purl.bioontology.org/ontology/ICD10CM/J86 

http://purl.bioontology.org/ontology/ICD10CM/J85-J86 

http://purl.bioontology.org/ontology/ICD10CM/J00-J99 

http://www.w3.org/2002/07/owl#Thing 

It’s just that it’s really slow, so I was wondering if there was a way of 
optimising this (either by some hints, or using reasoners that understand 
transitivity)



Can you quantify "really slow"?  This version of the query has a 
grounded starting point.


It is not obvious to me why it would be slow though what the data 
overall looks like is major factor, doubly so if running from a cold start.


The GRAPH ?g does turn it into a loop over all graph names so it does 
the path pattern repeatedly and on a large dataset, it is likely to lead 
to disk activity on each loop.


Default union graph is faster (it evaluates the path only once) although 
that finds paths that span graphs as well so it is semantically different.



Andy

Incidentally - I got round to putting in an optimization for "?x 
:predicate+ ?z" (not "*") which greatly improves that one case though 
none of the examples are of that form.


Re: Optimising path to root concept SPARQL query

2016-02-01 Thread buehmann

Something like could work, if there are no cycles:

select ?class where {
   rdfs:subClassOf* ?mid .
  ?mid rdfs:subClassOf* ?class .
}
group by ?class
order by count(?mid)

On 01.02.2016 16:29, Andy Seaborne wrote:

On 01/02/16 13:05, Joël Kuiper wrote:
Concretely I’m using this to get the path to root for a specific 
concept, e.g.


PREFIX rdfs: 

SELECT DISTINCT ?parent
WHERE {
   GRAPH ?g {
  rdfs:subClassOf+ ?parent .
   }}

Would return all the (distinct) intermediates


So this is a rather different query to the first example!

The first one did not return ?concept but was searching over all 
?concept which suggested to me that it was the cause of the expense.


Also, rdfs:subClassOf* and rdfs:subClassOf+ are different.

?anything * ?anything

so it looked like there was a huge amount of work going on.


DISTINCT isn't needed if ?g is just one graph.

What do you use named graphs for in your data?

Can you put the data online where?

Andy




On 01 Feb 2016, at 13:49, Andy Seaborne  wrote:

On 01/02/16 12:11, Joël Kuiper wrote:

Hey all,

I’m trying to run a query to find the path to the root concept of a 
graph.

The entries are defined as rdfs:subClassOf

Currently I’m using

PREFIX skos: >
PREFIX rdfs: >
PREFIX rdf: >


SELECT ?parent ?label
WHERE {
  GRAPH ?g {
?concept rdfs:subClassOf* ?parent
  }}

However on a moderately sized data set < 1M triples, this query 
sometimes takes /minutes/.
I suspect it has to do with the disk-based TDB (since I hear my HDD 
spin a lot), but still.
Is there a way to optimise this query, maybe by using a different 
reasoner? And if so how would that reasoner be used!


Thanks in advance,

Joël



If you are using a reasoner and doing rdfs:subClassOf* then that is 
doing excessive redundant work.


(Otherwise, with TDB is materialises more nodes that it needs to.)

The root is a node without a parent so this might help, without a 
reasoner:


PREFIX rdfs: 

SELECT ?top
WHERE {
   GRAPH ?g {
 ?concept rdfs:subClassOf ?top
 FILTER NOT EXISTS { ?top rdfs:subClassOf ?x }
  }
}









Re: Optimising path to root concept SPARQL query

2016-02-01 Thread Andy Seaborne

On 01/02/16 13:05, Joël Kuiper wrote:

Concretely I’m using this to get the path to root for a specific concept, e.g.

PREFIX rdfs: 

SELECT DISTINCT ?parent
WHERE {
   GRAPH ?g {
  rdfs:subClassOf+ ?parent .
   }}

Would return all the (distinct) intermediates


So this is a rather different query to the first example!

The first one did not return ?concept but was searching over all 
?concept which suggested to me that it was the cause of the expense.


Also, rdfs:subClassOf* and rdfs:subClassOf+ are different.

?anything * ?anything

so it looked like there was a huge amount of work going on.


DISTINCT isn't needed if ?g is just one graph.

What do you use named graphs for in your data?

Can you put the data online where?

Andy




On 01 Feb 2016, at 13:49, Andy Seaborne  wrote:

On 01/02/16 12:11, Joël Kuiper wrote:

Hey all,

I’m trying to run a query to find the path to the root concept of a graph.
The entries are defined as rdfs:subClassOf

Currently I’m using

PREFIX skos: >
PREFIX rdfs: >
PREFIX rdf: >

SELECT ?parent ?label
WHERE {
  GRAPH ?g {
?concept rdfs:subClassOf* ?parent
  }}

However on a moderately sized data set < 1M triples, this query sometimes takes 
/minutes/.
I suspect it has to do with the disk-based TDB (since I hear my HDD spin a 
lot), but still.
Is there a way to optimise this query, maybe by using a different reasoner? And 
if so how would that reasoner be used!

Thanks in advance,

Joël



If you are using a reasoner and doing rdfs:subClassOf* then that is doing 
excessive redundant work.

(Otherwise, with TDB is materialises more nodes that it needs to.)

The root is a node without a parent so this might help, without a reasoner:

PREFIX rdfs: 

SELECT ?top
WHERE {
   GRAPH ?g {
 ?concept rdfs:subClassOf ?top
 FILTER NOT EXISTS { ?top rdfs:subClassOf ?x }
  }
}







Re: Optimising path to root concept SPARQL query

2016-02-01 Thread Joël Kuiper
True, fair enough. But that works too for our use case :-) (it’s a machine 
learning classification task, where we use the ancestors as features, rather 
than just the “leafs”).
What would be the fastest way of constructing such a list for all concepts in 
the graph? 
Maybe just flush out all the rdfs:subClassOf as a adjacency list and do some 
graph processing on that (without SPARQL)? 

Joël


> On 01 Feb 2016, at 16:21, buehmann  wrote:
> 
> There is no guarantee with this query to get a path, but instead all ancestor 
> classes for the given class. In your example it might have been worked, but 
> this is more by chance.
> 
> On 01.02.2016 16:13, Joël Kuiper wrote:
>> Well the query does what it needs to do, for a given concept find the path 
>> to a root, for example:
>> 
>> 
>> query:
>> SELECT DISTINCT ?parent
>> WHERE {
>>   GRAPH ?g {
>> > > rdfs:subClassOf+ 
>> ?parent .
>>   }}
>> 
>> output:
>> parent
>> http://purl.bioontology.org/ontology/ICD10CM/J86 
>> 
>> http://purl.bioontology.org/ontology/ICD10CM/J85-J86 
>> 
>> http://purl.bioontology.org/ontology/ICD10CM/J00-J99 
>> 
>> http://www.w3.org/2002/07/owl#Thing 
>> 
>> It’s just that it’s really slow, so I was wondering if there was a way of 
>> optimising this (either by some hints, or using reasoners that understand 
>> transitivity)
>> 
>> Joël
>> 
>>> On 01 Feb 2016, at 15:42, Paul Tyson  wrote:
>>> 
>>> I don't know that you can get such results from sparql directly. I would 
>>> get flat list of subclass relations in xml (.srx) or Json and then process 
>>> with xslt or JavaScript to write out class hierarchy.
>>> 
>>> Regards,
>>> --Paul
>>> 
>>> On Feb 1, 2016, at 07:05, Joël Kuiper >> > wrote:
>>> 
 This message has no content.
>> 
> 



Re: Optimising path to root concept SPARQL query

2016-02-01 Thread buehmann
There is no guarantee with this query to get a path, but instead all 
ancestor classes for the given class. In your example it might have been 
worked, but this is more by chance.


On 01.02.2016 16:13, Joël Kuiper wrote:

Well the query does what it needs to do, for a given concept find the path to a 
root, for example:


query:
SELECT DISTINCT ?parent
WHERE {
   GRAPH ?g {
 > rdfs:subClassOf+ ?parent .
   }}

output:
parent
http://purl.bioontology.org/ontology/ICD10CM/J86 

http://purl.bioontology.org/ontology/ICD10CM/J85-J86 

http://purl.bioontology.org/ontology/ICD10CM/J00-J99 

http://www.w3.org/2002/07/owl#Thing 

It’s just that it’s really slow, so I was wondering if there was a way of 
optimising this (either by some hints, or using reasoners that understand 
transitivity)

Joël


On 01 Feb 2016, at 15:42, Paul Tyson  wrote:

I don't know that you can get such results from sparql directly. I would get 
flat list of subclass relations in xml (.srx) or Json and then process with 
xslt or JavaScript to write out class hierarchy.

Regards,
--Paul

On Feb 1, 2016, at 07:05, Joël Kuiper mailto:j...@joelkuiper.eu>> wrote:


This message has no content.






Re: Optimising path to root concept SPARQL query

2016-02-01 Thread Joël Kuiper
Well the query does what it needs to do, for a given concept find the path to a 
root, for example: 


query:
SELECT DISTINCT ?parent 
WHERE {
  GRAPH ?g { 
> rdfs:subClassOf+ ?parent .
  }} 

output: 
parent
http://purl.bioontology.org/ontology/ICD10CM/J86 

http://purl.bioontology.org/ontology/ICD10CM/J85-J86 

http://purl.bioontology.org/ontology/ICD10CM/J00-J99 

http://www.w3.org/2002/07/owl#Thing 

It’s just that it’s really slow, so I was wondering if there was a way of 
optimising this (either by some hints, or using reasoners that understand 
transitivity) 

Joël 

> On 01 Feb 2016, at 15:42, Paul Tyson  wrote:
> 
> I don't know that you can get such results from sparql directly. I would get 
> flat list of subclass relations in xml (.srx) or Json and then process with 
> xslt or JavaScript to write out class hierarchy.
> 
> Regards,
> --Paul
> 
> On Feb 1, 2016, at 07:05, Joël Kuiper  > wrote:
> 
>> This message has no content.



Re: Optimising path to root concept SPARQL query

2016-02-01 Thread Paul Tyson
I don't know that you can get such results from sparql directly. I would get 
flat list of subclass relations in xml (.srx) or Json and then process with 
xslt or JavaScript to write out class hierarchy.

Regards,
--Paul

> On Feb 1, 2016, at 07:05, Joël Kuiper  wrote:
> 
> This message has no content.


Re: Optimising path to root concept SPARQL query

2016-02-01 Thread Joël Kuiper
Concretely I’m using this to get the path to root for a specific concept, e.g. 

PREFIX rdfs: 

SELECT DISTINCT ?parent 
WHERE {
  GRAPH ?g { 
 rdfs:subClassOf+ ?parent .
  }} 

Would return all the (distinct) intermediates 

> On 01 Feb 2016, at 13:49, Andy Seaborne  wrote:
> 
> On 01/02/16 12:11, Joël Kuiper wrote:
>> Hey all,
>> 
>> I’m trying to run a query to find the path to the root concept of a graph.
>> The entries are defined as rdfs:subClassOf
>> 
>> Currently I’m using
>> 
>> PREFIX skos: > >
>> PREFIX rdfs: > >
>> PREFIX rdf: > >
>> 
>> SELECT ?parent ?label
>> WHERE {
>>  GRAPH ?g {
>>?concept rdfs:subClassOf* ?parent
>>  }}
>> 
>> However on a moderately sized data set < 1M triples, this query sometimes 
>> takes /minutes/.
>> I suspect it has to do with the disk-based TDB (since I hear my HDD spin a 
>> lot), but still.
>> Is there a way to optimise this query, maybe by using a different reasoner? 
>> And if so how would that reasoner be used!
>> 
>> Thanks in advance,
>> 
>> Joël
>> 
> 
> If you are using a reasoner and doing rdfs:subClassOf* then that is doing 
> excessive redundant work.
> 
> (Otherwise, with TDB is materialises more nodes that it needs to.)
> 
> The root is a node without a parent so this might help, without a reasoner:
> 
> PREFIX rdfs: 
> 
> SELECT ?top
> WHERE {
>   GRAPH ?g {
> ?concept rdfs:subClassOf ?top
> FILTER NOT EXISTS { ?top rdfs:subClassOf ?x }
>  }
> }
> 



Re: Optimising path to root concept SPARQL query

2016-02-01 Thread Joël Kuiper

> On 01 Feb 2016, at 13:49, Andy Seaborne  wrote:
> 
> On 01/02/16 12:11, Joël Kuiper wrote:
>> Hey all,
>> 
>> I’m trying to run a query to find the path to the root concept of a graph.
>> The entries are defined as rdfs:subClassOf
>> 
>> Currently I’m using
>> 
>> PREFIX skos: > >
>> PREFIX rdfs: > >
>> PREFIX rdf: > >
>> 
>> SELECT ?parent ?label
>> WHERE {
>>  GRAPH ?g {
>>?concept rdfs:subClassOf* ?parent
>>  }}
>> 
>> However on a moderately sized data set < 1M triples, this query sometimes 
>> takes /minutes/.
>> I suspect it has to do with the disk-based TDB (since I hear my HDD spin a 
>> lot), but still.
>> Is there a way to optimise this query, maybe by using a different reasoner? 
>> And if so how would that reasoner be used!
>> 
>> Thanks in advance,
>> 
>> Joël
>> 
> 
> If you are using a reasoner and doing rdfs:subClassOf* then that is doing 
> excessive redundant work.
> 
> (Otherwise, with TDB is materialises more nodes that it needs to.)
> 
> The root is a node without a parent so this might help, without a reasoner:
> 
> PREFIX rdfs: 
> 
> SELECT ?top
> WHERE {
>   GRAPH ?g {
> ?concept rdfs:subClassOf ?top
> FILTER NOT EXISTS { ?top rdfs:subClassOf ?x }
>  }
> }
> 

I wasn’t using a reasoner yet (it’s just the bare Fuseki TDB assembler file). 
If I understand correctly your query does not allow the path of all the 
intermediate nodes to be found? 
E.g. A subclassOf B subclassOf C ….?

Thanks :-) 



Re: Optimising path to root concept SPARQL query

2016-02-01 Thread Andy Seaborne

On 01/02/16 12:11, Joël Kuiper wrote:

Hey all,

I’m trying to run a query to find the path to the root concept of a graph.
The entries are defined as rdfs:subClassOf

Currently I’m using

PREFIX skos: >
PREFIX rdfs: >
PREFIX rdf: >

SELECT ?parent ?label
WHERE {
  GRAPH ?g {
?concept rdfs:subClassOf* ?parent
  }}

However on a moderately sized data set < 1M triples, this query sometimes takes 
/minutes/.
I suspect it has to do with the disk-based TDB (since I hear my HDD spin a 
lot), but still.
Is there a way to optimise this query, maybe by using a different reasoner? And 
if so how would that reasoner be used!

Thanks in advance,

Joël



If you are using a reasoner and doing rdfs:subClassOf* then that is 
doing excessive redundant work.


(Otherwise, with TDB is materialises more nodes that it needs to.)

The root is a node without a parent so this might help, without a reasoner:

PREFIX rdfs: 

SELECT ?top
WHERE {
   GRAPH ?g {
 ?concept rdfs:subClassOf ?top
 FILTER NOT EXISTS { ?top rdfs:subClassOf ?x }
  }
}



Optimising path to root concept SPARQL query

2016-02-01 Thread Joël Kuiper
Hey all, 

I’m trying to run a query to find the path to the root concept of a graph. 
The entries are defined as rdfs:subClassOf 

Currently I’m using 

PREFIX skos: >
PREFIX rdfs: >
PREFIX rdf: >

SELECT ?parent ?label
WHERE {
 GRAPH ?g { 
   ?concept rdfs:subClassOf* ?parent 
 }} 

However on a moderately sized data set < 1M triples, this query sometimes takes 
/minutes/. 
I suspect it has to do with the disk-based TDB (since I hear my HDD spin a 
lot), but still. 
Is there a way to optimise this query, maybe by using a different reasoner? And 
if so how would that reasoner be used! 

Thanks in advance, 

Joël