[ 
https://issues.apache.org/jira/browse/JENA-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17424468#comment-17424468
 ] 

Justin commented on JENA-2176:
------------------------------

Hi [~andy] ,

> When looked up in the data, there is simply no match. It is simpler to 
> delegate this to the lookup than test the subject every time.

But doesn't the lookup involve disk access? It must take some amount of time, 
right? Even if it is simpler to implement it would be costly for some queries.

I was investigating why a query was taking longer than we expected and I saw 
many such lookups and when I rearranged triple patterns in the query to 
minimize these lookups it did speed up the query. 

If an appreciable amount of time is spent doing lookups for things that we know 
in advance will not be found that feels like a mistake.

> In the case of TDB2 (and TDB1) execution of a basic graph pattern is not by 
> RDF -term but by internal id, and the internal ids do not indicate whether it 
> is a URI, blank node or literal.

Ah, that is why this optimization isn't a couple line patch, right?


> There are some optimizations possible by knowing a variable can only be a URI 
> but they are not general and so can not be used everywhere.

I wonder if we could tell Jena that it can assume these properties are object 
properties and those are datatype properties – would that be far enough 
upstream to allow a general optimization? Because then if it bound a variable 
from the object position of a triple with a datatype property then it should 
never try to do a lookup with that variable in the subject position of a triple 
pattern.


Do you have any tips about how to get total time spent doing lookups for each 
BGP? I'm thinking of something like a heatmap that could be overlaid on the 
query to see where time is being spent.


Thanks,

Justin

> TDB2 queries can execute quadpatterns with a literal in the subject position
> ----------------------------------------------------------------------------
>
>                 Key: JENA-2176
>                 URL: https://issues.apache.org/jira/browse/JENA-2176
>             Project: Apache Jena
>          Issue Type: Question
>          Components: TDB2
>    Affects Versions: Jena 4.2.0
>            Reporter: Justin
>            Priority: Major
>         Attachments: z.rq, z.ttl
>
>
> Hello,
> If you try to put a triple into a TDB2 with a literal in the subject position 
> you get the following:
> {noformat}
>  ERROR riot :: [line: 6, col: 18] Subject is not a URI or blank node
> {noformat}
> So far so good.
> But since literals can not be in the subject position of a triple a query 
> against a TDB2 should never attempt to find a literal in the subject position 
> of a triple, right? It would be a waste of time.
> But if I am reading the logs correctly that is what appears to happen:
> {noformat}
> root@ec6206bb523f:/mnt/tdb_42# cat /mnt/z.ttl 
>  @prefix ex: <[http://example.com/]> .
> ex:apple ex:hasPart ex:skin .
>  ex:skin ex:hasName "Skin" .
>  ex:file ex:hasPart "lala" .
> root@ec6206bb523f:/mnt/tdb_42# 
>  root@ec6206bb523f:/mnt/tdb_42# cat /mnt/z.rq 
>  prefix ex: <[http://example.com/]>
> select * where
> { ?s ex:hasPart ?o . optional \\{ ?o ?p ?o1 . }
> }
>  
> root@ec6206bb523f:/mnt/tdb_42# /mnt/apache-jena-4.2.0/bin/tdb2.tdbloader 
> --loc=`pwd` /mnt/z.ttl
>  00:31:49 INFO loader :: Loader = LoaderPhased
>  00:31:49 INFO loader :: Start: /mnt/z.ttl
>  00:31:49 INFO loader :: Finished: /mnt/z.ttl: 3 tuples in 0.07s (Avg: 40)
>  00:31:49 INFO loader :: Finish - index SPO
>  00:31:49 INFO loader :: Start replay index SPO
>  00:31:49 INFO loader :: Index set: SPO => SPO->POS, SPO->OSP
>  00:31:49 INFO loader :: Index set: SPO => SPO->POS, SPO->OSP [3 items, 0.0 
> seconds]
>  00:31:49 INFO loader :: Finish - index OSP
>  00:31:49 INFO loader :: Finish - index POS
>  root@ec6206bb523f:/mnt/tdb_42# /mnt/apache-jena-4.2.0/bin/tdb2.tdbquery -v 
> --loc=`pwd` --query=/mnt/z.rq
>  1 PREFIX ex: <[http://example.com/]>
>  2
>  3 SELECT *
>  4 WHERE
>  5
> { ?s ex:hasPart ?o 
> 6 OPTIONAL
>  7 { ?o ?p ?o1 }
>  
>  8 }
>  
>  00:31:59 INFO exec :: QUERY
>  PREFIX ex: <[http://example.com/]>
>  
>  SELECT *
>  WHERE
>  
>  { ?s ex:hasPart ?o OPTIONAL  { ?o ?p ?o1 }
> }
>  00:31:59 INFO exec :: ALGEBRA
>  (conditional
>  (quadpattern (quad <urn:x-arq:DefaultGraphNode> ?s 
> <[http://example.com/hasPart]> ?o))
>  (quadpattern (quad <urn:x-arq:DefaultGraphNode> ?o ?p ?o1)))
>  00:32:00 INFO exec :: TDB
>  (conditional
>  (quadpattern (quad <urn:x-arq:DefaultGraphNode> ?s 
> <[http://example.com/hasPart]> ?o))
>  (quadpattern (quad <urn:x-arq:DefaultGraphNode> ?o ?p ?o1)))
>  00:32:00 INFO exec :: Execute :: ?s <[http://example.com/hasPart]> ?o
>  00:32:00 INFO exec :: TDB
>  (quadpattern (quad <urn:x-arq:DefaultGraphNode> <[http://example.com/skin]> 
> ?p ?o1))
>  00:32:00 INFO exec :: Execute :: <[http://example.com/skin]> ?p ?o1
>  00:32:00 INFO exec :: TDB
>  (quadpattern (quad <urn:x-arq:DefaultGraphNode> "lala" ?p ?o1))
>  00:32:00 INFO exec :: Execute :: "lala" ?p ?o1
>  --------------------------------------------
> |s|o|p|o1|
> ============================================
> |ex:apple|ex:skin|ex:hasName|"Skin"|
> |ex:file|"lala"| | |
> --------------------------------------------
> {noformat}
> Doesn't this:
> {noformat}
>  00:32:00 INFO exec :: TDB
>  (quadpattern (quad <urn:x-arq:DefaultGraphNode> "lala" ?p ?o1))
>  00:32:00 INFO exec :: Execute :: "lala" ?p ?o1
> {noformat}
>  mean a lookup was done in the TDB2 for a triple with the literal "lala" in 
> the subject position? If so, shouldn't lookups like that be ignored as they 
> will never find matching triples in the TDB2?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to