On 6 December 2011 15:44, Jérôme <[email protected]> wrote:

> Thank you Andy,
>
> it was the cost of serializing and deserializing.
>
> My second problem (yes, i have another one ;-) ) is:
>

By the way - replying to unrelated threads and changing the subject risks
you email not being seen.  I, for one, don't always check threads that I'm
not involved in.


>
> The goal of my queries is to find "paragraphs" which are containing
> "words" which are matching a regex.
> My triplestore stores approximately 1.600.000 triples.
> For example: find paragraphs (in my RDF model) containing the word
> "example" - here the corresponding query:
>
> PREFIX ram:<...>
> PREFIX 
> rdf:<http://www.w3.org/1999/**02/22-rdf-syntax-ns#<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
> >
>
> SELECT ?Response
> WHERE
> {
> ?Response rdf:type <http://www.tei-c.org/ns/1.0#p**> .
> ?Objet_1 rdf:type 
> <http://prodescartes.greyc.fr/**annotations#word<http://prodescartes.greyc.fr/annotations#word>>
> .
> ?Objet_1 ram:contents ?Objet_1_content .
> FILTER regex(?Objet_1_content,"**example") .
> ?Response ram:contains ?Objet_1 .
> }
>
> I get the result in 0.5 seconds
>
> Now, when i'm looking for paragrahs containing "example" and "help":
>
> SELECT ?Response
> WHERE
> {
>
> ?Response rdf:type <http://www.tei-c.org/ns/1.0#p**> .
>
> ?Objet_1 rdf:type <http://example.com#word> .
> ?Objet_1 ram:contents ?Objet_1_content .
> FILTER regex(?Objet_1_content,"**example") .
> ?Response ram:contains ?Objet_1 .
>
> ?Objet_2 rdf:type <http://example.com#word> .
> ?Objet_2 ram:contents ?Objet_2_content .
> FILTER regex(?Objet_2_content,"help") .
> ?Response ram:contains ?Objet_2 .
>
> }
>
> I get the result in...10 minutes. ResultSet is around 50 results.
>
> Why is it so long?
>

It's doing a cross-product of the results but you're asking the question a
complicated way.

try

SELECT ?Response
WHERE
{
  ?Response rdf:type <http://www.tei-c.org/ns/1.0#p> .
  ?Objet_1 rdf:type <http://example.com#word> .
  ?Objet_1 ram:contents ?Objet_1_content .
  FILTER (regex(?Objet_1_content,"example")
       && regex(?Objet_1_content,"work") )
  ?Response ram:contains ?Objet_1 .
}


>
> The "funniest" is when i remove constraints on words:
> I remove those 2 lines:
> ?Objet_1 rdf:type <http://example.com#word> .
> ?Objet_2 rdf:type <http://example.com#word> .
>
> Fuseki answers me faster...
>

Less work to do.

With cross products in query (two triple patterns not connected by sharing
a variable) there can be a a multiplication of additional work.  The
optimizer should have chosen a different strategy but better is to write
the as above.


>
> Thank you.
> Jérôme
>

Andy

Reply via email to