I'm not a Rya expert, but generally speaking, for large data in any RDF/SPARQL 
system, string comparisons on parts of IRIs is a bad idea because It requires 
all of the pre-filter data to be processed (string conversion) and then regex 
compared.  Very expensive over large data. 

A good data model usually avoids the need to do string search on parts of 
resource IRIs.  

Does the dbpedia data model provide any help in this area?  E.g. does it 
declare lists as being of type list? And categories as type category?  If so, 
this would create a direct equal/not-equal resource comparison vs. a 
string/regex conversion and comparison, which is usually much more efficient.

E.g. 

  prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
  SELECT * WHERE {
     ?pages <http://www.w3.org/2000/01/rdf-schema#label> ?labels .
     ?pages rdf:type ?type . 
     FILTER (?type != <http://dbpedia.org/ontology/Category> 
       && ?type != <http://dbpedia.org/ontology/List>).
     ?pages ?property ?objects .
  }


--
Mark Wallace

-----Original Message-----
From: Puja Valiyil [mailto:puja...@gmail.com] 
Sent: Wednesday, September 07, 2016 4:00 PM
To: dev@rya.incubator.apache.org
Subject: Re: regarding slow query execution

Hi Pranav,
I'm not sure that the free text index would be useful in your case -- we assume 
that regexes would be on string literals (not uris converted to strings).
I think you would probably have the best luck with breaking your query into a 
few queries, since that would avoid doing a complex join client side, but I'm 
not sure if anything would really make this query perform in sub-second/minutes 
range for total execution time.  The last statement in your query is 
particularly expensive, so if you could move that into a separate query 
logically, you'd probably see a performance improvement.
Someone who is more familiar with SPARQL may be able to suggest a way of doing 
that as a single query, but I'm not sure off hand how to (maybe using describe 
or construct keywords?).

On Wed, Sep 7, 2016 at 8:22 AM, pranav.puri <pranav.p...@orkash.com> wrote:

> Hi
>
> I am using simple rdf regex filters .
>
> The query being evaluated is this:
>
> SELECT * WHERE {
> ?pages <http://www.w3.org/2000/01/rdf-schema#label> ?labels .
> FILTER (!regex(str(?pages),'^http://dbpedia.org/resource/Category:')).
> FILTER (!regex(str(?pages), '^http://dbpedia.org/resource/List')).
> ?pages ?property ?objects .
>  }
>
> Can you please provide me with some direction for using free text 
> index or pre-computed joins to perform free text search .
>
>
>
> On Tuesday 06 September 2016 06:58 PM, Puja Valiyil wrote:
>
>> Hi Pranav,
>> I think this is likely due to the fact that filters are being 
>> evaluated client side (so it's possible you are bringing back all 110 
>> million triples multiple times in your query). Can you send us your query so 
>> we can verify?
>>
>> Sent from my iPhone
>>
>> On Sep 6, 2016, at 8:49 AM, Meier, Caleb <caleb.me...@parsons.com> wrote:
>>>
>>> Hey Pranav,
>>>
>>> Even if the filtering is occurring client side, that is still 
>>> strange behavior.  What does your query look like?
>>>
>>>
>>> -----Original Message-----
>>> From: pranav.puri [mailto:pranav.p...@orkash.com]
>>> Sent: Monday, September 05, 2016 3:38 AM
>>> To: dev@rya.incubator.apache.org
>>> Subject: regarding slow query execution
>>>
>>> Dear All
>>>
>>>   The query execution time for sparql queries with regex filters is 
>>> much
>>> more(10-12 minutes) as compared to queries with no filters applied.
>>>
>>> The queries are being run on tables containing triples from dbpedia 
>>> dataset.
>>> Each table(ie Spo,osp,po) contains 110 million entries.I am 
>>> currently using a three node accumulo cluster.
>>>
>>> Please suggest some ways to improve the query execution time .
>>>
>>> Regards
>>> Pranav
>>>
>>
>

Reply via email to