On 5/24/11 3:33 PM, Curran Kelleher wrote:
Hi Kingsley,
Thanks for your clarification, but I don't understand why 'first
encounter != distinct'. I was thinking that DISTINCT just causes
duplicate solutions to be excluded from the result set, just like
DISTINCT in SQL.
Yes, but you are speaking about a result set with a Distinct item rather
than a solution to the question: Find me all distinct properties in a
3-tuple, then chop my solution down to a single record, so to speak.
The SPARQL reference states
<http://www.w3.org/TR/rdf-sparql-query/#modDistinct> "The DISTINCT
solution modifier eliminates duplicate solutions.
Yes, but that isn't contrary to the point above.
Specifically, each solution that binds the same variables to the same
RDF terms as another solution is eliminated from the solution set."
This sounds like a first encounter would be added to the result set,
and any subsequent encounters would simply not be added to the result set.
Yes, but to a the DBMS cost of figuring out that subsequent encounters
exist != 0.
In my (probably naiive) understanding, when the engine sees a solution
that has not yet been added to the result set, the engine would add it
to the result set. At that point with our example query, the LIMIT
would be reached and the result could be returned without traversing
any more triples. Am I missing something?
Digest my comments above re. actual cost-optimization matter. It isn't 0
and it isn't just a case of first match.
Is there some reason why the engine would need to compute the entire
result set with before applying the limit? Granted, the SPARQL
reference says "duplicates are eliminated before either limit or
offset is applied", but this is in terms of the abstract result set
specification and can be optimized around (i.e. check the LIMIT
condition after adding each solution to the result set) without
changing the correctness of the results returned. Thanks again!
The delta you see when comparing SPARQL queries with or without DISTINCT
== the cost of determining DISTINCT condition is True via query optimizer.
Kingsley
Best regards,
Curran
On Tue, May 24, 2011 at 1:16 PM, Kingsley Idehen
<kide...@openlinksw.com <mailto:kide...@openlinksw.com>> wrote:
On 5/24/11 1:08 PM, Curran Kelleher wrote:
Greetings,
The problem remains, the following query doesn't execute on the
public DBPedia endpoint
<http://dbpedia.org/snorql/?query=select+distinct+%3Fproperty+where+%7B%0D%0A+++++%3Fs+%3Fproperty+%3Fo.%0D%0A%7D+limit+1>,
even with a limit:
select distinct ?property where {
?s ?property ?o.
} limit 1
Without 'distinct' it does work:
select ?property where {
?s ?property ?o.
} limit 1
Why might this be?
Because Distinct requires more work.
Shouldn't the engine be able to work this one out quickly even
with 'distinct', as it needs to only traverse a single triple to
compute the result?
Really? First encounter != distinct :-)
It seems the engine is doing some unnecessary computation to do
with 'distinct' and is timing out because of it.
LIMIT doesn't simply the Distinct computation. It simply limits
the resultset size.
Kingsley
Best regards,
Curran
On Tue, May 24, 2011 at 9:17 AM, Kingsley Idehen
<kide...@openlinksw.com <mailto:kide...@openlinksw.com>> wrote:
On 5/24/11 8:30 AM, Mohamed Morsey wrote:
> Hi Sarasi,
>
> I've performed that query with limit 1000, and it worked on
one of our
> local endpoints, and it ended within 3 minutes.
> So I guess that the maximum time allowed for a query on the
official
> endpoint is relatively low, but the query itself is
executable with limit.
>
> Hope that helps.
>
All,
If we want to have a live DBpedia endpoint that serves the
whole world,
we have to have it configured in such a way that it forces
use of OFFSET
and LIMIT.
We are very experienced with DBMS oriented data exposed to
massive
concurrent users, and this stems to periods prior to the
pervasive Web
of today, thus we've configured the DBpedia SPARQL endpoint
with this
experience in hand.
Once again, the DBpedia endpoint is for everyone, so we
deliberately
protect against inconsiderate use e.g. attempting to get
massive results
sets in single query passes at the expense of others.
Also remember, you can make application or service specific
instances of
the DBpedia SPARQL endpoint via a number of offerings on EC2
if you seek
hassle free reconstruction of DBpedia.
Links:
1. http://blog.dbpedia.org/2011/01/31/dbpedia-36-ami-available/ .
--
Regards,
Kingsley Idehen
President& CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca
<http://www.openlinksw.com/blog/%7Ekidehen%0ATwitter/Identi.ca>:
kidehen
------------------------------------------------------------------------------
vRanger cuts backup time in half-while increasing security.
With the market-leading solution for virtual backup and recovery,
you get blazing-fast, flexible, and affordable data protection.
Download your free trial now.
http://p.sf.net/sfu/quest-d2dcopy1
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
<mailto:Dbpedia-discussion@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
------------------------------------------------------------------------------
vRanger cuts backup time in half-while increasing security.
With the market-leading solution for virtual backup and recovery,
you get blazing-fast, flexible, and affordable data protection.
Download your free trial now.
http://p.sf.net/sfu/quest-d2dcopy1
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
<mailto:Dbpedia-discussion@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
--
Regards,
Kingsley Idehen
President& CEO
OpenLink Software
Web:http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
<http://www.openlinksw.com/blog/%7Ekidehen> Twitter/Identi.ca:
kidehen
------------------------------------------------------------------------------
vRanger cuts backup time in half-while increasing security.
With the market-leading solution for virtual backup and recovery,
you get blazing-fast, flexible, and affordable data protection.
Download your free trial now.
http://p.sf.net/sfu/quest-d2dcopy1
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
<mailto:Dbpedia-discussion@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
------------------------------------------------------------------------------
vRanger cuts backup time in half-while increasing security.
With the market-leading solution for virtual backup and recovery,
you get blazing-fast, flexible, and affordable data protection.
Download your free trial now.
http://p.sf.net/sfu/quest-d2dcopy1
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
--
Regards,
Kingsley Idehen
President& CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen
------------------------------------------------------------------------------
vRanger cuts backup time in half-while increasing security.
With the market-leading solution for virtual backup and recovery,
you get blazing-fast, flexible, and affordable data protection.
Download your free trial now.
http://p.sf.net/sfu/quest-d2dcopy1
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion