On 5/24/11 3:33 PM, Curran Kelleher wrote:
Hi Kingsley,

Thanks for your clarification, but I don't understand why 'first encounter != distinct'. I was thinking that DISTINCT just causes duplicate solutions to be excluded from the result set, just like DISTINCT in SQL.

Yes, but you are speaking about a result set with a Distinct item rather than a solution to the question: Find me all distinct properties in a 3-tuple, then chop my solution down to a single record, so to speak.

The SPARQL reference states <http://www.w3.org/TR/rdf-sparql-query/#modDistinct> "The DISTINCT solution modifier eliminates duplicate solutions.

Yes, but that isn't contrary to the point above.

Specifically, each solution that binds the same variables to the same RDF terms as another solution is eliminated from the solution set." This sounds like a first encounter would be added to the result set, and any subsequent encounters would simply not be added to the result set.

Yes, but to a the DBMS cost of figuring out that subsequent encounters exist != 0.

In my (probably naiive) understanding, when the engine sees a solution that has not yet been added to the result set, the engine would add it to the result set. At that point with our example query, the LIMIT would be reached and the result could be returned without traversing any more triples. Am I missing something?

Digest my comments above re. actual cost-optimization matter. It isn't 0 and it isn't just a case of first match.

Is there some reason why the engine would need to compute the entire result set with before applying the limit? Granted, the SPARQL reference says "duplicates are eliminated before either limit or offset is applied", but this is in terms of the abstract result set specification and can be optimized around (i.e. check the LIMIT condition after adding each solution to the result set) without changing the correctness of the results returned. Thanks again!

The delta you see when comparing SPARQL queries with or without DISTINCT == the cost of determining DISTINCT condition is True via query optimizer.

Kingsley

Best regards,
Curran

On Tue, May 24, 2011 at 1:16 PM, Kingsley Idehen <kide...@openlinksw.com <mailto:kide...@openlinksw.com>> wrote:

    On 5/24/11 1:08 PM, Curran Kelleher wrote:
    Greetings,

    The problem remains, the following query doesn't execute on the
    public DBPedia endpoint
    
<http://dbpedia.org/snorql/?query=select+distinct+%3Fproperty+where+%7B%0D%0A+++++%3Fs+%3Fproperty+%3Fo.%0D%0A%7D+limit+1>,
    even with a limit:

    select distinct ?property where {
         ?s ?property ?o.
    } limit 1

    Without 'distinct' it does work:

    select ?property where {
         ?s ?property ?o.
    } limit 1

    Why might this be?

    Because Distinct requires more work.


    Shouldn't the engine be able to work this one out quickly even
    with 'distinct', as it needs to only traverse a single triple to
    compute the result?

    Really? First encounter != distinct :-)

    It seems the engine is doing some unnecessary computation to do
    with 'distinct' and is timing out because of it.

    LIMIT doesn't simply the Distinct computation. It simply limits
    the resultset size.

    Kingsley



    Best regards,
    Curran

    On Tue, May 24, 2011 at 9:17 AM, Kingsley Idehen
    <kide...@openlinksw.com <mailto:kide...@openlinksw.com>> wrote:

        On 5/24/11 8:30 AM, Mohamed Morsey wrote:
        > Hi Sarasi,
        >
        > I've performed that query with limit 1000, and it worked on
        one of our
        > local endpoints, and it ended within 3 minutes.
        > So I guess that the maximum time allowed for a query on the
        official
        > endpoint is relatively low, but the query itself is
        executable with limit.
        >
        > Hope that helps.
        >
        All,

        If we want to have a live DBpedia endpoint that serves the
        whole world,
        we have to have it configured in such a way that it forces
        use of OFFSET
        and LIMIT.

        We are very experienced with DBMS oriented data exposed to
        massive
        concurrent users, and this stems to periods prior to the
        pervasive Web
        of today, thus we've configured the DBpedia SPARQL endpoint
        with this
        experience in hand.

        Once again, the DBpedia endpoint is for everyone, so we
        deliberately
        protect against inconsiderate use e.g. attempting to get
        massive results
        sets in single query passes at the expense of others.

        Also remember, you can make application or service specific
        instances of
        the DBpedia SPARQL endpoint via a number of offerings on EC2
        if you seek
        hassle free reconstruction of DBpedia.

        Links:

        1. http://blog.dbpedia.org/2011/01/31/dbpedia-36-ami-available/ .

        --

        Regards,

        Kingsley Idehen
        President&  CEO
        OpenLink Software
        Web: http://www.openlinksw.com
        Weblog: http://www.openlinksw.com/blog/~kidehen
        Twitter/Identi.ca
        <http://www.openlinksw.com/blog/%7Ekidehen%0ATwitter/Identi.ca>:
        kidehen






        
------------------------------------------------------------------------------
        vRanger cuts backup time in half-while increasing security.
        With the market-leading solution for virtual backup and recovery,
        you get blazing-fast, flexible, and affordable data protection.
        Download your free trial now.
        http://p.sf.net/sfu/quest-d2dcopy1
        _______________________________________________
        Dbpedia-discussion mailing list
        Dbpedia-discussion@lists.sourceforge.net
        <mailto:Dbpedia-discussion@lists.sourceforge.net>
        https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion



    
------------------------------------------------------------------------------
    vRanger cuts backup time in half-while increasing security.
    With the market-leading solution for virtual backup and recovery,
    you get blazing-fast, flexible, and affordable data protection.
    Download your free trial now.
    http://p.sf.net/sfu/quest-d2dcopy1


    _______________________________________________
    Dbpedia-discussion mailing list
    Dbpedia-discussion@lists.sourceforge.net  
<mailto:Dbpedia-discussion@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion


--
    Regards,

    Kingsley Idehen     
    President&  CEO
    OpenLink Software
    Web:http://www.openlinksw.com
    Weblog: http://www.openlinksw.com/blog/~kidehen
    <http://www.openlinksw.com/blog/%7Ekidehen> Twitter/Identi.ca:
    kidehen


    
------------------------------------------------------------------------------
    vRanger cuts backup time in half-while increasing security.
    With the market-leading solution for virtual backup and recovery,
    you get blazing-fast, flexible, and affordable data protection.
    Download your free trial now.
    http://p.sf.net/sfu/quest-d2dcopy1
    _______________________________________________
    Dbpedia-discussion mailing list
    Dbpedia-discussion@lists.sourceforge.net
    <mailto:Dbpedia-discussion@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion



------------------------------------------------------------------------------
vRanger cuts backup time in half-while increasing security.
With the market-leading solution for virtual backup and recovery,
you get blazing-fast, flexible, and affordable data protection.
Download your free trial now.
http://p.sf.net/sfu/quest-d2dcopy1


_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion


--

Regards,

Kingsley Idehen 
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen





------------------------------------------------------------------------------
vRanger cuts backup time in half-while increasing security.
With the market-leading solution for virtual backup and recovery, 
you get blazing-fast, flexible, and affordable data protection.
Download your free trial now. 
http://p.sf.net/sfu/quest-d2dcopy1
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to