Re: The truth about SPARQL Endpoint availability

Kingsley Idehen Sun, 06 Mar 2011 06:57:50 -0800

On 3/6/11 7:56 AM, Richard Cyganiak wrote:

On 6 Mar 2011, at 12:16, Christopher Gutteridge wrote:

Talk of how many triples are in a store puts me in mind of this quote
     "Measuring programming progress by lines of code is like measuring aircraft 
building progress by weight."

Well, but you know that quality on the Web of Data is measured in million 
triples! ;-)


Jokes aside, as long as triple store performance is a frequent limiting factor, 
triple counts are important.

“We can't load that dataset, it would be another 200MT, this would kill our 
store”
“Their dataset is only 100kT, so how come their endpoint is so slow?”
“Well if you have a million triples then you should be ok with any of the major 
stores on the hardware you already have.”
“Given the load rate we typically get on our store, loading this dataset should 
take till tuesday.”
“Wow, this new dataset increases the total number of triples in the LOD Cloud 
by 3%!”

You might object to some, but surely not all, of these uses of triple counts.

there's very few webmasters out there willing to do extra work just so we can 
make pretty graphs.

Aside: As a maker of pretty graphs, I can tell you that you would be surprised.

Enjoy your Sunday!

Richard

In addition to the above, smart SPARQL-FED [1] isn't achievable withoutgood stats about SPARQL endpoints. Locality aware cost optimization isvery dependent on metadata [2] gleaned from remote data sourcesassociated with a SPARQL endpoint. What's good for SQL is well and trulygood for SPARQL re. data virtualization, assuming Triple/Quad stores area sub-category of DBMS. We can leverage voID when making SPARQL endpointdescription metadata. It's actually very important from a pragmatic viewpoint, especially if we truly believe in the crystallization of the Webas a Global Data Space.

I don't expect users or Web developers to write SPARQL-FED, but I doexpect them to assume and/or demand the Linked Data experience thatSPARQL-FED, SPARQL Endpoint Metadata, and voID facilitate.


Links:

1. http://www.w3.org/TR/sparql-features/#Basic_federated_query - SPARQL-FED

2. http://www.w3.org/TR/sparql-features/#Service_description -- SPARQLendpoint metadata.


Kingsley



Ian Davis wrote:

Is the number of triples that important? With all respect to the
people on this list, I think there's a tendency to obsess over triple
counts. Aren't we past that bootstrap phase of being awed when we see
millions of triples being produced?  I thought we'd moved towards
being more focussed on quality and utility of data than sheer numbers?

Besides, for me the most interesting datasets are those that are
continually changing as they reflect the real world and I'd like to
see us work towards metrics for freshness and coverage.


On Sun, Mar 6, 2011 at 11:20 AM, Tim Berners-Lee
<ti...@w3.org>
  wrote:

Maybe the count of triples should be special-cased in the sparql server code,
spotted on input and the store size returned.
if it is reasonable for the endpoint to keep track of the size of its store.
(Do they anyway?)

Tim

On 2011-03 -05, at 11:58, Bill Roberts wrote:

Thanks Hugh - as someone running a couple of SPARQL endpoints, I'd certainly 
prefer if people don't run a global count too often (or at all). It is indeed 
something that makes typical SPARQL implementations work very hard.

But it's a good reminder we should provide an alternative and i'll look into 
providing triple counts in voiD.

Bill


On 5 Mar 2011, at 15:14, Hugh Glaser wrote:

Hi,
On 5 Mar 2011, at 14:22, Andrea Splendiani wrote:

Hi,

I think it depends on the store, I've tried some (from the endpoint list) and 
some returns a answer pretty quickly. Some doesn't and some doesn't support 
count.
However, one could have this information only for the stores that answers the 
count query, no need to try all time.

I am happy for a store implementor or owner to disagree, but I find it very 
unlikely that the owner of a store with a decent chunk of data (>  1M triples, 
say) would be happy for someone to keep issuing such a query, even if they did 
decide to give enough resources to execute it.
I would quickly blacklist such a site.

VoID:
is this a good query:
select * where {?s
<http://rdfs.org/ns/void#numberOfTriples>
  ?o }

I'm no SPARQL or voiD guru, but I think you need a bit more wrapping in the 
scovo stuff, so more like:

SELECT DISTINCT ?endpoint ?uri ?triples ?uris WHERE
          { ?ds a void:Dataset .
            ?ds void:sparqlEndpoint ?uri .
            ?ds rdfs:label ?endpoint .
            ?ds void:statItem [ scovo:dimension void:numberOfTriples ; 
rdf:value  ?triples ] .
         }

Try it at

http://kwijibo.talis.com/voiD/

or

http://void.rkbexplorer.com/


I guess Pierre-Yves might like to enhance his page by querying a voiD store to 
also give basic stats.
Or someone might like to do a store reporter that uses (a) voiD endpoint(s) 
plus Pierre-Yves's data (he has a SPARQL endpoint), to do so.
And maybe the CKAN endpoint would have extra useful data as well.
A real Semantic Web application that queried more than one SPARQL endpoint - 
now that would be a novelty!
Fancy the challenge, it is the weekend?! :-)

ciao
Hugh

it doesn't seem viable if so.

ciao,
Andrea


Il giorno 05/mar/2011, alle ore 13.49, Hugh Glaser ha scritto:

NIce idea, but,... :-)

SELECT (count(*) as ?c) WHERE {?s ?p ?o}

is a pretty anti-social thing to do to a store.
At best, a store of any size will spend a while thinking, and then quite 
rightly decide they have burnt enough resources, and return some sort of error.

For a properly maintained site, of course, the VoiD description will give lots 
of similar information.
Best
Hugh

On 5 Mar 2011, at 13:06, Andrea Splendiani wrote:

Hi, very nice!
I have a small suggestion:

why don't you ask "count(*) where {?s ?p ?o}" to the endpoint ?
Or ask for the number of graphs ?
Both information, number of triples and number of graphs, if logged and 
compared over time, can give a practical view of the liveliness of the content 
of the endpoint.

best,
Andrea Splendiani


Il giorno 28/feb/2011, alle ore 18.55, Pierre-Yves Vandenbussche ha scritto:

Hello all,

you have already encountered problems of SPARQL endpoint accessibility ?
you feel frustrated they are never available when you need them?
you develop an application using these services but wonder if it is reliable?

Here is a tool [1] that allows you to know public SPARQL endpoints availability 
and monitor them in the last hours/days.
Stay informed of a particular (or all) endpoint status changes through RSS 
feeds.
All availability information generated by this tool is accessible through a 
SPARQL endpoint.

This tool fetches public SPARQL endpoints from CKAN  open data. From this list, 
it runs tests every hour for availability.

[1]
http://labs.mondeca.com/sparqlEndpointsStatus/index.html

[2]
http://ckan.net/


Pierre-Yves Vandenbussche.

Andrea Splendiani
Senior Bioinformatics Scientist
Centre for Mathematical and Computational Biology
+44(0)1582 763133 ext 2004

andrea.splendi...@bbsrc.ac.uk

--
Hugh Glaser,
           Intelligence, Agents, Multimedia
           School of Electronics and Computer Science,
           University of Southampton,
           Southampton SO17 1BJ
Work: +44 23 8059 3670, Fax: +44 23 8059 3045
Mobile: +44 78 9422 3822, Home: +44 23 8061 5652

http://www.ecs.soton.ac.uk/~hg/

Andrea Splendiani
Senior Bioinformatics Scientist
Centre for Mathematical and Computational Biology
+44(0)1582 763133 ext 2004

andrea.splendi...@bbsrc.ac.uk

--
Hugh Glaser,
             Intelligence, Agents, Multimedia
             School of Electronics and Computer Science,
             University of Southampton,
             Southampton SO17 1BJ
Work: +44 23 8059 3670, Fax: +44 23 8059 3045
Mobile: +44 78 9422 3822, Home: +44 23 8061 5652

http://www.ecs.soton.ac.uk/~hg/

--
Christopher Gutteridge --
http://id.ecs.soton.ac.uk/person/1248


You should read the ECS Web Team blog:
http://blogs.ecs.soton.ac.uk/webteam/



--

Regards,

Kingsley Idehen 
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Re: The truth about SPARQL Endpoint availability

Reply via email to