Re: The truth about SPARQL Endpoint availability
Maybe the count of triples should be special-cased in the sparql server code, spotted on input and the store size returned. if it is reasonable for the endpoint to keep track of the size of its store. (Do they anyway?) Tim On 2011-03 -05, at 11:58, Bill Roberts wrote: Thanks Hugh - as someone running a couple of SPARQL endpoints, I'd certainly prefer if people don't run a global count too often (or at all). It is indeed something that makes typical SPARQL implementations work very hard. But it's a good reminder we should provide an alternative and i'll look into providing triple counts in voiD. Bill On 5 Mar 2011, at 15:14, Hugh Glaser wrote: Hi, On 5 Mar 2011, at 14:22, Andrea Splendiani wrote: Hi, I think it depends on the store, I've tried some (from the endpoint list) and some returns a answer pretty quickly. Some doesn't and some doesn't support count. However, one could have this information only for the stores that answers the count query, no need to try all time. I am happy for a store implementor or owner to disagree, but I find it very unlikely that the owner of a store with a decent chunk of data ( 1M triples, say) would be happy for someone to keep issuing such a query, even if they did decide to give enough resources to execute it. I would quickly blacklist such a site. VoID: is this a good query: select * where {?s http://rdfs.org/ns/void#numberOfTriples ?o } I'm no SPARQL or voiD guru, but I think you need a bit more wrapping in the scovo stuff, so more like: SELECT DISTINCT ?endpoint ?uri ?triples ?uris WHERE { ?ds a void:Dataset . ?ds void:sparqlEndpoint ?uri . ?ds rdfs:label ?endpoint . ?ds void:statItem [ scovo:dimension void:numberOfTriples ; rdf:value ?triples ] . } Try it at http://kwijibo.talis.com/voiD/ or http://void.rkbexplorer.com/ I guess Pierre-Yves might like to enhance his page by querying a voiD store to also give basic stats. Or someone might like to do a store reporter that uses (a) voiD endpoint(s) plus Pierre-Yves's data (he has a SPARQL endpoint), to do so. And maybe the CKAN endpoint would have extra useful data as well. A real Semantic Web application that queried more than one SPARQL endpoint - now that would be a novelty! Fancy the challenge, it is the weekend?! :-) ciao Hugh it doesn't seem viable if so. ciao, Andrea Il giorno 05/mar/2011, alle ore 13.49, Hugh Glaser ha scritto: NIce idea, but,... :-) SELECT (count(*) as ?c) WHERE {?s ?p ?o} is a pretty anti-social thing to do to a store. At best, a store of any size will spend a while thinking, and then quite rightly decide they have burnt enough resources, and return some sort of error. For a properly maintained site, of course, the VoiD description will give lots of similar information. Best Hugh On 5 Mar 2011, at 13:06, Andrea Splendiani wrote: Hi, very nice! I have a small suggestion: why don't you ask count(*) where {?s ?p ?o} to the endpoint ? Or ask for the number of graphs ? Both information, number of triples and number of graphs, if logged and compared over time, can give a practical view of the liveliness of the content of the endpoint. best, Andrea Splendiani Il giorno 28/feb/2011, alle ore 18.55, Pierre-Yves Vandenbussche ha scritto: Hello all, you have already encountered problems of SPARQL endpoint accessibility ? you feel frustrated they are never available when you need them? you develop an application using these services but wonder if it is reliable? Here is a tool [1] that allows you to know public SPARQL endpoints availability and monitor them in the last hours/days. Stay informed of a particular (or all) endpoint status changes through RSS feeds. All availability information generated by this tool is accessible through a SPARQL endpoint. This tool fetches public SPARQL endpoints from CKAN open data. From this list, it runs tests every hour for availability. [1] http://labs.mondeca.com/sparqlEndpointsStatus/index.html [2] http://ckan.net/ Pierre-Yves Vandenbussche. Andrea Splendiani Senior Bioinformatics Scientist Centre for Mathematical and Computational Biology +44(0)1582 763133 ext 2004 andrea.splendi...@bbsrc.ac.uk -- Hugh Glaser, Intelligence, Agents, Multimedia School of Electronics and Computer Science, University of Southampton, Southampton SO17 1BJ Work: +44 23 8059 3670, Fax: +44 23 8059 3045 Mobile: +44 78 9422 3822, Home: +44 23 8061 5652 http://www.ecs.soton.ac.uk/~hg/ Andrea Splendiani Senior Bioinformatics Scientist Centre for Mathematical and Computational Biology +44(0)1582 763133 ext 2004 andrea.splendi...@bbsrc.ac.uk -- Hugh Glaser, Intelligence, Agents, Multimedia School of Electronics and Computer Science,
CfP - Semantic Web in a Mobile World
Call for Papers === Special Issue of the Journal of Web Semantics on Semantic Web in a Mobile World *** Apologies for Cross-Postings *** Description === Mobile users’ information needs are becoming more important than ever. It is estimated that access to the Internet by mobile phones will exceed desktop computers by 2013 (http://www.gartner.com/it/page.jsp?id= 1278413). This will make Internet-enabled mobile devices the key access point to the information and service infrastructure of the Internet. When accessing the web with a mobile device, one is not only overwhelmed with the information provided but also hampered by the limitations mobile devices have such as limited interaction possibilities and smaller display size. On the other hand, mobile devices provide various sensors, for instance GPS/aGPS, accelerometer, compass, microphone, and cameras. These sensors can enable capturing the users’ context and using the context to addressing the mobile users’ needs. The Semantic Web is of high benefit for mobile computing. Not only does it provide a common interlingua for devices to reason and negotiate about context, it also provides a lot of “facts” that can be used in inferring context, such as knowledge about individuals, places, organizations, and events. The availability of such information has blossomed since the advent of the Linked Data (http://linkeddata.org/) movement in 2007. However, such a Mobile Semantic Computing is still in its early beginnings. In this special issue, we invite original work in the area of Mobile Semantic Computing that show innovative solutions and demonstrate the benefits of semantic technologies for mobile devices and mobile applications. Besides a solid research contribution, we expect systems and applications that have been evaluated with respect to its usefulness. In addition, we like to see contributions that share their data set or make services they use publicly available. Topics of Interest == The topics of interests for this special issue include but are not limited to: - RDF/Linked Data storage and processing on mobile devices - Data and information management on mobile devices - Reasoning on mobile devices - Mobile indexing and retrieving of multimedia data such as audio, video, images, and text - Pub-/sub-systems and middleware for mobile semantic applications - Scalability and performance of semantic mobile technologies - Mobile semantic user profiling and context modeling - Mobile semantic cloud computing - Interoperability of mobile semantic applications - Browsing semantic data on mobile devices - Mobile semantic annotation and peer tagging - Mobile semantic mash-ups - Mobile semantic multimedia - Mobile applications for the social semantic web - Mobile semantic e-learning and collaboration - Location-aware mobile semantic applications - Mobile semantic eGovernment applications and services - Innovative and novel user interfaces for mobile semantic applications - Development methods and tools for mobile semantic applications - Privacy and security for mobile semantic devices and applications - Data sets for the mobile semantic web Important Dates === - *** Submission of papers October 1, 2011 *** - Acceptance/revision notification: December 20, 2011 - Revised manuscript due: February 29, 2012 - Final acceptance notification: May 1, 2012 - Final manuscript due: June 15, 2012 - Tentative publication: August/September 2012 Submissions === Please see the author guidelines for detailed instructions before you submit: http://www.elsevier.com/wps/find/journaldescription.cws_home/671322/authorinstructions Submissions should be conducted through Elsevier’s Electronic Submission System (http://ees.elsevier.com/jws/). More details on the Journal of Web Semantics can be found on its homepage: http://www.elsevier.com/locate/websem Guest Editors = - Ansgar Scherp (University of Koblenz-Landau, Germany) - Anupam Joshi (University of Maryland Baltimore County, USA) -- Dr. Ansgar Scherp University of Koblenz-Landau Institute for Web Science and Technology Phone: +49(0)261/287-2717 Universitaetsstrasse 1 Fax : +49(0)261/287-2721 D-56070 Koblenz, Germany Mail : sch...@uni-koblenz.de ~ http://kreuzverweis.com - Media with a Meaning
Re: The truth about SPARQL Endpoint availability
Is the number of triples that important? With all respect to the people on this list, I think there's a tendency to obsess over triple counts. Aren't we past that bootstrap phase of being awed when we see millions of triples being produced? I thought we'd moved towards being more focussed on quality and utility of data than sheer numbers? Besides, for me the most interesting datasets are those that are continually changing as they reflect the real world and I'd like to see us work towards metrics for freshness and coverage. On Sun, Mar 6, 2011 at 11:20 AM, Tim Berners-Lee ti...@w3.org wrote: Maybe the count of triples should be special-cased in the sparql server code, spotted on input and the store size returned. if it is reasonable for the endpoint to keep track of the size of its store. (Do they anyway?) Tim On 2011-03 -05, at 11:58, Bill Roberts wrote: Thanks Hugh - as someone running a couple of SPARQL endpoints, I'd certainly prefer if people don't run a global count too often (or at all). It is indeed something that makes typical SPARQL implementations work very hard. But it's a good reminder we should provide an alternative and i'll look into providing triple counts in voiD. Bill On 5 Mar 2011, at 15:14, Hugh Glaser wrote: Hi, On 5 Mar 2011, at 14:22, Andrea Splendiani wrote: Hi, I think it depends on the store, I've tried some (from the endpoint list) and some returns a answer pretty quickly. Some doesn't and some doesn't support count. However, one could have this information only for the stores that answers the count query, no need to try all time. I am happy for a store implementor or owner to disagree, but I find it very unlikely that the owner of a store with a decent chunk of data ( 1M triples, say) would be happy for someone to keep issuing such a query, even if they did decide to give enough resources to execute it. I would quickly blacklist such a site. VoID: is this a good query: select * where {?s http://rdfs.org/ns/void#numberOfTriples ?o } I'm no SPARQL or voiD guru, but I think you need a bit more wrapping in the scovo stuff, so more like: SELECT DISTINCT ?endpoint ?uri ?triples ?uris WHERE { ?ds a void:Dataset . ?ds void:sparqlEndpoint ?uri . ?ds rdfs:label ?endpoint . ?ds void:statItem [ scovo:dimension void:numberOfTriples ; rdf:value ?triples ] . } Try it at http://kwijibo.talis.com/voiD/ or http://void.rkbexplorer.com/ I guess Pierre-Yves might like to enhance his page by querying a voiD store to also give basic stats. Or someone might like to do a store reporter that uses (a) voiD endpoint(s) plus Pierre-Yves's data (he has a SPARQL endpoint), to do so. And maybe the CKAN endpoint would have extra useful data as well. A real Semantic Web application that queried more than one SPARQL endpoint - now that would be a novelty! Fancy the challenge, it is the weekend?! :-) ciao Hugh it doesn't seem viable if so. ciao, Andrea Il giorno 05/mar/2011, alle ore 13.49, Hugh Glaser ha scritto: NIce idea, but,... :-) SELECT (count(*) as ?c) WHERE {?s ?p ?o} is a pretty anti-social thing to do to a store. At best, a store of any size will spend a while thinking, and then quite rightly decide they have burnt enough resources, and return some sort of error. For a properly maintained site, of course, the VoiD description will give lots of similar information. Best Hugh On 5 Mar 2011, at 13:06, Andrea Splendiani wrote: Hi, very nice! I have a small suggestion: why don't you ask count(*) where {?s ?p ?o} to the endpoint ? Or ask for the number of graphs ? Both information, number of triples and number of graphs, if logged and compared over time, can give a practical view of the liveliness of the content of the endpoint. best, Andrea Splendiani Il giorno 28/feb/2011, alle ore 18.55, Pierre-Yves Vandenbussche ha scritto: Hello all, you have already encountered problems of SPARQL endpoint accessibility ? you feel frustrated they are never available when you need them? you develop an application using these services but wonder if it is reliable? Here is a tool [1] that allows you to know public SPARQL endpoints availability and monitor them in the last hours/days. Stay informed of a particular (or all) endpoint status changes through RSS feeds. All availability information generated by this tool is accessible through a SPARQL endpoint. This tool fetches public SPARQL endpoints from CKAN open data. From this list, it runs tests every hour for availability. [1] http://labs.mondeca.com/sparqlEndpointsStatus/index.html [2] http://ckan.net/ Pierre-Yves Vandenbussche. Andrea Splendiani Senior Bioinformatics Scientist Centre for Mathematical and Computational Biology +44(0)1582 763133 ext 2004 andrea.splendi...@bbsrc.ac.uk -- Hugh Glaser, Intelligence, Agents, Multimedia
Re: The truth about SPARQL Endpoint availability
Yes, I was puzzling over this. And then what other useful things might be special-cased. (classes?, even a dump of rdfs:label?) But it sort of sticks in the craw to do that. And I keep coming back to the fact that there is already a way of doing this in the Linked Data world. If you have the voiD description in your endpoint, then it all just works. And it can be queried or browsed, etc. So for example (in our case) querying http://southampton.rkbexplorer.com/sparql/ with {?s void:sparqlEndpoint http://southampton.rkbexplorer.com/sparql/} gives ?s = http://southampton.rkbexplorer.com/id/void And bingo! Browse the store metadata as LD or SPARQL to your heart's content, as you would with any other data we offer from that store. And whatever else metadata about the store that is wanted can be proposed as different vocabs or extensions to voiD. One thing I am puzzling over, though: Should http://southampton.rkbexplorer.com/sparql/ be Linked Data? Currently if you ask for RDF we give 406 Not Acceptable. It might be helpful to 303 to an RDF description; and if so, would it look different from the voiD description? Or certainly somehow getting back an rdfs:seeAlso would be valid. Or is this already sorted out somewhere that I have missed? (I told you I'm not a voiD guru :-) ) Best Hugh On 6 Mar 2011, at 11:20, Tim Berners-Lee wrote: Maybe the count of triples should be special-cased in the sparql server code, spotted on input and the store size returned. if it is reasonable for the endpoint to keep track of the size of its store. (Do they anyway?) Tim On 2011-03 -05, at 11:58, Bill Roberts wrote: Thanks Hugh - as someone running a couple of SPARQL endpoints, I'd certainly prefer if people don't run a global count too often (or at all). It is indeed something that makes typical SPARQL implementations work very hard. But it's a good reminder we should provide an alternative and i'll look into providing triple counts in voiD. Bill On 5 Mar 2011, at 15:14, Hugh Glaser wrote: Hi, On 5 Mar 2011, at 14:22, Andrea Splendiani wrote: Hi, I think it depends on the store, I've tried some (from the endpoint list) and some returns a answer pretty quickly. Some doesn't and some doesn't support count. However, one could have this information only for the stores that answers the count query, no need to try all time. I am happy for a store implementor or owner to disagree, but I find it very unlikely that the owner of a store with a decent chunk of data ( 1M triples, say) would be happy for someone to keep issuing such a query, even if they did decide to give enough resources to execute it. I would quickly blacklist such a site. VoID: is this a good query: select * where {?s http://rdfs.org/ns/void#numberOfTriples ?o } I'm no SPARQL or voiD guru, but I think you need a bit more wrapping in the scovo stuff, so more like: SELECT DISTINCT ?endpoint ?uri ?triples ?uris WHERE { ?ds a void:Dataset . ?ds void:sparqlEndpoint ?uri . ?ds rdfs:label ?endpoint . ?ds void:statItem [ scovo:dimension void:numberOfTriples ; rdf:value ?triples ] . } Try it at http://kwijibo.talis.com/voiD/ or http://void.rkbexplorer.com/ I guess Pierre-Yves might like to enhance his page by querying a voiD store to also give basic stats. Or someone might like to do a store reporter that uses (a) voiD endpoint(s) plus Pierre-Yves's data (he has a SPARQL endpoint), to do so. And maybe the CKAN endpoint would have extra useful data as well. A real Semantic Web application that queried more than one SPARQL endpoint - now that would be a novelty! Fancy the challenge, it is the weekend?! :-) ciao Hugh it doesn't seem viable if so. ciao, Andrea Il giorno 05/mar/2011, alle ore 13.49, Hugh Glaser ha scritto: NIce idea, but,... :-) SELECT (count(*) as ?c) WHERE {?s ?p ?o} is a pretty anti-social thing to do to a store. At best, a store of any size will spend a while thinking, and then quite rightly decide they have burnt enough resources, and return some sort of error. For a properly maintained site, of course, the VoiD description will give lots of similar information. Best Hugh On 5 Mar 2011, at 13:06, Andrea Splendiani wrote: Hi, very nice! I have a small suggestion: why don't you ask count(*) where {?s ?p ?o} to the endpoint ? Or ask for the number of graphs ? Both information, number of triples and number of graphs, if logged and compared over time, can give a practical view of the liveliness of the content of the endpoint. best, Andrea Splendiani Il giorno 28/feb/2011, alle ore 18.55, Pierre-Yves Vandenbussche ha scritto: Hello all, you have already encountered problems of SPARQL endpoint accessibility ? you feel frustrated they are never available when you need them? you develop an application using these services but wonder if it is
Re: The truth about SPARQL Endpoint availability
Talk of how many triples are in a store puts me in mind of this quote Measuring programming progress by lines of code is like measuring aircraft building progress by weight. At every stage we should be able to answer a key question from someone setting up a linked data site for the first time, and that question is What's in it for me? If we (the community interested in the development of Linked Data) want to get data on linkage and size of datasets, then the tools better do that automatically, because there's very few webmasters out there willing to do extra work just so we can make pretty graphs. Ian Davis wrote: Is the number of triples that important? With all respect to the people on this list, I think there's a tendency to obsess over triple counts. Aren't we past that bootstrap phase of being awed when we see millions of triples being produced? I thought we'd moved towards being more focussed on quality and utility of data than sheer numbers? Besides, for me the most interesting datasets are those that are continually changing as they reflect the real world and I'd like to see us work towards metrics for freshness and coverage. On Sun, Mar 6, 2011 at 11:20 AM, Tim Berners-Lee ti...@w3.org wrote: Maybe the count of triples should be special-cased in the sparql server code, spotted on input and the store size returned. if it is reasonable for the endpoint to keep track of the size of its store. (Do they anyway?) Tim On 2011-03 -05, at 11:58, Bill Roberts wrote: Thanks Hugh - as someone running a couple of SPARQL endpoints, I'd certainly prefer if people don't run a global count too often (or at all). It is indeed something that makes typical SPARQL implementations work very hard. But it's a good reminder we should provide an alternative and i'll look into providing triple counts in voiD. Bill On 5 Mar 2011, at 15:14, Hugh Glaser wrote: Hi, On 5 Mar 2011, at 14:22, Andrea Splendiani wrote: Hi, I think it depends on the store, I've tried some (from the endpoint list) and some returns a answer pretty quickly. Some doesn't and some doesn't support count. However, one could have this information only for the stores that answers the count query, no need to try all time. I am happy for a store implementor or owner to disagree, but I find it very unlikely that the owner of a store with a decent chunk of data ( 1M triples, say) would be happy for someone to keep issuing such a query, even if they did decide to give enough resources to execute it. I would quickly blacklist such a site. VoID: is this a good query: select * where {?s http://rdfs.org/ns/void#numberOfTriples ?o } I'm no SPARQL or voiD guru, but I think you need a bit more wrapping in the scovo stuff, so more like: SELECT DISTINCT ?endpoint ?uri ?triples ?uris WHERE { ?ds a void:Dataset . ?ds void:sparqlEndpoint ?uri . ?ds rdfs:label ?endpoint . ?ds void:statItem [ scovo:dimension void:numberOfTriples ; rdf:value ?triples ] . } Try it at http://kwijibo.talis.com/voiD/ or http://void.rkbexplorer.com/ I guess Pierre-Yves might like to enhance his page by querying a voiD store to also give basic stats. Or someone might like to do a store reporter that uses (a) voiD endpoint(s) plus Pierre-Yves's data (he has a SPARQL endpoint), to do so. And maybe the CKAN endpoint would have extra useful data as well. A real Semantic Web application that queried more than one SPARQL endpoint - now that would be a novelty! Fancy the challenge, it is the weekend?! :-) ciao Hugh it doesn't seem viable if so. ciao, Andrea Il giorno 05/mar/2011, alle ore 13.49, Hugh Glaser ha scritto: NIce idea, but,... :-) SELECT (count(*) as ?c) WHERE {?s ?p ?o} is a pretty anti-social thing to do to a store. At best, a store of any size will spend a while thinking, and then quite rightly decide they have burnt enough resources, and return some sort of error. For a properly maintained site, of course, the VoiD description will give lots of similar information. Best Hugh On 5 Mar 2011, at 13:06, Andrea Splendiani wrote: Hi, very nice! I have a small suggestion: why don't you ask count(*) where {?s ?p ?o} to the endpoint ? Or ask for the number of graphs ? Both information, number of triples and number of graphs, if logged and compared over time, can give a practical view of the liveliness of the content of the endpoint. best, Andrea Splendiani Il giorno 28/feb/2011, alle ore 18.55, Pierre-Yves Vandenbussche ha scritto: Hello all, you have already encountered problems of SPARQL endpoint accessibility ? you feel frustrated they are never available when you need them? you develop an application using these services but wonder if it is reliable? Here is a tool [1] that allows you to know public SPARQL endpoints availability and monitor them in the last
Re: The truth about SPARQL Endpoint availability
I believe that people coming from a MySQL (well MyISAM, specifically) background would assume a global COUNT to be fast, since it's a O(1) operation on a MyISAM table with a primary key. Another way to go would be to add a NOOP command to SPARQL, surely? Dan On 6 Mar 2011, at 11:20, Tim Berners-Lee wrote: Maybe the count of triples should be special-cased in the sparql server code, spotted on input and the store size returned. if it is reasonable for the endpoint to keep track of the size of its store. (Do they anyway?) Tim On 2011-03 -05, at 11:58, Bill Roberts wrote: Thanks Hugh - as someone running a couple of SPARQL endpoints, I'd certainly prefer if people don't run a global count too often (or at all). It is indeed something that makes typical SPARQL implementations work very hard. But it's a good reminder we should provide an alternative and i'll look into providing triple counts in voiD. Bill On 5 Mar 2011, at 15:14, Hugh Glaser wrote: Hi, On 5 Mar 2011, at 14:22, Andrea Splendiani wrote: Hi, I think it depends on the store, I've tried some (from the endpoint list) and some returns a answer pretty quickly. Some doesn't and some doesn't support count. However, one could have this information only for the stores that answers the count query, no need to try all time. I am happy for a store implementor or owner to disagree, but I find it very unlikely that the owner of a store with a decent chunk of data ( 1M triples, say) would be happy for someone to keep issuing such a query, even if they did decide to give enough resources to execute it. I would quickly blacklist such a site. VoID: is this a good query: select * where {?s http://rdfs.org/ns/void#numberOfTriples ?o } I'm no SPARQL or voiD guru, but I think you need a bit more wrapping in the scovo stuff, so more like: SELECT DISTINCT ?endpoint ?uri ?triples ?uris WHERE { ?ds a void:Dataset . ?ds void:sparqlEndpoint ?uri . ?ds rdfs:label ?endpoint . ?ds void:statItem [ scovo:dimension void:numberOfTriples ; rdf:value ?triples ] . } Try it at http://kwijibo.talis.com/voiD/ or http://void.rkbexplorer.com/ I guess Pierre-Yves might like to enhance his page by querying a voiD store to also give basic stats. Or someone might like to do a store reporter that uses (a) voiD endpoint(s) plus Pierre-Yves's data (he has a SPARQL endpoint), to do so. And maybe the CKAN endpoint would have extra useful data as well. A real Semantic Web application that queried more than one SPARQL endpoint - now that would be a novelty! Fancy the challenge, it is the weekend?! :-) ciao Hugh it doesn't seem viable if so. ciao, Andrea Il giorno 05/mar/2011, alle ore 13.49, Hugh Glaser ha scritto: NIce idea, but,... :-) SELECT (count(*) as ?c) WHERE {?s ?p ?o} is a pretty anti-social thing to do to a store. At best, a store of any size will spend a while thinking, and then quite rightly decide they have burnt enough resources, and return some sort of error. For a properly maintained site, of course, the VoiD description will give lots of similar information. Best Hugh On 5 Mar 2011, at 13:06, Andrea Splendiani wrote: Hi, very nice! I have a small suggestion: why don't you ask count(*) where {?s ?p ?o} to the endpoint ? Or ask for the number of graphs ? Both information, number of triples and number of graphs, if logged and compared over time, can give a practical view of the liveliness of the content of the endpoint. best, Andrea Splendiani Il giorno 28/feb/2011, alle ore 18.55, Pierre-Yves Vandenbussche ha scritto: Hello all, you have already encountered problems of SPARQL endpoint accessibility ? you feel frustrated they are never available when you need them? you develop an application using these services but wonder if it is reliable? Here is a tool [1] that allows you to know public SPARQL endpoints availability and monitor them in the last hours/days. Stay informed of a particular (or all) endpoint status changes through RSS feeds. All availability information generated by this tool is accessible through a SPARQL endpoint. This tool fetches public SPARQL endpoints from CKAN open data. From this list, it runs tests every hour for availability. [1] http://labs.mondeca.com/sparqlEndpointsStatus/index.html [2] http://ckan.net/ Pierre-Yves Vandenbussche. Andrea Splendiani Senior Bioinformatics Scientist Centre for Mathematical and Computational Biology +44(0)1582 763133 ext 2004 andrea.splendi...@bbsrc.ac.uk -- Hugh Glaser, Intelligence, Agents, Multimedia School of Electronics and Computer Science, University of Southampton, Southampton SO17 1BJ Work: +44 23 8059 3670, Fax: +44 23 8059 3045 Mobile: +44 78 9422 3822, Home: +44 23 8061 5652 http://www.ecs.soton.ac.uk/~hg/
Re: The truth about SPARQL Endpoint availability
On 6 Mar 2011, at 12:16, Christopher Gutteridge wrote: Talk of how many triples are in a store puts me in mind of this quote Measuring programming progress by lines of code is like measuring aircraft building progress by weight. Well, but you know that quality on the Web of Data is measured in million triples! ;-) Jokes aside, as long as triple store performance is a frequent limiting factor, triple counts are important. “We can't load that dataset, it would be another 200MT, this would kill our store” “Their dataset is only 100kT, so how come their endpoint is so slow?” “Well if you have a million triples then you should be ok with any of the major stores on the hardware you already have.” “Given the load rate we typically get on our store, loading this dataset should take till tuesday.” “Wow, this new dataset increases the total number of triples in the LOD Cloud by 3%!” You might object to some, but surely not all, of these uses of triple counts. there's very few webmasters out there willing to do extra work just so we can make pretty graphs. Aside: As a maker of pretty graphs, I can tell you that you would be surprised. Enjoy your Sunday! Richard Ian Davis wrote: Is the number of triples that important? With all respect to the people on this list, I think there's a tendency to obsess over triple counts. Aren't we past that bootstrap phase of being awed when we see millions of triples being produced? I thought we'd moved towards being more focussed on quality and utility of data than sheer numbers? Besides, for me the most interesting datasets are those that are continually changing as they reflect the real world and I'd like to see us work towards metrics for freshness and coverage. On Sun, Mar 6, 2011 at 11:20 AM, Tim Berners-Lee ti...@w3.org wrote: Maybe the count of triples should be special-cased in the sparql server code, spotted on input and the store size returned. if it is reasonable for the endpoint to keep track of the size of its store. (Do they anyway?) Tim On 2011-03 -05, at 11:58, Bill Roberts wrote: Thanks Hugh - as someone running a couple of SPARQL endpoints, I'd certainly prefer if people don't run a global count too often (or at all). It is indeed something that makes typical SPARQL implementations work very hard. But it's a good reminder we should provide an alternative and i'll look into providing triple counts in voiD. Bill On 5 Mar 2011, at 15:14, Hugh Glaser wrote: Hi, On 5 Mar 2011, at 14:22, Andrea Splendiani wrote: Hi, I think it depends on the store, I've tried some (from the endpoint list) and some returns a answer pretty quickly. Some doesn't and some doesn't support count. However, one could have this information only for the stores that answers the count query, no need to try all time. I am happy for a store implementor or owner to disagree, but I find it very unlikely that the owner of a store with a decent chunk of data ( 1M triples, say) would be happy for someone to keep issuing such a query, even if they did decide to give enough resources to execute it. I would quickly blacklist such a site. VoID: is this a good query: select * where {?s http://rdfs.org/ns/void#numberOfTriples ?o } I'm no SPARQL or voiD guru, but I think you need a bit more wrapping in the scovo stuff, so more like: SELECT DISTINCT ?endpoint ?uri ?triples ?uris WHERE { ?ds a void:Dataset . ?ds void:sparqlEndpoint ?uri . ?ds rdfs:label ?endpoint . ?ds void:statItem [ scovo:dimension void:numberOfTriples ; rdf:value ?triples ] . } Try it at http://kwijibo.talis.com/voiD/ or http://void.rkbexplorer.com/ I guess Pierre-Yves might like to enhance his page by querying a voiD store to also give basic stats. Or someone might like to do a store reporter that uses (a) voiD endpoint(s) plus Pierre-Yves's data (he has a SPARQL endpoint), to do so. And maybe the CKAN endpoint would have extra useful data as well. A real Semantic Web application that queried more than one SPARQL endpoint - now that would be a novelty! Fancy the challenge, it is the weekend?! :-) ciao Hugh it doesn't seem viable if so. ciao, Andrea Il giorno 05/mar/2011, alle ore 13.49, Hugh Glaser ha scritto: NIce idea, but,... :-) SELECT (count(*) as ?c) WHERE {?s ?p ?o} is a pretty anti-social thing to do to a store. At best, a store of any size will spend a while thinking, and then quite rightly decide they have burnt enough resources, and return some sort of error. For a properly maintained site, of course, the VoiD description will give lots of similar information. Best Hugh On 5 Mar 2011, at 13:06, Andrea Splendiani wrote: Hi, very nice! I have a small suggestion: why
Re: The truth about SPARQL Endpoint availability
Thanks Richard, that's really useful. I'm hoping to be talking to lots of people this year who are thinking of dipping their toe in the water, and it's really helpful to have some clear soundbites of why you should bother to do things, rather than appeal to people's better nature. Richard Cyganiak wrote: On 6 Mar 2011, at 12:16, Christopher Gutteridge wrote: Talk of how many triples are in a store puts me in mind of this quote Measuring programming progress by lines of code is like measuring aircraft building progress by weight. Well, but you know that quality on the Web of Data is measured in million triples! ;-) Jokes aside, as long as triple store performance is a frequent limiting factor, triple counts are important. “We can't load that dataset, it would be another 200MT, this would kill our store” “Their dataset is only 100kT, so how come their endpoint is so slow?” “Well if you have a million triples then you should be ok with any of the major stores on the hardware you already have.” “Given the load rate we typically get on our store, loading this dataset should take till tuesday.” “Wow, this new dataset increases the total number of triples in the LOD Cloud by 3%!” You might object to some, but surely not all, of these uses of triple counts. there's very few webmasters out there willing to do extra work just so we can make pretty graphs. Aside: As a maker of pretty graphs, I can tell you that you would be surprised. Enjoy your Sunday! Richard Ian Davis wrote: Is the number of triples that important? With all respect to the people on this list, I think there's a tendency to obsess over triple counts. Aren't we past that bootstrap phase of being awed when we see millions of triples being produced? I thought we'd moved towards being more focussed on quality and utility of data than sheer numbers? Besides, for me the most interesting datasets are those that are continually changing as they reflect the real world and I'd like to see us work towards metrics for freshness and coverage. On Sun, Mar 6, 2011 at 11:20 AM, Tim Berners-Lee ti...@w3.org wrote: Maybe the count of triples should be special-cased in the sparql server code, spotted on input and the store size returned. if it is reasonable for the endpoint to keep track of the size of its store. (Do they anyway?) Tim On 2011-03 -05, at 11:58, Bill Roberts wrote: Thanks Hugh - as someone running a couple of SPARQL endpoints, I'd certainly prefer if people don't run a global count too often (or at all). It is indeed something that makes typical SPARQL implementations work very hard. But it's a good reminder we should provide an alternative and i'll look into providing triple counts in voiD. Bill On 5 Mar 2011, at 15:14, Hugh Glaser wrote: Hi, On 5 Mar 2011, at 14:22, Andrea Splendiani wrote: Hi, I think it depends on the store, I've tried some (from the endpoint list) and some returns a answer pretty quickly. Some doesn't and some doesn't support count. However, one could have this information only for the stores that answers the count query, no need to try all time. I am happy for a store implementor or owner to disagree, but I find it very unlikely that the owner of a store with a decent chunk of data ( 1M triples, say) would be happy for someone to keep issuing such a query, even if they did decide to give enough resources to execute it. I would quickly blacklist such a site. VoID: is this a good query: select * where {?s http://rdfs.org/ns/void#numberOfTriples ?o } I'm no SPARQL or voiD guru, but I think you need a bit more wrapping in the scovo stuff, so more like: SELECT DISTINCT ?endpoint ?uri ?triples ?uris WHERE { ?ds a void:Dataset . ?ds void:sparqlEndpoint ?uri . ?ds rdfs:label ?endpoint . ?ds void:statItem [ scovo:dimension void:numberOfTriples ; rdf:value ?triples ] . } Try it at http://kwijibo.talis.com/voiD/ or http://void.rkbexplorer.com/ I guess Pierre-Yves might like to enhance his page by querying a voiD store to also give basic stats. Or someone might like to do a store reporter that uses (a) voiD endpoint(s) plus Pierre-Yves's data (he has a SPARQL endpoint), to do so. And maybe the CKAN endpoint would have extra useful data as well. A real Semantic Web application that queried more than one SPARQL endpoint - now that would be a novelty! Fancy the challenge, it is the weekend?! :-) ciao Hugh it doesn't seem viable if so. ciao, Andrea Il giorno 05/mar/2011, alle ore 13.49, Hugh Glaser ha scritto: NIce idea, but,... :-) SELECT (count(*) as ?c) WHERE {?s ?p ?o} is a pretty anti-social thing to do to a store. At best, a store of any size will spend a while thinking, and then
Re: The truth about SPARQL Endpoint availability
On 3/6/11 7:56 AM, Richard Cyganiak wrote: On 6 Mar 2011, at 12:16, Christopher Gutteridge wrote: Talk of how many triples are in a store puts me in mind of this quote Measuring programming progress by lines of code is like measuring aircraft building progress by weight. Well, but you know that quality on the Web of Data is measured in million triples! ;-) Jokes aside, as long as triple store performance is a frequent limiting factor, triple counts are important. “We can't load that dataset, it would be another 200MT, this would kill our store” “Their dataset is only 100kT, so how come their endpoint is so slow?” “Well if you have a million triples then you should be ok with any of the major stores on the hardware you already have.” “Given the load rate we typically get on our store, loading this dataset should take till tuesday.” “Wow, this new dataset increases the total number of triples in the LOD Cloud by 3%!” You might object to some, but surely not all, of these uses of triple counts. there's very few webmasters out there willing to do extra work just so we can make pretty graphs. Aside: As a maker of pretty graphs, I can tell you that you would be surprised. Enjoy your Sunday! Richard In addition to the above, smart SPARQL-FED [1] isn't achievable without good stats about SPARQL endpoints. Locality aware cost optimization is very dependent on metadata [2] gleaned from remote data sources associated with a SPARQL endpoint. What's good for SQL is well and truly good for SPARQL re. data virtualization, assuming Triple/Quad stores are a sub-category of DBMS. We can leverage voID when making SPARQL endpoint description metadata. It's actually very important from a pragmatic view point, especially if we truly believe in the crystallization of the Web as a Global Data Space. I don't expect users or Web developers to write SPARQL-FED, but I do expect them to assume and/or demand the Linked Data experience that SPARQL-FED, SPARQL Endpoint Metadata, and voID facilitate. Links: 1. http://www.w3.org/TR/sparql-features/#Basic_federated_query - SPARQL-FED 2. http://www.w3.org/TR/sparql-features/#Service_description -- SPARQL endpoint metadata. Kingsley Ian Davis wrote: Is the number of triples that important? With all respect to the people on this list, I think there's a tendency to obsess over triple counts. Aren't we past that bootstrap phase of being awed when we see millions of triples being produced? I thought we'd moved towards being more focussed on quality and utility of data than sheer numbers? Besides, for me the most interesting datasets are those that are continually changing as they reflect the real world and I'd like to see us work towards metrics for freshness and coverage. On Sun, Mar 6, 2011 at 11:20 AM, Tim Berners-Lee ti...@w3.org wrote: Maybe the count of triples should be special-cased in the sparql server code, spotted on input and the store size returned. if it is reasonable for the endpoint to keep track of the size of its store. (Do they anyway?) Tim On 2011-03 -05, at 11:58, Bill Roberts wrote: Thanks Hugh - as someone running a couple of SPARQL endpoints, I'd certainly prefer if people don't run a global count too often (or at all). It is indeed something that makes typical SPARQL implementations work very hard. But it's a good reminder we should provide an alternative and i'll look into providing triple counts in voiD. Bill On 5 Mar 2011, at 15:14, Hugh Glaser wrote: Hi, On 5 Mar 2011, at 14:22, Andrea Splendiani wrote: Hi, I think it depends on the store, I've tried some (from the endpoint list) and some returns a answer pretty quickly. Some doesn't and some doesn't support count. However, one could have this information only for the stores that answers the count query, no need to try all time. I am happy for a store implementor or owner to disagree, but I find it very unlikely that the owner of a store with a decent chunk of data ( 1M triples, say) would be happy for someone to keep issuing such a query, even if they did decide to give enough resources to execute it. I would quickly blacklist such a site. VoID: is this a good query: select * where {?s http://rdfs.org/ns/void#numberOfTriples ?o } I'm no SPARQL or voiD guru, but I think you need a bit more wrapping in the scovo stuff, so more like: SELECT DISTINCT ?endpoint ?uri ?triples ?uris WHERE { ?ds a void:Dataset . ?ds void:sparqlEndpoint ?uri . ?ds rdfs:label ?endpoint . ?ds void:statItem [ scovo:dimension void:numberOfTriples ; rdf:value ?triples ] . } Try it at http://kwijibo.talis.com/voiD/ or http://void.rkbexplorer.com/ I guess Pierre-Yves might like to enhance his page by querying a voiD store to also give basic stats. Or someone might like to do a store reporter that uses (a) voiD endpoint(s) plus Pierre-Yves's data (he has a SPARQL endpoint), to do so.
Re: The truth about SPARQL Endpoint availability
Pierre-Yves, Peter, On 2 Mar 2011, at 16:59, Pierre-Yves Vandenbussche wrote: If someone from CKAN could answer these questions: Is it normal in CKAN to have more than one endpoint ? (I think so) No, http://ckan.net/package/geospecies is the only CKAN record where two endpoints were entered. All other packages have zero or one. Is it possible in CKAN endpoint to differentiate between main and alternative endpoint ? The question of alternative endpoints hasn't really come up yet. For geospecies someone just added it ad hoc. Actually, the LOD Cloud Cache (hope Hugh isn't listening!) has its own CKAN package: http://ckan.net/package/lod-cloud-cache Its SPARQL endpoint is already listed there. So I took the liberty to remove it from the GeoSpecies package. At some point we should explore making the list of packages mirrored in the LOD Cache explicit on CKAN. Best, Richard