Re: AW: ANN: LOD Cloud - Statistics and compliance with best practices
I usually dislike to comment on such discussions, as I don't find them particularly productive, but 1) since the number of people pointing me to this thread is growing, 2) it contains some wrong statements, and 3) I feel that this thread has been hijacked from a topic that I consider productive and important, I hope you won't mind me giving a comment. I wanted to keep it brief, but I failed. Let's start with the wrong statements: First, although I take responsibility as a co-creator for Linked Open Numbers, I surely cannot take full credit for it. The dataset was a shared effort by a number of people in Karlsruhe over a few days, and thus calling the whole thing Denny's numbers dataset is simply wrong due to the effort spent by my colleagues on it. It is fine to call it Karlsruhe's numbers dataset or simply Linked Open Numbers, but providing me with the sole attribution is too much of an honor. Second, although it is claimed that Linked Open Numbers are by design and known to everybody in the core community, not data but noise, being one of the co-designers of the system I have to disagree. It is noise by design. One of my motivations for LON was to raise a few points for discussion, and at the same time provide with a dataset fully adhering to Linked Open Data principles. We were obviously able to get the first goal right, and we didn't do too bad on the second, even though we got an interesting list of bugs by Richard Cyganiak, which, pitily, we still did not fix. I am very sorry for that. But, to make the point very clear again, this dataset was designed to follow LOD principles as good as possible, to be correct, and to have an implementation that is so simple that we are usually up, so anyone can use LON as a testing ground. Due to a number of mails and personal communications I know that LON has been used in that sense, and some developers even found it useful for other features, like our provision of number names in several languages. So, what is called noise by design here, is actually an actively used dataset, that managed to raise, as we have hoped, discussions about the point of counting triples, was a factor in the discussion about literals as subjects, made us rethink the notion of semantics and computational properties of RDF entities in a different way, and is involved in the discussion about quality of LOD. With respect to that, in my opinion, LON has achieved and exceeded its expectations, but I understand anyone who disagrees. Besides that, it was, and is, huge fun. Now to some topics of the discussion: On the issue of the LOD cloud diagram. I want to express my gratitude to all the people involved, for the effort they voluntarily put in its development and maintenance. I find it especially great, that it is becoming increasingly transparent how the diagram is created and how the datasets are selected. Chris has refered to a set of conditions that are expected for inclusion, and before the creation of the newest iteration there was an explicit call on this mailing list to gather more information. I can only echo the sentiment that if someone is unhappy with that diagram, they are free to create their own and put it online. The data is available, the SVG is available and editable, and they use licenses that allow the modification and republishing. Enrico is right that a system like Watson (or Sindice), that automatically gathers datasets from the Web instead of using a manually submitted and managed catalog, will probably turn out to be the better approach. Watson used to have an overview with statistics on its current content, and I really loved that overview, but this feature has been disabled since a few months. If it was available, especially in any graphical format that can be easily reused in slides -- for example, graphs on the growth of number of triples, datasets, etc., graphs on the change of cohesion, vocabulary reuse, etc. over time, within the Watson corpus -- I have no doubts that such graphs and data would be widely reused, and would in many instances replace the current usage of the cloud diagram. (I am furthermore curious about Enrico's statement that the Semantic Web =/= Linked Open Data and wonder about what he means here, but that is a completely different thread). Finally, to what I consider most important in this thread: I also find it a shame, that this thread has been hijacked, especially since the original topic was so interesting. The original email by Anja was not about the LOD cloud, but rather about -- as the title of the thread still suggests -- the compliance of LOD with some best practices. Instead of the question is X in the diagram, I would much rather see a discussion on are the selected quality criteria good criteria? why are some of them so little followed? how can we improve the situation? Anja has pointed to a wealth of openly available numbers (no pun intended), that have not been
Re: AW: ANN: LOD Cloud - Statistics and compliance with best practices
On Oct 21, 2010, at 23:43, Denny Vrandecic wrote: Second, although it is claimed that Linked Open Numbers are by design and known to everybody in the core community, not data but noise, being one of the co-designers of the system I have to disagree. It is noise by design. Even though I reread my message before sending, I missed the quite relevant not in the second sentence. It should read It is not noise by design. :P Cheers, Denny :)
Re: Ontos links to LOD
Hugh, I'm also not a cURL specialist, but I assume the problem is the # in our GUID. Unfortunately, due to historical reasons, we have the hash sign in there and it is a real part of the GUID, not a fragment identifier. You would have to url-encode our GUID or replace # with %23 manually in order to get it to run properly. Try to do this in http://www.sameas.org/?uri=http://dbpedia.org/resource/Berners-Lee for the ontosearch.com-URL - it will work. Concerning your question concerning additional sameAs data, wait for my answer to Alex. Best, Christian
AW: Ontos links to LOD
Hi Alex, Hugh, You can use our API to query the db using a DBpedia (or Freebase etc.) identifier. Here an example (for Barack Obama): http://news.ontos.com/api/ontology?query={get:attrents,offset:0,limit :30,typeFilter:http://www.ontosearch.com/2008/02/ontosminer-ns/domain/co mmon/english#Person, attrnames:[http://www.ontosearch.com/2008/02/ontosminer-ns/domain/common/ english/dbpedia#sameAs],attrvals:[http://dbpedia.org/resource/Barack_Oba ma]} Just now you have to know the Ontos type for doing this, i. e. you have to match DBpedia types to Ontos types by yourself. We are planning to extend this functionality in order to make this more simple. You can of course use our API to make queries by object name or fulltext and use resulting objects as starting points for exploring the db, i. e. to find additional ones. Also see Kingsley's hint. Regards, Christian -Ursprüngliche Nachricht- Von: Alexandre Passant [mailto:alexandre.pass...@deri.org] Gesendet: Donnerstag, 21. Oktober 2010 15:27 An: Christian Ehrlich Cc: public-lod@w3.org Betreff: Re: Ontos links to LOD Hi Christian, On 20 Oct 2010, at 10:33, Christian Ehrlich wrote: Dear all, Please note that Ontos is about to integrate its news portal [http://news.ontos.com] into the Linked Data Cloud. Ontos' GUIDs for objects are now dereferencable - the resulting RDF contains owl:sameAs- attributes to DBpedia, Freebase and others (check e. g. the entry for Barack Obama [http://www.ontosearch.com/2008/01/rdf/EID- 2e70185c38e929aa90049982de43414c] ). Within the portal Ontos crawls news articles from diverse online sources, uses its cutting-edge NLP technology to extract facts (objects and relations between them), merges these information with existing ones and stores them including respective references to the original news article - all of this fully automatically. Facts from Ontos' portal are accessible via a RESTful HTTP API. Fetching data is free - in order to receive an API key, developers have to register (e-mail address only!) at Ontos' homepage [http://www.ontos.com]. Is there a way to query the API with a DBpedia identifier rather than an Ontos one ? Or at least, is there somewhere a DBpedia 2 Ontos service (or a SPARQL endpoint where I can get that information) ? Thanks Alex. For humans Ontos provides a search interface at http://www.ontosearch.com. It allows to look-up objects in the database and viewing respective summaries in HTML or RDF. The generated RDF does currently contain a small part of existing information (e. g. no article references yet) and owl:sameAs is only supported for Persons and Organizations. Ontos will extend the respective content step-by-step. Any tests with our API as well as comments are highly appreciated. Regards, Christian -- Christian Ehrlich Ontos AG Telefon: +49 341 21559-10 Telefax: +49 341 21559-11 Mobil: +49 173 8745000 christian.ehrl...@ontos.com http://www.ontos.com -- Dr. Alexandre Passant Digital Enterprise Research Institute National University of Ireland, Galway :me owl:sameAs http://apassant.net/alex .
AW: AW: ANN: LOD Cloud - Statistics and compliance with best practices
Hi Denny, thank you for your smart and insightful comments. I also find it a shame, that this thread has been hijacked, especially since the original topic was so interesting. The original email by Anja was not about the LOD cloud, but rather about -- as the title of the thread still suggests -- the compliance of LOD with some best practices. Instead of the question is X in the diagram, I would much rather see a discussion on are the selected quality criteria good criteria? why are some of them so little followed? how can we improve the situation? Absolutely. Opening up the discussion on these topics is exactly the reason why we compiled the statistics. In order to guide the discussion back to this topic, maybe it is useful to repost the original link: http://www4.wiwiss.fu-berlin.de/lodcloud/state/ A quick initial comment concerning the term quality criteria. I think it is essential to distinguish between: 1. The quality of the way data is published, meaning to which extend the publishers comply with best practices (a possible set of best practices is listed in the document) 2. The quality of the data itself. I think Enrico's comment was going into this direction. The Web of documents is an open system built on people agreeing on standards and best practices. Open system means in this context that everybody can publish content and that there are no restrictions on the quality of the content. This is in my opinion one of the central facts that made the Web successful. The same is true for the Web of Data. There obviously cannot be any restrictions on what people can/should publish (including, different opinions on a topic, but also including pure SPAM). As on the classic Web, it is a job of the information/data consumer to figure out which data it wants to believe and use (definition of information quality = usefulness of information, which is a subjective thing). Thus it also does not make sense to discuss the objective quality of the data that should be included into the LOD cloud (objective quality just does not exist) and it makes much more sense to discuss the mayor issues that we are still having in regard to the compliance with publishing best practices. Anja has pointed to a wealth of openly available numbers (no pun intended), that have not been discussed at all. For example, only 7.5% of the data source provide a mapping of proprietary vocabulary terms to other vocabulary terms. For anyone building applications to work with LOD, this is a real problem. Yes, this is also the figure that scared me most. but in order to figure out what really needs to be done, and how the criteria for good data on the Semantic Web need to look like, we need to get back to Anja's original questions. I think that is a question we may try to tackle in Shanghai in some form, I at least would find that an interesting topic. Same with me. Shanghai was also the reason for the timing of the post. Cheers, Chris -Ursprüngliche Nachricht- Von: semantic-web-requ...@w3.org [mailto:semantic-web- requ...@w3.org] Im Auftrag von Denny Vrandecic Gesendet: Freitag, 22. Oktober 2010 08:44 An: Martin Hepp Cc: Kingsley Idehen; public-lod; Enrico Motta; Chris Bizer; Thomas Steiner; Semantic Web; Anja Jentzsch; semanticweb; Giovanni Tummarello; Mathieu d'Aquin Betreff: Re: AW: ANN: LOD Cloud - Statistics and compliance with best practices I usually dislike to comment on such discussions, as I don't find them particularly productive, but 1) since the number of people pointing me to this thread is growing, 2) it contains some wrong statements, and 3) I feel that this thread has been hijacked from a topic that I consider productive and important, I hope you won't mind me giving a comment. I wanted to keep it brief, but I failed. Let's start with the wrong statements: First, although I take responsibility as a co-creator for Linked Open Numbers, I surely cannot take full credit for it. The dataset was a shared effort by a number of people in Karlsruhe over a few days, and thus calling the whole thing Denny's numbers dataset is simply wrong due to the effort spent by my colleagues on it. It is fine to call it Karlsruhe's numbers dataset or simply Linked Open Numbers, but providing me with the sole attribution is too much of an honor. Second, although it is claimed that Linked Open Numbers are by design and known to everybody in the core community, not data but noise, being one of the co-designers of the system I have to disagree. It is noise by design. One of my motivations for LON was to raise a few points for discussion, and at the same time provide with a dataset fully adhering to Linked Open Data principles. We were obviously able to get the first goal right, and we didn't do too bad on the second, even though we got an interesting list of bugs by Richard Cyganiak, which, pitily, we still did not fix. I am very sorry for that. But, to make the point very
Re: AW: ANN: LOD Cloud - Statistics and compliance with best practices
The Web of documents is an open system built on people agreeing on standards and best practices. Open system means in this context that everybody can publish content and that there are no restrictions on the quality of the content. This is in my opinion one of the central facts that made the Web successful. +100 The same is true for the Web of Data. There obviously cannot be any restrictions on what people can/should publish (including, different opinions on a topic, but also including pure SPAM). As on the classic Web, it is a job of the information/data consumer to figure out which data it wants to believe and use (definition of information quality = usefulness of information, which is a subjective thing). +100 The fact that there is obviously a lot of low quality data on the current Web should not encourage us to publish masses of low-quality data and then celebrate ourselves for having achieved a lot. The current Web tolerates buggy markup, broken links, and questionable content of all types. But I hope everybody agrees that the Web is successful because of this tolerance, not because of the buggy content itself. Quite to the contrary, the Web has been broadly adopted because of the lots of commonly agreed high-quality contents. If you continue to live the linked data landfill style it will fall back on you, reputation-wise, funding-wise, and career-wise. Some rules hold in ecosystems of all kinds and sizes. Best Martin
AW: AW: ANN: LOD Cloud - Statistics and compliance with best practices
Hi Martin, The fact that there is obviously a lot of low quality data on the current Web should not encourage us to publish masses of low-quality data and then celebrate ourselves for having achieved a lot. The current Web tolerates buggy markup, broken links, and questionable content of all types. But I hope everybody agrees that the Web is successful because of this tolerance, not because of the buggy content itself. Quite to the contrary, the Web has been broadly adopted because of the lots of commonly agreed high-quality contents. Sure, where is the problem? The same holds for the Web of Data: There is a lot of high quality content and a lot of low quality content. Which means - as on the classic Web - that the data consumer need to decide which content it wants to use. If the Web has proved anything than that having a completely open architecture is a crucial factor for being able to succeed on global scale. The Web of Linked Data also aims at global scale. Thus, I will keep on betting on open solutions without curation or any other bottle neck. If you continue to live the linked data landfill style it will fall back on you, reputation-wise, funding-wise, and career-wise. Some rules hold in ecosystems of all kinds and sizes. Sorry, you are leaving the grounds of scientific discussion here and I will thus not comment. Best, Chris Best Martin
Low Quality Data (was before Re: AW: ANN: LOD Cloud - Statistics and compliance with best practices)
Martin and all, Can somebody point me to papers or maybe give their definition of low quality data when it comes to LOD. What is the criteria for data to be considered low quality. Thanks Juan Sequeda +1-575-SEQ-UEDA www.juansequeda.com On Fri, Oct 22, 2010 at 9:01 AM, Martin Hepp martin.h...@ebusiness-unibw.org wrote: The Web of documents is an open system built on people agreeing on standards and best practices. Open system means in this context that everybody can publish content and that there are no restrictions on the quality of the content. This is in my opinion one of the central facts that made the Web successful. +100 The same is true for the Web of Data. There obviously cannot be any restrictions on what people can/should publish (including, different opinions on a topic, but also including pure SPAM). As on the classic Web, it is a job of the information/data consumer to figure out which data it wants to believe and use (definition of information quality = usefulness of information, which is a subjective thing). +100 The fact that there is obviously a lot of low quality data on the current Web should not encourage us to publish masses of low-quality data and then celebrate ourselves for having achieved a lot. The current Web tolerates buggy markup, broken links, and questionable content of all types. But I hope everybody agrees that the Web is successful because of this tolerance, not because of the buggy content itself. Quite to the contrary, the Web has been broadly adopted because of the lots of commonly agreed high-quality contents. If you continue to live the linked data landfill style it will fall back on you, reputation-wise, funding-wise, and career-wise. Some rules hold in ecosystems of all kinds and sizes. Best Martin
Re: Low Quality Data (was before Re: AW: ANN: LOD Cloud - Statistics and compliance with best practices)
Hi, On 22 October 2010 15:47, Juan Sequeda juanfeder...@gmail.com wrote: Martin and all, Can somebody point me to papers or maybe give their definition of low quality data when it comes to LOD. What is the criteria for data to be considered low quality. I asked this in the context of Linked Data on semantic overflow: http://www.semanticoverflow.com/questions/1072/quality-indicators-for-linked-data-datasets Some good discussion and pointers in there. Cheers, L. -- Leigh Dodds Programme Manager, Talis Platform Talis leigh.do...@talis.com http://www.talis.com
Re: Low Quality Data (was before Re: AW: ANN: LOD Cloud - Statistics and compliance with best practices)
Hi Juan, Martin and all, Can somebody point me to papers or maybe give their definition of low quality data when it comes to LOD. What is the criteria for data to be considered low quality. An overview about the literature on data quality can be found in my PhD, including the different definitions of the term and the like . See: http://www.diss.fu-berlin.de/diss/servlets/MCRFileNodeServlet/FUDISS_derivat e_2736/02_Chapter2-Information-Quality.pdf?hosts= also http://www.diss.fu-berlin.de/2007/217/indexe.html All this is from 2008. Thus, I guess there will also be newer stuff around, but the text should properly reflect the state-of-the-art back then. Cheers, Chris Thanks Juan Sequeda +1-575-SEQ-UEDA www.juansequeda.com On Fri, Oct 22, 2010 at 9:01 AM, Martin Hepp martin.h...@ebusiness-unibw.org wrote: The Web of documents is an open system built on people agreeing on standards and best practices. Open system means in this context that everybody can publish content and that there are no restrictions on the quality of the content. This is in my opinion one of the central facts that made the Web successful. +100 The same is true for the Web of Data. There obviously cannot be any restrictions on what people can/should publish (including, different opinions on a topic, but also including pure SPAM). As on the classic Web, it is a job of the information/data consumer to figure out which data it wants to believe and use (definition of information quality = usefulness of information, which is a subjective thing). +100 The fact that there is obviously a lot of low quality data on the current Web should not encourage us to publish masses of low-quality data and then celebrate ourselves for having achieved a lot. The current Web tolerates buggy markup, broken links, and questionable content of all types. But I hope everybody agrees that the Web is successful because of this tolerance, not because of the buggy content itself. Quite to the contrary, the Web has been broadly adopted because of the lots of commonly agreed high-quality contents. If you continue to live the linked data landfill style it will fall back on you, reputation-wise, funding-wise, and career-wise. Some rules hold in ecosystems of all kinds and sizes. Best Martin
Types of Data Source on the LOD Cloud
Hi, The LOD cloud analysis [1] is a really great piece of work. I wanted to pick up on one aspect of the analysis for further discussion: whether data is published by the data owner or a third-party. It seems to me that there are broadly three categories into which a dataset might fall: * Primary -- published and maintained directly by the data owner, e.g. BBC * Secondary -- published and maintained by a third-party, e.g. by scraping, wrapping or otherwise converting a data source * Tertiary -- published and maintained by a third-party, usually a mirror or aggregation of primary/secondary sources. This might be a direct mirror, or involve some additional creativity, e.g. re-modelling some aspects of another dataset. Mirrors typically provide additional services, e.g. a SPARQL endpoint where primary source doesn't provide one. If we consider the different categories we can see that: * Growth of the web of data is best served by encouraging more Primary sources. The current community can't scale to add more Secondary sources, so adoption is best driven by data owners * Sustainability and usage of Linked Data is best served by encouraging more Tertiary sources. Availability of useful, current aggregations of data, wrapped in services will help drive more consumption. What do others think? Cheers, L. [1]. http://www4.wiwiss.fu-berlin.de/lodcloud/state/ -- Leigh Dodds Programme Manager, Talis Platform Talis leigh.do...@talis.com http://www.talis.com
Schema Mappings (was Re: AW: ANN: LOD Cloud - Statistics and compliance with best practices)
Hi, On 22 October 2010 09:35, Chris Bizer ch...@bizer.de wrote: Anja has pointed to a wealth of openly available numbers (no pun intended), that have not been discussed at all. For example, only 7.5% of the data source provide a mapping of proprietary vocabulary terms to other vocabulary terms. For anyone building applications to work with LOD, this is a real problem. Yes, this is also the figure that scared me most. This might be low for a good reason: people may be creating proprietary terms because they don't feel well served by existing vocabularies and hence defining mappings (or even just reusing terms) may be difficult or even impossible. This also strikes me as an opportunity: someone could usefully build a service (perhaps built on facilities in Sindice) that aggregated schema information and provides tools for expressing simple mappings and equivalencies. It could fill a dual role: recommend more common/preferred terms, whilst simultaneously providing machine-readable equivalencies. I know that Uberblic provides some mapping tools in this area, allowing for the creation of a more normalized view across the web, but not sure how much of that is resurfaced. Cheers, L. -- Leigh Dodds Programme Manager, Talis Platform Talis leigh.do...@talis.com http://www.talis.com
Concordance, Reconciliation, and shared identifiers
Hi, The announcement of that the Guardian has begun cataloguing other identifiers (e.g. ISBN, Musicbrainz) within its API [1] is a nice illustration that the importance of cross-linking between datasets is starting to become more generally accepted. Setting aside the debate about what constitutes linked data, I think its important that this community tracks these various initiatives to help explore the trade-offs between different approaches, as well as to build bridges with the wider developer community. A great project would be for someone to produce a Linked Data wrapper for the Guardian API, that allows linking *in* to their data, based on ISBNs and MusicBrainz ids. Its on my TODO list, but then so is a lot of other stuff ;) If we look back a few months we can see signs of the importance of cross-linking appearing in other projects. Google Refine (nee Freebase Gridworks) has the notion of a reconcilication service that is used to build and set links [2]. Yahoo meanwhile have their concordance service [3, 4] which is basically a sameAs.org service for building cross-links between geo data. Again, it would be interesting to build bridges between different communities by showing how one can achieve the same effects with Linked Data, as well as integrating Linked Data into those services by providing gateways services, e.g. implementing the same API but backed by RDF. This is what I did for the Gridworks, but the same could be extended to other services. Cheers, L. [1]. http://www.guardian.co.uk/open-platform/blog/linked-data-open-platform [2]. http://www.ldodds.com/blog/2010/08/gridworks-reconciliation-api-implementation/ [3]. http://blog.programmableweb.com/2010/04/05/yahoos-new-geo-concordance-a-geographic-rosetta-stone/ [4]. http://developer.yahoo.com/geo/geoplanet/guide/api-reference.html#api-concordance -- Leigh Dodds Programme Manager, Talis Platform Talis leigh.do...@talis.com http://www.talis.com
Re: AW: AW: ANN: LOD Cloud - Statistics and compliance with best practices
I happen to agree with Martin here. My concern is that the naïveté of most of the research in LOD creates the illusion that data integration is an easily solvable problem -- while it is well known that it is the most important open problem in the database community (30+ years of research) where there is a huge amount of money, research, and resources invested in it. This will eventually fire back to us - the whole community including me - since people will not trust us anymore. Specifically, you can't deny that in practice the mythical picture gives this illusion; otherwise, why have it? cheers --e. On 22 Oct 2010, at 16:49, Chris Bizer wrote: Hi Martin, The fact that there is obviously a lot of low quality data on the current Web should not encourage us to publish masses of low-quality data and then celebrate ourselves for having achieved a lot. The current Web tolerates buggy markup, broken links, and questionable content of all types. But I hope everybody agrees that the Web is successful because of this tolerance, not because of the buggy content itself. Quite to the contrary, the Web has been broadly adopted because of the lots of commonly agreed high-quality contents. Sure, where is the problem? The same holds for the Web of Data: There is a lot of high quality content and a lot of low quality content. Which means - as on the classic Web - that the data consumer need to decide which content it wants to use. If the Web has proved anything than that having a completely open architecture is a crucial factor for being able to succeed on global scale. The Web of Linked Data also aims at global scale. Thus, I will keep on betting on open solutions without curation or any other bottle neck. If you continue to live the linked data landfill style it will fall back on you, reputation-wise, funding-wise, and career-wise. Some rules hold in ecosystems of all kinds and sizes. Sorry, you are leaving the grounds of scientific discussion here and I will thus not comment. Best, Chris Best Martin
Re: Correct Usage of rdfs:idDefinedBy in Vocabulary Specifications with a Hash-based URI Pattern
On Oct 21, 2010, at 10:05 AM, KangHao Lu (Kenny) wrote: Hello Martin, I don't think my argument would be very logical, but we can't wait for rule engines to discuss this. Note, however, the majority of the Web vocabularies use the same URI for the entity name reference and the descriptor reference, see the link provided by Michael Hausenblas: http://code.google.com/p/void-impl/issues/detail?id=45 and in particular the little survey by Richard Cyganiak posted on that page. I personally would argue that in the case of ontologies / vocabularies, the conceptual difference between the entity and the descriptor is a lot less significant than when it comes to data, since an ontology is, by definition, a specification, i.e. a document. Basically I like this approach, that is, I don't like the fact that some ontologies have '#' as end character and there should not a URI for an ontology document and a different URI for the *conceptual* ontology. IIRC, 3 years ago Tim was very shocked by those ontologies that have '#' as end charter and claimed that this is not a good idea (and he would bring up this issue at TAG or awwaw, I can't remember). The argument was that string after '#' has the meaning of 'local identifier' (so that we use #I #i for WebIDs because 'I' is a 'local identifier') and identifiers can't be empty strings (or this might break some systems, I guess). I somehow agree with that, and Toby's use of my: to identify an Ontology makes me a little bit uncomfortable. I have no idea if there's any followup after Tim brought this to TAG or awwaw. I have another argument, namely, you should distinguish the concept from the document only if the following criterion is satisfied. - if the time when the thing with hash URI is created and the time when the document is created have *clear* difference So this holds for people, so people should not use document URIs. This holds for organizations, cause you create the website of an organization maybe some years after the organization is founded. The problem is 'ontology'. I don't know whether you should call the structure an ontology or it became an ontology once it is written down, but I don't think the difference of the timing is very *clear*. I agree its not as clear as the other cases, but an argument for making the distinction would be that the same ontology can be encoded by different documents. For example, an OWL/RDF ontology *is* an RDF graph, but that graph can be represented/encoded/choose your word ... in a variety of different documents with different syntax rules. This is an old and familiar distinction, really, between type and token: one work called Moby Dick, many copies of it; one third letter of the alphabet, as many copies of the C character as you can shake a stick at, and so on. A similar example is when you want to give a URI to a python module. I would not end it with '#' because I don't see why we need do distinguish the 'module document' from 'module'. That case is much blurrier, I agree. But imagine an algorithm implemented once in Python and elsewhere in C++. (Really, this is possible.) Same algorithm, very different documents. Ontologies are (arguably) more like algorithms than pieces of code. Pat Hayes A module is a kind of document, so is ontology. So, owl:Ontology rdfs:subClassOf foaf:Document ! Well, this is a theory. If there's a common practice of using '#'-ending URI for ontologies, maybe we should accept it. No strong opinion. Wasn't this discussed at AWWAW? Just curious. Cheers, -- Kenny WebID: http://dig.csail.mit.edu/People/kennyluck#I What is WebID: http://esw.w3.org/WebID IHMC (850)434 8903 or (650)494 3973 40 South Alcaniz St. (850)202 4416 office Pensacola(850)202 4440 fax FL 32502 (850)291 0667 mobile phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes
Re: Low Quality Data (was before Re: AW: ANN: LOD Cloud - Statistics and compliance with best practices)
On 10/22/10 10:47 AM, Juan Sequeda wrote: Martin and all, Can somebody point me to papers or maybe give their definition of low quality data when it comes to LOD. What is the criteria for data to be considered low quality. My Subjective Data Quality Factors: 1. Unambiguous Names -- Resolvable URIs based Names 2. Data Representation Format Dexterity -- HTTP + Content Negotiation which loosens the coupling between model Semantics and Data Representation 3. Platform Agnostic Data Access -- HTTP delivers this well 4. Change Sensitivity -- speaks for itself, hopefully 5. Provenance -- data about the data (metadata) that helps establish Who, What, When, Where, and ~ Why re. curation 6. Mesh Navigability -- inference context enables this .. This is why I say: look at Data like a cube of sugar. Especially when trying to fashion Linked Data oriented business models. 1-6 nullify many of the concerns about data driven business models: 1. Wholesale Imports (crawls) that reconstitute data in a new data space -- #1 allows you to brand your data, when combined with licensing it also allows you track conformance (remember, Web Architecture makes the Web sticky via http logs amongst other things, so entropy is your friend, ultimately) 2. Attribution -- ditto 3. Data Consumer Identity -- WebID will put an end to API Keys (major relics) so QoS based on quality factors #2-6 is absolutely plausible. Kingsley Thanks Juan Sequeda +1-575-SEQ-UEDA www.juansequeda.com http://www.juansequeda.com On Fri, Oct 22, 2010 at 9:01 AM, Martin Hepp martin.h...@ebusiness-unibw.org mailto:martin.h...@ebusiness-unibw.org wrote: The Web of documents is an open system built on people agreeing on standards and best practices. Open system means in this context that everybody can publish content and that there are no restrictions on the quality of the content. This is in my opinion one of the central facts that made the Web successful. +100 The same is true for the Web of Data. There obviously cannot be any restrictions on what people can/should publish (including, different opinions on a topic, but also including pure SPAM). As on the classic Web, it is a job of the information/data consumer to figure out which data it wants to believe and use (definition of information quality = usefulness of information, which is a subjective thing). +100 The fact that there is obviously a lot of low quality data on the current Web should not encourage us to publish masses of low-quality data and then celebrate ourselves for having achieved a lot. The current Web tolerates buggy markup, broken links, and questionable content of all types. But I hope everybody agrees that the Web is successful because of this tolerance, not because of the buggy content itself. Quite to the contrary, the Web has been broadly adopted because of the lots of commonly agreed high-quality contents. If you continue to live the linked data landfill style it will fall back on you, reputation-wise, funding-wise, and career-wise. Some rules hold in ecosystems of all kinds and sizes. Best Martin -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Re: Concordance, Reconciliation, and shared identifiers
On 10/22/10 11:47 AM, Leigh Dodds wrote: Hi, The announcement of that the Guardian has begun cataloguing other identifiers (e.g. ISBN, Musicbrainz) within its API [1] is a nice illustration that the importance of cross-linking between datasets is starting to become more generally accepted. Setting aside the debate about what constitutes linked data, I think its important that this community tracks these various initiatives to help explore the trade-offs between different approaches, as well as to build bridges with the wider developer community. A great project would be for someone to produce a Linked Data wrapper for the Guardian API, that allows linking *in* to their data, based on ISBNs and MusicBrainz ids. Its on my TODO list, but then so is a lot of other stuff ;) We've had sponger meta cartridges [1] for the Guardian API since its early incarnations. Anyone that uses URIBurner [2] ends up with a look-up pass through Guardians API (amongst a boat load of others) en route to the final URIBurner generated Linked Data graph. URIBurner then pings PTSW which ultimately leads to data in the LOD Cloud Cache we maintain. Sindice also does URIBurner lookups, and URIBurner also looks up Sindice (sames.org etc..) . End result, dynamic Web of Linked Data. We stopped counting its size in 2007 :-) Of course, others too should make wrappers for these APIs so that their perspectives are expressed in the burgeoning Web of Linked Data etc.. If we look back a few months we can see signs of the importance of cross-linking appearing in other projects. Google Refine (nee Freebase Gridworks) has the notion of a reconcilication service that is used to build and set links [2]. Yahoo meanwhile have their concordance service [3, 4] which is basically a sameAs.org service for building cross-links between geo data. Again, it would be interesting to build bridges between different communities by showing how one can achieve the same effects with Linked Data, as well as integrating Linked Data into those services by providing gateways services, e.g. implementing the same API but backed by RDF. This is what I did for the Gridworks, but the same could be extended to other services. On our part, we've been doing so since Linked Data inception, and will continue to do so :-) Links: 1. http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtSponger -- Sponger 2. http://uriburner.com -- Virtuoso Sponger Service . Cheers, L. [1]. http://www.guardian.co.uk/open-platform/blog/linked-data-open-platform [2]. http://www.ldodds.com/blog/2010/08/gridworks-reconciliation-api-implementation/ [3]. http://blog.programmableweb.com/2010/04/05/yahoos-new-geo-concordance-a-geographic-rosetta-stone/ [4]. http://developer.yahoo.com/geo/geoplanet/guide/api-reference.html#api-concordance -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Re: AW: ANN: LOD Cloud - Statistics and compliance with best practices
On 10/21/10 11:56 PM, Martin Hepp wrote: Hi all: I think that Enrico really made two very important points: 1. The LOD bubbles diagram has very high visibility inside and outside of the community (up to the point that broad audiences believe the diagram would define relevance or quality). True re. visibility. Subjective quality bearer, I think not :-) 2. Its creators have a special responsibility (in particular as scientists) to maintain the diagram in a way that enhances insight and understanding, rather than conveying false facts and confusing people. Methinks creators executed on a marketing plan. Personally, I enjoy the fact that people otherwise tagged as geeks have ended up demonstrated potent marketing prowess, really. So Kingsley's argument that anybody could provide a better diagram does not really hold. Uh? I said: more diagrams, each addressing a specific realm of interest (and bias) here are some examples: 1. http://linkedopencommerce.com -- you know about this one clearly 2. http://www.mkbergman.com/wp-content/themes/ai3/images/2009Posts/090212_lodd_cloud.jpg 3. http://umbel.org/images/081010_lod_constellation.png -- UMBEL (TBox oriented) 4. http://www.mquter.qut.edu.au/bio/bio2rdf.jpg -- Bio2RDF 5. http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/ClickableVirtSpongerCloud/sponger-cloud.png -- Old Dynamic Linked Data Cloud via Sponger (URIBurner in current LOD cloud) . It will harm the community as a whole, sooner or later, if the diagram misses the point, simply based on the popularity of this diagram. How does a single diagram define a community? IMHO attacking those that have made contributions that have become popular says more about problems in our community. We really have to make up our minds what we want here. Everyone is entitled to their own biases (context lenses) that's the fundamental beauty of the Web, especially the Linked Data variant that's rapidly taking shape. Until the Web stops us from projecting our individual biases, I stand by my position i.e., let a million LOD cloud variants rain :-) I strongly encourage you to make an alternative pictorial that addresses the areas that concern you. Of course, if their is language that may concern you re. the report from Chris, then that's a different matter re. potentially for deeming the LOD cloud pictorial as canonical. And to be frank, despite other design decisions, it is really ridiculous that Chris justifies the inclusion of Denny's numbers dataset as valid Linked Data, because that dataset is, by design and known to everybody in the core community, not data but noise. Do you really believe that most powerpoint viewers actually drill down this deep into the pictorial? When people ask be about the cloud i.e., what does it signify etc. My answer goes like this: Linked Data exists on the Web, and that its on a discernible exponential curve (meaning critical mass to VCs and suits). I don't make any statements about subjective quality. If people ask: how is this going to affect business models? I show them the Linked Open Commerce cloud collection, and they get it, pronto! Pronto! implying there's money to be made, but biz models remains fuzzy since Linked Data QoS factors haven't really crystallized due to basic Linked Data concept remaining mercurial to comprehend. Thus, you get your typical powerpoint effect: seed planted, hockey stick potential is sorta there, now let me go figure how to make this less-fuzzy-web-opportunity my own money making reality etc.. This is the linked data landfill mindset that I have kept on complaining about. You make it very easy for others to discard the idea of linked data as a whole. Come on! Seriously, where would Linked Data be without the LOD cloud pictorial re. mindshare acquisition? Let's just make more realm / bias specific pictorials and and associated data set analysis reports so that newly uncovered Linked Data dimensions go viral like the LOD cloud. Thus, I can only agree with you once you've taken corrective action :-) Best Martin -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Please allow JS access to Ontologies and LOD
Hi All, Currently nearly all the web of linked data is blocked from access via client side scripts (javascript) due to CORS [1] being implemented in the major browsers. Whilst this is important for all data, there are many of you reading this who have it in your power to expose huge chunks of the RDF on the web to JS clients, if you manage any of the common ontologies or anything in the LOD cloud diagram, please do take a few minutes from your day to expose the single http header needed. Long story short, to allow js clients to access our open data we need to add one small HTTP Response header which will allow HEAD/GET and POST requests - the header is: Access-Control-Allow-Origin * This is both XMLHttpRequest (W3C) and XDomainRequest (Microsoft) compatible and supported by all the major browser vendors. Instructions for common servers follow: If you're on Apache then you can send this header by simply adding the following line to a .htaccess file in the dir you want to expose (probably site-root): Header add Access-Control-Allow-Origin * For NGINX: add_header Access-Control-Allow-Origin *; see: http://wiki.nginx.org/NginxHttpHeadersModule For IIS see: http://technet.microsoft.com/en-us/library/cc753133(WS.10).aspx In PHP you add the following line before any output has been sent from the server with: header(Access-Control-Allow-Origin, *); For anything else you'll need to check the relevant docs I'm afraid. Best TIA, Nathan [1] http://dev.w3.org/2006/waf/access-control/
Re: [foaf-protocols] Please allow JS access to Ontologies and LOD
On 23 October 2010 01:04, Nathan nat...@webr3.org wrote: Hi All, Currently nearly all the web of linked data is blocked from access via client side scripts (javascript) due to CORS [1] being implemented in the major browsers. Whilst this is important for all data, there are many of you reading this who have it in your power to expose huge chunks of the RDF on the web to JS clients, if you manage any of the common ontologies or anything in the LOD cloud diagram, please do take a few minutes from your day to expose the single http header needed. Long story short, to allow js clients to access our open data we need to add one small HTTP Response header which will allow HEAD/GET and POST requests - the header is: Access-Control-Allow-Origin * This is both XMLHttpRequest (W3C) and XDomainRequest (Microsoft) compatible and supported by all the major browser vendors. Instructions for common servers follow: If you're on Apache then you can send this header by simply adding the following line to a .htaccess file in the dir you want to expose (probably site-root): Header add Access-Control-Allow-Origin * For NGINX: add_header Access-Control-Allow-Origin *; see: http://wiki.nginx.org/NginxHttpHeadersModule For IIS see: http://technet.microsoft.com/en-us/library/cc753133(WS.10).aspx In PHP you add the following line before any output has been sent from the server with: header(Access-Control-Allow-Origin, *); For anything else you'll need to check the relevant docs I'm afraid. +1 Thanks for the heads up. I added: Header add Access-Control-Allow-Origin * to my .htaccess and everything worked fine. Easy! :) Best TIA, Nathan [1] http://dev.w3.org/2006/waf/access-control/ ___ foaf-protocols mailing list foaf-protoc...@lists.foaf-project.org http://lists.foaf-project.org/mailman/listinfo/foaf-protocols
[Fwd: XRD 1.0 currently up for OASIS Standard vote]
FYI, you should probably be aware.. don't underestimate either, just take a look at the To: list.. Original Message Subject: XRD 1.0 currently up for OASIS Standard vote Date: Fri, 22 Oct 2010 16:25:53 -0700 From: Will Norris w...@willnorris.com Reply-To: activity-stre...@googlegroups.com To: activity-stre...@googlegroups.com, gene...@lists.openid.net, bo...@lists.openid.net, sp...@lists.openid.net, oa...@googlegroups.com, oexcha...@googlegroups.com, portableconta...@googlegroups.com, salmon-proto...@googlegroups.com, diso-proj...@googlegroups.com, webfin...@googlegroups.com (apologies up front for those that get multiple copies of this due to the wide cross post) For those that haven't been following the current status XRD, I wanted to ping you and let you know that it is currently being voted on for consideration as an OASIS Standard. As a quick recap, XRD is a simple format for describing resources, with one immediate use being discovery of social web services. It is a direct evolution of XRDS (used for discovery in OpenID 2.0) and XRDS-Simple (used in OAuth Discovery, Portable Contacts, et al). XRD is currently in use as the descriptor format powering WebFinger. You can read the full spec at: http://docs.oasis-open.org/xri/xrd/v1.0/xrd-1.0.html I'm writing to these communities to encourage anyone whose company is an OASIS member to find out who your OASIS representative is and encourage them to vote. The ballot is open through October 31, so we only have just over a week left. Ballot: http://www.oasis-open.org/apps/org/workgroup/voting/ballot.php?id=1955(OASIS login required) Thanks, Will Norris -- You received this message because you are subscribed to the Google Groups Activity Streams group. To post to this group, send email to activity-stre...@googlegroups.com. To unsubscribe from this group, send email to activity-streams+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/activity-streams?hl=en.
Re: Please allow JS access to Ontologies and LOD
Hi Nathan, I implemented this header on http://productdb.org/ (since I had the code open). Can someone comfirm that it does what's expected (i.e. allows off-domain requesting of data from productdb.org) One important thing to note. The PHP snippet you gave was slightly wrong. The correct form is: header(Access-Control-Allow-Origin: *); Cheers, Ian On Sat, Oct 23, 2010 at 12:04 AM, Nathan nat...@webr3.org wrote: Hi All, Currently nearly all the web of linked data is blocked from access via client side scripts (javascript) due to CORS [1] being implemented in the major browsers. Whilst this is important for all data, there are many of you reading this who have it in your power to expose huge chunks of the RDF on the web to JS clients, if you manage any of the common ontologies or anything in the LOD cloud diagram, please do take a few minutes from your day to expose the single http header needed. Long story short, to allow js clients to access our open data we need to add one small HTTP Response header which will allow HEAD/GET and POST requests - the header is: Access-Control-Allow-Origin * This is both XMLHttpRequest (W3C) and XDomainRequest (Microsoft) compatible and supported by all the major browser vendors. Instructions for common servers follow: If you're on Apache then you can send this header by simply adding the following line to a .htaccess file in the dir you want to expose (probably site-root): Header add Access-Control-Allow-Origin * For NGINX: add_header Access-Control-Allow-Origin *; see: http://wiki.nginx.org/NginxHttpHeadersModule For IIS see: http://technet.microsoft.com/en-us/library/cc753133(WS.10).aspx In PHP you add the following line before any output has been sent from the server with: header(Access-Control-Allow-Origin, *); For anything else you'll need to check the relevant docs I'm afraid. Best TIA, Nathan [1] http://dev.w3.org/2006/waf/access-control/
Re: Please allow JS access to Ontologies and LOD
Hi Ian, Thanks, I can confirm the change has been successful :) However, one small note is that the conneg URIs such as http://productdb.org/gtin/00319980033520 do not expose the header, thus can't be used. In order to test yourself, simply do a curl -I request on the resource, for instance: curl -I http://productdb.org/gtin/00319980033520.rdf Also, I've just uploaded a small script which lets you enter a uri of an RDF/XML document, it'll try and pull it, parse it and display it as turtle for you - which is a good test of both CORS and the script ;) http://webr3.org/apps/play/api/test FYI, Dan has also made the change so the FOAF vocab is now exposed to JS. Best and thanks again, Nathan Ian Davis wrote: Hi Nathan, I implemented this header on http://productdb.org/ (since I had the code open). Can someone comfirm that it does what's expected (i.e. allows off-domain requesting of data from productdb.org) One important thing to note. The PHP snippet you gave was slightly wrong. The correct form is: header(Access-Control-Allow-Origin: *); Cheers, Ian On Sat, Oct 23, 2010 at 12:04 AM, Nathan nat...@webr3.org wrote: Hi All, Currently nearly all the web of linked data is blocked from access via client side scripts (javascript) due to CORS [1] being implemented in the major browsers. Whilst this is important for all data, there are many of you reading this who have it in your power to expose huge chunks of the RDF on the web to JS clients, if you manage any of the common ontologies or anything in the LOD cloud diagram, please do take a few minutes from your day to expose the single http header needed. Long story short, to allow js clients to access our open data we need to add one small HTTP Response header which will allow HEAD/GET and POST requests - the header is: Access-Control-Allow-Origin * This is both XMLHttpRequest (W3C) and XDomainRequest (Microsoft) compatible and supported by all the major browser vendors. Instructions for common servers follow: If you're on Apache then you can send this header by simply adding the following line to a .htaccess file in the dir you want to expose (probably site-root): Header add Access-Control-Allow-Origin * For NGINX: add_header Access-Control-Allow-Origin *; see: http://wiki.nginx.org/NginxHttpHeadersModule For IIS see: http://technet.microsoft.com/en-us/library/cc753133(WS.10).aspx In PHP you add the following line before any output has been sent from the server with: header(Access-Control-Allow-Origin, *); For anything else you'll need to check the relevant docs I'm afraid. Best TIA, Nathan [1] http://dev.w3.org/2006/waf/access-control/