Re: Think before you write Semantic Web crawlers
I wonder, are ways to link RDF data so that convential crawlers do not crawl it, but only the semantic web aware ones do? I am not sure how the current practice of linking by link tag in the html headers could cause this, but it may be case that those heavy loads come from a crawlers having nothing to do with semantic web... Maybe we should start linking to our rdf/xml, turtle, ntriples files and publishing sitemap info in RDFa... Best, Jiri On 06/22/2011 09:00 AM, Steve Harris wrote: While I don't agree with Andreas exactly that it's the site owners fault, this is something that publishers of non-semantic data have to deal with. If you publish a large collection of interlinked data which looks interesting to conventional crawlers and is expensive to generate, conventional web crawlers will be all over it. The main difference is that a greater percentage of those are written properly, to follow robots.txt and the guidelines about hit frequency (maximum 1 request per second per domain, no parallel crawling). Has someone published similar guidelines for semantic web crawlers? The ones that don't behave themselves get banned, either in robots.txt, or explicitly by the server. - Steve On 2011-06-22, at 06:07, Martin Hepp wrote: Hi Daniel, Thanks for the link! I will relay this to relevant site-owners. However, I still challenge Andreas' statement that the site-owners are to blame for publishing large amounts of data on small servers. One can publish 10,000 PDF documents on a tiny server without being hit by DoS-style crazy crawlers. Why should the same not hold if I publish RDF? But for sure, it is necessary to advise all publishers of large RDF datasets to protect themselves against hungry crawlers and actual DoS attacks. Imagine if a large site was brought down by a botnet that is exploiting Semantic Sitemap information for DoS attacks, focussing on the large dump files. This could end LOD experiments for that site. Best Martin On Jun 21, 2011, at 10:24 AM, Daniel Herzig wrote: Hi Martin, Have you tried to put a Squid [1] as reverse proxy in front of your servers and use delay pools [2] to catch hungry crawlers? Cheers, Daniel [1] http://www.squid-cache.org/ [2] http://wiki.squid-cache.org/Features/DelayPools On 21.06.2011, at 09:49, Martin Hepp wrote: Hi all: For the third time in a few weeks, we had massive complaints from site-owners that Semantic Web crawlers from Universities visited their sites in a way close to a denial-of-service attack, i.e., crawling data with maximum bandwidth in a parallelized approach. It's clear that a single, stupidly written crawler script, run from a powerful University network, can quickly create terrible traffic load. Many of the scripts we saw - ignored robots.txt, - ignored clear crawling speed limitations in robots.txt, - did not identify themselves properly in the HTTP request header or lacked contact information therein, - used no mechanisms at all for limiting the default crawling speed and re-crawling delays. This irresponsible behavior can be the final reason for site-owners to say farewell to academic/W3C-sponsored semantic technology. So please, please - advise all of your colleagues and students to NOT write simple crawler scripts for the billion triples challenge or whatsoever without familiarizing themselves with the state of the art in friendly crawling. Best wishes Martin Hepp signature.asc Description: OpenPGP digital signature
Re: Schema.org in RDF ...
On 06/12/2011 08:19 PM, Richard Cyganiak wrote: Hi Danny, On 12 Jun 2011, at 17:57, Danny Ayers wrote: We explicitly know the “expected types” of properties, and I'd like to keep that information in a structured form rather than burying it in prose. As far as I can see, rdfs:range is the closest available term in W3C's data modeling toolkit, and it *is* correct as long as data publishers use the terms with the “expected type.” I don't think it is that close to expected type I didn't say it's close to “expected type”. I said that we want to keep the information in a structured form, and that rdfs:range is the closest construct available in the W3C toolkit. Hi, Why not make a new property for such loose semantics (and make rdfs:range subproperty of it)? Surely we didn't go out of way to have great flexibility, compared to controlled vocabularies, for nothing... #something :hasColour #wet . then we get #wet a :Colour . If you apply RDFS/OWL reasoning to broken data, you get more broken data. I don't understand why anyone would be surprised by that. I am surprised someone wants to publish broken data. Best, Jiri signature.asc Description: OpenPGP digital signature
Re: Semantics of rdfs:seeAlso (Was: Is it best practices to use a rdfs:seeAlso link to a potentially multimegabyte PDF?)
On 01/13/2011 01:09 PM, Dave Reynolds wrote: On Thu, 2011-01-13 at 06:29 -0500, Tim Berners-Lee wrote: This is the Linked Open Data list. The Linked Data world is a well-defined bit of engineering. It has co-opted the rdf:seeAlso semantics of if you are looking up x load y from the much earlier FOAF work. Where is this well-defined bit of engineering defined in such a way that makes that co-option clear? [*] Assuming a particular use of rdfs:seeAlso as a convention for some community (e.g. FOAF) that wants to adapt that particular pattern is just fine. Updating specs in the future to narrow the interpretation to support this assumption usage might be OK, so long as due process is followed, but that hasn't happened yet. Complaining when others go by the existing spec does not seem reasonable. The URI space is full of empty space waiting for you to define terms with whatever semantics you like for your own use. But one cant argue philosophically that for some reason the URI rdfs:seeAlso should have some other meaning when people are using it and there have been specs. Those specs support Martin's usage, as his quotes from them clearly demonstrated. One *can* argue that the RDFS spec is definitive, and it is very loose in its definition. Loose in the sense of allowing a range of values but as a specification it is unambiguous in this case, as Martin has already pointed out: When such representations may be retrieved, no constraints are placed on the format of those representations. We could look at maybe asking for an erratum to the spec to make it clear and introduce the other term int the same spec. Or mint a sub-property of rdfs:seeAlso which provides the additional constraints. Dave +1 I also consider part of Linked Data that authoritative definition of a term is the one obtained by dereferencing it, which in case of RDFS is http://www.w3.org/2000/01/rdf-schema# . Best, Jiri [*] And yes, I'm well aware of [1] which does mention the foaf convention but it does so just as one convention in passing, there's no clear suggestion in there that tools should rely on that convention for arbitrary linked data. [1] http://www.w3.org/DesignIssues/LinkedData.html signature.asc Description: OpenPGP digital signature
Re: Is it best practices to use a rdfs:seeAlso link to a potentially multimegabyte PDF?, existing predicate for linking to PDF?
On 01/10/2011 01:45 PM, William Waites wrote: * [2011-01-10 08:55:59 +] Phil Archer phil.arc...@talis.com écrit: ] However... a property should not imply any content type AFAIAC. That's ] the job of the HTTP Headers. If software de-references an rdfs:seeAlso ] object and only expects RDF then it should have a suitable accept ] header. if the server can't respond with that content type, there are ] codes to handle that. I disagree that we should rely on HTTP headers for this. Consider local processing of a large multi-graph dataset. These kinds of properties can act as hints to process one graph or another without the need to dereference something. (tending to think of graph as equivalent to document obtained by dereferencing the graph's name). Slightly more esoteric are graphs made available over ftp, finger, freenet, etc.. Let's take advantage of HTTP where appropriate but not mix up the transport and content unnecessariy. Cheers, -w I agree, there is nothing wrong in having a subProperty which includes more information, whether it be about the subject or object of the triple, regardless if it's about content type or anything else. I believe it is good practice to specify domain and range of property in as precisely as possible. Failing to do so begs for usage which either is wrong by the original intention or making the meaning of the property very fuzzy, which in both cases results in less useful data. Best, Jiri signature.asc Description: OpenPGP digital signature
Re: Any reason for ontology reuse?
I would like to add, that successful books are published translated to other languages. So analogically one who wishes his ideas to be proliferated the most, should publish them in as many ontologies as possible. The flaws of the analogy are that the differences between ontologies are grow far bigger then differences in (western) languages, so the semantic loss by translation may be big. Second thing is that currently rarely anyone publishes the information which data is the original, and which are derivative translations. Hopefully with machine processable languages, the costs of translation would be negligible compared to natural languages, but we should keep records of the process, because it means possible decline of semantic quality. Best, Jiri On 12/04/2010 03:10 PM, Hugh Glaser wrote: This is really rather a fun reflection. I like Toby's analogy, but I think that it can usefully be improved. Instead of considering publishing in English, we are publishing in the equivalent of natural language. So the different vocabularies might correspond better to the different NLs around. If I am able to publish in English, with a significant smattering of German or Latin for words that might be missing from English, then with a little effort, someone can more easily understand what I am saying, especially given zeitgeist, context, et cetera, usw. However i am probably best keeping to English if I can. On the other hand, spraying around lots of words from lots of different vocabularies makes it much harder and fragile to understand than sticking to one obscure one or even inventing my own, as it means the consumer needs to go to lots of sources to work out what is meant. In fact, grabbing words from a bunch of different NLs is quite an easy, if vulnerable, encoding mechanism. I have been know to write down four digit numbers using transliteration of the numbers from different languages, as a mnemonic which would just be that bit of a challenge to someone who stumbled on it. I guess that is one reason why I am not as averse to minting URIs as some people. Cheers On 4 Dec 2010, at 13:07, Martin Hepp wrote: Simple rules: 1. It is better to use an existing ontology than inventing your own. 2. It is better to use the most popular existing ontology than a less popular existing ontology. 3. It is better to publish your data using your own ontology than not publishing your data at all. 4. It is better to use a good (*) private ontology for publishing your data than using a messy private ontology. (*) A good ontology is one that preserves the largest share of the original conceptual distinctions in your data, i.e. it does not require merging entity types that are distinct in the original data, as long as this distinction matters for potential data consumers. Whether option #1 is feasible depends on 1. how much time and money you are willing into lifting / publishing your data (that will be a matter of economic incentives). 2. how complicated it is to populate that ontology based on the available data and the local schemas. Best Martin On 04.12.2010, at 09:27, Toby Inkster wrote: On Fri, 3 Dec 2010 18:15:08 -0200 Percy Enrique Rivera Salas privera.sa...@gmail.com wrote: I would like to know, which are the specific reason(s), for reuse terms from well-known vocabularies in the process of Publish Linked Data on the Web? Consider this question: I would like to know, which are the specific reason(s) for reusing well-known words in the process of publishing English text on the Web? Answer: When you're writing something in English, you should avoid inventing new words unless you're fairly sure that a word for the concept you're trying to describe does not exist. This is because if you invent a new word, you need to describe what it means for other people to be able to understand you. And even when you do that, you've increased the cognitive load for your readers. URIs are the vocabulary of linked data, just like words are the vocabulary of the English language. For analogous reasons, you should avoid minting new URIs when an existing URI will do. If you mint a new URI that means the same as an existing one, then not only do you have to go to the effort of documenting its meaning, but consumers have to perform extra work (such as subproperty/subclass inferencing) to understand it. -- Toby A Inkster mailto:m...@tobyinkster.co.uk http://tobyinkster.co.uk martin hepp e-business web science research group universitaet der bundeswehr muenchen e-mail: h...@ebusiness-unibw.org phone: +49-(0)89-6004-4217 fax: +49-(0)89-6004-4620 www: http://www.unibw.de/ebusiness/ (group) http://www.heppnetz.de/ (personal) skype: mfhepp twitter: mfhepp Check out GoodRelations for E-Commerce on the Web of Linked Data!
Re: Is 303 really necessary?
On 11/28/2010 02:52 PM, Giovanni Tummarello wrote: - the rest of the web continue to use 200 Tim yes but the rest of the web will use 200 also to show what we would consider 208, e.g. http://www.rottentomatoes.com/celebrity/antonio_banderas/ see the trilples http://inspector.sindice.com/inspect?url=http://www.rottentomatoes.com/celebrity/antonio_banderas/#TRIPLES http://www.rottentomatoes.com/celebrity/antonio_banderas/ is clearly a web page but its also an actor, it is pointed by their graph in other pages as such and the same page contains the opengraph triple type actor We should not get ourself in the position to have to try to evangelize all to change something for reasons that are really not apparent to your normal web world. I think the solution we should be seeking consider RDFa publishing via normal 200 code as the example above absolutely ok an agent would then be able to distinguish which properties apply to the page and which to the actor looking at the.. properties themselves i guess? sad but possibly unavoidable? Giovanni Hi, I agree with this. This problem is caused that Linked Data conflates identifiers with locators - important is that one can get information about a unique name, by using it as a locator. The issue whether some events in the process or outcome of the information retrieval somehow should affect users perception of the name (is it a document or xyz?) is a can of worms most implementers don't want to tackle and they have a point. I don't want to maintain all apps I once coded so they support whatever is the latest HTTP semantics trend is, when there is a widely used standard for extensible, *evolvable* information representation (RDF) which I am already expecting to receive about the name I am retrieving info about. So lets not presume that by dereferencing an URI and getting back a document, the URI is the documents identifier - it is its locator. It can be its identifier too, but lets leave that for publishers to decide - that has been the point of my previous post on the topic ( http://lists.w3.org/Archives/Public/public-lod/2010Nov/0325.html ) Best regards, Jiří Procházka signature.asc Description: OpenPGP digital signature
Simpler alternative of Linked Data semantics (was Re: Is 303 really necessary?)
On 11/28/2010 06:45 PM, Kingsley Idehen wrote: On 11/28/10 9:46 AM, Jiří Procházka wrote: snip This problem is caused that Linked Data conflates identifiers with locators - important is that one can get information about a unique name, by using it as a locator. Linked Data (meme or actual concept) doesn't conflate Locators with Identifiers. A URI is a generic Identifier. A URL (a Locator / Address) is an Identifier. The problem remains in not understanding the URI abstraction. One issue you can't tack on Linked Data is failure to distinguish between a Name Reference and an Address Reference implemented via elegance of URI abstraction. The issue whether some events in the process or outcome of the information retrieval somehow should affect users perception of the name (is it a document or xyz?) is a can of worms most implementers don't want to tackle and they have a point. It wasn't a can of worms before the Web. The issue of Resource in URI [1] has lead to overloading that creates the illusion you describe, across many quarters and their associated commentators. I don't want to maintain all apps I once coded so they support whatever is the latest HTTP semantics trend is, when there is a widely used standard for extensible, *evolvable* information representation (RDF) which I am already expecting to receive about the name I am retrieving info about. So lets not presume that by dereferencing an URI and getting back a document, the URI is the documents identifier - it is its locator. Yes, it's the URL of a Document, and if the content-type is one of the RDF formats, or any other syntax for representing EAV model structured data -- via hypermedia -- then its the URL of a Entity Descriptor Document -- a document that provides a full representation of its Subject via a Description expressed in a Graph Pictorial comprised of Attribute=Value pairs coalesced around Subject Name (an Resolvable Identifier e..g an HTTP URI). It can be its identifier too, but lets leave that for publishers to decide - that has been the point of my previous post on the topic ( http://lists.w3.org/Archives/Public/public-lod/2010Nov/0325.html ) If you mean, let the publisher decide via Content and Mime Type what this is about, then emphatic YES!! That is the option which was promoted till now, but some people chose not to oblige for whatever reason and judging by the amount of discussions about it, it is a problem. If publisher makes available structured data about some concept at an URI he probably means the URI identifies the concept, not the data documents, and I think if one wants to use that data, he needs to try to understand the publisher, not tell him he is wrong because [insert XX pages of HTTP URI semantics], however flawed neglecting the standards you may consider to be - welcome the Linked Data (tag^H^H^Hstatus-code-)soup. I'm fond of RDFs take on URIs == names. What I mean is: a) letting publisher decide which name is for document and relations between them, which for concepts. On dereferencing (200 returned), not to think hmm yup, this definitely must be name for a document, but hmm I dereferenced this URI and got back some document - the document exists, but it's name isn't specified - like blank node, with similar drawbacks, thus: b) letting the publisher decide that not via Content and Mime Type, but in the structured data itself, because that is most probably what the consumer will be able to parse and understand anyway and there exists a well established standard Resource Description Framework for it which fulfills more of this great document ( http://www.w3.org/DesignIssues/Evolution.html ) than HTTP, which isn't a one ring to rule the all transport protocols. (Other data formats than RDF should have equivalent ways of expressing that too.) This practice doesn't need to be standardized. You and me can use it now if we wish. It has both advantages, covering wider quality range of linked data, and disadvantages - suddenly all documents with no data about them have no names! Tragedy? No, we have their locators and if there is something said about the locators, we can assume it is about the document stored there, unless said otherwise by the publisher (this is the difference from the standard Linked Data perspective). This doesn't make ambiguity to go away, that is impossible since it depends on the publisher, but I believe it is a simpler, more forward compatible way to go around it, fitting more world-views. Best, Jiri snip Links: 1. http://lists.w3.org/Archives/Public/www-tag/2009Aug/.html -- TimBL's own account re. origins of Resource in URI. This is the problem!! signature.asc Description: OpenPGP digital signature
Re: Simpler alternative of Linked Data semantics (was Re: Is 303 really necessary?)
On 11/29/2010 12:48 AM, Kingsley Idehen wrote: On 11/28/10 3:52 PM, Jiří Procházka wrote: On 11/28/2010 06:45 PM, Kingsley Idehen wrote: On 11/28/10 9:46 AM, Jiří Procházka wrote: snip This problem is caused that Linked Data conflates identifiers with locators - important is that one can get information about a unique name, by using it as a locator. Linked Data (meme or actual concept) doesn't conflate Locators with Identifiers. A URI is a generic Identifier. A URL (a Locator / Address) is an Identifier. The problem remains in not understanding the URI abstraction. One issue you can't tack on Linked Data is failure to distinguish between a Name Reference and an Address Reference implemented via elegance of URI abstraction. The issue whether some events in the process or outcome of the information retrieval somehow should affect users perception of the name (is it a document or xyz?) is a can of worms most implementers don't want to tackle and they have a point. It wasn't a can of worms before the Web. The issue of Resource in URI [1] has lead to overloading that creates the illusion you describe, across many quarters and their associated commentators. I don't want to maintain all apps I once coded so they support whatever is the latest HTTP semantics trend is, when there is a widely used standard for extensible, *evolvable* information representation (RDF) which I am already expecting to receive about the name I am retrieving info about. So lets not presume that by dereferencing an URI and getting back a document, the URI is the documents identifier - it is its locator. Yes, it's the URL of a Document, and if the content-type is one of the RDF formats, or any other syntax for representing EAV model structured data -- via hypermedia -- then its the URL of a Entity Descriptor Document -- a document that provides a full representation of its Subject via a Description expressed in a Graph Pictorial comprised of Attribute=Value pairs coalesced around Subject Name (an Resolvable Identifier e..g an HTTP URI). It can be its identifier too, but lets leave that for publishers to decide - that has been the point of my previous post on the topic ( http://lists.w3.org/Archives/Public/public-lod/2010Nov/0325.html ) If you mean, let the publisher decide via Content and Mime Type what this is about, then emphatic YES!! That is the option which was promoted till now, but some people chose not to oblige for whatever reason and judging by the amount of discussions about it, it is a problem. If publisher makes available structured data about some concept at an URI he probably means the URI identifies the concept, not the data documents, and I think if one wants to use that data, he needs to try to understand the publisher, not tell him he is wrong because [insert XX pages of HTTP URI semantics], however flawed neglecting the standards you may consider to be - welcome the Linked Data (tag^H^H^Hstatus-code-)soup. I'm fond of RDFs take on URIs == names. What I mean is: a) letting publisher decide which name is for document and relations between them, which for concepts. On dereferencing (200 returned), not to think hmm yup, this definitely must be name for a document, but hmm I dereferenced this URI and got back some document - the document exists, but it's name isn't specified - like blank node, with similar drawbacks, thus: b) letting the publisher decide that not via Content and Mime Type, but in the structured data itself, because that is most probably what the consumer will be able to parse and understand anyway and there exists a well established standard Resource Description Framework for it which fulfills more of this great document ( http://www.w3.org/DesignIssues/Evolution.html ) than HTTP, which isn't a one ring to rule the all transport protocols. (Other data formats than RDF should have equivalent ways of expressing that too.) Yes-ish, but my point is this: 1. Publisher (owner of Linked Data Server) serves up data via Documents at URLs 2. Linked Data Client (agents) accesses data by exploiting content negotiation when de-referencing URIs (Name or Address) 3. Publisher sends Document Content to client with metadata (HTTP response headers and/or within the content via triples or head/link/ exploitation re. HTML) -- this is where Mime Type comes into play too 4. Linked Data Client processes metadata and content en route to understanding what its received. This practice doesn't need to be standardized. You and me can use it now if we wish. It has both advantages, covering wider quality range of linked data, and disadvantages - suddenly all documents with no data about them have no names! Tragedy? No, we have their locators and if there is something said about the locators, we can assume it is about the document stored there, unless said otherwise by the publisher (this is the difference from the standard Linked Data
Re: Role of URI and HTTP in Linked Data
On 11/10/2010 11:44 AM, Nathan wrote: Hi Jiří, Jiří Procházka wrote: Hi, having read all of the past week and still ongoing discussion about HTTP status codes, URIs and most importantly their meaning from Linked Data perspective, I want share my thoughts on this topic. I don't mean to downplay anyone's work but I think the role of URI and HTTP specifications (especially semantics) in Linked Data is overemphasized, which unnecessarily complicates things. The URI is what makes Linked Data, Linked Data, it's the only hook to the real world, and via the domain name system + domain registration process gives us a hook on accountability, which is critically important. I am by no means giving up these utilities by what I suggest. #bar, as described by http://example.com/foo resolves in two ways: (1) http://example.com/foo as a name for the literal description/graph (2) http://example.com/foo as a way of saying the author of the description available at http://example.com/foo, stated X, and was responsible as delegated by the owners of example.com, where X is (1) and provable by the HTTP messages and logs. A status code of 200 vs 303 to some other domain or URI vs 4xx or 5xx plays a big part in that chain of accountability / validity / trust. I don't think Linked Data consumers should *have to* care about what status codes HTTP request returns - it shouldn't be part of the core Linked Data semantics. Of course it can be beneficial for clients to listen to them to get more information, but treating HTTP library as a simple function should be allowed (either it returns data or not). Whether someone 303s (nice verb) to a different domain, it obviously means he trusts it to maintain the description of his URI. Also never forget that Linked Data is just Links with literals, a Link as in a hyperlink, its the description of a relationship between two things (names or literals) which make a link a link, thus each link is a statement, statements form descriptions, descriptions are literal things. Triples are statements, Graphs are descriptions. There's a lot more to the simple triple with http URIs than many realise, sure it makes a nice RDF data bus for us and gives us an almost universal data format, which we can exploit and bring to the fore via linked data, but that's just the tip of the iceberg, and ultimately of very little use without the URI and HTTP. a few notes.. I think we can all agree, that the core idea of Linked Data is that information is expressed using unique identifiers (URIs) I can simply use to get useful information about the thing the identifier represents (thus mandated relatively simple, widely supported transfer protocol HTTP). as above, that's not the core of linked data, that's the surface. So lets stick with this. Lets just treat URIs as RDF does - as simple names. When we dereference an URI we get back some useful data and that's it. So, that'll be like mailto: or pop: or tel: then.. I don't follow here. I don't know of any standardized ways of getting structured data out of such URIs. If we want to express, the data fetched are in fact a document, we use the wdrs:isDefinedBy property. The data fetched are just a data and any info about it should be contain in it. Expressing that the data fetched is infact a document, is indeed optional, but any response is always a message, a description, a /literal/ thing, you can't pretend it doesn't exist, it does - to say a description is anything other than that is like me saying you're an apple and insisting everybody believe me. Literals are self identifying, self naming, things. I don't get what you mean here either. Are you talking about RDF semantics here or general ontological philosophy? If you are talking about RDF, then be aware that literals can have names - URIs assigned to literals. If talking about the latter, then I don't get you at all. I am advocating making Linked Data as simple as possible, avoiding abstract ontological definitions (in which I count the notion of literal). The fact that what you say is incomprehensible to me further strengthens me in my opinion. Why? Why no Content-Location? There is no reason to require additional complexity, building extra information layers. Publishing the document information in the data itself most probably would be simpler for both the publishing and the consuming party. Treating HTTP as a simple blackbox is what is mostly done in practice anyway. Read only world then? Not really, writing can be simple too, but we probably would want to draw the line somewhere unless we want Linked Data to require an universal RPC framework specification. What if someone doesn't publish the document data? Would it mean the URI we dereferenced refers both to the thing described and the description of it? Kind of. There is no kind of. The description is a literal thing all of it's own, it's the same thing regardless
Re: Role of URI and HTTP in Linked Data
On 11/10/2010 11:26 PM, Nathan wrote: Jiří Procházka wrote: On 11/10/2010 11:44 AM, Nathan wrote: Hi Jiří, Jiří Procházka wrote: Hi, having read all of the past week and still ongoing discussion about HTTP status codes, URIs and most importantly their meaning from Linked Data perspective, I want share my thoughts on this topic. I don't mean to downplay anyone's work but I think the role of URI and HTTP specifications (especially semantics) in Linked Data is overemphasized, which unnecessarily complicates things. The URI is what makes Linked Data, Linked Data, it's the only hook to the real world, and via the domain name system + domain registration process gives us a hook on accountability, which is critically important. I am by no means giving up these utilities by what I suggest. #bar, as described by http://example.com/foo resolves in two ways: (1) http://example.com/foo as a name for the literal description/graph (2) http://example.com/foo as a way of saying the author of the description available at http://example.com/foo, stated X, and was responsible as delegated by the owners of example.com, where X is (1) and provable by the HTTP messages and logs. A status code of 200 vs 303 to some other domain or URI vs 4xx or 5xx plays a big part in that chain of accountability / validity / trust. I don't think Linked Data consumers should *have to* care about what status codes HTTP request returns - it shouldn't be part of the core Linked Data semantics. Of course it can be beneficial for clients to listen to them to get more information, but treating HTTP library as a simple function should be allowed (either it returns data or not). Whether someone 303s (nice verb) to a different domain, it obviously means he trusts it to maintain the description of his URI. snap, I don't think they should either, I also don't think they should have to constantly ask is this a document or a toucan? - it could all be so much easier. I think you have missed the point of the second part of my original email - I think it is flawed trying to enforce URI == 1 thing by some system (especially if you want to maintain RDF as one of supported structured data formats (I dare to say the major one)), as nothing can be completely unambiguous (in RDF) - that is something the publisher needs to keep in mind and work towards to. Key is not inferring any information which would increase ambiguity which my simple solution preserves and solves the is this a document or a toucan? problem if the original data is unambiguous (if it isn't it's not like the consumer can do anything about it anyway). ps: 303 doesn't day you'll find it here!, it says maybe you try here instead? So lets stick with this. Lets just treat URIs as RDF does - as simple names. When we dereference an URI we get back some useful data and that's it. So, that'll be like mailto: or pop: or tel: then.. I don't follow here. I don't know of any standardized ways of getting structured data out of such URIs. That's the point, RDF treats all URIs the same, you're saying we should treat URIs as RDF does, as nothing more than a logical hook - doesn't do us much good practically when we want to dereference and get back some useful data. It is useful - I don't advocate using any other URIs then HTTP with Linked Data, because with HTTP URIs we use the HyperText *Transfer* Protocol which gets us some useful data, without having to cut up the URIs. Best, Jiri signature.asc Description: OpenPGP digital signature
Role of URI and HTTP in Linked Data
Hi, having read all of the past week and still ongoing discussion about HTTP status codes, URIs and most importantly their meaning from Linked Data perspective, I want share my thoughts on this topic. I don't mean to downplay anyone's work but I think the role of URI and HTTP specifications (especially semantics) in Linked Data is overemphasized, which unnecessarily complicates things. I think we can all agree, that the core idea of Linked Data is that information is expressed using unique identifiers (URIs) I can simply use to get useful information about the thing the identifier represents (thus mandated relatively simple, widely supported transfer protocol HTTP). So lets stick with this. Lets just treat URIs as RDF does - as simple names. When we dereference an URI we get back some useful data and that's it. If we want to express, the data fetched are in fact a document, we use the wdrs:isDefinedBy property. The data fetched are just a data and any info about it should be contain in it. Why? Why no Content-Location? There is no reason to require additional complexity, building extra information layers. Publishing the document information in the data itself most probably would be simpler for both the publishing and the consuming party. Treating HTTP as a simple blackbox is what is mostly done in practice anyway. What if someone doesn't publish the document data? Would it mean the URI we dereferenced refers both to the thing described and the description of it? Kind of. What I mean is the consumer side can add additional information to the data about the document (when and how fast it was fetched etc) and if the data doesn't contain info about the document already, it could add it: uri wdrs:isDefinedBy [ wdsr:location uri ] . # or something like this Non-RDF data should use their equivalents. That is the most important things I had to say - lets keep semantics in the data. I believe it is quite important that the range of wdrs:isDefinedBy is a document class, which should be domain of wdsr:location. I am going to explain why I think so, but beware, at this point I get a bit philosophical :) What is pretty awesome about RDF, which is something Linked Data could learn, is how it dabbled the ontological (used as philosophical term) issues - existence, being and reality. In order to support maximum expressiveness and compatibility with various world-views it says the least about it. Big part of that is dealing with identity - if a caterpillar turns into butterfly, is it still the same thing? Am I still I when I get older and change? RDF doesn't offer any answers to such questions, neither if there are only information resources and other resources. There are just names which identify objects or concepts, which we describe with names and the final description matches some number of objects or concepts we know, while the better the description is, the lower the number is. RDFS classes are used to describe various aspects of objects or concepts, which allow us to express ourselves much less ambiguously, using properties with defined domain and range. On the other hand we can describe those aspects separately if we consider them a separate entity. For example someone can say I am averagely skilled as an English speaker, or that my English skill is mediocre, or that I am one of averagely skilled English speakers. Similarly one could say book is long 3 characters as its content, or that book is long 20 characters as its title, or that book is long 3000 characters as the description received on dereferencing. It shouldn't matter if I consider a book name as part of it or not, if I use as unambiguously defined properties as possible. However vocabularies with not very well defined terms (consider an example length property), which generally mimic natural language properties, are used widely, which is why we should have wdrs:isDefinedBy. The point of this philosophical exercise was to say, that shouldn't be saying an URI represents one resource or trying to define what resources are or what existence is, but recognizing the context of the original information when modifying it (especially amending). Best, Jiri Prochazka PS: It might be useful to also have wdrs:isPrimarilyDefinedBy (as rdfs:subPropertyOf wdrs:isDefinedBy). signature.asc Description: OpenPGP digital signature
Re: AW: ANN: LOD Cloud - Statistics and compliance with best practices
Hi everyone, I think it is important not to forget that semantic web goal of creating a unified model for information exchange in decentralized heterogeneous network of systems, aiming for the lowest common denominator, implies many requirements for data quality will not be met, because simply for various people they are different. It is a matter of paradigm - way of working with the data, so it should come as no surprise various groups of alike thinking people define their requirements, especially in area of discoverability. I find it quite surprising that no more of such standards as Linked Data and LOD exist. Perhaps once more of them exist, community tracking and comparison to be included in semantic web introduction materials would help proliferation of more accurate image of semantic web... Of course it would be great if information about complying data of such initiatives would be generated by automated tools (no Submit URL please), as example the application of the data discoverability algorithm they endorse (not sure if LD has something like this - follow-your-nose?), if discoverability is in their focus. Best, Jiri Prochazka On 10/21/2010 09:23 PM, Enrico Motta wrote: Chris I strongly agree with the points made by Martin and Giovanni. Of course the LOD initiative has had a lot of positive impact and you cannot be blamed for being successful, but at the some time I am worried that teh success and visibility of the LOD cloud is having some rather serious negative consequences. Specifically: 1) lots of people, even within the SW community, now routinely describe the LOD as the 'semantic web'. This is not only dramatically incorrect (and bad for students and people who want to know about the SW) but also an obstacle to progress: anything which is not in the LOD diagram does not exist, and this is really not good for the SW community as a whole (including the people at the centre of the LOD initiative). Even worse, in the past 12-18 months I have noticed that this viewpoint has also been embraced by funding bodies and linking to LOD is becoming a necessary condition for a SW project. Again, I think this is undesirable - see also Martin's email on this thread. 2) Because the LOD is perceived as the 'official SW' and because resources in the LOD have to comply with a number of guidelines, people also assume that LOD resources exhibit higher quality. Unfortunately in our experience this is not really the case, and this also generates negative consequences. That is, if LOD is the 'official high quality SW ' and there are so many issues with the data, automatically people assume that the rest of the SW is a lot worse, even though this is not necessarily the case. So, as other people have already said, maybe it is time to re-examine teh design criteria for LOD and the way this is presented? For instance, it would be beneficial to the community if LOD were to focus more on quality issues, rather than linking for the sake of linking. And in addition, a less static approach to listing resources could improve the visibility of so much more stuff out there. Enrico PS I agree with you that it would be much better, if somebody would set up a crawler, properly crawl the Web of Data and then provide a catalog about all datasets. Actually this is exactly what our Watson system does, see http://watson.kmi.open.ac.uk At 13:12 +0100 21/10/10, Giovanni Tummarello wrote: But again: I agree that crawling the Web of Data and then deriving a dataset catalog as well as meta-data about the datasets directly from the crawled data would be clearly preferable and would also scale way better. Thus: Could please somebody start a crawler and build such a catalog? As long as nobody does this, I will keep on using CKAN. Hi Chris, all I can only restate that within Sindice we're very open to anyone who wanted to develop data anlisys apps creating catalogs automatically. At the moment a map reduce job a couple of week ago gave an excess of 100k independent datasets. How many interlinked etc? to be analyzed. Our interest (and the interest of the Semantic Web vision i want to sposor) is to make sure RDFa sites are fully included and so are those who provide markup which can however be translated in an automatic/agreeable way (so no scraping or sponging) into RDF. (that is anything that any23.org can turn into triples) If you were indeed interested in running your or developing your algorithms in our running dataset no problem, the code can be made opensource so it would run on others similarly structured datasets. This said yes i think too that in this phase a CKAN like repository can be an interesting aggregation point, why not. But i do think the diagram, which made great sense as an example when Richard started it is now at risk of providing a disservice which is in line which what Martin is making noticed. The diagram
Re: RDF Extensibility
On 07/06/2010 11:05 PM, Pat Hayes wrote: On Jul 6, 2010, at 9:34 AM, Jiří Procházka wrote: [snipped] In case of a) I don't have cleared up my thoughts yet, but generally I would like to know: How are semantic extensions to work together in automated system? Well, the semantics always defines some notion of entailment, and your system is supposed to respect that notion: not draw invalid conclusions, draw as many valid conclusions as you feel are useful, don't say things are inconsistent when they aren't, etc.. Otherwise, you have free rein. So, if you have several semantic extensions, they are each provide a set of such entailments and they should add up to one single set of legal entailments. How to let agent know that the data is described using new RDF extension, which the client doesn't know and the data could be (or definitely are) false if it is interpreted using vanilla RDF semantics? NOt false, if its a semantic extension (they can't contradict the RDF semantics., only extend it.) BUt same point more generally: how do we know, given some RDF, what semantic extensions are appropriately to be used when interpreting it? That is a VERY good question. This is something that RDF2 could most usefully tackle, if only in a first-step (ham-fisted?) kind of a way. We were aware that this was an issue in the first WG, but it was just too far outside out charter, and our energy level, to tackle properly. One obvious (?) thing to say is that using a construction from a namespace which is associated with the definition of any RDF semantic extension is deemed to bring along the necessary interpretation conditions from the extension, so that for example if I use owl:sameAs in some RDF, then I mean it to be understood using the OWL semantic conditions. We all do this without remarking upon it, but loosely, and to make this precise and normative would be a very interesting (and useful) exercise. (An issue already here is, which version of the OWL semantics is intended? Does the use in RDF also import the OWL-DL syntactic restrictions on its use, for example?) That is indeed what I had in mind. I think sooner or later this has to be dealt with, and I think the sooner the better... I don't think the namespace thing is obvious, since I don't think there is a concept of namespace defined in RDF. It is just some graph containing some terms related to a semantic extension of RDF. How does the processing application know? Which entailment rules are to be applied to the graph? How should the patterns triggering application of a rule be defined? Having multiple rulesets, in what order and how to apply them? What about rules modifying rules and rulesets? How to define interpretation of graphs (which rulesets to apply, which to ignore)? Is a graph and entailment rules everything what is used in interpretation according to a semantic extension or are there also some attributes like graph consistency (if so, how do pass them on? as added triples?)? These are just questions I am pulling off top of my head... Best, Jiri Pat b) How should my system know that the data which is just being processed is new revision of RDF/XML and not malformed RDF/XML when forward compatibility was out of sight, out of mind when RDF/XML was designed? Best, Jiri Prochazka IHMC (850)434 8903 or (650)494 3973 40 South Alcaniz St. (850)202 4416 office Pensacola(850)202 4440 fax FL 32502 (850)291 0667 mobile phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes signature.asc Description: OpenPGP digital signature
RDF Extensibility
On 07/06/2010 03:35 PM, Toby Inkster wrote: On Tue, 6 Jul 2010 14:03:19 +0200 Michael Schneider schn...@fzi.de wrote: So, if :s lit :o . must not have a semantic meaning, what about lit rdf:type rdf:Property . ? As, according to what you say above, you are willing to allow for literals in subject position, this triple is fine for you syntactically. But what about its meaning? Would this also be officially defined to have no meaning? It would have a meaning. It would just be a false statement. The same as the following is a false statement: foaf:Person a rdf:Property . Why do you think so? I believe it is valid RDF and even valid under RDFS semantic extension. Maybe OWL says something about disjointness of RDF properties and classes URI can be many things. I think there are issues about RDF extensibility which haven't been solved and they concern: a) semantics b) serializations In case of a) I don't have cleared up my thoughts yet, but generally I would like to know: How are semantic extensions to work together in automated system? How to let agent know that the data is described using new RDF extension, which the client doesn't know and the data could be (or definitely are) false if it is interpreted using vanilla RDF semantics? b) How should my system know that the data which is just being processed is new revision of RDF/XML and not malformed RDF/XML when forward compatibility was out of sight, out of mind when RDF/XML was designed? Best, Jiri Prochazka signature.asc Description: OpenPGP digital signature
Re: Show me the money - (was Subjects as Literals)
On 07/01/2010 09:11 PM, Henry Story wrote: Social Web Architect http://bblfish.net/ On 1 Jul 2010, at 21:03, Tim Finin wrote: On 7/1/10 2:51 PM, Henry Story wrote: ... So just as a matter of interest, imagine a new syntax came along that allowed literals in subject position, could you not write a serialiser for it that turned 123 length 3 . Into _:b owl:sameAs 123; length 3. ? So that really you'd have to do no work at all? Just wondering Isn't owl:sameAs defined to be a relation between two URI references? Not sure. It is, this won't work under OWL DL... In OWL Full if I think it will. I asked about this recently on this list... In any case I suppose it would be simple to crete such an identity relation. Even if not, it is symmetric and would have the above imply {123 owl:sameAs _:b .} It does indeed imply that, though you can't write it out like that in most serialisations, other than N3. And being able to write it out, makes it easy to explain what symmetry means. I think people keep confusing syntax and semantics for some reason, even on the semantic web. Henry signature.asc Description: OpenPGP digital signature
Re: Subjects as Literals, [was Re: The Ordered List Ontology]
On 06/30/2010 09:09 PM, Pat Hayes wrote: On Jun 30, 2010, at 11:50 AM, Nathan wrote: Pat Hayes wrote: On Jun 30, 2010, at 6:45 AM, Toby Inkster wrote: On Wed, 30 Jun 2010 10:54:20 +0100 Dan Brickley dan...@danbri.org wrote: That said, i'm sure sameAs and differentIndividual (or however it is called) claims could probably make a mess, if added or removed... You can create some pretty awesome messes even without OWL: # An rdf:List that loops around... #mylist a rdf:List ; rdf:first #Alice ; rdf:next #mylist . # A looping, branching mess... #anotherlist a rdf:List ; rdf:first #anotherlist ; rdf:next #anotherlist . They might be messy, but they are *possible* structures using pointers, which is what the RDF vocabulary describes. Its just about impossible to guarantee that messes can't happen when all you are doing is describing structures in an open-world setting. But I think the cure is to stop thinking that possible-messes are a problem to be solved. So, there is dung in the road. Walk round it. Could we also apply that to the 'subjects as literals' general discussion that's going on then? For example I've heard people saying that it encourages bad 'linked data' practise by using examples like { 'London' a x:Place } - whereas I'd immediately counter with { x:London a 'Place' }. Surely all of the subjects as literals arguments can be countered with 'walk round it', and further good practise could be aided by a few simple notes on best practise for linked data etc. I wholly agree. Allowing literals in subject position in RDF is a no-brainer. (BTW, it would also immediately solve the 'bugs in the RDF rules' problem.) These arguments against it are nonsensical. The REAL argument against it is that it will mess up OWL-DL, or at any rate it *might* mess up OWL-DL. I wonder, when using owl:sameAs or related, to name literals to be able to say other useful thing about them in normal triples (datatype, language, etc) does it break OWL DL (or any other formalism which is base of some ontology extending RDF semantics)? Or would it if rdf:sameAs was introduced? Best, Jiri The Description Logic police are still in charge:-) Pat Best, Nathan IHMC (850)434 8903 or (650)494 3973 40 South Alcaniz St. (850)202 4416 office Pensacola(850)202 4440 fax FL 32502 (850)291 0667 mobile phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes signature.asc Description: OpenPGP digital signature
Re: Inclusion of additional (non dereferencable) data?
Hi Nathan, origin of this pattern is from RDF - graph is just a piece of paper where you can write any statements... But I understand - your view of Linked Data is much different, introducing more complex notions of things and their descriptions in which it is desirable, after getting a description, to identify the role which statements relating other things play. I see that someone could want to express statements which relate to the described thing - make sense from the point of view of its description but not being in the description received by dereferencing URIs of the subject/object of the statement or even contradicting it, perhaps maybe when in doubt of their future. Other than that I see it bad practice, because of the reasons you explained (out of date possibility being the most important). I believe Linked Data encourages dereferencing. What Linked Data brings into the Semantic Web puzzle is a recommended algorithm how to get data (follow your nose?) and maybe that is the thing you really are looking for... I wonder if there is a need for vocabulary allowing people describe recommendations of URIs to surely dereference, or perhaps measurement of URI importance in the graph, feeding a more complex algorithm... Regards, Jiri Prochazka On 06/10/2010 05:24 PM, Nathan wrote: All, Here's a common example of what I'm referring to, suppose we have a (foaf) document http://ex.org/bobsmith which includes the following triples: :me foaf:knows http://example.org/joe_bloggs#me . http://example.org/joe_bloggs#me a foaf:Person ; foaf:name Joe Bloggs@en . In Linked Data terms one could suggest that the description of Joe Bloggs doesn't 'belong' in this document (although clearly it can be here). I can quite easily see how trend came about, there are benefits, it's both an optimisation method (saves dereferencing) and it's an inclusion of human presentable information (which aids display / comprehension in 'foaf viewers'). However, there are drawbacks too, the data could easily go out of date / out of sync, it's not dereferencable (the adverse effects in this example aren't specifically clear, but in other use-cases they are considerable). Over and above these simple thoughts, I'm quite sure that there are bigger architectural and best practise considerations (for a web of data), for example: - does this create an environment where we are encouraged not to deference linked data (or where it is common to look local first) - does this point to bigger issues such as not having a single global predicate for a default human presentable 'name' for all things that can be 'named' (given a URI) - even though many candidates are available. - should 'reading ahead' (dereferencing all linked data before presentation to a user / trying to glean an understanding) be encouraged over providing a limited local subset of the data which could easily be inaccurate or out of date. - is there an gut instinct in the community that most data will ultimately end up being presented to a human somewhere along the line, and this is driving us to make such design decisions. Any thoughts or strong feelings on the issue(s)? and is anybody aware of whether this practise came about more by accident than by design? Best, Nathan signature.asc Description: OpenPGP digital signature
Re: Fwd: backronym proposal: Universal Resource Linker
Why 'URL' when it is pretty clearly defined and still significant portion of web users don't understand it. I'd rather embrace 'web address' - even non-tech users would understand that. Best, Jiri Prochazka On 04/18/2010 12:18 PM, Dan Brickley wrote: So - I'm serious. The term 'URI' has never really worked as something most Web users encounter and understand. For RDF, SemWeb and linked data efforts, this is a problem as our data model is built around URIs. If 'URL' can be brought back from limbo as a credible technical term, and rebranded around the concept of 'linkage', I think it'll go a long way towards explaining what we're up to with RDF. Thoughts? Dan -- Forwarded message -- From: Dan Brickley dan...@danbri.org Date: Sun, Apr 18, 2010 at 11:52 AM Subject: backronym proposal: Universal Resource Linker To: u...@w3.org Cc: Tim Berners-Lee ti...@w3.org I'll keep this short. The official term for Web identifiers, URI, isn't widely known or understood. The I18N-friendly variant IRI confuses many (are we all supposed to migrate to use it; or just in our specs?), while the most widely used, understood and (for many) easiest to pronounce, 'URL' (for Uniform Resource Locator) has been relegated to 'archaic form' status. At the slightest provocation this community dissapears down the rathole of URI-versus-URN, and until this all settles down we are left with an uncomfortable disconnect between how those in-the-know talk about Web identifiers, and those many others who merely use it. As of yesterday, I've been asked but what is a URI? one too many times. I propose a simple-minded fix: restore 'URL' as the most general term for Web identifiers, and re-interpret 'URL' as Universal Resource Linker. Most people won't care, but if they investigate, they'll find out about the re-naming. This approach avoids URN vs URI kinds of distinction, scores 2 out of 3 for use of intelligible words, and is equally appropriate to classic browser/HTML, SemWeb and other technical uses. What's not to like? The Web is all about links, and urls are how we make them... cheers, Dan signature.asc Description: OpenPGP digital signature
Re: Comments on Data 3.0 manifesto
So essentially, all this is a cover-up maneuver to sell RDF to the people masked as something else, more familiar? If so, I understand why you feel this is necessary, after all the goal is not to sell the customer what he asked for, but what he really wanted but didn't realize or could fully express (this time customer being tech folk). Anyway I rather use and try to market RDF as it is, maybe it's a bit too fast for some, but I guess I've left too little people in utter confusion yet to try so different ways :) But before proceeding with your plan to fix RDF + Linked Data marketing, I ask you to consider also what in marketing RDF was done right beside what wasn't. For example RDF has clear name (Data 3.0? not very good name IMHO), the core model is very simple and has been numerously very well explained. On the other hand your manifesto sounds a bit too complex, more like a spec than a manifesto. For the effect I think you are aiming you need something very simple and striking... Not to mention it is first time I am hearing about EAV model, we all are from different backgrounds so this terminology won't have much of an impact I fear, though it is still good to introduce yet distant communities ;) For me greatest value of RDF and Linked Data lies in semantics - the ontologies (RDFS/OWL), which, as far as I understand it, the EAV model doesn't touch at all which in my eyes makes it only a bit better than tabular data models (rectangular as someone nicely coined some time ago somewhere). Overall it seems to me like building a sand island in middle of a wide river to ease construction of bridges across it... I guess you have tried building a bridge without the island a few times and it collapsed every time, so I understand why you are building the island. But maybe I got better steel and mine bridges would last... maybe... On one hand I am glad we try these various ways and on the other I keep myself asking if the gain outweighs the price of fragmentation... Best, Jiri Prochazka On 04/17/2010 10:51 PM, Kingsley Idehen wrote: John Erickson wrote: Hi Kingsley! Reading between the lines, I think I grok where you are trying to go with your manifesto. For it to be an effective, stand-alone document I think a few pieces are needed: 1. What is your GOAL? It should be clearly stated, something like, to promote best-practices for standards-compliant access to structured data object (or entity) descriptors by getting data architects to do X instead of Y, etc. Okay, I'll see what I can do. This document is really a continuation of a document that's actually missing from the Web, sadly. A long time ago (start of Web 2.0), there was a Data 2.0 manifesto by Alex James (now at Microsoft), so in classic two-fer fashion I've opted to kill two birds with a single stone: 1. Linked Data incomprehension (Technical and Political) 2. Data 2.0 manifesto upgrade and update. 2. What is your MOTIVATION? I think this is implicit in your current text --- your argument seems to be that TBL's Four Principles are not enough --- but you need to make your motivations explicit and JUSTIFY them. If TBL's principles are too nebulous, explain concisely why and what the implications are. Keep in mind that they seem to be good enough for many practitioners today. ;) My motivation is simply this: Get RDF out of the way! The RDF incomprehension cloud is only second to what's heading across Northern Europe from Iceland, re. obscuring a myriad of routes to Linked Data comprehension. How can we spend 12+ years on the basic issue of EAV + de-referencable identifiers? Compounded by poor monikers such as: Information Resource and Non-Information Resource. We have Data Objects (Entities, Data Items etc.) and their associated Descriptor Documents (Representation Carriers or Senses), its always been so! Note, RDF the Data Model doesn't exist in the minds of the broader Web audience (I am not sending an inbound meme to the Semantic Web Community, my meme is being beamed to a wider audience that's taking way to long to grok the essence of the Linked Data matter). I (and many others) are utterly fed up with trying to accentuate the fact that RDF is based on a Graph Data Model. The initial RDF/XML is RDF conflation has dealt a fatal blow to RDF re., broad audience communications. EAV has been with us forever, people already use applications that are based on this model, across all major operating systems. Why not triangulate from this position (top down) instead of bottom up (which ultimately reeks of NIH rather than a Cool Tweak)? 3. Be SPECIFIC about what practitioners must do moving forward. I think you've made a good start on this, to the extent that you have lots of SHOULDS. I would argue that more specificity of a different kind is needed; if data architects SHOULD be following more abstract EAV conceptualizations, what exactly should they do in practice?
Re: [Patterns] Materialize Inferences (was Re: Triple materialization at publisher level)
SWRL has much nicer RDF representation than RIF, so that might also be an alternative, though the expressiveness might vary. Best, Jiri Prochazka On 04/08/2010 10:16 AM, Ivan Herman wrote: On Apr 7, 2010, at 14:45 , Dan Brickley wrote: A guideline might be: As typing information become broader and more inclusive, they also become less informative: to know that something is a Thing is rarely useful. It is difficult to say whether a class is at a useful level of specificity without taking into account other datasets, tools and services that use it, however an intuitive grasp of mid-level concepts often provides useful guidance. In addition, Linked Data apps have a particular concern for cross-referencing information about specific things, it is therefore often useful to include inferred identifiers (owl:sameAs etc) based on analysis of properties (owl:FunctionalProperty, owl:InverseFunctionalProperty) etc Ok that's not very friendly text but hope it might be useful. Basically rdf:type owl:Thing is boring, but owl:sameAs x:anotherID is very useful... I am a little bit concerned by the open-endedness of this. As an information consumer I would like to have at least some information or hints as for which inferences are materialized and which are not. As a thought experiment: what about providing a set of RIF (Core) rules that describe which inferences are materialized? It is possible to express RDFS as well as OWL-RL via RIF rules and, what is even more important in this context, any subsets thereof. Human clients may look at those rules and, with little training, may understand what is happening for the simpler cases; after all, many publisher will decide to use 2-3 rules only (eg, subproperty and subclass inferences). Machine clients may even choose to instantiate the inferences themselves with some local rule engine if their CPU/bandwidth ratio makes that more attractive. I know, the current RIF syntax is not all that beautiful (but I would hope that alternative syntaxes will come to the fore, mainly if the demand is there) and I am not sure whether rule engines, bound to RDF environments (like Jena Rules or Fuxi) already implement RIF Core (although I believe/hope they would). But that seems to be a possible way to go nevertheless... Ivan (To avoid misunderstanding: with his W3C Position's hat down:-) cheers, Dan Ivan Herman, W3C Semantic Web Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 PGP Key: http://www.ivan-herman.net/pgpkey.html FOAF: http://www.ivan-herman.net/foaf.rdf signature.asc Description: OpenPGP digital signature
Re: ontology mapping etiquette (was What is the class of a Named Graph?)
On 02/22/2010 09:44 PM, Richard Cyganiak wrote: On 22 Feb 2010, at 19:36, Jiří Procházka wrote: I wonder if we as a group of people interested in Semantic Web could come up with etiquette for ontology mapping. Interesting topic! My €0.02: If the other vocabulary is likely to be - more stable - more mature - more likely to be widely used - more likely to be around for a longer time then you should map your terms to it. If not, don't. So IMO the rdfg vocabulary should map to the SPARQL Service Description vocabulary as soon as it becomes REC, but SPARQL-SD should NOT map to rdfg. Hi Richard, that also seems reasonable to me at first, but when thinking about it more thoroughly, there is value in both ontologies doing the mapping to the other. Yes, if both sides agree, then two-way mappings are great. But this is only realistic if both vocabularies rate about equally on the criteria above. As an extreme example, it would be totally unrealistic to expect the RDFS vocabulary to link back to every vocabulary that has some sort of label/name property (all of which should be subproperties of rdfs:label). Yes, I had in mind especially equivalentClass/Property relations and alike where it doesn't scale much, not subproperties. snip Certain mapping statements make sense from PoV of one ontology, but not the other. I don't know what you mean. An example might help. But anyway, if you map to my ontology, but from my POV that mapping doesn't make sense, then I'm certainly not going to map back to yours. I mean when the philosophies of the creators of the ontologies aren't mutually compatible. I'm unable to come up with some example but just let's say that someday we will have religious ontologies... snip If we allow ourselves to go a bit further, I thought it would be great if there was some community developed service which would in automated fashion give advice for improvement and rate user submitted (better yet WoD collected) ontologies judging their quality of design - most importantly re-usability which basically means how is it aligned to other similar ontologies. This would be probably very difficult, at least because of varying opinions on this... I guess database community has something to say about that. I think that's a different issue. When it comes to rating the “quality” of a vocabulary, then the amount of mappings to other vocabularies is a very minor factor. First, because other things (especially amount of uptake and strength of the surrounding community) are much more important. Second, because adding the mappings is so easy. No vocabulary will succeed or fail because of its inclusion or lack of mappings. Strength of community and amount of uptake matters really a lot, but next thing you are interested in is how an ontology is compatible with the rest of your knowledge - how good the mappings are and if it has mappings to its competitors, because the they can have mappings to other ontologies you have not (and how they are good). Nevertheless, I agree that we need services that support us in finding high-quality vocabularies, and that help drive the improvement of existing ones. But it's a complex subject, there are many existing efforts (Watson, Talis Schema Cache, Falcons Concept Search, ontologydesignpatterns.org, and I probably missed a few), and to me it's not obvious what is the right approach. Perhaps we don't need better ways of finding and creating vocabularies, but better ways of finding and creating communities around a domain that can then jointly agree on a vocabulary. Great point! I would love to see some development in this area... All the best, Richard There are more things to talk about regarding this, but this is what I have in mind so far. Best, Jiri Best, Richard Best, Jiri Hope that helps. thanks, .greg [1] http://www.w3.org/TR/sparql11-service-description/#id41794 signature.asc Description: OpenPGP digital signature
ontology mapping etiquette (was What is the class of a Named Graph?)
On 02/22/2010 01:53 PM, Richard Cyganiak wrote: Jiri, On 22 Feb 2010, at 10:51, Jiří Procházka wrote: I wonder if we as a group of people interested in Semantic Web could come up with etiquette for ontology mapping. Interesting topic! My €0.02: If the other vocabulary is likely to be - more stable - more mature - more likely to be widely used - more likely to be around for a longer time then you should map your terms to it. If not, don't. So IMO the rdfg vocabulary should map to the SPARQL Service Description vocabulary as soon as it becomes REC, but SPARQL-SD should NOT map to rdfg. Hi Richard, that also seems reasonable to me at first, but when thinking about it more thoroughly, there is value in both ontologies doing the mapping to the other. Danbri recently touched on this on IRC in relation to reciprocal WebID owl:sameAs relations. What one source says in RDF is what it considers true, which in our case would also mean the mapping makes sense from the point of view of both ontologies if reciprocal. So I would advocate doing reciprocal mappings, if they can agree on the common mapping. This brings another issue... Certain mapping statements make sense from PoV of one ontology, but not the other. Should it be dropped and just have the both-sides-approved mapping? I'm in favour of publishing it on just with the ontology for which it makes sense. Dumping it would encourage one-big-federated-web-ontology which is nice dream but not what I believe is suitable for the real world and web (thanks to its relativistic nature). If we allow ourselves to go a bit further, I thought it would be great if there was some community developed service which would in automated fashion give advice for improvement and rate user submitted (better yet WoD collected) ontologies judging their quality of design - most importantly re-usability which basically means how is it aligned to other similar ontologies. This would be probably very difficult, at least because of varying opinions on this... I guess database community has something to say about that. There are more things to talk about regarding this, but this is what I have in mind so far. Best, Jiri Best, Richard Best, Jiri Hope that helps. thanks, .greg [1] http://www.w3.org/TR/sparql11-service-description/#id41794 signature.asc Description: OpenPGP digital signature
Re: What is the class of a Named Graph?
What you pointed at is a property sd:namedGraph. The upcoming SPARQL standard doesn't define any class for named graphs. I support using: http://www.w3.org/2004/03/trix/rdfg-1/Graph Best, Jiri On 02/21/2010 10:40 AM, Michael Hausenblas wrote: Nathan, Any further input before I start using rdfg-1:Graph when describing graphs? I'd suggest you forget about both references and go with the upcoming SPARQL standard [1]. Cheers, Michael [1] http://www.w3.org/TR/2010/WD-sparql11-service-description-20100126/#id41744 signature.asc Description: OpenPGP digital signature
Re: Ontology for semantic web technologies field
If he is looking for ontologies about ontologies, there is also lightweight Vann ontology: http://vocab.org/vann/.html Best, Jiri Prochazka On 01/13/2010 02:11 PM, Aldo Gangemi wrote: Uh, this guy seems to look for ontologies that are *about* semantic web technologies, not for a repository of ontologies :) To my knowledge, the following models can help you: OMV [1] (Ontology Metadata Vocabulary) by UPM describes ontology metadata and it's used in services like Oyster for annotation and management of ontology repository C-ODO Light [2] has been developed by CNR STLab for the NeOn project; it describes ontology-design-related entities (ontologies, tools, design activities, etc.): it is a modular, pattern-based ontology network (see also [4] for ontology design patterns) Sweet Tools [3] by Mike Bergman has a minimal vocabulary but has basic data about hundreds of SW-related tools [1] http://omv.ontoware.org/2005/05/ontology [2] http://www.ontologydesignpatterns.org/cpont/codo/codolight.owl [3] http://www.mkbergman.com/new-version-sweet-tools-sem-web/ [4] http://www.ontologydesignpatterns.org Ciao Aldo On 13 Jan 2010, at 10:23, Michael Hausenblas wrote: Is there an ontology for the general field of semantic web technologies itself or a repository of ontologies? A repository not really, but the best overview I'm aware of is [1]. Cheers, Michael [1] http://esw.w3.org/topic/VocabularyMarket -- Dr. Michael Hausenblas LiDRC - Linked Data Research Centre DERI - Digital Enterprise Research Institute NUIG - National University of Ireland, Galway Ireland, Europe Tel. +353 91 495730 http://linkeddata.deri.ie/ http://sw-app.org/about.html From: ProjectParadigm-ICT-Program metadataport...@yahoo.com Date: Tue, 12 Jan 2010 19:40:19 - To: Semantic Web community semantic-...@w3.org, Linked Data community public-lod@w3.org Subject: Ontology for semantic web technologies field Resent-From: Linked Data community public-lod@w3.org Resent-Date: Tue, 12 Jan 2010 19:40:56 + Dear listers, Is there an ontology for the general field of semantic web technologies itself or a repository of ontologies? Milton Ponson GSM: +297 568 5908 Rainbow Warriors Core Foundation PO Box 1154, Oranjestad Aruba, Dutch Caribbean www.rainbowwarriors.net Project Paradigm: A structured approach to bringing the tools for sustainable development to all stakeholders worldwide www.projectparadigm.info EarthForge: Creating ICT tools for NGOs worldwide for Project Paradigm www.earthforge.info, www.developmentforge.info MetaPortal: providing online access to web sites and repositories of data and information for sustainable development www.metaportal.info This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. _ Aldo Gangemi Senior Researcher Semantic Technology Lab (STLab) Institute for Cognitive Science and Technology, National Research Council (ISTC-CNR) Via Nomentana 56, 00161, Roma, Italy Tel: +390644161535 Fax: +390644161513 aldo.gang...@cnr.it http://www.stlab.istc.cnr.it http://www.istc.cnr.it/createhtml.php?nbr=71 skype aldogangemi signature.asc Description: OpenPGP digital signature