Re: Think before you write Semantic Web crawlers

2011-06-22 Thread Jiří Procházka
I wonder, are ways to link RDF data so that convential crawlers do not
crawl it, but only the semantic web aware ones do?
I am not sure how the current practice of linking by link tag in the
html headers could cause this, but it may be case that those heavy loads
come from a crawlers having nothing to do with semantic web...
Maybe we should start linking to our rdf/xml, turtle, ntriples files and
publishing sitemap info in RDFa...

Best,
Jiri

On 06/22/2011 09:00 AM, Steve Harris wrote:
 While I don't agree with Andreas exactly that it's the site owners fault, 
 this is something that publishers of non-semantic data have to deal with.
 
 If you publish a large collection of interlinked data which looks interesting 
 to conventional crawlers and is expensive to generate, conventional web 
 crawlers will be all over it. The main difference is that a greater 
 percentage of those are written properly, to follow robots.txt and the 
 guidelines about hit frequency (maximum 1 request per second per domain, no 
 parallel crawling).
 
 Has someone published similar guidelines for semantic web crawlers?
 
 The ones that don't behave themselves get banned, either in robots.txt, or 
 explicitly by the server. 
 
 - Steve
 
 On 2011-06-22, at 06:07, Martin Hepp wrote:
 
 Hi Daniel,
 Thanks for the link! I will relay this to relevant site-owners.

 However, I still challenge Andreas' statement that the site-owners are to 
 blame for publishing large amounts of data on small servers.

 One can publish 10,000 PDF documents on a tiny server without being hit by 
 DoS-style crazy crawlers. Why should the same not hold if I publish RDF?

 But for sure, it is necessary to advise all publishers of large RDF datasets 
 to protect themselves against hungry crawlers and actual DoS attacks.

 Imagine if a large site was brought down by a botnet that is exploiting 
 Semantic Sitemap information for DoS attacks, focussing on the large dump 
 files. 
 This could end LOD experiments for that site.


 Best

 Martin


 On Jun 21, 2011, at 10:24 AM, Daniel Herzig wrote:


 Hi Martin,

 Have you tried to put a Squid [1]  as reverse proxy in front of your 
 servers and use delay pools [2] to catch hungry crawlers?

 Cheers,
 Daniel

 [1] http://www.squid-cache.org/
 [2] http://wiki.squid-cache.org/Features/DelayPools

 On 21.06.2011, at 09:49, Martin Hepp wrote:

 Hi all:

 For the third time in a few weeks, we had massive complaints from 
 site-owners that Semantic Web crawlers from Universities visited their 
 sites in a way close to a denial-of-service attack, i.e., crawling data 
 with maximum bandwidth in a parallelized approach.

 It's clear that a single, stupidly written crawler script, run from a 
 powerful University network, can quickly create terrible traffic load. 

 Many of the scripts we saw

 - ignored robots.txt,
 - ignored clear crawling speed limitations in robots.txt,
 - did not identify themselves properly in the HTTP request header or 
 lacked contact information therein, 
 - used no mechanisms at all for limiting the default crawling speed and 
 re-crawling delays.

 This irresponsible behavior can be the final reason for site-owners to say 
 farewell to academic/W3C-sponsored semantic technology.

 So please, please - advise all of your colleagues and students to NOT 
 write simple crawler scripts for the billion triples challenge or 
 whatsoever without familiarizing themselves with the state of the art in 
 friendly crawling.

 Best wishes

 Martin Hepp




 



signature.asc
Description: OpenPGP digital signature


Re: Schema.org in RDF ...

2011-06-12 Thread Jiří Procházka
On 06/12/2011 08:19 PM, Richard Cyganiak wrote:
 Hi Danny,
 
 On 12 Jun 2011, at 17:57, Danny Ayers wrote:
 We explicitly know the “expected types” of properties, and I'd like to keep 
 that information in a structured form rather than burying it in prose. As 
 far as I can see, rdfs:range is the closest available term in W3C's data 
 modeling toolkit, and it *is* correct as long as data publishers use the 
 terms with the “expected type.”

 I don't think it is that close to expected type
 
 I didn't say it's close to “expected type”. I said that we want to keep the 
 information in a structured form, and that rdfs:range is the closest 
 construct available in the W3C toolkit.

Hi,
Why not make a new property for such loose semantics (and make
rdfs:range subproperty of it)?
Surely we didn't go out of way to have great flexibility, compared to
controlled vocabularies, for nothing...

 #something :hasColour #wet .

 then we get

 #wet a :Colour .
 
 If you apply RDFS/OWL reasoning to broken data, you get more broken data. I 
 don't understand why anyone would be surprised by that.

I am surprised someone wants to publish broken data.

Best,
Jiri



signature.asc
Description: OpenPGP digital signature


Re: Semantics of rdfs:seeAlso (Was: Is it best practices to use a rdfs:seeAlso link to a potentially multimegabyte PDF?)

2011-01-13 Thread Jiří Procházka
On 01/13/2011 01:09 PM, Dave Reynolds wrote:
 On Thu, 2011-01-13 at 06:29 -0500, Tim Berners-Lee wrote:
 
 This is the Linked Open Data list.
 The Linked Data world is a well-defined bit of engineering.
 It has co-opted the rdf:seeAlso semantics of if you are looking up x load 
 y from the much 
 earlier FOAF work.  
 
 Where is this well-defined bit of engineering defined in such a way
 that makes that co-option clear? [*]
 
 Assuming a particular use of rdfs:seeAlso as a convention for some
 community (e.g. FOAF) that wants to adapt that particular pattern is
 just fine.
 
 Updating specs in the future to narrow the interpretation to support
 this assumption usage might be OK, so long as due process is followed,
 but that hasn't happened yet.
 
 Complaining when others go by the existing spec does not seem
 reasonable.
 
 The URI space is full of empty space waiting for you to define terms
 with whatever semantics you like for your own use.
 But one cant argue philosophically that for some reason 
 the URI rdfs:seeAlso should have some other meaning when people are using it 
 and 
 there have been specs.
 
 Those specs support Martin's usage, as his quotes from them clearly
 demonstrated.
 
 One *can* argue that the RDFS spec is definitive, and it is very loose in 
 its definition.
 
 Loose in the sense of allowing a range of values but as a specification
 it is unambiguous in this case, as Martin has already pointed out:
 
 When such representations may be retrieved, no constraints are placed
 on the format of those representations.
 
 We could look at maybe asking for an erratum to the spec
 to make it clear and introduce the other term int the same spec.
 
 Or mint a sub-property of rdfs:seeAlso which provides the additional
 constraints.
 
 Dave

+1

I also consider part of Linked Data that authoritative definition of a
term is the one obtained by dereferencing it, which in case of RDFS is
http://www.w3.org/2000/01/rdf-schema# .

Best,
Jiri
 [*] And yes, I'm well aware of [1] which does mention the foaf
 convention but it does so just as one convention in passing, there's no
 clear suggestion in there that tools should rely on that convention for
 arbitrary linked data.
 
 [1] http://www.w3.org/DesignIssues/LinkedData.html 
 
 
 



signature.asc
Description: OpenPGP digital signature


Re: Is it best practices to use a rdfs:seeAlso link to a potentially multimegabyte PDF?, existing predicate for linking to PDF?

2011-01-10 Thread Jiří Procházka
On 01/10/2011 01:45 PM, William Waites wrote:
 * [2011-01-10 08:55:59 +] Phil Archer phil.arc...@talis.com écrit:
 
 ] However... a property should not imply any content type AFAIAC. That's 
 ] the job of the HTTP Headers. If software de-references an rdfs:seeAlso 
 ] object and only expects RDF then it should have a suitable accept 
 ] header. if the server can't respond with that content type, there are 
 ] codes to handle that.
 
 I disagree that we should rely on HTTP headers for this.
 Consider local processing of a large multi-graph dataset.
 These kinds of properties can act as hints to process one
 graph or another without the need to dereference something.
 (tending to think of graph as equivalent to document 
 obtained by dereferencing the graph's name).
 
 Slightly more esoteric are graphs made available over 
 ftp, finger, freenet, etc.. Let's take advantage of HTTP
 where appropriate but not mix up the transport and 
 content unnecessariy.
 
 Cheers,
 -w

I agree, there is nothing wrong in having a subProperty which includes
more information, whether it be about the subject or object of the
triple, regardless if it's about content type or anything else.
I believe it is good practice to specify domain and range of property in
as precisely as possible. Failing to do so begs for usage which either
is wrong by the original intention or making the meaning of the property
very fuzzy, which in both cases results in less useful data.

Best,
Jiri



signature.asc
Description: OpenPGP digital signature


Re: Any reason for ontology reuse?

2010-12-04 Thread Jiří Procházka
I would like to add, that successful books are published translated to
other languages. So analogically one who wishes his ideas to be
proliferated the most, should publish them in as many ontologies as
possible.
The flaws of the analogy are that the differences between ontologies are
grow far bigger then differences in (western) languages, so the semantic
loss by translation may be big. Second thing is that currently rarely
anyone publishes the information which data is the original, and which
are derivative translations.
Hopefully with machine processable languages, the costs of translation
would be negligible compared to natural languages, but we should keep
records of the process, because it means possible decline of semantic
quality.

Best,
Jiri

On 12/04/2010 03:10 PM, Hugh Glaser wrote:
 This is really rather a fun reflection.
 I like Toby's analogy, but I think that it can usefully be improved.
 Instead of considering publishing in English, we are publishing in the 
 equivalent of natural language.
 So the different vocabularies might correspond better to the different NLs 
 around.
 If I am able to publish in English, with a significant smattering of German 
 or Latin for words that might be missing from English, then with a little 
 effort, someone can more easily understand what I am saying, especially given 
 zeitgeist, context, et cetera, usw.
 However i am probably best keeping to English if I can.
 On the other hand, spraying around lots of words from lots of different 
 vocabularies makes it much harder and fragile to understand than sticking to 
 one obscure one or even inventing my own, as it means the consumer needs to 
 go to lots of sources to work out what is meant.
 
 In fact, grabbing words from a bunch of different NLs is quite an easy, if 
 vulnerable, encoding mechanism.
 I have been know to write down four digit numbers using transliteration of 
 the numbers from different languages, as a mnemonic which would just be that 
 bit of a challenge to someone who stumbled on it.
 I guess that is one reason why I am not as averse to minting URIs as some 
 people.
 Cheers
 
 On 4 Dec 2010, at 13:07, Martin Hepp wrote:
 
 Simple rules:

 1. It is better to use an existing ontology than inventing your own.
 2. It is better to use the most popular existing ontology than a less 
 popular existing ontology.
 3. It is better to publish your data using your own ontology than not 
 publishing your data at all.
 4. It is better to use a good (*) private ontology for publishing your data 
 than using a messy private ontology.

 (*) A good ontology is one that preserves the largest share of the original 
 conceptual distinctions in your data, i.e. it does not require merging 
 entity types that are distinct in the original data, as long as this 
 distinction matters for potential data consumers.

 Whether option #1 is feasible depends on

 1. how much time and money you are willing into lifting / publishing your 
 data (that will be a matter of economic incentives).
 2. how complicated it is to populate that ontology based on the available 
 data and the local schemas.

 Best

 Martin

 On 04.12.2010, at 09:27, Toby Inkster wrote:

 On Fri, 3 Dec 2010 18:15:08 -0200
 Percy Enrique Rivera Salas privera.sa...@gmail.com wrote:

 I would like to know, which are the specific reason(s),
 for reuse terms from well-known vocabularies in the process of Publish
 Linked Data on the Web?

 Consider this question: I would like to know, which are the specific
 reason(s) for reusing well-known words in the process of publishing
 English text on the Web?

 Answer: When you're writing something in English, you should avoid
 inventing new words unless you're fairly sure that a word for the
 concept you're trying to describe does not exist. This is because if
 you invent a new word, you need to describe what it means for other
 people to be able to understand you. And even when you do that, you've
 increased the cognitive load for your readers.

 URIs are the vocabulary of linked data, just like words are the
 vocabulary of the English language. For analogous reasons, you should
 avoid minting new URIs when an existing URI will do. If you mint a new
 URI that means the same as an existing one, then not only do you have
 to go to the effort of documenting its meaning, but consumers have to
 perform extra work (such as subproperty/subclass inferencing) to
 understand it.

 -- 
 Toby A Inkster
 mailto:m...@tobyinkster.co.uk
 http://tobyinkster.co.uk



 
 martin hepp
 e-business  web science research group
 universitaet der bundeswehr muenchen

 e-mail:  h...@ebusiness-unibw.org
 phone:   +49-(0)89-6004-4217
 fax: +49-(0)89-6004-4620
 www: http://www.unibw.de/ebusiness/ (group)
 http://www.heppnetz.de/ (personal)
 skype:   mfhepp
 twitter: mfhepp

 Check out GoodRelations for E-Commerce on the Web of Linked Data!
 

Re: Is 303 really necessary?

2010-11-28 Thread Jiří Procházka


On 11/28/2010 02:52 PM, Giovanni Tummarello wrote:
 - the rest of the web continue to use 200

 Tim
 
 yes but the rest of the web will use 200 also to show what we would
 consider 208, e.g.
 
 http://www.rottentomatoes.com/celebrity/antonio_banderas/
 
 see the trilples
 http://inspector.sindice.com/inspect?url=http://www.rottentomatoes.com/celebrity/antonio_banderas/#TRIPLES
 
 http://www.rottentomatoes.com/celebrity/antonio_banderas/
 
 is clearly a web page but its also an actor, it is pointed by their
 graph in other pages as such and the same page contains the opengraph
 triple  type actor
 
 We should not get ourself in the position to have to try to evangelize
 all to change something for reasons that are really not apparent to
 your normal web world. I think the solution we should be seeking
 consider RDFa publishing via normal 200 code as the example above
 absolutely ok
 
 an agent would then be able to distinguish which properties apply to
 the page and which to the actor looking at the.. properties
 themselves i guess?  sad but possibly unavoidable?
 
 Giovanni

Hi,
I agree with this.
This problem is caused that Linked Data conflates identifiers with
locators - important is that one can get information about a unique
name, by using it as a locator. The issue whether some events in the
process or outcome of the information retrieval somehow should affect
users perception of the name (is it a document or xyz?) is a can of
worms most implementers don't want to tackle and they have a point. I
don't want to maintain all apps I once coded so they support whatever is
the latest HTTP semantics trend is, when there is a widely used standard
for extensible, *evolvable* information representation (RDF) which I am
already expecting to receive about the name I am retrieving info about.
So lets not presume that by dereferencing an URI and getting back a
document, the URI is the documents identifier - it is its locator. It
can be its identifier too, but lets leave that for publishers to decide
- that has been the point of my previous post on the topic (
http://lists.w3.org/Archives/Public/public-lod/2010Nov/0325.html )

Best regards,
Jiří Procházka



signature.asc
Description: OpenPGP digital signature


Simpler alternative of Linked Data semantics (was Re: Is 303 really necessary?)

2010-11-28 Thread Jiří Procházka
On 11/28/2010 06:45 PM, Kingsley Idehen wrote:
 On 11/28/10 9:46 AM, Jiří Procházka wrote:
snip
 This problem is caused that Linked Data conflates identifiers with
 locators - important is that one can get information about a unique
 name, by using it as a locator.
 
 Linked Data (meme or actual concept) doesn't conflate Locators with
 Identifiers. A URI is a generic Identifier. A URL (a Locator / Address)
 is an Identifier.
 
 The problem remains in not understanding the URI abstraction.
 
 One issue you can't tack on Linked Data is failure to distinguish
 between a Name Reference and an Address Reference implemented via
 elegance of URI abstraction.
 
   The issue whether some events in the
 process or outcome of the information retrieval somehow should affect
 users perception of the name (is it a document or xyz?) is a can of
 worms most implementers don't want to tackle and they have a point.
 
 It wasn't a can of worms before the Web. The issue of Resource in URI
 [1] has lead to overloading that creates the illusion you describe,
 across many quarters and their associated commentators.
 
   I
 don't want to maintain all apps I once coded so they support whatever is
 the latest HTTP semantics trend is, when there is a widely used standard
 for extensible, *evolvable* information representation (RDF) which I am
 already expecting to receive about the name I am retrieving info about.
 So lets not presume that by dereferencing an URI and getting back a
 document, the URI is the documents identifier - it is its locator.
 
 Yes, it's the URL of a Document, and if the content-type is one of the
 RDF formats, or any other syntax for representing EAV model structured
 data -- via hypermedia -- then its the URL of a Entity Descriptor
 Document -- a document that provides a full representation of its
 Subject via a Description expressed in a Graph Pictorial comprised of
 Attribute=Value pairs coalesced around Subject Name (an Resolvable
 Identifier e..g an HTTP URI).
 
 It
 can be its identifier too, but lets leave that for publishers to decide
 - that has been the point of my previous post on the topic (
 http://lists.w3.org/Archives/Public/public-lod/2010Nov/0325.html )
 
 If you mean, let the publisher decide via Content and Mime Type what
 this is about, then emphatic YES!!

That is the option which was promoted till now, but some people chose
not to oblige for whatever reason and judging by the amount of
discussions about it, it is a problem. If publisher makes available
structured data about some concept at an URI he probably means the URI
identifies the concept, not the data documents, and I think if one wants
to use that data, he needs to try to understand the publisher, not tell
him he is wrong because [insert XX pages of HTTP  URI semantics],
however flawed neglecting the standards you may consider to be - welcome
the Linked Data (tag^H^H^Hstatus-code-)soup.

I'm fond of RDFs take on URIs == names. What I mean is:
a)  letting publisher decide which name is for document and relations
between them, which for concepts. On dereferencing (200 returned), not
to think hmm yup, this definitely must be name for a document, but
hmm I dereferenced this URI and got back some document - the document
exists, but it's name isn't specified - like blank node, with similar
drawbacks, thus:
b)  letting the publisher decide that not via Content and Mime Type, but
in the structured data itself, because that is most probably what the
consumer will be able to parse and understand anyway and there exists a
well established standard Resource Description Framework for it which
fulfills more of this great document (
http://www.w3.org/DesignIssues/Evolution.html ) than HTTP, which isn't a
one ring to rule the all transport protocols. (Other data formats than
RDF should have equivalent ways of expressing that too.)

This practice doesn't need to be standardized. You and me can use it now
if we wish. It has both advantages, covering wider quality range of
linked data, and disadvantages - suddenly all documents with no data
about them have no names! Tragedy? No, we have their locators and if
there is something said about the locators, we can assume it is about
the document stored there, unless said otherwise by the publisher (this
is the difference from the standard Linked Data perspective).
This doesn't make ambiguity to go away, that is impossible since it
depends on the publisher, but I believe it is a simpler, more forward
compatible way to go around it, fitting more world-views.

Best,
Jiri

snip
 Links:
 
 1. http://lists.w3.org/Archives/Public/www-tag/2009Aug/.html --
 TimBL's own account re. origins of Resource in URI. This is the problem!!
 



signature.asc
Description: OpenPGP digital signature


Re: Simpler alternative of Linked Data semantics (was Re: Is 303 really necessary?)

2010-11-28 Thread Jiří Procházka
On 11/29/2010 12:48 AM, Kingsley Idehen wrote:
 On 11/28/10 3:52 PM, Jiří Procházka wrote:
 On 11/28/2010 06:45 PM, Kingsley Idehen wrote:
 On 11/28/10 9:46 AM, Jiří Procházka wrote:
 snip
 This problem is caused that Linked Data conflates identifiers with
 locators - important is that one can get information about a unique
 name, by using it as a locator.
 Linked Data (meme or actual concept) doesn't conflate Locators with
 Identifiers. A URI is a generic Identifier. A URL (a Locator / Address)
 is an Identifier.

 The problem remains in not understanding the URI abstraction.

 One issue you can't tack on Linked Data is failure to distinguish
 between a Name Reference and an Address Reference implemented via
 elegance of URI abstraction.

The issue whether some events in the
 process or outcome of the information retrieval somehow should affect
 users perception of the name (is it a document or xyz?) is a can of
 worms most implementers don't want to tackle and they have a point.
 It wasn't a can of worms before the Web. The issue of Resource in URI
 [1] has lead to overloading that creates the illusion you describe,
 across many quarters and their associated commentators.

I
 don't want to maintain all apps I once coded so they support
 whatever is
 the latest HTTP semantics trend is, when there is a widely used
 standard
 for extensible, *evolvable* information representation (RDF) which I am
 already expecting to receive about the name I am retrieving info about.
 So lets not presume that by dereferencing an URI and getting back a
 document, the URI is the documents identifier - it is its locator.
 Yes, it's the URL of a Document, and if the content-type is one of the
 RDF formats, or any other syntax for representing EAV model structured
 data -- via hypermedia -- then its the URL of a Entity Descriptor
 Document -- a document that provides a full representation of its
 Subject via a Description expressed in a Graph Pictorial comprised of
 Attribute=Value pairs coalesced around Subject Name (an Resolvable
 Identifier e..g an HTTP URI).

 It
 can be its identifier too, but lets leave that for publishers to decide
 - that has been the point of my previous post on the topic (
 http://lists.w3.org/Archives/Public/public-lod/2010Nov/0325.html )
 If you mean, let the publisher decide via Content and Mime Type what
 this is about, then emphatic YES!!
 That is the option which was promoted till now, but some people chose
 not to oblige for whatever reason and judging by the amount of
 discussions about it, it is a problem. If publisher makes available
 structured data about some concept at an URI he probably means the URI
 identifies the concept, not the data documents, and I think if one wants
 to use that data, he needs to try to understand the publisher, not tell
 him he is wrong because [insert XX pages of HTTP  URI semantics],
 however flawed neglecting the standards you may consider to be - welcome
 the Linked Data (tag^H^H^Hstatus-code-)soup.

 I'm fond of RDFs take on URIs == names. What I mean is:
 a)  letting publisher decide which name is for document and relations
 between them, which for concepts. On dereferencing (200 returned), not
 to think hmm yup, this definitely must be name for a document, but
 hmm I dereferenced this URI and got back some document - the document
 exists, but it's name isn't specified - like blank node, with similar
 drawbacks, thus:
 b)  letting the publisher decide that not via Content and Mime Type, but
 in the structured data itself, because that is most probably what the
 consumer will be able to parse and understand anyway and there exists a
 well established standard Resource Description Framework for it which
 fulfills more of this great document (
 http://www.w3.org/DesignIssues/Evolution.html ) than HTTP, which isn't a
 one ring to rule the all transport protocols. (Other data formats than
 RDF should have equivalent ways of expressing that too.)
 
 Yes-ish, but my point is this:
 
 1. Publisher (owner of Linked Data Server) serves up data via Documents
 at URLs
 2. Linked Data Client (agents) accesses data by exploiting content
 negotiation when de-referencing URIs (Name or Address)
 3. Publisher sends Document Content to client with metadata (HTTP
 response headers and/or within the content via triples or head/link/
 exploitation re. HTML) -- this is where Mime Type comes into play too
 4. Linked Data Client processes metadata and content en route to
 understanding what its received.
 
 This practice doesn't need to be standardized. You and me can use it now
 if we wish. It has both advantages, covering wider quality range of
 linked data, and disadvantages - suddenly all documents with no data
 about them have no names! Tragedy? No, we have their locators and if
 there is something said about the locators, we can assume it is about
 the document stored there, unless said otherwise by the publisher (this
 is the difference from the standard Linked Data

Re: Role of URI and HTTP in Linked Data

2010-11-10 Thread Jiří Procházka
On 11/10/2010 11:44 AM, Nathan wrote:
 Hi Jiří,
 
 Jiří Procházka wrote:
 Hi,
 having read all of the past week and still ongoing discussion about HTTP
 status codes, URIs and most importantly their meaning from Linked Data
 perspective, I want share my thoughts on this topic.

 I don't mean to downplay anyone's work but I think the role of URI and
 HTTP specifications (especially semantics) in Linked Data is
 overemphasized, which unnecessarily complicates things.
 
 The URI is what makes Linked Data, Linked Data, it's the only hook to
 the real world, and via the domain name system + domain registration
 process gives us a hook on accountability, which is critically
 important. 

I am by no means giving up these utilities by what I suggest.

 #bar, as described by http://example.com/foo resolves in
 two ways:
 (1) http://example.com/foo as a name for the literal description/graph
 (2) http://example.com/foo as a way of saying the author of the
 description available at http://example.com/foo, stated X, and was
 responsible as delegated by the owners of example.com, where X is (1)
 and provable by the HTTP messages and logs. A status code of 200 vs 303
 to some other domain or URI vs 4xx or 5xx plays a big part in that chain
 of accountability / validity / trust.

I don't think Linked Data consumers should *have to* care about what
status codes HTTP request returns - it shouldn't be part of the core
Linked Data semantics. Of course it can be beneficial for clients to
listen to them to get more information, but treating HTTP library as a
simple function should be allowed (either it returns data or not).
Whether someone 303s (nice verb) to a different domain, it obviously
means he trusts it to maintain the description of his URI.

 Also never forget that Linked Data is just Links with literals, a Link
 as in a hyperlink, its the description of a relationship between two
 things (names or literals) which make a link a link, thus each link is a
 statement, statements form descriptions, descriptions are literal
 things. Triples are statements, Graphs are descriptions.
 
 There's a lot more to the simple triple with http URIs than many
 realise, sure it makes a nice RDF data bus for us and gives us an almost
 universal data format, which we can exploit and bring to the fore via
 linked data, but that's just the tip of the iceberg, and ultimately of
 very little use without the URI and HTTP.
 
 a few notes..
 
 I think we can all agree, that the core idea of Linked Data is that
 information is expressed using unique identifiers (URIs) I can simply
 use to get useful information about the thing the identifier represents
 (thus mandated relatively simple, widely supported transfer protocol
 HTTP).
 
 as above, that's not the core of linked data, that's the surface.
 
 So lets stick with this. Lets just treat URIs as RDF does - as simple
 names. When we dereference an URI we get back some useful data and
 that's it.
 
 So, that'll be like mailto: or pop: or tel: then..

I don't follow here. I don't know of any standardized ways of getting
structured data out of such URIs.

 If we want to express, the data fetched are in fact a
 document, we use the wdrs:isDefinedBy property. The data fetched are
 just a data and any info about it should be contain in it.
 
 Expressing that the data fetched is infact a document, is indeed
 optional, but any response is always a message, a description, a
 /literal/ thing, you can't pretend it doesn't exist, it does - to say a
 description is anything other than that is like me saying you're an
 apple and insisting everybody believe me. Literals are self identifying,
 self naming, things.

I don't get what you mean here either. Are you talking about RDF
semantics here or general ontological philosophy? If you are talking
about RDF, then be aware that literals can have names - URIs assigned to
literals. If talking about the latter, then I don't get you at all.
I am advocating making Linked Data as simple as possible, avoiding
abstract ontological definitions (in which I count the notion of
literal). The fact that what you say is incomprehensible to me further
strengthens me in my opinion.

 Why? Why no Content-Location? There is no reason to require additional
 complexity, building extra information layers. Publishing the document
 information in the data itself most probably would be simpler for both
 the publishing and the consuming party. Treating HTTP as a simple
 blackbox is what is mostly done in practice anyway.
 
 Read only world then?

Not really, writing can be simple too, but we probably would want to
draw the line somewhere unless we want Linked Data to require an
universal RPC framework specification.

 What if someone doesn't publish the document data? Would it mean the URI
 we dereferenced refers both to the thing described and the description
 of it? Kind of.
 
 There is no kind of. The description is a literal thing all of it's own,
 it's the same thing regardless

Re: Role of URI and HTTP in Linked Data

2010-11-10 Thread Jiří Procházka
On 11/10/2010 11:26 PM, Nathan wrote:
 Jiří Procházka wrote:
 On 11/10/2010 11:44 AM, Nathan wrote:
 Hi Jiří,

 Jiří Procházka wrote:
 Hi,
 having read all of the past week and still ongoing discussion about
 HTTP
 status codes, URIs and most importantly their meaning from Linked Data
 perspective, I want share my thoughts on this topic.

 I don't mean to downplay anyone's work but I think the role of URI and
 HTTP specifications (especially semantics) in Linked Data is
 overemphasized, which unnecessarily complicates things.
 The URI is what makes Linked Data, Linked Data, it's the only hook to
 the real world, and via the domain name system + domain registration
 process gives us a hook on accountability, which is critically
 important. 

 I am by no means giving up these utilities by what I suggest.

 #bar, as described by http://example.com/foo resolves in
 two ways:
 (1) http://example.com/foo as a name for the literal description/graph
 (2) http://example.com/foo as a way of saying the author of the
 description available at http://example.com/foo, stated X, and was
 responsible as delegated by the owners of example.com, where X is (1)
 and provable by the HTTP messages and logs. A status code of 200 vs 303
 to some other domain or URI vs 4xx or 5xx plays a big part in that chain
 of accountability / validity / trust.

 I don't think Linked Data consumers should *have to* care about what
 status codes HTTP request returns - it shouldn't be part of the core
 Linked Data semantics. Of course it can be beneficial for clients to
 listen to them to get more information, but treating HTTP library as a
 simple function should be allowed (either it returns data or not).
 Whether someone 303s (nice verb) to a different domain, it obviously
 means he trusts it to maintain the description of his URI.
 
 snap, I don't think they should either, I also don't think they should
 have to constantly ask is this a document or a toucan? - it could all
 be so much easier.

I think you have missed the point of the second part of my original
email - I think it is flawed trying to enforce URI == 1 thing by some
system (especially if you want to maintain RDF as one of supported
structured data formats (I dare to say the major one)), as nothing can
be completely unambiguous (in RDF) - that is something the publisher
needs to keep in mind and work towards to.
Key is not inferring any information which would increase ambiguity
which my simple solution preserves and solves the is this a document or
a toucan? problem if the original data is unambiguous (if it isn't it's
not like the consumer can do anything about it anyway).

 ps: 303 doesn't day you'll find it here!, it says maybe you try here
 instead?
 
 So lets stick with this. Lets just treat URIs as RDF does - as simple
 names. When we dereference an URI we get back some useful data and
 that's it.
 So, that'll be like mailto: or pop: or tel: then..

 I don't follow here. I don't know of any standardized ways of getting
 structured data out of such URIs.
 
 That's the point, RDF treats all URIs the same, you're saying we should
 treat URIs as RDF does, as nothing more than a logical hook - doesn't do
 us much good practically when we want to dereference and get back some
 useful data.

It is useful - I don't advocate using any other URIs then HTTP with
Linked Data, because with HTTP URIs we use the HyperText *Transfer*
Protocol which gets us some useful data, without having to cut up the URIs.

Best,
Jiri



signature.asc
Description: OpenPGP digital signature


Role of URI and HTTP in Linked Data

2010-11-09 Thread Jiří Procházka
Hi,
having read all of the past week and still ongoing discussion about HTTP
status codes, URIs and most importantly their meaning from Linked Data
perspective, I want share my thoughts on this topic.

I don't mean to downplay anyone's work but I think the role of URI and
HTTP specifications (especially semantics) in Linked Data is
overemphasized, which unnecessarily complicates things.
I think we can all agree, that the core idea of Linked Data is that
information is expressed using unique identifiers (URIs) I can simply
use to get useful information about the thing the identifier represents
(thus mandated relatively simple, widely supported transfer protocol HTTP).

So lets stick with this. Lets just treat URIs as RDF does - as simple
names. When we dereference an URI we get back some useful data and
that's it. If we want to express, the data fetched are in fact a
document, we use the wdrs:isDefinedBy property. The data fetched are
just a data and any info about it should be contain in it.
Why? Why no Content-Location? There is no reason to require additional
complexity, building extra information layers. Publishing the document
information in the data itself most probably would be simpler for both
the publishing and the consuming party. Treating HTTP as a simple
blackbox is what is mostly done in practice anyway.

What if someone doesn't publish the document data? Would it mean the URI
we dereferenced refers both to the thing described and the description
of it? Kind of. What I mean is the consumer side can add additional
information to the data about the document (when and how fast it was
fetched etc) and if the data doesn't contain info about the document
already, it could add it:
  uri wdrs:isDefinedBy [ wdsr:location uri ] . # or something like this
Non-RDF data should use their equivalents.
That is the most important things I had to say - lets keep semantics in
the data.

I believe it is quite important that the range of wdrs:isDefinedBy is a
document class, which should be domain of wdsr:location.
I am going to explain why I think so, but beware, at this point I get a
bit philosophical :)

What is pretty awesome about RDF, which is something Linked Data could
learn, is how it dabbled the ontological (used as philosophical term)
issues - existence, being and reality. In order to support maximum
expressiveness and compatibility with various world-views it says the
least about it. Big part of that is dealing with identity - if a
caterpillar turns into butterfly, is it still the same thing? Am I still
I when I get older and change? RDF doesn't offer any answers to such
questions, neither if there are only information resources and other
resources. There are just names which identify objects or concepts,
which we describe with names and the final description matches some
number of objects or concepts we know, while the better the description
is, the lower the number is.

RDFS classes are used to describe various aspects of objects or
concepts, which allow us to express ourselves much less ambiguously,
using properties with defined domain and range. On the other hand we can
describe those aspects separately if we consider them a separate entity.
For example someone can say I am averagely skilled as an English
speaker, or that my English skill is mediocre, or that I am one of
averagely skilled English speakers. Similarly one could say book is
long 3 characters as its content, or that book is long 20
characters as its title, or that book is long 3000 characters as the
description received on dereferencing. It shouldn't matter if I consider
a book name as part of it or not, if I use as unambiguously defined
properties as possible. However vocabularies with not very well defined
terms (consider an example length property), which generally mimic
natural language properties, are used widely, which is why we should
have wdrs:isDefinedBy.
The point of this philosophical exercise was to say, that shouldn't be
saying an URI represents one resource or trying to define what
resources are or what existence is, but recognizing the context of the
original information when modifying it (especially amending).

Best,
Jiri Prochazka

PS: It might be useful to also have wdrs:isPrimarilyDefinedBy (as
rdfs:subPropertyOf wdrs:isDefinedBy).




signature.asc
Description: OpenPGP digital signature


Re: AW: ANN: LOD Cloud - Statistics and compliance with best practices

2010-10-21 Thread Jiří Procházka
Hi everyone,
I think it is important not to forget that semantic web goal of creating
a unified model for information exchange in decentralized heterogeneous
network of systems, aiming for the lowest common denominator, implies
many requirements for data quality will not be met, because simply for
various people they are different. It is a matter of paradigm - way of
working with the data, so it should come as no surprise various groups
of alike thinking people define their requirements, especially in area
of discoverability.

I find it quite surprising that no more of such standards as Linked Data
and LOD exist. Perhaps once more of them exist, community tracking and
comparison to be included in semantic web introduction materials would
help proliferation of more accurate image of semantic web...

Of course it would be great if information about complying data of such
initiatives would be generated by automated tools (no Submit URL
please), as example the application of the data discoverability
algorithm they endorse (not sure if LD has something like this -
follow-your-nose?), if discoverability is in their focus.

Best,
Jiri Prochazka

On 10/21/2010 09:23 PM, Enrico Motta wrote:
 Chris
 
 I strongly agree with the points made by Martin and Giovanni.  Of course
 the LOD initiative has had a lot of positive impact and you cannot be
 blamed for being successful, but at the some time I am worried that teh
 success and visibility of the LOD cloud is having some rather serious
 negative consequences. Specifically:
 
 1) lots of people, even within the SW community, now routinely describe
 the LOD as the 'semantic web'.  This is not only dramatically incorrect
 (and bad for students and people who want to know about the SW) but also
 an obstacle to progress: anything which is not in the LOD diagram does
 not exist, and this is really not good for the SW community as a whole
 (including the people at the centre of the LOD initiative).  Even worse,
 in the past 12-18 months  I have noticed that this viewpoint has also
 been embraced by funding bodies and linking to LOD is becoming a
 necessary condition for a SW project. Again, I think this is undesirable
 - see also Martin's email on this thread.
 
 2) Because the LOD is perceived as the 'official SW' and because
 resources in the LOD have to comply with a number of guidelines, people
 also assume that LOD resources exhibit higher quality. Unfortunately in
 our experience this is not really the case, and this also generates
 negative consequences. That is, if LOD is the 'official high quality SW
 ' and there are so many issues with the data, automatically people
 assume that the rest of the SW is a lot worse, even though this is not
 necessarily the case.
 
 So, as other people have already said, maybe it is time to re-examine
 teh design criteria for LOD and the way this is presented?  For
 instance, it would be beneficial to the community if LOD were to focus
 more on quality issues, rather than linking for the sake of linking. 
 And in addition, a less static approach to listing resources could
 improve the visibility of so much more stuff out there.
 
 
 Enrico
 
 PS
 
 
 I agree with you that it would be much better, if somebody would set up a
 crawler, properly crawl the Web of Data and then provide a catalog
 about all
 datasets.
 
 Actually this is exactly what our Watson system does, see
 http://watson.kmi.open.ac.uk
 
 
 
 At 13:12 +0100 21/10/10, Giovanni Tummarello wrote:
   But again: I agree that crawling the Web of Data and then deriving
 a dataset
  catalog as well as meta-data about the datasets directly from the
 crawled
  data would be clearly preferable and would also scale way better.

  Thus: Could please somebody start a crawler and build such a catalog?

  As long as nobody does this, I will keep on using CKAN.


 Hi Chris, all

 I can only restate that within Sindice we're very open to anyone who
 wanted to develop data anlisys apps creating catalogs automatically.
 At the moment a map reduce job a couple of week ago gave an excess of
 100k independent datasets. How many interlinked etc? to be analyzed.

 Our interest (and the interest of the Semantic Web vision i want to
 sposor) is to make sure RDFa sites are fully included and so are those
 who provide markup which can however be translated in an
 automatic/agreeable way (so no scraping or sponging) into RDF. (that
 is anything that any23.org can turn into triples)

 If you were indeed interested in running your or developing your
 algorithms in our running dataset no problem, the code can be made
 opensource so it would run on others similarly structured datasets.

 This said yes i think too that in this phase a CKAN like repository
 can be an interesting aggregation point, why not.

  But i do think the diagram, which made great sense as an example when
 Richard started it is now at risk of providing a disservice
 which is in line which what Martin is making noticed.

 The diagram 

Re: RDF Extensibility

2010-07-07 Thread Jiří Procházka
On 07/06/2010 11:05 PM, Pat Hayes wrote:
 
 On Jul 6, 2010, at 9:34 AM, Jiří Procházka wrote:

 [snipped]

 In case of a) I don't have cleared up my thoughts yet, but generally I
 would like to know:
 How are semantic extensions to work together in automated system?
 
 Well, the semantics always defines some notion of entailment, and your
 system is supposed to respect that notion: not draw invalid conclusions,
 draw as many valid conclusions as you feel are useful, don't say things
 are inconsistent when they aren't, etc.. Otherwise, you have free rein.
 So, if you have several semantic extensions, they are each provide a set
 of such entailments and they should add up to one single set of legal
 entailments.
 
 How to let agent know that the data is described using new RDF
 extension, which the client doesn't know and the data could be (or
 definitely are) false if it is interpreted using vanilla RDF semantics?
 
 NOt false, if its a semantic extension (they can't contradict the RDF
 semantics., only extend it.) BUt same point more generally: how do we
 know, given some RDF, what semantic extensions are appropriately to be
 used when interpreting it? That is a VERY good question. This is
 something that RDF2 could most usefully tackle, if only in a first-step
 (ham-fisted?) kind of a way. We were aware that this was an issue in the
 first WG, but it was just too far outside out charter, and our energy
 level, to tackle properly. One obvious (?) thing to say is that using a
 construction from a namespace which is associated with the definition of
 any RDF semantic extension is deemed to bring along the necessary
 interpretation conditions from the extension, so that for example if I
 use owl:sameAs in some RDF, then I mean it to be understood using the
 OWL semantic conditions. We all do this without remarking upon it, but
 loosely, and to make this precise and normative would be a very
 interesting (and useful) exercise. (An issue already here is, which
 version of the OWL semantics is intended? Does the use in RDF also
 import the OWL-DL syntactic restrictions on its use, for example?)

That is indeed what I had in mind. I think sooner or later this has to
be dealt with, and I think the sooner the better...

I don't think the namespace thing is obvious, since I don't think there
is a concept of namespace defined in RDF. It is just some graph
containing some terms related to a semantic extension of RDF. How does
the processing application know? Which entailment rules are to be
applied to the graph? How should the patterns triggering application of
a rule be defined? Having multiple rulesets, in what order and how to
apply them? What about rules modifying rules and rulesets? How to define
interpretation of graphs (which rulesets to apply, which to ignore)? Is
a graph and entailment rules everything what is used in interpretation
according to a semantic extension or are there also some attributes like
graph consistency (if so, how do pass them on? as added triples?)?
These are just questions I am pulling off top of my head...

Best,
Jiri

 Pat
 
 

 b) How should my system know that the data which is just being processed
 is new revision of RDF/XML and not malformed RDF/XML when forward
 compatibility was out of sight, out of mind when RDF/XML was designed?

 Best,
 Jiri Prochazka

 
 
 IHMC (850)434 8903 or (650)494 3973
 40 South Alcaniz St.   (850)202 4416   office
 Pensacola(850)202 4440   fax
 FL 32502  (850)291 0667   mobile
 phayesAT-SIGNihmc.us   http://www.ihmc.us/users/phayes
 
 
 
 
 



signature.asc
Description: OpenPGP digital signature


RDF Extensibility

2010-07-06 Thread Jiří Procházka
On 07/06/2010 03:35 PM, Toby Inkster wrote:
 On Tue, 6 Jul 2010 14:03:19 +0200
 Michael Schneider schn...@fzi.de wrote:
 
 So, if 

 :s lit :o .

 must not have a semantic meaning, what about

 lit rdf:type rdf:Property .

 ? As, according to what you say above, you are willing to allow for
 literals in subject position, this triple is fine for you
 syntactically. But what about its meaning? Would this also be
 officially defined to have no meaning?
 
 It would have a meaning. It would just be a false statement. The
 same as the following is a false statement:
 
   foaf:Person a rdf:Property .

Why do you think so?
I believe it is valid RDF and even valid under RDFS semantic extension.
Maybe OWL says something about disjointness of RDF properties and classes
URI can be many things.

I think there are issues about RDF extensibility which haven't been
solved and they concern:
a) semantics
b) serializations

In case of a) I don't have cleared up my thoughts yet, but generally I
would like to know:
How are semantic extensions to work together in automated system?
How to let agent know that the data is described using new RDF
extension, which the client doesn't know and the data could be (or
definitely are) false if it is interpreted using vanilla RDF semantics?

b) How should my system know that the data which is just being processed
is new revision of RDF/XML and not malformed RDF/XML when forward
compatibility was out of sight, out of mind when RDF/XML was designed?

Best,
Jiri Prochazka



signature.asc
Description: OpenPGP digital signature


Re: Show me the money - (was Subjects as Literals)

2010-07-01 Thread Jiří Procházka
On 07/01/2010 09:11 PM, Henry Story wrote:
 
 Social Web Architect
 http://bblfish.net/
 
 On 1 Jul 2010, at 21:03, Tim Finin wrote:
 
 On 7/1/10 2:51 PM, Henry Story wrote:
 ...
 So just as a matter of interest, imagine a new syntax came along that 
 allowed literals in
 subject position, could you not write a serialiser for it that turned
123 length 3 .
 Into
  _:b owl:sameAs 123;
  length 3.
 ?
 So that really you'd have to do no work at all?
 Just wondering

 Isn't owl:sameAs defined to be a relation between two
 URI references?  
 
 Not sure.

It is, this won't work under OWL DL... In OWL Full if I think it will.
I asked about this recently on this list...

 In any case I suppose it would be simple to crete such an identity relation. 
 
 Even if not, it is symmetric and
 would have the above imply {123 owl:sameAs _:b .}
 
 It does indeed imply that, though you can't write it out like that 
 in most serialisations, other than N3.
 
 And being able to write it out, makes it easy to explain what symmetry means.
 
 I think people keep confusing syntax and semantics for some reason, even on
 the semantic web.
 
 Henry



signature.asc
Description: OpenPGP digital signature


Re: Subjects as Literals, [was Re: The Ordered List Ontology]

2010-06-30 Thread Jiří Procházka
On 06/30/2010 09:09 PM, Pat Hayes wrote:
 
 On Jun 30, 2010, at 11:50 AM, Nathan wrote:
 
 Pat Hayes wrote:
 On Jun 30, 2010, at 6:45 AM, Toby Inkster wrote:
 On Wed, 30 Jun 2010 10:54:20 +0100
 Dan Brickley dan...@danbri.org wrote:
 That said, i'm sure sameAs and differentIndividual (or however it is
 called) claims could probably make a mess, if added or removed...

 You can create some pretty awesome messes even without OWL:

# An rdf:List that loops around...

#mylist a rdf:List ;
rdf:first #Alice ;
rdf:next #mylist .

# A looping, branching mess...

#anotherlist a rdf:List ;
rdf:first #anotherlist ;
rdf:next #anotherlist .

 They might be messy, but they are *possible* structures using
 pointers, which is what the RDF vocabulary describes.  Its just about
 impossible to guarantee that messes can't happen when all you are
 doing is describing structures in an open-world setting. But I think
 the cure is to stop thinking that possible-messes are a problem to be
 solved. So, there is dung in the road. Walk round it.

 Could we also apply that to the 'subjects as literals' general
 discussion that's going on then?

 For example I've heard people saying that it encourages bad 'linked
 data' practise by using examples like { 'London' a x:Place } - whereas
 I'd immediately counter with { x:London a 'Place' }.

 Surely all of the subjects as literals arguments can be countered with
 'walk round it', and further good practise could be aided by a few
 simple notes on best practise for linked data etc.
 
 I wholly agree. Allowing literals in subject position in RDF is a
 no-brainer. (BTW, it would also immediately solve the 'bugs in the RDF
 rules' problem.) These arguments against it are nonsensical. The REAL
 argument against it is that it will mess up OWL-DL, or at any rate it
 *might* mess up OWL-DL.

I wonder, when using owl:sameAs or related, to name literals to be
able to say other useful thing about them in normal triples (datatype,
language, etc) does it break OWL DL (or any other formalism which is
base of some ontology extending RDF semantics)? Or would it if
rdf:sameAs was introduced?

Best,
Jiri

 
 The Description Logic police are still in charge:-)
 
 Pat
 
 
 

 Best,

 Nathan


 
 
 IHMC (850)434 8903 or (650)494 3973
 40 South Alcaniz St.   (850)202 4416   office
 Pensacola(850)202 4440   fax
 FL 32502  (850)291 0667   mobile
 phayesAT-SIGNihmc.us   http://www.ihmc.us/users/phayes
 
 
 
 
 
 



signature.asc
Description: OpenPGP digital signature


Re: Inclusion of additional (non dereferencable) data?

2010-06-10 Thread Jiří Procházka
Hi Nathan,

origin of this pattern is from RDF - graph is just a piece of paper
where you can write any statements...

But I understand - your view of Linked Data is much different,
introducing more complex notions of things and their descriptions in
which it is desirable, after getting a description, to identify the role
which statements relating other things play.

I see that someone could want to express statements which relate to the
described thing - make sense from the point of view of its description
but not being in the description received by dereferencing URIs of the
subject/object of the statement or even contradicting it, perhaps maybe
when in doubt of their future.

Other than that I see it bad practice, because of the reasons you
explained (out of date possibility being the most important).
I believe Linked Data encourages dereferencing.
What Linked Data brings into the Semantic Web puzzle is a recommended
algorithm how to get data (follow your nose?) and maybe that is the
thing you really are looking for...

I wonder if there is a need for vocabulary allowing people describe
recommendations of URIs to surely dereference, or perhaps measurement of
URI importance in the graph, feeding a more complex algorithm...

Regards,
Jiri Prochazka


On 06/10/2010 05:24 PM, Nathan wrote:
 All,
 
 Here's a common example of what I'm referring to, suppose we have a
 (foaf) document http://ex.org/bobsmith which includes the following
 triples:
 
   :me foaf:knows http://example.org/joe_bloggs#me .
 
   http://example.org/joe_bloggs#me a foaf:Person ;
 foaf:name Joe Bloggs@en .
 
 In Linked Data terms one could suggest that the description of Joe
 Bloggs doesn't 'belong' in this document (although clearly it can be here).
 
 I can quite easily see how trend came about, there are benefits, it's
 both an optimisation method (saves dereferencing) and it's an inclusion
 of human presentable information (which aids display / comprehension in
 'foaf viewers').
 
 However, there are drawbacks too, the data could easily go out of date /
 out of sync, it's not dereferencable (the adverse effects in this
 example aren't specifically clear, but in other use-cases they are
 considerable).
 
 Over and above these simple thoughts, I'm quite sure that there are
 bigger architectural and best practise considerations (for a web of
 data), for example:
 
  - does this create an environment where we are encouraged not to
 deference linked data (or where it is common to look local first)
 
  - does this point to bigger issues such as not having a single global
 predicate for a default human presentable 'name' for all things that can
 be 'named' (given a URI) - even though many candidates are available.
 
  - should 'reading ahead' (dereferencing all linked data before
 presentation to a user / trying to glean an understanding) be encouraged
 over providing a limited local subset of the data which could easily be
 inaccurate or out of date.
 
  - is there an gut instinct in the community that most data will
 ultimately end up being presented to a human somewhere along the line,
 and this is driving us to make such design decisions.
 
 Any thoughts or strong feelings on the issue(s)? and is anybody aware of
 whether this practise came about more by accident than by design?
 
 Best,
 
 Nathan
 



signature.asc
Description: OpenPGP digital signature


Re: Fwd: backronym proposal: Universal Resource Linker

2010-04-18 Thread Jiří Procházka
Why 'URL' when it is pretty clearly defined and still significant
portion of web users don't understand it.

I'd rather embrace 'web address' - even non-tech users would understand
that.

Best,
Jiri Prochazka

On 04/18/2010 12:18 PM, Dan Brickley wrote:
 So - I'm serious. The term 'URI' has never really worked as something
 most Web users encounter and understand.
 
 For RDF, SemWeb and linked data efforts, this is a problem as our data
 model is built around URIs.
 
 If 'URL' can be brought back from limbo as a credible technical term,
 and rebranded around the concept of 'linkage', I think it'll go a long
 way towards explaining what we're up to with RDF.
 
 Thoughts?
 
 Dan
 
 
 -- Forwarded message --
 From: Dan Brickley dan...@danbri.org
 Date: Sun, Apr 18, 2010 at 11:52 AM
 Subject: backronym proposal: Universal Resource Linker
 To: u...@w3.org
 Cc: Tim Berners-Lee ti...@w3.org
 
 
 I'll keep this short. The official term for Web identifiers, URI,
 isn't widely known or understood. The I18N-friendly variant IRI
 confuses many (are we all supposed to migrate to use it; or just in
 our specs?), while the most widely used, understood and (for many)
 easiest to pronounce, 'URL' (for Uniform Resource Locator) has been
 relegated to 'archaic form' status. At the slightest provocation this
 community dissapears down the rathole of URI-versus-URN, and until
 this all settles down we are left with an uncomfortable disconnect
 between how those in-the-know talk about Web identifiers, and those
 many others who merely use it.
 
 As of yesterday, I've been asked but what is a URI? one too many
 times. I propose a simple-minded fix: restore 'URL' as the most
 general term for Web identifiers, and re-interpret 'URL' as Universal
 Resource Linker. Most people won't care, but if they investigate,
 they'll find out about the re-naming. This approach avoids URN vs URI
 kinds of distinction, scores 2 out of 3 for use of intelligible words,
 and is equally appropriate to classic browser/HTML, SemWeb and other
 technical uses. What's not to like? The Web is all about links, and
 urls are how we make them...
 
 cheers,
 
 Dan
 



signature.asc
Description: OpenPGP digital signature


Re: Comments on Data 3.0 manifesto

2010-04-17 Thread Jiří Procházka
So essentially, all this is a cover-up maneuver to sell RDF to the
people masked as something else, more familiar?

If so, I understand why you feel this is necessary, after all the goal
is not to sell the customer what he asked for, but what he really wanted
but didn't realize or could fully express (this time customer being tech
folk).
Anyway I rather use and try to market RDF as it is, maybe it's a bit too
fast for some, but I guess I've left too little people in utter
confusion yet to try so different ways :)

But before proceeding with your plan to fix RDF + Linked Data marketing,
I ask you to consider also what in marketing RDF was done right beside
what wasn't.

For example RDF has clear name (Data 3.0? not very good name IMHO), the
core model is very simple and has been numerously very well explained.
On the other hand your manifesto sounds a bit too complex, more like a
spec than a manifesto. For the effect I think you are aiming you need
something very simple and striking...
Not to mention it is first time I am hearing about EAV model, we all are
from different backgrounds so this terminology won't have much of an
impact I fear, though it is still good to introduce yet distant
communities ;)

For me greatest value of RDF and Linked Data lies in semantics - the
ontologies (RDFS/OWL), which, as far as I understand it, the EAV model
doesn't touch at all which in my eyes makes it only a bit better than
tabular data models (rectangular as someone nicely coined some time
ago somewhere).

Overall it seems to me like building a sand island in middle of a wide
river to ease construction of bridges across it... I guess you have
tried building a bridge without the island a few times and it collapsed
every time, so I understand why you are building the island. But maybe I
got better steel and mine bridges would last... maybe...
On one hand I am glad we try these various ways and on the other I keep
myself asking if the gain outweighs the price of fragmentation...

Best,
Jiri Prochazka

On 04/17/2010 10:51 PM, Kingsley Idehen wrote:
 John Erickson wrote:
 Hi Kingsley!

 Reading between the lines, I think I grok where you are trying to go
 with your manifesto. For it to be an effective, stand-alone document
 I think a few pieces are needed:

 1. What is your GOAL? It should be clearly stated, something like, to
 promote best-practices for standards-compliant access to structured
 data object (or entity) descriptors by getting data architects to do X
 instead of Y, etc.
   
 
 Okay, I'll see what I can do.
 
 This document is really a continuation of a document that's actually
 missing from the Web, sadly.
 
 A long time ago (start of Web 2.0), there was a Data 2.0 manifesto by
 Alex James (now at Microsoft), so in classic two-fer fashion I've opted
 to kill two birds with a single stone:
 
 1. Linked Data incomprehension (Technical and Political)
 
 2. Data 2.0 manifesto upgrade and update.
 
 2. What is your MOTIVATION? I think this is implicit in your current
 text --- your argument seems to be that TBL's Four Principles are
 not enough --- but you need to make your motivations explicit and
 JUSTIFY them. If TBL's principles are too nebulous, explain concisely
 why and what the implications are. Keep in mind that they seem to be
 good enough for many practitioners today. ;)
   
 My motivation is simply this: Get RDF out of the way!
 The RDF incomprehension cloud is only second to what's heading across
 Northern Europe from Iceland, re. obscuring a myriad of routes to Linked
 Data comprehension.
 
 How can we spend 12+ years on the basic issue of EAV + de-referencable
 identifiers? Compounded by poor monikers such as: Information Resource
 and Non-Information Resource. We have Data Objects (Entities, Data Items
 etc.) and their associated Descriptor Documents (Representation Carriers
 or Senses), its always been so!
 
 Note,  RDF the Data Model doesn't exist in the minds of the broader
 Web audience (I am not sending an inbound meme to the Semantic Web
 Community, my meme is being beamed to a wider audience that's taking way
 to long to grok the essence of the Linked Data matter).
 
 I (and many others) are utterly fed up with trying to accentuate the
 fact that RDF is based on a Graph Data Model. The initial RDF/XML is
 RDF conflation has dealt a fatal blow to RDF re., broad audience
 communications.
 
 EAV has been with us forever, people already use applications that are
 based on this model, across all major operating systems. Why not
 triangulate from this position (top down) instead of bottom up (which
 ultimately reeks of NIH rather than a Cool Tweak)?
 
 3. Be SPECIFIC about what practitioners must do moving forward. I
 think you've made a good start on this, to the extent that you have
 lots of SHOULDS. I would argue that more specificity of a different
 kind is needed; if data architects SHOULD be following more abstract
 EAV conceptualizations, what exactly should they do in practice?
   
 
 

Re: [Patterns] Materialize Inferences (was Re: Triple materialization at publisher level)

2010-04-10 Thread Jiří Procházka
SWRL has much nicer RDF representation than RIF, so that might also be
an alternative, though the expressiveness might vary.

Best,
Jiri Prochazka

On 04/08/2010 10:16 AM, Ivan Herman wrote:
 
 On Apr 7, 2010, at 14:45 , Dan Brickley wrote:
 
 A guideline might be:

 As typing information become broader and more inclusive, they also
 become less informative: to know that something is a Thing is rarely
 useful. It is difficult to say whether a class is at a useful level of
 specificity without taking into account other datasets, tools and
 services that use it, however an intuitive grasp of mid-level
 concepts often provides useful guidance. In addition, Linked Data apps
 have a particular concern for cross-referencing information about
 specific things, it is therefore often useful to include inferred
 identifiers (owl:sameAs etc) based on analysis of properties
 (owl:FunctionalProperty, owl:InverseFunctionalProperty) etc

 Ok that's not very friendly text but hope it might be useful.
 Basically rdf:type owl:Thing is boring, but owl:sameAs x:anotherID
 is very useful...

 
 I am a little bit concerned by the open-endedness of this. As an information 
 consumer I would like to have at least some information or hints as for which 
 inferences are materialized and which are not.
 
 As a thought experiment: what about providing a set of RIF (Core) rules that 
 describe which inferences are materialized? It is possible to express RDFS as 
 well as OWL-RL via RIF rules and, what is even more important in this 
 context, any subsets thereof. Human clients may look at those rules and, with 
 little training, may understand what is happening for the simpler cases; 
 after all, many publisher will decide to use 2-3 rules only (eg, subproperty 
 and subclass inferences). Machine clients may even choose to instantiate the 
 inferences themselves with some local rule engine if their CPU/bandwidth 
 ratio makes that more attractive.
 
 I know, the current RIF syntax is not all that beautiful (but I would hope 
 that alternative syntaxes will come to the fore, mainly if the demand is 
 there) and I am not sure whether rule engines, bound to RDF environments 
 (like Jena Rules or Fuxi) already implement RIF Core (although I believe/hope 
 they would). But that seems to be a possible way to go nevertheless...
 
 Ivan
 
 (To avoid misunderstanding: with his W3C Position's hat down:-)
 
 
 cheers,

 Dan

 
 
 
 Ivan Herman, W3C Semantic Web Activity Lead
 Home: http://www.w3.org/People/Ivan/
 mobile: +31-641044153
 PGP Key: http://www.ivan-herman.net/pgpkey.html
 FOAF: http://www.ivan-herman.net/foaf.rdf
 
 
 
 
 



signature.asc
Description: OpenPGP digital signature


Re: ontology mapping etiquette (was What is the class of a Named Graph?)

2010-02-23 Thread Jiří Procházka
On 02/22/2010 09:44 PM, Richard Cyganiak wrote:
 On 22 Feb 2010, at 19:36, Jiří Procházka wrote:
 I wonder if we as a group of people
 interested in Semantic Web could come up with etiquette for ontology
 mapping.

 Interesting topic! My €0.02: If the other vocabulary is likely to be

 - more stable
 - more mature
 - more likely to be widely used
 - more likely to be around for a longer time

 then you should map your terms to it. If not, don't.

 So IMO the rdfg vocabulary should map to the SPARQL Service Description
 vocabulary as soon as it becomes REC, but SPARQL-SD should NOT map to
 rdfg.

 Hi Richard, that also seems reasonable to me at first, but when thinking
 about it more thoroughly, there is value in both ontologies doing the
 mapping to the other.
 
 Yes, if both sides agree, then two-way mappings are great. But this is
 only realistic if both vocabularies rate about equally on the criteria
 above. As an extreme example, it would be totally unrealistic to expect
 the RDFS vocabulary to link back to every vocabulary that has some sort
 of label/name property (all of which should be subproperties of
 rdfs:label).

Yes, I had in mind especially equivalentClass/Property relations and
alike where it doesn't scale much, not subproperties.

 snip
 Certain mapping statements make sense from PoV of one ontology, but not
 the other.
 
 I don't know what you mean. An example might help. But anyway, if you
 map to my ontology, but from my POV that mapping doesn't make sense,
 then I'm certainly not going to map back to yours.

I mean when the philosophies of the creators of the ontologies aren't
mutually compatible. I'm unable to come up with some example but just
let's say that someday we will have religious ontologies...

 snip
 If we allow ourselves to go a bit further, I thought it would be great
 if there was some community developed service which would in automated
 fashion give advice for improvement and rate user submitted (better yet
 WoD collected) ontologies judging their quality of design - most
 importantly re-usability which basically means how is it aligned to
 other similar ontologies. This would be probably very difficult, at
 least because of varying opinions on this... I guess database community
 has something to say about that.
 
 I think that's a different issue. When it comes to rating the “quality”
 of a vocabulary, then the amount of mappings to other vocabularies is a
 very minor factor. First, because other things (especially amount of
 uptake and strength of the surrounding community) are much more
 important. Second, because adding the mappings is so easy. No vocabulary
 will succeed or fail because of its inclusion or lack of mappings.

Strength of community and amount of uptake matters really a lot, but
next thing you are interested in is how an ontology is compatible with
the rest of your knowledge - how good the mappings are and if it has
mappings to its competitors, because the they can have mappings to
other ontologies you have not (and how they are good).

 Nevertheless, I agree that we need services that support us in finding
 high-quality vocabularies, and that help drive the improvement of
 existing ones. But it's a complex subject, there are many existing
 efforts (Watson, Talis Schema Cache, Falcons Concept Search,
 ontologydesignpatterns.org, and I probably missed a few), and to me it's
 not obvious what is the right approach.
 
 Perhaps we don't need better ways of finding and creating vocabularies,
 but better ways of finding and creating communities around a domain that
 can then jointly agree on a vocabulary.

Great point! I would love to see some development in this area...

 All the best,
 Richard
 
 

 There are more things to talk about regarding this, but this is what I
 have in mind so far.

 Best,
 Jiri

 Best,
 Richard



 Best,
 Jiri


 Hope that helps.

 thanks,
 .greg


 [1] http://www.w3.org/TR/sparql11-service-description/#id41794




 



signature.asc
Description: OpenPGP digital signature


ontology mapping etiquette (was What is the class of a Named Graph?)

2010-02-22 Thread Jiří Procházka
On 02/22/2010 01:53 PM, Richard Cyganiak wrote:
 Jiri,
 
 On 22 Feb 2010, at 10:51, Jiří Procházka wrote:
 I wonder if we as a group of people
 interested in Semantic Web could come up with etiquette for ontology
 mapping.
 
 Interesting topic! My €0.02: If the other vocabulary is likely to be
 
 - more stable
 - more mature
 - more likely to be widely used
 - more likely to be around for a longer time
 
 then you should map your terms to it. If not, don't.
 
 So IMO the rdfg vocabulary should map to the SPARQL Service Description
 vocabulary as soon as it becomes REC, but SPARQL-SD should NOT map to rdfg.

Hi Richard, that also seems reasonable to me at first, but when thinking
about it more thoroughly, there is value in both ontologies doing the
mapping to the other. Danbri recently touched on this on IRC in relation
to reciprocal WebID owl:sameAs relations. What one source says in RDF is
what it considers true, which in our case would also mean the mapping
makes sense from the point of view of both ontologies if reciprocal. So
I would advocate doing reciprocal mappings, if they can agree on the
common mapping. This brings another issue...

Certain mapping statements make sense from PoV of one ontology, but not
the other. Should it be dropped and just have the both-sides-approved
mapping? I'm in favour of publishing it on just with the ontology for
which it makes sense. Dumping it would encourage
one-big-federated-web-ontology which is nice dream but not what I
believe is suitable for the real world and web (thanks to its
relativistic nature).

If we allow ourselves to go a bit further, I thought it would be great
if there was some community developed service which would in automated
fashion give advice for improvement and rate user submitted (better yet
WoD collected) ontologies judging their quality of design - most
importantly re-usability which basically means how is it aligned to
other similar ontologies. This would be probably very difficult, at
least because of varying opinions on this... I guess database community
has something to say about that.

There are more things to talk about regarding this, but this is what I
have in mind so far.

Best,
Jiri

 Best,
 Richard
 
 

 Best,
 Jiri


 Hope that helps.

 thanks,
 .greg


 [1] http://www.w3.org/TR/sparql11-service-description/#id41794


 



signature.asc
Description: OpenPGP digital signature


Re: What is the class of a Named Graph?

2010-02-21 Thread Jiří Procházka
What you pointed at is a property sd:namedGraph. The upcoming SPARQL
standard doesn't define any class for named graphs.
I support using: http://www.w3.org/2004/03/trix/rdfg-1/Graph

Best,
Jiri

On 02/21/2010 10:40 AM, Michael Hausenblas wrote:
 
 Nathan,
 
 Any further input before I start using rdfg-1:Graph when describing graphs?
 
 I'd suggest you forget about both  references and go with the upcoming
 SPARQL standard [1].
 
 Cheers,
   Michael
 
 [1] 
 http://www.w3.org/TR/2010/WD-sparql11-service-description-20100126/#id41744
 



signature.asc
Description: OpenPGP digital signature


Re: Ontology for semantic web technologies field

2010-01-13 Thread Jiří Procházka
If he is looking for ontologies about ontologies, there is also
lightweight Vann ontology:

http://vocab.org/vann/.html

Best,
Jiri Prochazka

On 01/13/2010 02:11 PM, Aldo Gangemi wrote:
 Uh, this guy seems to look for ontologies that are *about* semantic web
 technologies, not for a repository of ontologies :)
 To my knowledge, the following models can help you:
 
 OMV [1] (Ontology Metadata Vocabulary) by UPM describes ontology
 metadata and it's used in services like Oyster for annotation and
 management of ontology repository
 
 C-ODO Light [2] has been developed by CNR STLab for the NeOn project; it
 describes ontology-design-related entities (ontologies, tools, design
 activities, etc.): it is a modular, pattern-based ontology network (see
 also [4] for ontology design patterns)
 
 Sweet Tools [3] by Mike Bergman has a minimal vocabulary but has basic
 data about hundreds of SW-related tools
 
 [1] http://omv.ontoware.org/2005/05/ontology
 [2] http://www.ontologydesignpatterns.org/cpont/codo/codolight.owl
 [3] http://www.mkbergman.com/new-version-sweet-tools-sem-web/
 [4] http://www.ontologydesignpatterns.org
 
 Ciao
 Aldo
 
 On 13 Jan 2010, at 10:23, Michael Hausenblas wrote:
 

 Is there an ontology for the general field of semantic web
 technologies itself
 or a repository of ontologies?

 A repository not really, but the best overview I'm aware of is [1].

 Cheers,
  Michael

 [1] http://esw.w3.org/topic/VocabularyMarket

 -- 
 Dr. Michael Hausenblas
 LiDRC - Linked Data Research Centre
 DERI - Digital Enterprise Research Institute
 NUIG - National University of Ireland, Galway
 Ireland, Europe
 Tel. +353 91 495730
 http://linkeddata.deri.ie/
 http://sw-app.org/about.html



 From: ProjectParadigm-ICT-Program metadataport...@yahoo.com
 Date: Tue, 12 Jan 2010 19:40:19 -
 To: Semantic Web community semantic-...@w3.org, Linked Data community
 public-lod@w3.org
 Subject: Ontology for semantic web technologies field
 Resent-From: Linked Data community public-lod@w3.org
 Resent-Date: Tue, 12 Jan 2010 19:40:56 +

 Dear listers,

 Is there an ontology for the general field of semantic web
 technologies itself
 or a repository of ontologies?

 Milton Ponson
 GSM: +297 568 5908
 Rainbow Warriors Core Foundation
 PO Box 1154, Oranjestad
 Aruba, Dutch Caribbean
 www.rainbowwarriors.net
 Project Paradigm: A structured approach to bringing the tools for
 sustainable
 development to all stakeholders worldwide
 www.projectparadigm.info
 EarthForge: Creating ICT tools for NGOs worldwide for Project Paradigm
 www.earthforge.info, www.developmentforge.info
 MetaPortal: providing online access to web sites and repositories of
 data and
 information for sustainable development
 www.metaportal.info

 This email and any files transmitted with it are confidential and
 intended
 solely for the use of the individual or entity to whom they are
 addressed. If
 you have received this email in error please notify the system
 manager. This
 message contains confidential information and is intended only for the
 individual named. If you are not the named addressee you should not
 disseminate, distribute or copy this e-mail.



 
 
 
 _
 
 Aldo Gangemi
 
 Senior Researcher
 Semantic Technology Lab (STLab)
 Institute for Cognitive Science and Technology,
 National Research Council (ISTC-CNR)
 Via Nomentana 56, 00161, Roma, Italy
 Tel: +390644161535
 Fax: +390644161513
 aldo.gang...@cnr.it
 http://www.stlab.istc.cnr.it
 http://www.istc.cnr.it/createhtml.php?nbr=71
 skype aldogangemi
 
 



signature.asc
Description: OpenPGP digital signature