from:"William Waites"

Re: SPARQL results in RDF

2013-09-21 Thread William Waites

Hi Hugh,

You can get results in RDF if you use CONSTRUCT -- which is basically
a special case of SELECT that returns 3-tuples and uses set semantics
(does not allow duplicates), but I imagine that you are aware of this.

Returning RDF for SELECT where the result set consists in n-tuples
where n != 3 is difficult because there is no direct way to represent
it. 

Also problematic is that there *is* a concept of order in SPARQL query
results while there is not with RDF.

Also the use of bag semantics allowing duplicates which also does not
really work with RDF.

These, again, could be kludged with reification, but that is not very
elegant. 

So most SELECT results are not directly representable in RDF.

Cheers,
-w

Linked Prolog

2013-06-20 Thread William Waites

On Thu, 20 Jun 2013 01:28:54 +, エリクソン　トーレ t-eriks...@so.taisho.co.jp said:

 ex:distance ex:earth ex:moon 381550 25150 u:km.

 (Ab)using RDF I was able to (barely) document my semantics
 directly in turtle.  Where is the semantics and syntax of your
 example described? Your data might be linked, but as a
 prospective consumer of it I'm feeling a bit lost :-)

I just made it up. But not out of thin air. It's basically a prolog
assertion with a turtle-esque surface syntax -- that's why the
predicate comes first, so you can have arbitrary arity. The
definitions of ex:distance (and ex:moon, ex:earth, u:km) could be
obtained by dereferencing those URIs in the same way it's done with
RDF.

In fact it takes the two best ideas from RDF -- URIs as identifiers
for real-world things, and the mechanism of dereferencing these URIs
to get more information. Pace assertions about a normative meaning of
Linked Data from the RDF WG (of which I am a member), I think these
two ideas are the essence of Linked Data.

I'm not seriously advocating this right now, it's just an example or
thought experiment to answer your question and there's too much sunk
investment in RDF for such a radical change. In fact if we were making
radical changes, thinking about lambda expressions might be better
than doing it this way. Maybe for RDF 3.0...

-w

Re: Proof: Linked Data does not require RDF

2013-06-19 Thread William Waites

On Tue, 18 Jun 2013 23:32:42 +, エリクソン　トーレ t-eriks...@so.taisho.co.jp said:

 I would be interested in seeing some linked data that is
 incompatible with RDF while still adhering to rules like using
 global identifiers and typed links.


@prefix ex: http://example.org/
@prefix u: http://example.org/units

ex:distance ex:earth ex:moon 381550 25150 u:km.

This relation has a typed link (ex:distance) between two
non-informational resources (ex:earth, ex:moon). It has a distance
that has units as well as a datatype, and a +/- uncertainty thrown in
for good measure. I could even imagine the ex:distance predicate to be
self-describing in the usual way, defining its arity and the meaning
and type of its arguments.

I think this can quite sensibly be called Linked Data and whilst with
sufficient contortions (reification, abuse of datatypes, perhaps
anonymous or parametrised predicates) it can be shoehorned into RDF,
it really doesn't happen naturally or obviously enough that it could
be called compatible in my opinion.

Happy hacking,
-w

Re: Content negotiation for Turtle files

2013-02-06 Thread William Waites

On Wed, 06 Feb 2013 11:45:10 +, Richard Light rich...@light.demon.co.uk 
said:

 In a web development context, JSON would probably come second
 for me as a practical proposition, in that it ties in nicely
 with widely-supported javascript utilities.

If it were up to me, XML with all the pointy brackets that make my
eyes bleed would be deprecated everywhere. Most or all modern
programming languages have good support for JSON, the web browsers do
natively as well, and it's much easier to work with since it mostly
maps directly to built-in datatypes.

 To me, Turtle is symptomatic of a world in which people are
 still writing far too many Linked Data examples and resources by
 hand, and want something that is easier to hand-write than
 RDF/XML.  I don't really see how that fits in with the promotion
 of the idea of machine-processible web-based data.

Kind of agree. Turtle is a relic of trying to make a machine readable
quasi-prose representation of data, which is suitable for both
machines and people. But it's not general enough -- you can only use
it to write RDF, which means you need specialised tools. It's
saddening because (especially with some of the N3 enhancements) it's
quite an elegant approach.

Cheers,
-w

WebID vs. JSON (Was: Re: Think before you write Semantic Web crawlers)

2011-06-22 Thread William Waites

What does WebID have to do with JSON? They're somehow representative
of two competing trends.

The RDF/JSON, JSON-LD, etc. work is supposed to be about making it
easier to work with RDF for your average programmer, to remove the
need for complex parsers, etc. and generally to lower the barriers.

The WebID arrangement is about raising barriers. Not intended to be
the same kind of barriers, certainly the intent isn't to make
programmer's lives more difficult, rather to provide a good way to do
distributed authentication without falling into the traps of PKI and
such.

While I like WebID, and I think it is very elegant, the fact is that I
can use just about any HTTP client to retrieve a document whereas to
get rdf processing clients, agents, whatever, to do it will require
quite a lot of work [1]. This is one reason why, for example, 4store's
arrangement of /sparql/ for read operations and /data/ and /update/
for write operations is *so* much easier to work with than Virtuoso's
OAuth and WebID arrangement - I can just restrict access using all of
the normal tools like apache, nginx, squid, etc..

So in the end we have some work being done to address the perception
that RDF is difficult to work with and on the other hand a suggestion
of widespread putting in place of authentication infrastructure which,
whilst obviously filling a need, stands to make working with the data
behind it more difficult.

How do we balance these two tendencies?

[1] examples of non-WebID aware clients: rapper / rasqal, python
rdflib, curl, the javascript engine in my web browser that doesn't
properly support client certificates, etc.
-- 
William Waitesmailto:w...@styx.org
http://river.styx.org/ww/sip:w...@styx.org
F4B3 39BF E775 CF42 0BAB  3DF0 BE40 A6DF B06F FD45

Re: WebID vs. JSON (Was: Re: Think before you write Semantic Web crawlers)

2011-06-22 Thread William Waites

* [2011-06-22 16:00:49 +0100] Kingsley Idehen kide...@openlinksw.com écrit:

] explain to me how the convention you espouse enables me confine access 
] to a SPARQL endpoint for:
] 
] A person identified by URI based Name (WebID) that a member of a 
] foaf:Group (which also has its own WebID).

This is not a use case I encounter much. Usually I have some
application code that needs write access to the store and some public
code (maybe javascript in a browser, maybe some program run by a third
party) that needs read access.

If the answer is to teach my application code about WebID, it's going
to be a hard sell because really I want to be working on other things
than protocol plumbing.

If you then go further and say that *all* access to the endpoint needs
to use WebID because of resource-management issues, then every client
now needs to do a bunch of things that end with shaving a yak before
they can even start on working on whatever they were meant to be
working on.

On the other hand, arranging things so that access control can be done
by existing tools without burdening the clients is a lot easier, if
less general. And easier is what we want working with RDF to be.

Cheers,
-w

-- 
William Waitesmailto:w...@styx.org
http://river.styx.org/ww/sip:w...@styx.org
F4B3 39BF E775 CF42 0BAB  3DF0 BE40 A6DF B06F FD45

Re: Squaring the HTTP-range-14 circle [was Re: Schema.org in RDF ...]

2011-06-15 Thread William Waites

* [2011-06-14 08:55:09 -0700] Pat Hayes pha...@ihmc.us écrit:

] Well, you have got me confused. Are you saying here that it does
] in fact make sense to say that a description of the eiffel tower
] is 356M tall? 

I'm just saying that things like this will be published because the
publisher is confused, or mistaken or doesn't think that making the
distinction is important or convenient and consumers of the data have
to deal with it.

We should encourage the publishers to do a better job but some of them
will balk and sometimes, like with the schema.org that started this
thread, big, important publishers with a lot of influence will balk.
If we're lucky we can convince them to fix it, otherwise writers of
software that consumes the data and tries to reason with it have to
work out a way to be robust in the face of this kind of ambiguity.

That's all.

-w
-- 
William Waitesmailto:w...@styx.org
http://river.styx.org/ww/sip:w...@styx.org
F4B3 39BF E775 CF42 0BAB  3DF0 BE40 A6DF B06F FD45

Re: Schema.org in RDF ...

2011-06-09 Thread William Waites

* [2011-06-07 09:22:01 +0100] Michael Hausenblas michael.hausenb...@deri.org 
écrit:

] Something I don't understand. If I read well all savvy discussions  
] so far, publishers behind http://schema.org URIs are unlikely to  
] ever provide any RDF description,
] 
] What makes you so sure about that not one day in the (near?) future  
] the Schema.org URIs will serve RDF or JSON, FWIW, additionally to  
] HTML? ;)

I suspect the prevailing view within Google is that autoneg is not
used in the real world. For example,

 https://groups.google.com/group/golang-nuts/msg/b882b153a3acd58e

(Brad was the creator of LiveJournal, now at Google, and I don't
think his view expressed there is uncommon).

So perhaps not the *near* future. Some other arrangement that does
not use the Accept header (and that seems to mean different URIs
then) is probably more likely, but this is just a guess.

Cheers,
-w
-- 
William Waitesmailto:w...@styx.org
http://river.styx.org/ww/sip:w...@styx.org
F4B3 39BF E775 CF42 0BAB  3DF0 BE40 A6DF B06F FD45

implied datasets

2011-05-23 Thread William Waites

This is the RDF version of the question I just sent to the CKAN list
[1]. It is somewhat a policy question and I believe that in RDF terms
the open world means the answer is basically, yes you can say what
you want.

Consider the diagram here,

  http://semantic.ckan.net/group/?group=http://ckan.net/group/lld

this is interconnections between library datasets. You'll notice there
is a partition. This partition is not really there.

Here's why. In library world, perhaps more than elsewhere, it is
common to do things like this,

http://example.org/issn/1234-5678 a bibo:Jornal;
blah blah blah some descriptions;
owl:sameAs urn:issn:1234-5678.

This is because there are standard identifiers for lots of things that
are found in libraries and they even have a urn namespace. So it is a
lot easier when publishing this data than to go out and use something
like silk to try to find links. They're already implied by the
identifiers we have in hand.

So given two such datasets, they are indeed connected in the way we
think of RDF datasets as being connected, not necessarily with
semantics as strict as owl:sameAs - we would probably not choose to
actually materialise its productions here especially since the
entities might be modelled in different, incompatible ways, and the
owl:sameAs is really not the right predicate to be using, but at least
connected with semantics along the lines of rdfs:seeAlso. The point
is, the two datasets are transitively connected.

But because we have no extant dataset that contains all the ISSNs,
particularly all ISSNs where the identifier is expressed as a urn:
URI, we have nothing to put in our voiD linkset -- which is how the
relationships between these datasets are represented at a high
level. So we have an apparent partition.

What I propose to do here, is invent an implied dataset, the one that
contains in principle the entire list of ISSNs. Something like,

urn:issn:- a rdf:Resource.
urn:issn:-0001 a rdf:Resource.
...

but which actually should contain X a rdf:Resource for everything in
the valid lexical space of urn:issn, which may be (countably) infinite
for all I know.

Then for each dataset that I have that uses the links to this space, I
count them up and make a linkset pointing at this imaginary dataset.

Obviously the same strategy for anywhere there exist some kind of
standard identifiers that are not URIs in HTTP.

Does this make sense?

Can we sensibly talk about and even assert the existence of a dataset
of infinite size? (whatever existence means).

Is this an abuse of DCat/voiD?

Are this class of datasets subsets of sameAs.org (assuming sameAs.org
to be complete in principle?)

Cheers,
-w

[1] http://lists.okfn.org/pipermail/ckan-discuss/2011-May/001269.html
-- 
William Waitesmailto:w...@styx.org
http://river.styx.org/ww/sip:w...@styx.org
F4B3 39BF E775 CF42 0BAB  3DF0 BE40 A6DF B06F FD45

Re: implied datasets

2011-05-23 Thread William Waites

* [2011-05-23 11:34:56 -0400] glenn mcdonald gl...@furia.com écrit:

] It seems to me that this is another demonstration of confusion that wouldn't
] happen if we all understood RDF IDs to be pure identifiers that belong to
] the graph representation of a dataset and nothing else. ISSN numbers are not
] graph-node IDs, they are real-world conceptual identifiers like social
] security numbers or SKUs or country codes. Many different data-structure
] might reference them in very different ways, so it should be fairly clear
] that they cannot uniquely identify anything but themselves, and thus they
] should themselves be represented in RDF as nodes. So the above should be
] more like:

Hi Glenn,

That may be so but it misses the point. The point is there is a field,
be it a URI or a literal however modelled, that can be used to join
between two datasets. This join field is hidden in that there exists
no (known) dataset that contains all possible values it can take on.

So you have a situation when you are trying to describe datasets where
you can say that DS1 and DS2 are indirectly linked and you want to
make that link explicit so that you can put it on diagrams ans such.

Saying,

  DS1 indirectlyLinkedTo DS2

is no good because then you get O(n^2) such statements which makes
your visualisation messy and furthermore you don't know without
examining them that they have any common values on the join field so
they may not actually be linked except in a degenerate sense.

Inventing a dataset that contains only the join field lets you say
something useful and coherent about the relationship between DS1 and
DS2.

There is nothing in this that requires the datasets themselves to be
RDF. See my other post to ckan-discuss on the same topic expressed in
terms of the relationships between CSV files.

Cheers,
-w
-- 
William Waitesmailto:w...@styx.org
http://river.styx.org/ww/sip:w...@styx.org
F4B3 39BF E775 CF42 0BAB  3DF0 BE40 A6DF B06F FD45

Re: implied datasets

2011-05-23 Thread William Waites

* [2011-05-23 14:46:47 +0100] Leigh Dodds leigh.do...@talis.com écrit:

] I'm not sure that the dataset is imaginary, but what you're doing
] seems eminently sensible to me. I've been working on a little project
] that I hope to release shortly that aims to facilitate this kind of
] linking, especially where those non-URI identifiers, or Literal Keys
] [1] are used to build patterned URIs.

The thing is, as with Hugh's suggestion, as a curator of datasets I
have little control or influence over how the dataset authors choose
to do this. I have noticed a common pattern though (urn:issn for
example) and encouraging patterns like this is helpful I think.

] It may be more natural to thing of these more as services though than
] datasets. i.e. a service that accepts some keys as input and returns a
] set of assertions. In this case the assertions would be links to other
] datasets.

This is a bit different. I was thinking of an implied dataset that 
would have no links outwards at all. 

] Subsets if they only asserted sameAs links, but I think you're
] suggesting that this may be too strict. I think there's potentially a
] whole set of related predicate based services [2] that provide
] useful indexes of existing datasets, or expose additional annotations
] of extra sources.

So this would be a separation of edge-labelled graphs into a bunch
of perhaps more manageable basic (V,E) graphs. An interesting way
of chopping things up.

The reason I think sameAs is too strict, aside from people putting
sameAs when they really mean similarTo, can be shown by another
library example. Broadly there seem to be two strategies for
representing things like books, the flat BIBO style and the more
elaborate FRBR/WEMI style. So if I have two datasets, one in each,
I might have something like,

ds1:flc a bibo:Book;
  dc:title The Feynman Lectures on Computation;
  dc:creator [ foaf:name Richard Feynman ];
  dc:language eng;
  owl:sameAs urn:isbn:0738202967.

ds2:flc a frbr:Manifestation;
  frbr:manifestationOf [
a frbr:Expression;
dc:language en;
frbr:expressionOf [
   a frbr:Work;
   dc:title The Feynman Lectures on Computation;
   dc:creator [ foaf:name Richard Feynman ]
]
  ];
  owl:sameAs urn:isbn:0738202967.

Both the authors have done something prima facie reasonable with the
sameAs but if you actually run it transitively you get into trouble.

This also goes to what Glenn was saying. These datasets are obviously
related in a meaningful way, there may well be useful ways for someone
who studies them to draw links between them but it isn't as simple as
saying they both have things of the same type. In fact what type
assertions are appropriate to clarify the relationship between these
datasets is the type of analysis that I would want to facilitate, not
try to do up front. What I can say is they both have references (that
may or may not be strictly believable) to this funny
non-dereferenceable URI (or equivalently, string literal of a certain
kind).

Cheers,
-w

-- 
William Waitesmailto:w...@styx.org
http://river.styx.org/ww/sip:w...@styx.org
F4B3 39BF E775 CF42 0BAB  3DF0 BE40 A6DF B06F FD45

Re: See UK

2011-05-21 Thread William Waites

* [2011-05-21 14:59:26 +0300] Denny Vrandecic denny.vrande...@kit.edu écrit:

] Very impressive!

Yes indeed!

] How much of rewriting would be needed to use data from other countries?

Sadly the data seems to stop at Hadrian's wall. Perhaps this
should properly be See England (and possibly Wales) :(

Cheers,
-w
-- 
William Waitesmailto:w...@styx.org
http://river.styx.org/ww/sip:w...@styx.org
F4B3 39BF E775 CF42 0BAB  3DF0 BE40 A6DF B06F FD45

ANN: Semantic CKAN - Revisited

2011-04-17 Thread William Waites

A new version of RDF infrastructure for CKAN has now been deployed at

http://semantic.ckan.net/

this represents a complete reimplementation and re-thinking of how to
manage a distributed set of metadata catalogues.

Features:

* Aggregation of all known CKAN instances (if there are any
  missing, please let me know and I'll add them)
* Search and filtering across all data sources
* Dataset metadata represented using DCat and, if available, voiD
* Retrieval of RDF representations using content-type negotiation.
* Where voiD information is known, navigable visualisations of
  the datasets and their neighbours, likewise for curated groups.
* CKAN-compatible API for using existing CKAN tools to inspect the
  aggregated data from one place

Some interesting pages to look at:

  DBpedia
http://semantic.ckan.net/record/dcc6715c-bf94-4a89-bbf3-35933da795a5.html
  Linked Library Data
http://semantic.ckan.net/group/?group=http://ckan.net/group/lld

-- 
William Waitesmailto:w...@styx.org
http://river.styx.org/ww/sip:w...@styx.org
F4B3 39BF E775 CF42 0BAB  3DF0 BE40 A6DF B06F FD45

Re: ANN: Semantic CKAN - Revisited

2011-04-17 Thread William Waites

* [2011-04-17 19:58:28 +0200] Pierre-Yves Vandenbussche 
py.vandenbuss...@gmail.com écrit:

] Nice work William,

Thank you Pierre-Yves.

] Do you plan to add navigation links beetween ckan.net and semantic.ckan.net?

From semantic.ckan.net there are links back to *.ckan.net on each
catalog record page. On ckan.net (and the others once they are
upgraded) there are links to the various RDF representations and
ckan.net will also content-type negotiate and redirect to
semantic.ckan.net but there is no link to the HTML page and
visualisations - I leave that to the discretion of the other CKAN
developers.

There are some corner cases where this is not true - these old CKAN
instances http://tinyurl.com/5skavxk don't give back a URI for the
dataset in their API calls and the result is that we end up with these
datasets being blank nodes. Hopefully they will be upgraded soon.

There is also no particular way to find out who is responsible for the
maintenance of a CKAN site and what version of the software it is
running. In practice it usually ends up being OKFN staff so sending a
message to the ckan-discuss list is likely to alert the right person,
but this is not a very scalable situation. There is a ticket to add an
API call to CKAN to address this.

Cheers,
-w
-- 
William Waitesmailto:w...@styx.org
http://river.styx.org/ww/sip:w...@styx.org
F4B3 39BF E775 CF42 0BAB  3DF0 BE40 A6DF B06F FD45

Re: LOD Cloud Cache Stats

2011-04-05 Thread William Waites

So I don't have answers to your questions, but do have some
observations about the results, particularly the counts of
distinct predicates.

The top one is rdf:type which makes sense. Below that we 
have ones used in reification. Who knew there was actually 
that much reified data out there? I wonder where this comes
from and what about the consensus that this is not a good
idea and should be deprecated?

SELECT DISTINCT ?graph, COUNT(?s) AS ?count WHERE {
GRAPH ?graph { ?s ?p http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement 
}
} ORDER BY DESC(?count) LIMIT 50

This query times out, but it would be interesting to know
the answer, who is the source of all of these reifications?

Next is rdfs:label, ok, fine. After that, a sizeable chunk
of data has to do with rows and columns in CSV tables that
comes from data.gov. How is a mechanical transliteration
from CSV to RDF without any modelling useful? It just makes
the data a couple of orders of magnitude bigger and a few
more orders of magnitude more cumbersome to deal with. I
mean, being able to refer to a specific spreadsheet cell is
useful but how does actually materialising all of them do
anything but take up disk space and slow down queries?

Cheers,
-w
-- 
William Waitesmailto:w...@styx.org
http://river.styx.org/ww/sip:w...@styx.org
F4B3 39BF E775 CF42 0BAB  3DF0 BE40 A6DF B06F FD45

Re: Introducing Vocabularies of a Friend (VOAF)

2011-01-25 Thread William Waites

* [2011-01-25 11:21:45 -0500] Kingsley Idehen kide...@openlinksw.com écrit:

] Hmm. Is it the Name or Description that's important?
] 
] But what about discerning meaning from the VOAF graph?

Humans looking at documents and trying to understand a system
do so in a very different way from machines. While what you
suggest might be strictly true according to the way RDF and
formal logic work, it isn't the way humans work (otherwise 
the strong AI project of the past half-century might have 
succeeded by now). So we should try arrange things in a way
that is both consistent with what the machines want and as
easy as possible for humans to understand. That Hugh, an
expert in the domain, had trouble figuring it out due to
poetic references to well known concepts suggests that there
is some room for improvement.

Cheers,
-w
-- 
William Waitesmailto:w...@styx.org
http://eris.okfn.org/ww/ sip:w...@styx.org
F4B3 39BF E775 CF42 0BAB  3DF0 BE40 A6DF B06F FD45

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-20 Thread William Waites

* [2011-01-20 14:29:35 +] Nathan nat...@webr3.org écrit:

]   RDF Publishers MUST perform Case Normalization and Percent-Encoding 
] Normalization on all URIs prior to publishing. When using relative URIs 
] publishers SHOULD include a well defined base using a serialization 
] specific mechanism. Publishers are advised to perform additional 
] normalization steps as specified by URI (RFC 3986) where possible.
] 
]   RDF Consumers MAY normalize URIs they encounter and SHOULD perform 
] Case Normalization and Percent-Encoding Normalization.
] 
]   Two RDF URIs are equal if and only if they compare as equal, 
] character by character, as Unicode strings.
] 
] For many reasons it would be good to solve this at the publishing phase, 
] allow normalization at the consuming phase (can't be precluded as 
] intermediary components may normalize), and keep simple case sensitive 
] string comparison throughout the stack and specs (so implementations 
] remain simple and fast.)
] 
] Does anybody find the above disagreeable?


Sounds about right to me, but what about port numbers,
http://example.org/ vs http://example.org:80/?

-w

-- 
William Waitesmailto:w...@styx.org
http://eris.okfn.org/ww/ sip:w...@styx.org
F4B3 39BF E775 CF42 0BAB  3DF0 BE40 A6DF B06F FD45

Re: URI Comparisons: RFC 2616 vs. RDF

2011-01-19 Thread William Waites

* [2011-01-19 11:11:20 -0500] Kingsley Idehen kide...@openlinksw.com écrit:

] On 1/19/11 10:59 AM, Nathan wrote:
] htTp://lists.W3.org/Archives/Public/public-lod/2011Jan/ - Personally 
] I'd hope that any statements made using these URIs (asserted by man or 
] machine) would remain valid regardless of the (incorrect?-)casing. 
]
] Okay for Data Source Address Ref. (URL), no good for Entity (Data Item 
] or Data Object) Name Ref., bar system specific handling via IFP property 
] or owl:sameAs :-)

FWIW I've just added a FuXi builtin for the curate tool [1]
that does URI comparisons using ll.uri [2] (deliberately
pushing the choice of place on the ladder into a library).
It is used like this:

@prefix curate: http://eris.okfn.org/ww/2010/12/curate#.

{ ?s1 ?p1 ?o1 .
  ?s2 ?p2 ?o2 .
  ?s1 curate:cmpURI ?s2 } =
{ ?s1 = ?s2 }.

And results in statements like this:

HTTP://example.org:80/ = HTTP://example.org:80/,
http://EXAMPLE.ORG/,
http://example.org/ .

http://EXAMPLE.ORG/ = HTTP://example.org:80/,
http://EXAMPLE.ORG/,
http://example.org/ .

http://example.org/ = HTTP://example.org:80/,
http://EXAMPLE.ORG/,
http://example.org/ .

Cheers,
-w

[1] https://bitbucket.org/okfn/curate/src/1f6ba3c360c3/curate/builtins.py#cl-9
[2] http://www.livinglogic.de/Python/url/Howto.html
-- 
William Waitesmailto:w...@styx.org
http://eris.okfn.org/ww/ sip:w...@styx.org
F4B3 39BF E775 CF42 0BAB  3DF0 BE40 A6DF B06F FD45

Re: Property for linking from a graph to HTTP connection meta-data?

2011-01-17 Thread William Waites

* [2011-01-17 16:39:27 +0100] Martin Hepp martin.h...@ebusiness-unibw.org 
écrit:

] Does anybody know of a standard property for linking a RDF graph to a  
] http:GetRequest, http:Connection, or http:Response instance? Maybe  
] rdfs:seeAlso (@TBL: ;- ))?

If you suppose that the name of the graph is the same as the
request URI (it will not always be, of course) you can link
in the other direction from http:Request using http:requestURI.
I am not sure that http:requestURI has a standard inverse though.

Cheers,
-w
-- 
William Waitesmailto:w...@styx.org
http://eris.okfn.org/ww/ sip:w...@styx.org
F4B3 39BF E775 CF42 0BAB  3DF0 BE40 A6DF B06F FD45

Re: Property for linking from a graph to HTTP connection meta-data?

2011-01-17 Thread William Waites


* [2011-01-17 23:09:01 +0100] Martin Hepp martin.h...@ebusiness-unibw.org 
écrit:

] # Link the graph to the HTTP header info from the data transformation
] foo:dataset rdfs:seeAlso foo:ResponseMetaData .

Actually this seems like a use case for OPMV. So I think you'd
do something like,

foo:dataset opmv:wasGeneratedBy [ a opmv:Process;
opmv:used foo:ResponseMetaData;
opmv:used http://example.org/foo.xml
].

This would have the side-effect of making your graph an 
opmv:Artifact but that actually makes sense.

Cheers,
-w
-- 
William Waitesmailto:w...@styx.org
http://eris.okfn.org/ww/ sip:w...@styx.org
F4B3 39BF E775 CF42 0BAB  3DF0 BE40 A6DF B06F FD45

Re: Is it best practices to use a rdfs:seeAlso link to a potentially multimegabyte PDF?, existing predicate for linking to PDF?

2011-01-10 Thread William Waites

* [2011-01-10 08:55:59 +] Phil Archer phil.arc...@talis.com écrit:

] However... a property should not imply any content type AFAIAC. That's 
] the job of the HTTP Headers. If software de-references an rdfs:seeAlso 
] object and only expects RDF then it should have a suitable accept 
] header. if the server can't respond with that content type, there are 
] codes to handle that.

I disagree that we should rely on HTTP headers for this.
Consider local processing of a large multi-graph dataset.
These kinds of properties can act as hints to process one
graph or another without the need to dereference something.
(tending to think of graph as equivalent to document 
obtained by dereferencing the graph's name).

Slightly more esoteric are graphs made available over 
ftp, finger, freenet, etc.. Let's take advantage of HTTP
where appropriate but not mix up the transport and 
content unnecessariy.

Cheers,
-w
-- 
William Waitesmailto:w...@styx.org
http://eris.okfn.org/ww/ sip:w...@styx.org
9C7E F636 52F6 1004 E40A  E565 98E3 BBF3 8320 7664

Re: Is vCard range restriction on org:siteAddress necessary?

2011-01-04 Thread William Waites

* [2011-01-04 11:49:43 +] Dave Reynolds dave.e.reyno...@gmail.com écrit:

] Is VCard that bad? It fits your example below just fine.

The only problem I see with the example is that we don't have counties
in Scotland, we have districts. In Quebec and Louisiana and other
historically catholic places we have parishes. Is Scotland a state
in the American sense, not really. You could use things like vc:county
and vc:state and just say that the naming is bad, I guess.

Geonames tackles this problem in a language-neutral way by having
several levels of administrative areas but they also construct a
hierarchy which might be a little verbose for this use case.

Cheers,
-w
-- 
William Waitesmailto:w...@styx.org
http://eris.okfn.org/ww/ sip:w...@styx.org
9C7E F636 52F6 1004 E40A  E565 98E3 BBF3 8320 7664

CKAN Curation Tool

2010-12-06 Thread William Waites

Hi all, 

I did some work over the past couple of days to try to imagine how a
package curation tool might work. This means a tool that looks at
packages on CKAN, applies some rules, and produces some output. The
output might be instructions to add a tag to a package or it might be
to add a package to a group. This last is the main use case, really --
trying to answer the question of, given a package and some rules about
group membership, does it qualify?

I tried to approach this in a general way and what I arrived at was
actually quite easy to implement. On the other had it is a command
line too, and writing rules, whilst straightforward enough, requires
some knowledge of inference rules. Ideas on how to make it more user
friendly are more than welcome.

Here's a very brief summary of how it works. It first reads an RDF
description of a package and a set of rules. The set of rules can
include operators like, try to get this web page or even, 
add this tag to the package or add this package to a group. It
compiles the ruleset and then feeds the description through,
triggering these operations. Any inferred statements and relationships
are printed out and (optionally) any desired changes are saved back to
CKAN through the API.

A somewhat longer explanation, with worked examples can be found at,

  http://packages.python.org/curate/overview.html

For this to be truly useful, a much larger library of built-in
predicates and good bunch of example rulesets would be necessary at
the very least.

Comments and suggestions most welcome -- indeed eagerly sought.

Cheers,
-w
-- 
William Waites
http://eris.okfn.org/ww/foaf#i
9C7E F636 52F6 1004 E40A  E565 98E3 BBF3 8320 7664

Re: Is 303 really necessary?

2010-11-28 Thread William Waites

* [2010-11-27 15:24:53 -0500] Tim Berners-Lee ti...@w3.org écrit:

] http://www.w3.org/2008/site/images/logo-w3c-mobile-lg.png
]   we know we can use to refer to the image in its PNG version.
] http://www.w3.org/2008/site/images/logo-w3c-mobile-lg
]   we know nothing about.
] 
] Because the fetch returned a content-location header,
] we are now not allowed to use that URI to refer to anything -- it could
] after all refer to the Eiffel Tour, or the W3C as an organization
] according to the new system.
] 
] Does this make sense?

Yes and no. I see the distinction between representation and
description, but I don't think the line is necessarily so sharp. For
example, you could make
http://www.w3.org/2008/site/images/logo-w3c-mobile-lg
respond with,

The W3C logo, white text on a teal background the characters
W and 3 apparently raised and C apparently sunken. It is
the large version of the logo intended for use with mobile
browsers.

when asked for text/plain (pace alt). Maybe this is useful for blind
people.  For them it functions as a representation but is written
using descriptive language. I could imagine formalising the
descriptive language in RDF and returning that when asked for a
different content-type. Maybe I should do some background reading in
semiotics to get this clearer in my mind.

In the meantime,

% curl -I http://bnb.bibliographica.org/entry/GB5105626
HTTP/1.0 303 See Other
Server: nginx/0.7.65
Date: Sat, 27 Nov 2010 23:44:54 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Pragma: no-cache
Cache-Control: no-cache
Vary: Accept
Location: http://bnb.bibliographica.org/entry/GB5105626.rdf
X-Cache: MISS from localhost
X-Cache-Lookup: MISS from localhost:80
Via: 1.0 localhost (squid/3.0.STABLE19)
Connection: close

Cheers,
-w
-- 
William Waites
http://eris.okfn.org/ww/foaf#i
9C7E F636 52F6 1004 E40A  E565 98E3 BBF3 8320 7664

Re: Is 303 really necessary?

2010-11-26 Thread William Waites

* [2010-11-26 15:15:42 +0100] Bob Ferris z...@elbklang.net écrit:
]
] I wrote a note as an attempt to clarify a bit the terms Resource, 
] Information Resource and Document and their relations (from my point of 
] view). Maybe this helps to figure out the bits of the current confusion. 

So taking a cue from this thread, I've implemented something that I
think is in line with the original suggestion for a new dataset that
I'm working on. If you request, e.g.

http://bnb.bibliographica.org/entry/GB8102507

with an Accept header indicating an interest in RDF data, you will get
a 200 response with a Content-Location header indicating that what is
returned is actually the GB8102507.rdf document. It seems to me that
this is enough information that a client needn't be confused between
the document and the book, A good man in Africa. There is
foaf:primaryTopic linkage in the document that should also adequately
explain the state of affairs.

However it seems that some clients are confused -- tabulator for
instance as was pointed out in irc the other day.

My question is, should I change the behaviour to the standard 303
redirect or leave it as a stake in the ground saying that this is a
reasonable arrangement?

Cheers,
-w

-- 
William Waites
http://eris.okfn.org/ww/foaf#i
9C7E F636 52F6 1004 E40A  E565 98E3 BBF3 8320 7664

FW: Failed to port datastore to RDF, will go Mongo

2010-11-24 Thread William Waites

Friedrich, I'm forwarding your message to one of the W3 lists.

Some of your questions could be easily answered (e.g. for euro in your
context, you don't have a predicate for that, you have an Observation
with units of a currency and you could take the currency from
dbpedia, the predicate is units).

But I think your concerns are quite valid generally and your
experience reflects that of most web site developers that encounter
RDF.

LOD list, Friedrich is a clueful developer, responsible for
http://bund.offenerhaushalt.de/ amongst other things. What can we
learn from this? How do we make this better?

-w


- Forwarded message from Friedrich Lindenberg friedr...@pudo.org -

From: Friedrich Lindenberg friedr...@pudo.org
Date: Wed, 24 Nov 2010 11:56:20 +0100
Message-Id: a9089567-6107-4b43-b442-d09dcc0c3...@pudo.org
To: wdmmg-discuss wdmmg-disc...@lists.okfn.org
Subject: [wdmmg-discuss] Failed to port datastore to RDF, will go Mongo

(reposting to list):

Hi all, 

As an action from OGDCamp, Rufus and I agreed that we should resume porting 
WDMMG to RDF in order to make the data model more flexible and to allow a 
merger between WDMMG, OffenerHaushalt and similar other projects. 

After a few days, I'm now over the whole idea of porting WDMMG to RDF. Having 
written a long technical pro/con email before (that I assume contained nothing 
you don't already know), I think the net effect of using RDF would be the 
following: 

* Lots of coolness, sucking up to linked data people.
* Further research regarding knowledge representation.

vs.

* Unstable and outdated technological base. No triplestore I have seen so far 
seemed on par with MySQL 4. 
* No freedom wrt to schema, instead modelling overhead. Spent 30 minutes trying 
to find a predicate for Euro.
* Scares off developers. Invested 2 days researching this, which is how long it 
took me to implement OHs backend the first time around. Project would need to 
be sustained through linked data grad students.
* Less flexibility wrt to analytics, querying and aggregation. SPARQL not so 
hot.
* Good chance of chewing up the UI, much harder to implement editing.

I normally enjoy learning new stuff. This is just painful. Most of the above 
points are probably based on my ignorance, but it really shouldn't take a PhD 
to process some gov spending tables. 

I'll now start a mongo effort because I really think this should go schema-free 
+ I want to get stuff moving. If you can hold off loading Uganda and Israel for 
a week that would of course be very cool, we could then try to evaluate how far 
this went. Progress will be at: http://bitbucket.org/pudo/wdmmg-core 

Friedrich



___
wdmmg-discuss mailing list
wdmmg-disc...@lists.okfn.org
http://lists.okfn.org/mailman/listinfo/wdmmg-discuss

- End forwarded message -

-- 
William Waites
http://eris.okfn.org/ww/foaf#i
9C7E F636 52F6 1004 E40A  E565 98E3 BBF3 8320 7664

Re: FW: Failed to port datastore to RDF, will go Mongo

2010-11-24 Thread William Waites

... on the plus side, Friedrich wrote:

] * Lots of coolness, sucking up to linked data people.

I don't see these as particularly good things in themselves. The
solutions have to be obviously technically sound and convenient to
use. Drinking the kool-aid is not helpful.

* [2010-11-24 08:05:08 -0500] Kingsley Idehen kide...@openlinksw.com écrit:
] 
] Is your data available as a dump?

UK data for 2009 that I made is available at:

   http://semantic.ckan.net/dataset/cra/2009/dump.nt.bz2
   http://semantic.ckan.net/dataset/cra/2009/dump.nq.bz2

But this was done more or less by hand and repurposing the CSV -
SDMX (this was done before QB became best practice)  scripts is not
easy. Still, from a modeling perspective they might be a good starting
point.

But having to ask a question in the right place and the answer being a
good starting point is maybe different from doing a google search and
finding easy to follow recipes that can immediately plugged into some
web app.

Cheers,
-w
-- 
William Waites
http://eris.okfn.org/ww/foaf#i
9C7E F636 52F6 1004 E40A  E565 98E3 BBF3 8320 7664

Re: Failed to port datastore to RDF, will go Mongo

2010-11-24 Thread William Waites

* [2010-11-24 22:44:53 +] Toby Inkster t...@g5n.co.uk écrit:
] 
] Or, to put a different slant on it: a competent developer who has spent
] years using SQL databases day-to-day finds it easier to use SQL and the
] relational data model than a different data model and different query
] language that he's spent a few days trying out.

I don't think that's what's happening here, or at least not
entirely. People coming from a RDB background expect things like SUM,
COUNT, INSERT, DELETE, not to mention GROUP BY to work. But SPARQL 1.1
is still very new, each store implements them in slightly different
ways with slightly different syntax, sometimes requiring workarounds
in application code. With RDBs we have good libraries for abstracting
away these differences. We still require people to pay a lot closer
attention to what the underlying plumbing is and how it works (and if
the binary package they got with their OS might be out of date or has
to be compiled from source or even patched - the horror!). These
things prevent people from getting on with what they see as the task
at hand. 

Cheers,
-w
-- 
William Waites
http://eris.okfn.org/ww/foaf#i
9C7E F636 52F6 1004 E40A  E565 98E3 BBF3 8320 7664

ANN: British National Bibliography

2010-11-22 Thread William Waites

Following up on the earlier announcement [1] that the British Library
[2] has made the British National Bibliography [3] available under a
public domain dedication, the JISC Open Bibliography [4] project has
worked to make this data more useable.

The data has been loaded into a Virtuoso store that is queriable
through the SPARQL Endpoint [5] and the URIs that we have assigned
each record use the ORDF [6] software to make them dereferencable,
supporting perform content auto-negotiation as well as embedding RDFa
in the HTML representation.

The data contains some 3 million individual records and some 173
million triples. Indexing the data was a very CPU intensive process
taking approximately three days. Transforming and loading the source
data took about five hours.

For more detail see http://eris.okfn.org/ww/2010/11/bl

   1. 
http://openbiblio.net/2010/11/17/jisc-openbibliography-british-library-data-release/
   2. http://www.bl.uk/
   3. http://www.bl.uk/bibliographic/natbib.html
   4. http://openbiblio.net/
   5. http://bnb.bibliographica.org/sparql
   6. http://ordf.org/

-- 
William Waites
http://eris.okfn.org/ww/foaf#i
9C7E F636 52F6 1004 E40A  E565 98E3 BBF3 8320 7664

Re: ANN: British National Bibliography

2010-11-22 Thread William Waites

* [2010-11-22 16:25:35 +] Richard Light rich...@light.demon.co.uk écrit:
] 
] But is it reliable?  I looked up the book I wrote (Presenting XML, SAMS 
] Net, 1997) and find that it claims it was written by Laura Alschuler. 
] How did that happen?

That's what's in the source data, the 111443rd entry in BNBrdfdc12.xml

I have no idea the error rate for problems of this type in the data -
finding that out is a research project in itself. A next step is to
feed it through the link discovery tools at DERI and Berlin (are they
both the same (Silk) or is DERI's something different?) and then see
what kinds of inconsistencies we can find.

A bit easier, what we really need now is a good way to make these
corrections and feed them back into the BL. For the RDF clued like
yourself, I could take a patch in the form of a corrected graph and
put it in the store.

Cheers,
-w
-- 
William Waites
http://eris.okfn.org/ww/foaf#i
9C7E F636 52F6 1004 E40A  E565 98E3 BBF3 8320 7664

Re: [open-bibliography] ANN: British National Bibliography

2010-11-22 Thread William Waites

(trimming the Cc a little bit but as Peter rightly points out we
should probably trim it further for continued discussion)

* [2010-11-22 17:33:33 +] Richard Light rich...@light.demon.co.uk écrit:

] Absolutely, though lets hope it's a random error and not something 
] systematic.  I went to download the file in question to have a look, but 
] as it's a 450MB XML document, which Firefox is gallantly trying to load 
] for me to read, I suspect I will fail.  Any chance of these resources 
] being offered as zip files?

The cleaned record, which I would agree should not be deleted but
superceded, can be retrieved as

http://bnb.bibliographica.org/entry/GB97W9726.rdf 

So what do we do about this? If it won't appear in further corrected
data from the BL, we should mint a new URI for it. This might be
directly in bibliographica.org. The identifier/slug shouldn't be used
because that's the BNB identifier. Easiest thing is just to make a
hash. 

So if you do a search now you'll see two records for that book, the
incorrect one from the original data and a hand-made one based on that
record and what I could easily find with google.

So the new record is at:

http://bibliographica.org/entry/c4bb7da2c60413acc06f2369746da92b

(anyone with a suggestion about how to make better identifiers please
pipe up).

As far as downloading the source data, I would suggest using wget(1).

Cheers,
-w
-- 
William Waites
http://eris.okfn.org/ww/foaf#i
9C7E F636 52F6 1004 E40A  E565 98E3 BBF3 8320 7664

Re: [open-bibliography] ANN: British National Bibliography

2010-11-22 Thread William Waites

* [2010-11-22 20:35:55 +] Richard Light rich...@light.demon.co.uk écrit:

] Thanks for this.  A follow-up nitpick: the sameAs URL
] 
] http://bibliographica.org/entity/f30566181677c26b17a024c0145f91cd
] 
] for the author gives a 404.

Right. There is (as yet) no particular graph there so what happens is
a SPARQL construct for that URI as a subject is done. I've turned on
sameAs processing in the query (only that one) and now it should give
some more useful information. Time will tell how well that scales.

Cheers,
-w
-- 
William Waites
http://eris.okfn.org/ww/foaf#i
9C7E F636 52F6 1004 E40A  E565 98E3 BBF3 8320 7664

Re: Semantic Ambiguity

2010-11-14 Thread William Waites

* [2010-11-12 15:33:01 +0100] Henry Story henry.st...@bblfish.net écrit:

] I'd start differently. Start with the social web, and simple terms such
] as foaf and sioc. The build up meanings from the ground up, piece by
] piece by introducing value at each point in the game.

FOAF avoided a minefield by using foaf:knows instead of e.g
foaf:friend. Still, what exactly it means that a foaf:knows b is kept
deliberately vague. It probably has as many interpretations as there
are FOAF profiles.

Maybe there is some basic consensus about the meaning which is the
intersection of all (non-pathological) interpretations. But chosing
the appropriate interpretation depends very much on the context or
purpose of the communication or task at hand. In your slides I think
you have implicitly assumed a context which has something to do with
very basic questions of identity -- this is useful but is hardly the
only context in which foaf:knows links between people can be
considered and it isn't at all clear if the assumptions you make will
hold in other contexts.

] Global naming is going to be useful, but by taking such a big
] problem, the linked data community is just confronting many big problems
] simultaneously, which is why it can seem intractable. The network effect
] will end up working itself out. 

This seems very hand-wavy to me. I agree that global naming is
useful. But sorting out the myriad interpretations of these global
names is a hard problem that I don't think is going to just work
itself out.

Cheers,
-w
-- 
William Waites
http://eris.okfn.org/ww/foaf#i
9C7E F636 52F6 1004 E40A  E565 98E3 BBF3 8320 7664


pgpUQHUVhWR40.pgp
Description: PGP signature

Semantic Ambiguity

2010-11-12 Thread William Waites

On Fri, Nov 12, 2010 at 08:40:14AM -0500, Patrick Durusau wrote:
 
 Semantic ambiguity isn't going to go away. It is part and parcel of the
 very act of communication. 
 
 [...]
 
 Witness the lack of uniform semantics in the linked data community over
 something as common as sameAs. As the linked data community expands, so
 are the number of interpretations of sameAs. 
 
 Why can't we fashion solutions for how we are rather than wishing for
 solution for how we aren't? 

I was at a lecture by Dave Robertson [0] the other day where
he talked about some of the ideas behind one of his current
projects [1]. Particularly relevant was the idea of completely
abandoning any attempts at global semantics and instead working
on making sure the semantics are clear on a local communication
channel (as I understood it).

So maybe that would mean a different meaning for sameAs in
different datasets, and that's just fine as long as the reader
is aware of that and fasions some transformation from their
notion of sameAs to their peer's, mutatis mutandis for other
predicates and classes.

In some ways this is similar to how we use language. If I'm
talking to a computer scientist I'll use a different but 
overlapping sub-language of English than if I'm talking to the
postman. If I'm talking to a non-native English speaker I'll
modify my speech so as to be more easily understood. Around
here, tea means supper but a short distance to the South
it more likely means a snack with cakes and cucumber sandwiches.

The important thing is a context of communication which modifies
-- and disambiguates meaning. This might be touched on in the
RDF Semantics with the not often mentioned idea of an
interpretation of a graph.

How does this square with the apparent tendency to want to treat
statements as overarching universal truths?

Cheers,
-w

[0] http://www.dai.ed.ac.uk/groups/ssp/members/dave.htm
[1] http://socialcomputer.eu/

Re: Is 303 really necessary?

2010-11-05 Thread William Waites

On Fri, Nov 05, 2010 at 09:34:43AM +, Leigh Dodds wrote:
 
 Are you suggesting that Linked Data crawlers could/should look at the
 status code and use that to infer new statements about the resources
 returned? If so, I think that's the first time I've seen that
 mentioned, and am curious as to why someone would do it. Surely all of
 the useful information is in the data itself.

Provenance and debugging. It would be quite possible to 
record the fact that this set of triples, G, were obtained
by dereferencing this uri N, at a certain time, from a
certain place, with a request that looked like this and a
response that had these headers and response code. The 
class of information that is kept for [0]. If N appeared
in G, that could lead directly to inferences involving the
provenance information. If later reasoning is concerned at
all with the trustworthiness or up-to-dateness of the 
data it could look at this as well.

Keeping this quantity of information around might quickly
turn out to be too data-intensive to be practical, but
that's more of an engineering question. I think it does
make some sense to do this in principle at least.

Cheers,
-w

[0] http://river.styx.org/ww/2010/10/corscheck

Re: Is 303 really necessary?

2010-11-04 Thread William Waites

On Thu, Nov 04, 2010 at 01:22:09PM +, Ian Davis wrote:
 Hi all,
 
 The subject of this email is the title of a blog post I wrote last
 night questioning whether we actually need to continue with the 303
 redirect approach for Linked Data. My suggestion is that replacing it
 with a 200 is in practice harmless and that nothing actually breaks on
 the web. Please take a moment to read it if you are interested.

cf. other discussion about RDF URI References and IRIs,
where a resource is given an IRI that is not a valid
URI as far as HTTP is concerned we can't dereference it
properly so we need some kind of document - description
indirection... Though in general I think the best practice
is only to give resources IRIs that are also valid URIs...

Cheers,
-w

Re: Please allow JS access to Ontologies and LOD

2010-10-27 Thread William Waites

On Wed, Oct 27, 2010 at 09:00:52PM +, Hugh Glaser wrote:
 Great stuff - thanks for the advice.
 Done for sameas.org and *.rkbexplorer.com
 
 However, did it via .htaccess, and would prefer to do it in
 /etc/httpd/http.conf, not least because the vhosts seems to make it end up
 with two of them (which I assume is not illegal?)
 Can anyone tell me the http.conf line that does the same thing, to help a
 lazy citizen :-)

Same as in .htaccess I believe. I just put

Header add Access-Control-Allow-Origin *

for ckan.net in the apache config.

Cheers,
-w

Re: WordNet RDF

2010-09-21 Thread William Waites

On 10-09-20 23:11, Vasiliy Faronov wrote:
 
 Have you looked at the GOLD ontology[1]?
 
 [1] http://linguistics-ontology.org/gold/

No, somehow I had missed that. It looks like just the
thing! (could benefit from some examples though, sample
sentences and how they would be represented with GOLD).

Thank you for the link!

-w

-- 
William Waites   w...@styx.org
Mob: +44 789 798 9965
Fax: +44 131 464 4948
CD70 0498 8AE4 36EA 1CD7  281C 427A 3F36 2130 E9F5



signature.asc
Description: OpenPGP digital signature

Re: WordNet RDF

2010-09-20 Thread William Waites

On 10-09-20 12:45, Antoine Isaac wrote:
Very interesting! I'm curious though: what's the application scenario
that made you create this version?

(hopefully this is closely enough related that my reply
below isn't a non-sequitur)

I worked on a toy NLP bot that might expose some real
uses for representing natural language in RDF [0]. The
basic premise was to allow users to describe bibliographic
data (works and authors and such) in simple natural
language sentences and have it output RDF (FRBR-esque) [1].

(Motivated partly by the fact that I am terrible at user
interface design and had a very hard time trying to make
a web interface that allowed users to enter data with
anything other than a very simple structure).

One vocabulary that I missed while doing this is something
to represent parts of speech and grammatical syntax in
natural language. I invented something ad-hoc but it might
be useful to have a more completely thought out way to do
this. You can see some examples in the first link.

How do you make the distinction between the two situations--I mean,
based on which elements in the Wordnet data?

The approach that I took -- and keep in mind this was a
toy, I have doubts about the scalability doing things this
way was to (1) parse the natural language sentence into an
annotated syntax tree as an intermediate form (represented
in RDF) and then (2) run specially crafted N3 inference
rules over it to generate the desired output. The inference
rules encode the semantic relationships between concepts
existing in (or across) sentences. I mostly worked with
inference rules that hinged on the main verb in the sentence
(which also happens to be the top of the syntax tree).

In principle, with a complete enough set of such inference
rules (most likely restricted to a particular domain of
discourse, a truly general set would be very hard if it is
possible at all) would resolve the ambiguity. In the case
that makes sense there would be useful entailments, in the
case that doesn't there wouldn't. I saw this kind of
resolution of syntactic ambiguity happen a couple of times.
Resolution of homonyms might work similarly.

I'm not so sure the structure of creating a class hierarchy
based on orthographical accident makes sense. Where the
words do have a common conceptual root, certainly. But in
the crack example I don't think so. They are (probably)
completely different concepts that just happen to be denoted
by the same string. I might be wrong but I don't think that
wordnet contains enough information to make this choice.

Cheers,
-w

[0]
http://blog.okfn.org/2010/08/09/cataloguing-bibliographic-data-with-natural-language-and-rdf/
[1] http://pastebin.ca/1913826
--
William Waites w...@styx.org
Mob: +44 789 798 9965
Fax: +44 131 464 4948
CD70 0498 8AE4 36EA 1CD7 281C 427A 3F36 2130 E9F5

signature.asc
Description: OpenPGP digital signature

Re: New LOD ESW wikipage about Data Licensing

2010-09-19 Thread William Waites

An idle thought. Suppose I take two datasets, licensed
differently, and combine them. Maybe I do something
clever to capture provenance information in how they
are combined (a combination of opmv and evopat comes
to mind). If the licenses are defined at a suitable
granularity (is the cc vocabulary enough?) I can then
derive the the resulting terms by doing something like
the intersection of rights granted in the source
licenses.

So (copyleft \cap public domain) = copyleft, etc.

I wonder about constructing inference rules for this...

If the combination is done in a way that is reversible,
simply selecting some triples from different sources,
for example, rather than putting provenance and
license information on graphs [0], putting it on
individual triples might be nice. But then we need
some token for a triple, ideally in a global way where
if the same triple occurs independently in two places,
two people making tokens for it will end up with the
same token...

Hrmmm... As I said, idle thoughts...

Cheers,
-w

[0] I'm not sure graph isn't a misnomer, or at least
loose language. An RDF graph is a set, I think, and
you can make a standard graph relative to a predicate
by taking vertices from subject and object and
edges from IEXT(predicate). Is this spliting hairs?

-- 
William Waites   w...@styx.org
Mob: +44 789 798 9965
Fax: +44 131 464 4948
CD70 0498 8AE4 36EA 1CD7  281C 427A 3F36 2130 E9F5



signature.asc
Description: OpenPGP digital signature

Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-08 Thread William Waites

On 10-09-08 17:47, Ted Thibodeau Jr wrote:

On Sep 8, 2010, at 01:31 AM, Peter DeVries wrote:

I am kind of annoyed by the CKAN site.

I'm right there with you, Peter.

Anja, you say you can edit without logging in but please note that
the doc page [1] about this database says --

• Please register to CKAN bevor editing or adding any packages.

The login issues should be fixed now. Something had changed
at Google and Yahoo that was causing them to return 501
Unimplemented errors when the association was made. Updating
python-openid to a newer version (2.2.5) appears to have
solved the problem. Please let me know if anyone has further
troubles logging in.

When I ignore that and do dive into editing DBpedia's listing,
I discover --

(Leaving your comments intact for the ckan-discuss list, you
are correct that it is an RDBMS system and that starts
showing through clearly when people used to thinking in an
RDF or EAV way start throwing data at it. I am particularly
interested in looking at ways to improve this, keeping in
mind that it is a running system with real users and a lot
of effort that has gone into building it -- so we need to be
gentle).

- The notes field uses Markdown markup, which I've never
encountered anywhere else, and must now learn (or fake).

- There must be a singular author, with a singular email address.
DBpedia doesn't have a singular author, and there are several
URIs which might be relevant to have here -- and they are not
mailto: URIs. The best is an http: URI ... but there is no way
to make this present, except as part of the literal associated
with the mailto: URI.

- There must be a singular maintainer, with a singular email address.
Same issues as with author.

- There are 14+ CKAN Resource links listed [2] in the documentation,
but the form appears to only take 5 (at least, 4 were previously
filled on the DBpedia page, and filling in the 5th didn't magically
cause a 6th to open, nor was there a link to create a 6th). OH!
Until I Preview the page -- and now there's an empty set of
Resource boxes ... so I can add one more, and Preview, and maybe
add one more, and Preview, and maybe Painful.

- The licensure choices separate CC-ShareAlike and CC-Attribution, but
do not list CC-Attribution-ShareAlike [3]. cc-by-sa is distinct from
cc-by -- and also from cc-by-nc-sa (CC-Attribution-NonCommercial-
ShareAlike), among others. Clarity of presentation is VERY important
for licensing!

- There appears to be an arbitrary limit on the number of Extras
key-value pairs associated with any given data set ... which means
that *truly* densely connected data sets will be short-changed.

From all I can see here, this is an RDB-based thing, not RDF-based. That's
disappointing, to say the least.

All in all, the experience is challenging at best, when listing one
data set. But I have several more to deal with, and today's the
deadline! Hurrah!

*sighs*

Ted

[1]
http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation#How_do_I_add_a_dataset_to_CKAN_or_edit_an_existing_dataset.3F
[2]
http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation#CKAN_resource_links
[3]
http://en.wikipedia.org/wiki/Wikipedia:Text_of_Creative_Commons_Attribution-ShareAlike_3.0_Unported_License

Ted Thibodeau, Jr. // voice +1-781-273-0900 x32
Evangelism Support //mailto:tthibod...@openlinksw.com
// http://twitter.com/TallTed
OpenLink Software, Inc. // http://www.openlinksw.com/
10 Burlington Mall Road, Suite 265, Burlington MA 01803
http://www.openlinksw.com/weblogs/uda/
OpenLink Blogs http://www.openlinksw.com/weblogs/virtuoso/
http://www.openlinksw.com/blog/~kidehen/
Universal Data Access and Virtual Database Technology Providers

--
William Waites william.wai...@okfn.org
Mob: +44 789 798 9965Open Knowledge Foundation
Fax: +44 131 464 4948Edinburgh, UK

RDF Indexing, Clustering and Inferencing in Python
http://ordf.org/

Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

2010-09-08 Thread William Waites

On 10-09-08 18:36, Leigh Dodds wrote:
 Hi,
 
 I've updated several packagea and knot had any issues. While CKAN may
 not be to everyone's taste, it's a much, much better than the previous
 approaches which were largely opaque

Thank you for this Leigh.

 The fact that we will now have more structured data describing the
 cloud so that it can be analysed is another big win. Converting the
 data to RDF is easy. The CKAN API is simple and easy to use.

In fact, for python hackers, there is http://bitbucket.org/ww/ckanrdf
which is another kettle of fish, but it will crawl the CKAN API and
put a DCAT representation into an rdflib store (the precise way this
is handled -- see http://ordf.org/ is another topic that I would be
very happy to discuss). Anyone is, of course, perfectly welcome to
roll their own as was done for the current LOD work.

 Maybe the grumbling can be converted into useful contributions to the
 CKAN code base, which is open source and being used by a number of
 different organisations. No one ga bothered to create anything better
 in the past, so using what's available and looking for ways to improve
 it seems like a more constructive approach IMHO.

I would suggest that unless there is particular interest on the
public-lod list about the workings of CKAN and how it could be
improved that we could continue discussion on the
ckan-disc...@lists.okfn.org list. Contributions of code, ideas and
(constructive) criticism alike are more than welcome.

Cheers,
-w

-- 
William Waites   william.wai...@okfn.org
Mob: +44 789 798 9965Open Knowledge Foundation
Fax: +44 131 464 4948Edinburgh, UK

RDF Indexing, Clustering and Inferencing in Python
http://ordf.org/

CKAN API Preformance Fixed

2010-09-05 Thread William Waites

 As was noticed during the crawling of the CKAN
API over the past few days for creating the diagram,
there were some performance problems. These
were caused by one of the other ckan sites (there
are about a dozen localised ones) that was
misbehaving and eating CPU.

The main (and largest) site has been moved to
a dedicated machine and some caching has been
put in place to further boost performance (though
changes may take up to 15 minutes to appear in
the API).

The misbehaving site, which was heavily customised
has been temporarily disabled as well.

Please let me know if there are any further problems,
but with luck it should be smooth sailing from now
on.

Cheers,
-w

-- 
William Waites   william.wai...@okfn.org
Mob: +44 789 798 9965Open Knowledge Foundation
Fax: +44 131 464 4948Edinburgh, UK

RDF Indexing, Clustering and Inferencing in Python
http://ordf.org/

Re: Org. Namespace Example

2010-06-23 Thread William Waites

On 10-06-23 23:31, Toby Inkster wrote:
 Firstly, bridges and beaches are not typically considered organisations.
   

Sentient, self-organised bridges and beaches? On second
thought maybe they should be foaf:Person

-w

-- 
William Waites   william.wai...@okfn.org
Mob: +44 789 798 9965Open Knowledge Foundation
Fax: +44 131 464 4948Edinburgh, UK

RDF Indexing, Clustering and Inferencing in Python
http://ordf.org/

Re: Organization types predicates vs classes

2010-06-08 Thread William Waites

On 10-06-08 04:27, Todd Vincent wrote:
 
 By adding OrganizationType to the Organization data model, you provide
 the ability to modify the type of organization and can then represent
 both (legal) entities and (legally unrecognized) organizations.

:foo rdf:type SomeKindOfOrganisation .

vs.

:foo org:organisationType SomeKindOfOrganisation .

I don't really see the need for an extra predicate
with almost identical semantics to rdf:type. There
is nothing stopping a subject from having more than
one type.

Having a special predicate doesn't really help with
modification, you could easily do the same thing
with rdf:type and still run up against the problem
that there is no good way of specifying *when* a
particular statement is true (OPMV notwithstanding)

Cheers,
-w

-- 
William Waites   william.wai...@okfn.org
Mob: +44 789 798 9965Open Knowledge Foundation
Fax: +44 131 464 4948Edinburgh, UK

Re: Organization types predicates vs classes

2010-06-08 Thread William Waites

On 10-06-08 11:48, Dan Brickley wrote:
 Yes, exactly. The schema guarantees things will have multiple types.
 The art is to know when to bother mentioning each type. Saying things
 are an rdfs:Resource is rarely interesting. 
   

FWIW, I actually put (using an inferencer) rdfs:Resource on
everything in [1][2] because I use the fresnel vocabulary to
display things. This means I can make a generic lens like this,

:resourceLens a fresnel:Lens ;
fresnel:purpose fresnel:defaultLens ;
fresnel:classLensDomain rdfs:Resource ;
fresnel:showProperties (
rdf:type
fresnel:allProperties
) .

to use as a default.

[1] http://knowledgeforge/pdw/ordf/
[2] http://bibliographica.org/

-- 
William Waites   william.wai...@okfn.org
Mob: +44 789 798 9965Open Knowledge Foundation
Fax: +44 131 464 4948Edinburgh, UK

Re: Organization ontology

2010-06-03 Thread William Waites

On 10-06-03 09:01, Dan Brickley wrote:
 I don't find anything particularly troublesome about the org: vocab on
 this front. If you really want to critique culturally-loaded
 ontologies, I'd go find one that declares class hierarchies with terms
 like 'Terrorist' without giving any operational definitions...
   

I must admit when I looked at the org vocabulary I had a feeling
that there were some assumptions buried in it but discarded a
couple of draft emails trying to articulate it.

I think it stems from org:FormalOrganization being a thing that is
legally recognized and org:OrganizationalUnit (btw, any
particular reason for using the North American spelling here?)
being an entity that is not recognised outside of the FormalOrg

Organisations can become recognised in some circumstances
despite never having solicited outside recognition from a state --
this might happen in a court proceeding after some collective
wrongdoing. Conversely you might have something that can
behave like a kind of organisation, e.g. a class in a class-action
lawsuit without the internal structure present it most organisations.

Is a state an Organisation?

Organisational units can often be semi-autonomous (e.g. legally
recognised) subsidiaries of a parent or holding company. What
about quangos or crown-corporations (e.g. corporations owned
by the state). They have legal recognition but are really like
subsidiaries or units.

Some types of legally recognised organisations don't have a
distinct legal personality, e.g. a partnership or unincorporated
association so they cannot be said to have rights and responsibilities,
rather the members have joint (or joint and several) rights and
responsibilities. This may seem like splitting hairs but from a
legal perspective its an important distinction at least in some
legal environments. The description provided in the vocabulary
is really only true for corporations or limited companies.

I think the example, eg:contract1 is misleading since this is
an inappropriate way to model a contract. A contract has two
or more parties. A contract might include a duty to fill a role
on the part of one party but it is not normally something that
has to do with membership

Membership usually has a particular meaning as applied to
cooperatives and not-for-profits. They usually wring their hands
extensively about what exactly membership means. This concept
normally doesn't apply to other types of organisations and does
not normally have much to do with the concept of a role. The
president of ${big_corporation} cannot be said to have any kind
of membership relationship to that corporation, for example.

I think there might be more, but I don't think its a problem of
embedding westminister assumptions because I don't think
the vocabulary fits very well even in the UK and commonwealth
countries when you start looking at it closely.

Thoughts?

Cheers,
-w

-- 
William Waites   william.wai...@okfn.org
Mob: +44 789 798 9965Open Knowledge Foundation
Fax: +44 131 464 4948Edinburgh, UK

47 matches

Mail list logo