Enhancing open data with identifiers

2014-10-31 Thread Leigh Dodds
I thought I'd share a link to this UKODI/Thomson Reuters white paper
which was published today:

http://theodi.org/guides/data-identifiers-white-paper

Cheers,

L.

-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Dbpedia is down?

2014-10-11 Thread Leigh Dodds
Hi,

Dbpedia has been down for maintenance since yesterday evening, does
anyone know when it will be back up:

All resource URIs return:

"The web-site you are currently trying to access is under maintenance
at this time. We are sorry for any inconvenience this has caused."

I'd have reported this to the bug tracker listed on the dbpedia
support page, but that link is also broken:

http://sourceforge.net/tracker/?group_id=190976

Is there a location where planned maintenance is noted? Similarly, is
there somewhere to go to check service status and updates on fault
finding?

Thanks,

L.

-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: URIs within URIs

2014-08-28 Thread Leigh Dodds
Hi,

I documented all the variations of this form of URI construction I was
aware of in the Rebased URI pattern:

http://patterns.dataincubator.org/book/rebased-uri.html

This covers generating one URI from another. What that new URI returns
is a separate concern.

Cheers,

L.

On Fri, Aug 22, 2014 at 4:56 PM, Bill Roberts  wrote:
> Hi Luca
>
> We certainly find a need for that kind of feature (as do many other linked 
> data publishers) and our choice in our PublishMyData platform has been the 
> URL pattern {domain}/resource?uri={url-encoded external URI} to expose info 
> in our databases about URIs in other domains.
>
> If there was a standard URL route for this scenario, we'd be glad to 
> implement it
>
> Best regards
>
> Bill
>
> On 22 Aug 2014, at 16:44, Luca Matteis  wrote:
>
>> Dear LOD community,
>>
>> I'm wondering whether there has been any research regarding the idea
>> of having URIs contain an actual URI, that would then resolve
>> information about what the linked dataset states about the input URI.
>>
>> Example:
>>
>> http://foo.com/alice -> returns data about what foo.com has regarding alice
>>
>> http://bar.com/endpoint?uri=http%3A%2F%2Ffoo.com%2Falice -> doesn't
>> just resolve the alice URI above, but returns what bar.com wants to
>> say about the alice URI
>>
>> For that matter http://bar.com/?uri=http%3A%2F%2Ffoo.com%2Falice could 
>> return:
>>
>> <http://bar.com/?uri=http%3A%2F%2Ffoo.com%2Falice> a void:Dataset .
>> <http://foo.com/alice> <#some> <#data> .
>>
>> I know SPARQL endpoints already have this functionality, but was
>> wondering whether any formal research was done towards this direction
>> rather than a full-blown SPARQL endpoint.
>>
>> The reason I'm looking for this sort of thing is because I simply need
>> to ask certain third-party datasets whether they have data about a URI
>> (inbound links).
>>
>> Best,
>> Luca
>>
>
>



-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



ORCID as Linked Data

2014-06-17 Thread Leigh Dodds
I discovered this today:

curl -v -L -H "Accept: text/turtle" http://orcid.org/-0003-0837-2362

A fairly new addition to the ORCID service I think.

With many DOIs already supporting Linked Data views, this makes a nice
addition to the academic linked data landscape.

Still lots of room for improvement, but definitely a step forwards.

Cheers,

L.

-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: rdf:HTML datatype in RDF 1.1

2014-04-02 Thread Leigh Dodds
The value space is defined as being a DocumentFragment. I'm not clear
on whether DOM4 has changed the meaning of that, but a fragment is a
collection of nodes, which don't necessarily have a common root
element.

So I think either is valid.

L.

On Wed, Apr 2, 2014 at 11:54 AM, john.walker  wrote:
> Simple question on this which wasn't immediately obvious from the
> recommendation [1].
>
> Is it expected that the string has a single top-level element:
>
> "Hello world!"
>
> Or is it OK to include fragments like:
>
> "Hello world!"
> "Hello world!"
> "Hello world!"
> "Hello world!"
>
> Regards,
>
> John
>
> [1] http://www.w3.org/TR/rdf11-concepts/#section-html



-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Exchanging Links with LINK and UNLINK

2013-12-11 Thread Leigh Dodds
Hi,

The "HTTP Link and Unlink Methods" RFC [1] specifies how to use the
LINK/UNLINK HTTP methods to support exchanging links between resources
on the web.

To explore these ideas I've created a Ruby implementation based on
Rack middleware.
This means that it can be easily integrated into any ruby based web
framework [2].

There are a couple of link stores provides, including one based on a SPARQL 1.1
compliant endpoint.

Supplemented with suitable authentication I think this provides an
interesting way to
exchange links between Linked Data publishers. No special mechanism is
needed, just
existing protocols. Its nicely aligned with existing web infrastructure.

I thought I'd share this with the community as I don't feel we've
settled on a common
pattern for exchanging this kind of information between publishers.

Cheers,

L.

[1]. http://tools.ietf.org/html/draft-snell-link-method-08
[2]. https://github.com/ldodds/link-middleware

-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: How to publish SPARQL endpoint limits/metadata?

2013-10-08 Thread Leigh Dodds
Hi,

As others have suggested, extending service descriptions would be the
best way to do this. This might make a nice little community project.

It would be useful to itemise a list of the type of limits that might
be faced, then look at how best to model them.

Perhaps something we could do on the list?

Cheers,

L.



On Tue, Oct 8, 2013 at 10:46 AM, Frans Knibbe | Geodan
 wrote:
> Hello,
>
> I am experimenting with running SPARQL endpoints and I notice the need to
> impose some limits to prevent overloading/abuse. The easiest and I believe
> fairly common way to do that is to LIMIT the number of results that the
> endpoint will return for a single query.
>
> I now wonder how I can publish the fact that my SPARQL endpoint has a LIMIT
> and that is has a certain value.
>
> I have read the thread Public SPARQL endpoints:managing (mis)-use and
> communicating limits to users, but that seemed to be about how to
> communicate limits during querying. I would like to know if there is a way
> to communicate limits before querying is started.
>
> It seems to me that a logical place to publish a limit would be in the
> metadata of the SPARQL endpoint. Those metadata could contain all limits
> imposed on the endpoint, and perhaps other things like a SLA or a
> maintenance schedule... data that could help in the proper use of the
> endpoint by both software agents and human users.
>
> So perhaps my enquiry really is about a standard for publishing SPARQL
> endpoint metadata, and how to access them.
>
> Greetings,
> Frans
>
>
> --
> Geodan
> President Kennedylaan 1
> 1079 MB Amsterdam (NL)
>
> T +31 (0)20 - 5711 347
> E frans.kni...@geodan.nl
> www.geodan.nl | disclaimer
> --



-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: ANN: DBpedia 3.9 released, including wider infobox coverage, additional type statements, and new YAGO and Wikidata links

2013-10-04 Thread Leigh Dodds
Hi Hugh,

Hasn't dbpedia always suffered from this? I've tended to do the same
as you and have encountered similar inconsistencies. I've never really
figured out whether its down to inconsistency encoding in the data
conversion or something else.

Cheers,

L.


On Fri, Oct 4, 2013 at 1:42 PM, Hugh Glaser  wrote:
> Hi.
> Chris has suggested I send the following to the LOD list, as it may be of 
> interest to several people:
>
> Hi Chris.
> Great stuff!
>
> I have a question.
> Or would you prefer I put it on the LOD list for discussion?
>
> It is about url encoding.
>
> Dbpedia:
> http://dbpedia.org/page/Ashford_%28borough%29 is not found
> http://dbpedia.org/page/Ashford_(borough) works, and redirects to
> http://dbpedia.org/resource/Borough_of_Ashford
> Wikipedia:
> http://en.wikipedia.org/wiki/Ashford_%28borough%29 works
> http://en.wikipedia.org/wiki/Ashford_(borough) works
> Both go to the page with content of 
> http://en.wikipedia.org/wiki/Borough_of_Ashford although the URL in the 
> address bar doesn't change.
>
> So the problem:
> I usually find things in wikipedia, and then use the last bit to construct 
> the dbpedia URI - I suspect lots of people do this.
> But as you can see, the url encoded URI, which can often be found in the 
> wild, won't allow me to do this.
> There are of course many wikipedia URLs with "(" and ")" in them - (artist), 
> (programmer), (borough) etc.
> It is also the same with comma and single quote.
>
> I think this may be different from 3.8, but can't be sure - is it intended?
>
> Very best
> Hugh



-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Minimizing data volume

2013-09-09 Thread Leigh Dodds
Hi,

Before using compression you might also make a decision about whether
you need to represent all of this information as RDF in the first
place.

For example, rather than include the large geometries as literals, why
not store them as separate documents and let clients fetch the
geometries when needed, rather than as part of a SPARQL query?

Geometries can be served using standard HTTP compression techniques
and will benefit from caching.

You can provide summary statistics (including size of the document,
and properties of the described area, e.g. centroids) in the RDF to
help address a few common requirements, allowing clients to only fetch
the geometries they need, as they need them.

This can greatly reduce the volume of data you have to store and
provides clients with more flexibility.

Cheers,

L.


On Mon, Sep 9, 2013 at 10:47 AM, Frans Knibbe | Geodan
 wrote:
> Hello,
>
> In my line of work (geographical information) I often deal with high volume
> data. The high volume is caused by single facts having a big size. A single
> 2D or 3D geometry is often encoded as a single text string and can consist
> of thousands of numbers (coordinates). It is easy to see that this can cause
> performance issues with transferring and processing data. So I wonder about
> the state of the art in minimizing data volume in Linked Data. I know that
> careful publication of data will help a bit: multiple levels of detail could
> be published, coordinates could use significant digits (they almost never
> do), but it seems to me that some kind of compression is needed too. Is
> there something like a common approach to data compression at the moment?
> Something that is understood by both publishers and consumers of data?
>
> Regards,
> Frans
>
>



-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Open Data Rights Statements

2013-08-13 Thread Leigh Dodds
Hi,

Yes, I'm aware of L4LOD. It's essentially the same as LIMO, ccREL and,
to a lesser extent, ODRL. All of these attempt to provide terms for
describing the key facets of licenses. The benefit of ccREL is that
all of the CC licences are already described using those terms so the
machine-readable metadata exists.

Thanks for the pointer to the paper.

However from a quick skim I must admit to being confused by their Real
World example in Section 3.6. While the logic might be correct in the
derivation of the combined licence, its not a great example because:

* You can't create a new derived dataset using data published under a
no-derivatives licence -- so the scenario isn't allowed
* The legal terms of the ODbL licence indicate that any derivatives
that are shared publicly must be done so under the ODbL or a
compatible licence designated by the publisher -- the derived licence
is neither

So while there may be some value in being able to automatically create
summaries of the combined obligations/permissions of licences, I think
this is at most useful for helping understand your obligations, not
the creation of new downstream licences. Partly because it glosses
over important legal points in the terms, and partly because the
community is not best served by a proliferation of licences.
Convergence creates simplicity.

Cheers,

L.



On Mon, Aug 12, 2013 at 5:53 PM, Ghislain Atemezing
 wrote:
> Hi Leigh,
> Nice work indeed! I confess I didn't go through all the guide.
>
>> This work looks at the implications of various open licences on the
>> creation of derived datasets. There's a blog post with pointers here:
>>
>> http://theodi.org/blog/exploring-compatibility-between-data-licences
>>
>> If anyone has any comments then please let me know.
>
> I was wondering if there were connection with the work of Serena et al. at
> INRIA (WIMIX team) on License composition...basically with this ontology
> L4LOD (Licenses for Linked Open Data) [1], and this paper [2] explains all
> the logic behind.
>
>
> Cheers,
> Ghislain
>
>
>
> [1] http://ns.inria.fr/l4lod/v2/l4lod_v2.html
> [2] http://www-sop.inria.fr/members/Serena.Villata/Resources/icail2013.pdf
> --
> Ghislain Atemezing
> EURECOM, Multimedia Communications Department
> Campus SophiaTech
> 450, route des Chappes, 06410 Biot, France.
> e-mail: auguste.atemez...@eurecom.fr & ghislain.atemez...@gmail.com
> Tel: +33 (0)4 - 9300 8178
> Fax: +33 (0)4 - 9000 8200
> Web: http://www.eurecom.fr/~atemezin
>



-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: License LINK Headers and Linked Data

2013-08-12 Thread Leigh Dodds
Hi Mike,

On Mon, Aug 12, 2013 at 5:34 PM, mike amundsen
 wrote:
> "A HEAD request can be made on a resource to check its licensing..."
>
> Since HEAD does not resolve the LINK URLs, agents can check for the
> *existence* of licensing information, but not necessarily determine the
> licensing context.
>
> If the LINK @href or one of the associated @rel values is a URI/IRI that the
> agent recognizes (knows ahead of time) then that MAY provide sufficient
> context for the agent to make a judgment on whether the representation is
> marked with an acceptable license.
>
> Failing that, the agent will need to deref the LINK @href and parse/process
> the response in order to make a judgment on the appropriateness of the
> licensing of the initial response.

Yes, that's exactly what I meant by "check its licensing". I didn't
mean that the header itself communicated all of the necessary
information.

Thanks for spelling it out! :)

L.


-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



License LINK Headers and Linked Data

2013-08-12 Thread Leigh Dodds
Hi,

There's one aspect of my document on publishing machine-readable
rights statements that I want to flag to this community.

Specifically its the section on including references to licence and
rights statements from LINK headers in HTTP responses:

https://github.com/theodi/open-data-licensing/blob/master/guides/publisher-guide.md#linking-to-rights-statements-from-web-apis

While that information can also be published in RDF, as part of the
Linked Data response, I think adding LINK headers is very important
too, for several reasons:

Linked Data applications and browsers will commonly encounter new
resources and the licensing information should be immediately clear.
Having this be accessible outside of the response will allow user
agents to be able to clearly detect licences before they start
retrieving data from a new source. This will allow users to place
pre-conditions on what type of data they want to
harvest/collect/process.

A HEAD request can be made on a resource to check its licensing,
before data is actually retrieved.

Cheers,

L.

-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Open Data Rights Statements

2013-08-12 Thread Leigh Dodds
Hi,

A quick follow-up to my previous announcement. The schema and user
guides have been updated based on feedback I've received from the
wider community. I've also just published a follow-up piece of work
that I think is also relevant to this community.

This work looks at the implications of various open licences on the
creation of derived datasets. There's a blog post with pointers here:

http://theodi.org/blog/exploring-compatibility-between-data-licences

If anyone has any comments then please let me know.

Cheers,

L.

On Tue, Jul 2, 2013 at 9:23 AM, Leigh Dodds  wrote:
> Hi,
>
> At the UK Open Data Institute we've been working on some guidance and
> a new vocabulary to help support the publication of machine-readable
> rights statements for open data. The vocabulary builds on existing
> work in this area (e.g. Dublin Core and Creative Commons) but
> addresses a few issues that we felt were underspecified.
>
> The vocabulary is intended to work in a wide variety of contexts, from
> simple JSON documents and data packaging formats through to Linked
> Data and Web APIs.
>
> The work is now at a stage where we're keen to get wider feedback from
> the community.
>
> You can read a background on the work in this introductory blog post
> on the UK ODI blog:
>
> http://theodi.org/blog/machine-readable-rights-statements
>
> The draft schema can be found here:
>
> http://schema.theodi.org/odrs/
>
> And there are publisher and re-user guides to accompany it:
>
> https://github.com/theodi/open-data-licensing/blob/master/guides/publisher-guide.md
> https://github.com/theodi/open-data-licensing/blob/master/guides/reusers-guide.md
>
> We would love to hear your feedback on the work. If you do have issues
> or comments, then can I ask that you submit them as an issue to our
> github project:
>
> https://github.com/theodi/open-data-licensing/issues
>
> Thanks,
>
> L.
>
> --
> Leigh Dodds
> Freelance Technologist
> Open Data, Linked Data Geek
> t: @ldodds
> w: ldodds.com
> e: le...@ldodds.com



-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Open Data Rights Statements

2013-07-08 Thread Leigh Dodds
Hi Bernard,

On Fri, Jul 5, 2013 at 7:12 PM, Bernard Vatant
wrote:

> Hello David
>
> Thanks for the ping, LOV lurking on public-lod anyway ...
> But since we are in public, just a reminder that the simplest way to
> suggest new vocabularies to LOV is through
> http://lov.okfn.org/dataset/lov/suggest/
>
> But we always of course appreciate direct conversation, and ORDS is
> definitely on the queue.
>
> @Leigh do you think this preliminary version is worth including in LOV as
> is (if nothing else for history) or do we wait for a more "mature" version?
>

I say go ahead and include it. I don't envisage any major changes to
structure, although we may add some new properties in future.

I'll also look at including alternate serializations to the existing Turtle
file.

Cheers,

L.

-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com


Re: Open Data Rights Statements

2013-07-02 Thread Leigh Dodds
Hi Andrea,

On Tue, Jul 2, 2013 at 11:19 AM, Andrea Perego
 wrote:
> That's very interesting, thank you, Leigh.
>
> I wonder whether you plan to consider work carried out in the framework of
> the Open Data Rights Language (ODRL) CG of W3C [1].

Yes, I'm aware of that work. ODRL is a general purpose rights
expression language that can describe re-use policies. This is similar
to the existing Creative Commons ccRel vocabulary which also captures
the permissions, etc that are described by a licence.

The ODRS vocabulary doesn't attempt to describe licenses themselves.
It's intended more of a way to annotate the relationship between a
dataset and one or more licences. Those licenses could be give a
machine-readable description using ccREL or ODRL. So I think the
vocabularies are compatible.

I've already added an issue to cover describing this relationship a little more.

> Also, do you plan to support the notion of "licence type"? This is being
> used, e.g., in vocabularies like ADMS.SW [2] and the DCAT-AP (DCAT
> Application Profile for EU data portals) [3].

Looking at the DCAT profile it seems that license type is a category
of license, e.g. public domain, royalties required, etc. To me, this
overlaps with what ccRel and ODRL already cover, but at a more coarse
grained level.

I think for the purposes of the ODRS vocabulary we'll leave the
description of licenses reasonably opaque and defer to other
vocabularies to describe those in more detail. However we do
distinguish between separate licenses that relate to the data and
copyrightable aspects of the dataset.

Cheers,

L.

--
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Open Data Rights Statements

2013-07-02 Thread Leigh Dodds
Hi,

At the UK Open Data Institute we've been working on some guidance and
a new vocabulary to help support the publication of machine-readable
rights statements for open data. The vocabulary builds on existing
work in this area (e.g. Dublin Core and Creative Commons) but
addresses a few issues that we felt were underspecified.

The vocabulary is intended to work in a wide variety of contexts, from
simple JSON documents and data packaging formats through to Linked
Data and Web APIs.

The work is now at a stage where we're keen to get wider feedback from
the community.

You can read a background on the work in this introductory blog post
on the UK ODI blog:

http://theodi.org/blog/machine-readable-rights-statements

The draft schema can be found here:

http://schema.theodi.org/odrs/

And there are publisher and re-user guides to accompany it:

https://github.com/theodi/open-data-licensing/blob/master/guides/publisher-guide.md
https://github.com/theodi/open-data-licensing/blob/master/guides/reusers-guide.md

We would love to hear your feedback on the work. If you do have issues
or comments, then can I ask that you submit them as an issue to our
github project:

https://github.com/theodi/open-data-licensing/issues

Thanks,

L.

--
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Business Models, Profitability, and Linked Data

2013-06-10 Thread Leigh Dodds
Hi,

On Mon, Jun 10, 2013 at 12:00 PM, Kingsley Idehen
 wrote:
> On 6/10/13 4:18 AM, Leigh Dodds wrote:
>>
>> Hi,
>>
>> On Fri, Jun 7, 2013 at 5:52 PM, Kingsley Idehen 
>> wrote:
>>>
>>> There have been a few recent threads on the LOD and Semantic
>>> Web mailing lists that boil down to the fundamental issues of
>>> profitability, business models, and Linked Data.
>>>
>>> Situation Analysis
>>> ==
>>>
>>> Business Model Issue
>>> 
>>>
>>> The problem with "Data"-oriented business models is that you
>>> ultimately have to deal with the issue of wholesale data copying
>>> without attribution. That's the key issue; everything else is
>>> a futile dance around this concern.
>>
>> Why do you think that attribution is the key issue with data oriented
>> businesses?
>
> Its the key to provenance. It's the key making all contributors to the data
> value chain visible.

I don't disagree that attribution and provenance are important,
especially for Open Data, but also whenever it becomes important to
understand sources of data.

> As I've already stated, the big problem here is wholesale copying and
> reproduction without attribution. Every data publisher has to deal with this
> problem, at some point, when crafting a data oriented business model.

Every data publisher that aggregates or collects data from other
sources certainly needs to understand -- for their own workflow --
where data originates.

>> I've spoken with a number of firms who have business models based on
>> data supply and have never once heard attribution being mentioned as
>> an issue for themselves or their customers. So I'm curious why you
>> think this is a problem.
>
> And are those data suppliers conforming to patterns such as those associated
> with publicly available Linked Open Data? Can they provide open access to
> data and actually have a functional business model based on the
> aforementioned style of data publication?

No they weren't using Linked Open Data. No they weren't publishing
open data (it was commercially licensed for the most part). But they
all had successful business models.

But I understood you to be making a general statement about a key
issue that is common to all data business models, one that Linked Data
then solves.

I agree that every data aggregator needs to understand their workflow,
to manage their own processes. I agree that publishing details of data
provenance and attribution is important, particularly for Open Data.
And absolutely agree that Linked Data can help there.

Maybe I'm misunderstanding your point but I'm not seeing evidence that
attribution is a key business issue that data businesses have to solve
in order to be successful. You said that "everything else is a futile
dance around this concern" which I found surprising, so I'm curious
about the evidence. I'm curious about the general business drivers,
regardless of whether the data is Linked or Open.

Making the data Linked is a solution; making the data Open might also
be a solution, but also presents its own challenges.

Sometimes its important to know how the sausage is made, sometimes its not.

Cheers,

L.

--
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Business Models, Profitability, and Linked Data

2013-06-10 Thread Leigh Dodds
Hi,

On Mon, Jun 10, 2013 at 9:26 AM, Víctor Rodríguez Doncel
 wrote:
>
> While attribution may not be hindering any business, it would be nice being
> able to specify in a machine readable form the way it should be made...

Yes there's definitely scope to do more there, and something I'm
working on at the moment.

Cheers,

L.

--
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Business Models, Profitability, and Linked Data

2013-06-10 Thread Leigh Dodds
Hi,

On Fri, Jun 7, 2013 at 5:52 PM, Kingsley Idehen  wrote:
> There have been a few recent threads on the LOD and Semantic
> Web mailing lists that boil down to the fundamental issues of
> profitability, business models, and Linked Data.
>
> Situation Analysis
> ==
>
> Business Model Issue
> 
>
> The problem with "Data"-oriented business models is that you
> ultimately have to deal with the issue of wholesale data copying
> without attribution. That's the key issue; everything else is
> a futile dance around this concern.

Why do you think that attribution is the key issue with data oriented
businesses?

I've spoken with a number of firms who have business models based on
data supply and have never once heard attribution being mentioned as
an issue for themselves or their customers. So I'm curious why you
think this is a problem.

Cheers,

L.

--
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: There's No Money in Linked Data

2013-05-18 Thread Leigh Dodds
Hi Pascal,

Its good to draw attention to these issues. At ISWC 2009 Tom Heath,
Kaitlin Thaney, Jordan Hatcher and myself ran a workshop a legal and
social issues for data sharing [1, 2]. Key themes from the workshop
were around the importance of clear licensing, norms for attribution,
and including machine-readable license data.

At the time I did a survey of the current state of licensing of the
Linked Data cloud, there's a write-up [3] and diagram [4].

Looking over your analysis, I don't think the picture has changed
considerably since then. We need to work harder to ensure that data is
clearly licensed. But this is a general problem for Open Data, not
just Linked Open Data.

You don't say in your paper how you did the analysis. Did you use the
metadata from the LOD group in datahub? [5]. At the time I had to do
mine manually, but it wouldn't be hard to automate some of this now,
perhaps to create an regularly updated set of indicators.

One criteria that agents might apply when conducting "Follow Your
Nose" consumption of Linked Data is the licensing of the target data,
e.g. ignore links to datasets that are not licensed for your
particular usage.

Cheers,

L.

[1]. http://opendatacommons.org/events/iswc-2009-legal-social-sharing-data-web/
[2]. http://blog.okfn.org/2009/11/05/slides-from-open-data-session-at-iswc-2009/
[3]. http://blog.ldodds.com/2010/01/01/rights-statements-on-the-web-of-data/
[4]. http://www.flickr.com/photos/ldodds/4043803502/
[5]. http://datahub.io/group/lodcloud

On Sat, May 18, 2013 at 3:15 AM, Pascal Hitzler
 wrote:
> We just finished a piece indicating serious legal issues regarding the
> commercialization of Linked Data - this may be of general interest, hence
> the post. We hope to stimulate discussions on this issue (hence the
> provokative title).
>
> Available from
> http://knoesis.wright.edu/faculty/pascal/pub/nomoneylod.pdf
>
> Abstract.
> Linked Data (LD) has been an active research area for more than 6 years and
> many aspects about publishing, retrieving, linking, and cleaning Linked Data
> have been investigated. There seems to be a broad and general agreement that
> in principle LD datasets can be very useful for solving a wide variety of
> problems ranging from practical industrial analytics to highly specific
> research problems. Having these notions in mind, we started exploring the
> use of notable LD datasets such as DBpedia, Freebase, Geonames and others
> for a commercial application. However, it turns out that using these
> datasets in realistic settings is not always easy. Surprisingly, in many
> cases the underlying issues are not technical but legal barriers erected by
> the LD data publishers. In this paper we argue that these barriers are often
> not justified, detrimental to both data publishers and users, and are often
> built without much consideration of their consequences.
>
> Authors:
> Prateek Jain, Pascal Hitzler, Krzysztof Janowicz, Chitra Venkatramani
>
> --
> Prof. Dr. Pascal Hitzler
> Kno.e.sis Center, Wright State University, Dayton, OH
> pas...@pascal-hitzler.de   http://www.knoesis.org/pascal/
> Semantic Web Textbook: http://www.semantic-web-book.org
> Semantic Web Journal: http://www.semantic-web-journal.net
>
>



-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Summarising dbpedia country coverage

2013-05-15 Thread Leigh Dodds
Thought this might be interesting for people on here. I wrote a script
to summarise the geographic coverage of dbpedia:

http://blog.ldodds.com/2013/05/15/summarising-geographic-coverage-of-dbpedia-and-wikipedia/

Lots more potential here, both for creating proper Linked Data for the
results, and for further analysis.

What other Linked Data sets include a range of geographic locations?

Cheers,

L.

--
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Is science on sale this week?

2013-05-14 Thread Leigh Dodds
;>>>>>>>
>>>>>>>>> If we subscribe to science, free and open access to knowledge, what's 
>>>>>>>>> the
>>>>>>>>> purpose of the arrangement between conferences and publishers?
>>>>>>>>>
>>>>>>>>> -Sarven
>>>>>>>>> http://csarven.ca/#i
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Leon R A Derczynski
>>>>>>>> Research Associate, NLP Group
>>>>>>>>
>>>>>>>> Department of Computer Science
>>>>>>>> University of Sheffield
>>>>>>>> Regent Court, 211 Portobello
>>>>>>>> Sheffield S1 4DP, UK
>>>>>>>>
>>>>>>>> +45 5157 4948
>>>>>>>> http://www.dcs.shef.ac.uk/~leon/
>>>>>>>>
>>>>>>>> --
>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>> Groups
>>>>>>>> "Beyond the PDF" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>>>>> an
>>>>>>>> email to beyond-the-pdf+unsubscr...@googlegroups.com.
>>>>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Alexander Garcia
>>>>>>> http://www.alexandergarcia.name/
>>>>>>> http://www.usefilm.com/photographer/75943.html
>>>>>>> http://www.linkedin.com/in/alexgarciac
>>>>>>
>>>>>> --
>>>>>> Phillip Lord,   Phone: +44 (0) 191 222 7827
>>>>>> Lecturer in Bioinformatics, Email: 
>>>>>> phillip.l...@newcastle.ac.uk
>>>>>> School of Computing Science,
>>>>>> http://homepages.cs.ncl.ac.uk/phillip.lord
>>>>>> Room 914 Claremont Tower,   skype: russet_apples
>>>>>> Newcastle University,   twitter: phillord
>>>>>> NE1 7RU
>>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google Groups 
>>>>> "Beyond the PDF" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>>> email to beyond-the-pdf+unsubscr...@googlegroups.com.
>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>
>>>>>
>>>
>>> --
>>> Phillip Lord,   Phone: +44 (0) 191 222 7827
>>> Lecturer in Bioinformatics, Email: phillip.l...@newcastle.ac.uk
>>> School of Computing Science,
>>> http://homepages.cs.ncl.ac.uk/phillip.lord
>>> Room 914 Claremont Tower,   skype: russet_apples
>>> Newcastle University,   twitter: phillord
>>> NE1 7RU
>>>
>>
>
>
>
> --
> Alexander Garcia
> http://www.alexandergarcia.name/
> http://www.usefilm.com/photographer/75943.html
> http://www.linkedin.com/in/alexgarciac
>



-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Content negotiation negotiation

2013-04-24 Thread Leigh Dodds
The first two indicate that responses vary based on Accept header as
both have a Vary: Accept. The third doesn't so doesn't support
negotiation.

None of the URLs advertise what formats are available. That's not a
requirement for content-negotiation, although it'd be useful.

Cheers,

L.


On Wed, Apr 24, 2013 at 2:17 PM, Phillip Lord
 wrote:
>
> Hmmm.
>
> So, taking a look at these three URLs, can you tell me
> a) which of these support content negotiation, and b) what formats
> they provide.
>
> http://dx.doi.org/10.3390/fi4041004
> http://dx.doi.org/10.1594/PANGAEA.527932
> http://dx.doi.org/10.1000/182
>
> I tried vapor -- it seems to work by probing with application/rdf+xml,
> but it appears to work by probing. I can't find any of the headers
> mentioned either, although perhaps I am looking wrongly.
>
> Phil
>
>
>
> Hugh Glaser  writes:
>
>> Ah of course - thanks Mark, silly me.
>> So I look at the Link: header for something like
>> curl -L -i http://dbpedia.org/resource/Luton
>> Which gives me the information I want.
>>
>> Anyone got any offers for how I would use Linked Data to get this into my 
>> RDF store?
>>
>> So then I can do things something like:
>> SELECT ?type ?source FROM { <http://dbpedia.org/resource/Luton> ?foo ?file .
>> ?file ?type ?source . }
>> (I think).
>>
>> I suppose it would need to actually be returned from a URI at the site - I
>> can't get a header as URI resolution - right?
>> And I would need an ontology?
>>
>> Cheers.
>>
>> On 23 Apr 2013, at 19:49, Mark Baker 
>>  wrote:
>>
>>> On Tue, Apr 23, 2013 at 1:42 PM, Hugh Glaser  wrote:
>>>>
>>>> On 22 Apr 2013, at 12:18, Phillip Lord  
>>>> wrote:
>>>> 
>>>>> We need to check for content negotiation; I'm not clear, though, how we
>>>>> are supposed to know what forms of content are available. Is there
>>>>> anyway we can tell from your website that content negotiation is
>>>>> possible?
>>>> Ah, and interesting question.
>>>> I don't know of any, but maybe someone else does?
>>>
>>> Client-side conneg, look for Link rel=alternate headers in response
>>>
>>> Server-side conneg, look for "Vary: Content-Type" in response
>>>
>>> Mark.
>>
>>
>>
>
> --
> Phillip Lord,   Phone: +44 (0) 191 222 7827
> Lecturer in Bioinformatics, Email: phillip.l...@newcastle.ac.uk
> School of Computing Science,
> http://homepages.cs.ncl.ac.uk/phillip.lord
> Room 914 Claremont Tower,   skype: russet_apples
> Newcastle University,   twitter: phillord
> NE1 7RU
>



-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: SPARQL, philosophy n'stuff..

2013-04-22 Thread Leigh Dodds
Hi Barry,

On Mon, Apr 22, 2013 at 9:17 AM, Barry Norton  wrote:
>
> I'm sorry, but you seem to have misunderstood the use of a graph URI
> parameter in indirect graph addressing for GSP.
>
> I wish all GSP actions addressed graphs directly, Queries were all GETs, and
> that Updates were all PATCH documents, but a degree of pragmatism has been
> applied.

I think Mark's point was that SPARQL 1.1/GSP specify a fixed query
parameter (query, graph) in the specification, requiring clients to
construct URIs rather than using hypermedia.

Cheers,

L.

--
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Fwd: Re: Public SPARQL endpoints:managing (mis)-use and communicating limits to users.

2013-04-19 Thread Leigh Dodds
Hi,

On Fri, Apr 19, 2013 at 11:55 AM, Kingsley Idehen
 wrote:
> ...
> If you have OFFSET and LIMIT in use, you can reflect the new state of
> affairs when the next GET is performed i.e, lets say you have OFFSET 20 and
> LIMIT 20, the URL with OFFSET 40 is the request for the next batch of
> results from the solution and the one that would reflect the new state of
> affairs.

This requires the client to page from the outset. Ideally there would
be a way for a server to force paging where it needed to. At the
moment though there's no way for a server to indicate that its done
that, e.g. by including a "next page" link in the results.

This also moves us towards a more hypermedia approach where clients
don't need to construct URIs: the server provides them.

The community could decide on some extension elements/keys that could
be used in SPARQL XML/JSON results formats to achieve this. If the
link element in the existing format were a little more flexible [1]
then this option would be available. We could still use the atom link
element as an extension though with existing rel values (which
addresses other use cases).


[1]. http://www.w3.org/2009/sparql/wiki/Feature:Query_response_linking

--
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Re: Public SPARQL endpoints:managing (mis)-use and communicating limits to users.

2013-04-19 Thread Leigh Dodds
Hi,

On Fri, Apr 19, 2013 at 8:49 AM, Jerven Bolleman
 wrote:
>  Original Message 
> Subject: Re: Public SPARQL endpoints:managing (mis)-use and communicating
> limits to users.
> Date: Thu, 18 Apr 2013 23:21:46 +0200
> From: Jerven Bolleman 
> To: Rob Warren 
>
> Hi Rob,
>
> There is a fundamental problem with HTTP status codes.
> Lets say a user submits a complex but small sparql request.
>
> My server sees the syntax is good and starts to reply in good faith.
> This means the server starts the http response and sends an 200 OK
> Some results are being send
> However, during the evaluation the server gets an exception.
> What to do? I can't change the status code anymore...
>
> Waiting until server know the query can be answered is not feasible because
> that would mean
> the server can't start giving replies as soon as possible. Which likely
> leads
> to connection timeouts. Using HTTP status codes when responses are likely to
> be larger
> than 1 MB works badly in practice.

That's not really true. I can download multi-gigabyte files over HTTP
without any problem. The issue is more with servers sending a 200 OK
response, when they can't actually guarantee that they can fulfil the
request.

While there are always going to be things like hardware failures that
might mean requests might fail, e.g. leading to truncated or no
responses, but servers shouldn't be sending 200 responses if there are
expected failure conditions. For example timing out a query after a
200 response is sent seems wrong to me.

There are work arounds:

* Response formats, particularly those intended for streaming, could
support markup that indicates that results are terminated, perhaps
with pointers to next page. SPARQL XML & JSON could be extended in
this way, difficult to do with RDF/XML, etc. This would allow server
to terminate streaming but still give a client a valid response with
potentially a link to further results

* Not responding directly at all: serve a 202 Accepted for (expensive)
queries and route the user to another resource from which they can
fetch the query results. Data can be prepared asynchronously and the
response can respond correctly for a timed-out query.

The latter wouldn't necessarily involve changes to SPARQL formats or
the protocol.

Cheers,

L.

--
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Restpark - Minimal RESTful API for querying RDF triples

2013-04-18 Thread Leigh Dodds
Hi,

On Thu, Apr 18, 2013 at 4:23 PM, Alan Ruttenberg
 wrote:
> Luca,
>
> In the past I have suggested a simple way to create simple restful services
> based on SPARQL. This could easily be implemented as an extension to your
> beginning of restpark.
>
> The idea is to have the definition of a service be a sparql query with
> blanks, and possibly some extra annotations.

That's essentialy what we called "SPARQL Stored Procedures" in Kasabi.
SPARQL queries bound to URIs with parameters injected from query
string. We also had transformation of results using XSLT. Swirrl have
implemented this as "named queries" [1], and I used their name when
writing up the pattern [2].

One set of annotations I'm planning on adding to sparql-doc are the
parameters that need to be injected and, optionally, a path to bind
the query to when mounted. The goal being to allow a package of
queries to be mounted at a URL and used as named queries.

[1]. http://blog.swirrl.com/articles/new-publishmydata-feature-named-queries
[2]. http://patterns.dataincubator.org/book/named-query.html
[3]. http://blog.ldodds.com/2013/01/30/sparql-doc/

Cheers,

L.

--
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: SPARQL, philosophy n'stuff..

2013-04-18 Thread Leigh Dodds
Hi,

On Thu, Apr 18, 2013 at 12:54 PM, Kingsley Idehen
 wrote:
> On 4/18/13 7:44 AM, Leigh Dodds wrote:
>>
>> But I bet you learnt it in stages using a pedagogical approach that
>> guided you towards the basic building blocks first. And I expect there
>> were other reasons -- network effects -- why learning English was
>> worth up-front effort. We're not there with SPARQL.
>
> Do you have an example of any declarative query language that meets the goal
> in question, assuming I am interpreting your comments accurately?

I think you misunderstood me. I don't think any declarative query
language (or technology) can meet the wider goal, because many of the
issues are non-technical.

My specific point in that comment was that SPARQL is still not widely
deployed or in-use enough that someone might just sit down and learn
it simply because its a core skill or technology. That's changing but
its still very far from, e.g. SQL, in that regard.

Cheers,

L.

--
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: SPARQL, philosophy n'stuff..

2013-04-18 Thread Leigh Dodds
Hi,

On Thu, Apr 18, 2013 at 12:21 PM, Jürgen Jakobitsch SWC
 wrote:
> i think there's yet another point overlooked :
>
> what we are trying to do is to create barrier free means of
> communication on data level in a globalized world. this effort requires
> a common language.

Did you mean a common *query* language?

I'm not sure I agree. Mainly because no-one has yet created such as
thing, so we might find out that the bigger challenges are elsewhere.
I guess time will tell :)

I used to think that there might be convergence around common query
languages for APIs, but there's little evidence of that happening.

> my personal view is that providing simplier subsets of such a language
> (an api) only leads to the fact that nobody will learn the language (see
> pocket calculators,...), although there's hardly anything easier than to
> write a sparql query, it can be learned in a day.
>
> i do not really understand where this "the developer can't sparql, so
> let's provide something similar (easier)" - idea comes from.

Well if our goal is to create barrier free data sharing and re-use
then we should focus on achieving that regardless of technology, and
should be open to a variety of approaches. We can't decide that SPARQL
is the right solution and then just expect everyone to learn it.

Maybe it only takes a day to learn SPARQL, but personally I find that
usually I can get up to speed with a custom API in a few minutes, so
that's even faster.

And it turns out that often the issue isn't just learning SPARQL
alone, its also learning the data model [1].

> did anyone provide me with a wrapper for the english language? nope, had
> to learn it.

But I bet you learnt it in stages using a pedagogical approach that
guided you towards the basic building blocks first. And I expect there
were other reasons -- network effects -- why learning English was
worth up-front effort. We're not there with SPARQL.

Cheers,

L.

[1]. http://blog.ldodds.com/2011/06/16/giving-rdf-datasets-more-affordance/

--
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Restpark - Minimal RESTful API for querying RDF triples

2013-04-18 Thread Leigh Dodds
Hi,

On Thu, Apr 18, 2013 at 12:01 PM, Luca Matteis  wrote:
> Thanks Paul,
>
> That is exactly what my point was entirely about. Many service don't expose
> their SQL interface, so why should Linked Data?
>
> Regarding this Linked Data API, it seems to still require a SPARQL endpoint.
> In fact it states that it is a proxy for SPARQL. Would it simply be possible
> to implement this API without SPARQL on top of a regular database that
> contains triples?

While the specification talks about mapping to a SPARQL endpoint the
processing model would potentially allow you to use different
backends. Servicing a Linked Data API request involves several steps:

1. Mapping the request to a query (currently a SPARQL SELECT) to
identify the list of resources of interest
2. Mapping the request to a query (currently a SPARQL CONSTRUCT) to
produce a description of each item on the list
3. Serialising the results

Broadly speaking you could swap out steps 1 & 2.

For example you could map the first step to a search query that
produces a list of results from a search engine, or a SQL query that
extracts the resources from a database. You could map the second step
to requests to a document database that fetches pre-existing
descriptions of each item.

The API supports a number of filtering and sorting options, which will
add some complexity to both stages, but I don't think there's any show
stoppers in there.

Cheers,

L.

--
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Restpark - Minimal RESTful API for querying RDF triples

2013-04-18 Thread Leigh Dodds
Hi Paul,

On Thu, Apr 18, 2013 at 11:54 AM, Paul Groth  wrote:
> Hi Leigh
>
> The problem is that it's really easy to write sparql queries that are
> inefficient when you don't know the data [1] and even when you do the
> flexibility of sparql means that people can easily end-up writing complex
> hard to process queries.

Totally agree with your assessment, I was just observing that there's
a number of factors in play which result in a design trade-off meaning
there is no right answer or winning solution.

My experience is much the same as yours. Which is why I've been
experimenting with APIs over SPARQL and worked with Jeni and Dave on
the design of the Linked Data API. I think its pretty good, but don't
think we've done a good job yet of documenting it. I also suspect
there's an even simpler subset or profile in there, but I've not had
the time yet to dig through and see what kinds of APIs people are
building with it.

L.

--
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Restpark - Minimal RESTful API for querying RDF triples

2013-04-18 Thread Leigh Dodds
Hi Hugh,

On Thu, Apr 18, 2013 at 10:56 AM, Hugh Glaser  wrote:
> (Yes, Linked Data API is cool!, and thanks for getting back to the main 
> subject, although I somehow doubt anyone is expecting to read anything about 
> it in this thread now :-) )

I'm still hoping we might return to the original topic :)

What this discussion, and in fact most related discussions about
SPARQL as a web service, seems to overlook is that there are several
different issues in play here:

* Whether SPARQL is more accessible to developers than other forms of
web API. For example is the learning curve, harder or easier?

* Whether offering query languages like SPARQL, SQL, YQL, etc is a
sensible option when offering a public API and what kinds of quality
of service can be wrapped around that. Or do other forms of API offer
more options for providing quality of service by trading off power of
query expression?

* Techniques for making SPARQL endpoints scale in scenarios where the
typical query patterns are unknown (which is true of most public
endpoints). Scaling and quality of service considerations for a public
web service and a private enterprise endpoint are different. Not all
of the techniques that people use, e.g. query timeouts or partial
results, are actually standardised so plenty of scope for more
exploration here.

* Whether SPARQL is the only query language we need for RDF, or for
more general graph databases, or whether there are room for other
forms of graph query languages

The Linked Data API was designed to provide a simplified read-only API
that is less expressive than full SPARQL. The goals were to make
something easier to use, but not preclude helping developers towards
using full SPARQL if that's what they wanted. It also fills a
short-fall with most Linked Data publishing approaches, i.e. that
getting lists of things, possibly as a paged list, possibly with some
simple filtering is not easy. We don't need a full graph query
language for that. The Linked Data Platform is looking at that area
too, but its also got a lot more requirements its trying to address.

Cheers,

L.

--
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Coping with gaps in linked data (UK postcodes)?

2013-04-12 Thread Leigh Dodds
Hi Stephen,

Really your only option is to mint your own URIs, but then later build
in links to the official URIs if/when they become available:

http://patterns.dataincubator.org/book/proxy-uris.html

The postcodes make good Natural Keys for building your URIs. This will
help to automatically generate links not just to your data, but also
to the official version when available.

Cheers,

L.


On Fri, Apr 12, 2013 at 2:08 PM, Cresswell, Stephen
 wrote:
>
> Hello,
>
> In our application, we wish to publish linked data, including addresses
> with postcode URIs.  The postcode URIs provided by Ordnance Survey for
> England, Wales and Scotland are really useful, with the postcode URIs
> dereferencing to provide useful information including co-ordinates.
>
> However, the geographical extent of our data includes Northern Ireland,
> which is outside the scope of (British) Ordnance Survey and not included
> in their dataset.  The equivalent postcode data for Northern Ireland is
> available from the NI government body NISRA, but it is not on an open
> license.
>
> This leaves us with a question about what URIs to use for Northern
> Ireland postcodes, as we know of no existing URI scheme for Northern
> Ireland postcodes.
>
> If we generate postcode URIs using the same pattern as the rest of the
> UK, those URIs would be in the Ordnance Survey's domain, but NI
> postcodes are not actually in their dataset and they won't dereference,
> so that seems wrong.
>
> If we are to have dereferencable URIs, we would presumably have to host
> them in our own domain, which is definitely not the most appropriate
> place for them to be.  If we buy a license to use the NI postcode data,
> we still wouldn't be able to republish it as linked data.  Presumably,
> however, there is some geographical information that is open and could
> be published, e.g. courser geographical information based on just the
> postcode district.
>
> Does anyone have any advice on best practice, either for the specific
> problem (NI postcodes) or for the general problem of how to cope with
> URIs based on an existing coding scheme (e.g. postcodes), where the
> published URIs don't cover all of the original codes?
>
> Stephen Cresswell
> The Stationery Office
>
>
> This email is confidential and may also be privileged and/or proprietary to 
> The Stationery Office Limited. It may be read, copied and used only by the 
> intended recipient(s). Any unauthorised use of this email is strictly 
> prohibited. If you have received this email in error please contact us 
> immediately and delete it and any copies you have made. Thank you for your 
> cooperation.
> The Stationery Office Limited is registered in England under Company No. 
> 3049649 at 1-5 Poland Street, London, W1F 8PR
>
>
>



-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Content negotiation for Turtle files

2013-02-06 Thread Leigh Dodds
Hi,

On Wed, Feb 6, 2013 at 9:54 AM, Bernard Vatant
 wrote:
> ...
> But what I still don't understand is the answer of Vapour when requesting
> RDF/XML :
>
> 1st request while dereferencing resource URI without specifying the desired
> content type (HTTP response code should be 303 (redirect)): Passed
> 2nd request while dereferencing resource URI without specifying the desired
> content type (Content type should be 'application/rdf+xml'): Failed
> 2nd request while dereferencing resource URI without specifying the desired
> content type (HTTP response code should be 200): Passed

>From a purely HTTP and Content Negotiation point of view, if a client
doesn't specify an Accept header then its perfectly legitimate for a
server to return a default format of its choosing. I think it could
also decide to serve a 300 status code and prompt the client to choose
an option thats available.

>From an interoperability point of view, having a default format that
clients can rely on is reasonable. Until now, RDF/XML has been the
standardised format that we can all rely on, although shortly we may
all collectively decide to prefer Turtle. So ensuring that RDF/XML is
available seems like a reasonable thing for a validator to try and
test for.

But there's several ways that test could have been carried out. E.g.
Vapour could have checked that there was a RDF/XML version and
provided you with some reasons why that would be useful. Perhaps as a
warning, rather than a fail.

The explicit check for RDF/XML being available AND being the default
preference of the server is raising the bar slightly, but its still
trying to aim for interop.

Personally I think I'd implement this kind of check as "ensure there
is at least one valid RDF serialisation available, either RDF/XML or
Turtle". I wouldn't force a default on a server, particularly as we
know that many clients can consume multiple formats.

This is where automated validation tools have to tread carefully:
while they play an excellent role in encouraging consistently, the
tests they perform and the feedback they give need to have some
nuance.

Cheers,

L.

-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Linked Data Adoption Challenges Poll

2012-09-13 Thread Leigh Dodds
Hi,

You might need to clarify your questions, I can have a guess at what
they mean, but they may not be right.

Presumably you are also targeting this poll at people trying (and
failing) to adopt linked data. In that case you might want to broaden
the base of potential respondents. People on these lists may not
reflect all of the issues.

Cheers,

L.

On Thu, Sep 13, 2012 at 5:34 PM, Kingsley Idehen  wrote:
> All,
>
> I've created a poll oriented towards capturing data about issues that folks
> find most challenging re., Linked Data Adoption.
>
> Please cast your vote as the results will be useful to all Linked Data
> stakeholders.
>
> Link: http://poll.fm/3w0cb .
>
> --
>
> Regards,
>
> Kingsley Idehen
> Founder & CEO
> OpenLink Software
> Company Web: http://www.openlinksw.com
> Personal Weblog: http://www.openlinksw.com/blog/~kidehen
> Twitter/Identi.ca handle: @kidehen
> Google+ Profile: https://plus.google.com/112399767740508618350/about
> LinkedIn Profile: http://www.linkedin.com/in/kidehen
>
>
>
>
>



-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com



Re: Can we create better links by playing games?

2012-06-20 Thread Leigh Dodds
On Wed, Jun 20, 2012 at 2:19 PM, Melvin Carvalho
 wrote:
>
>
> On 20 June 2012 15:11, Kingsley Idehen  wrote:
>>
>> On 6/19/12 3:23 PM, Martin Hepp wrote:
>>>
>>> [1] Games with a Purpose for the Semantic Web, IEEE Intelligent Systems,
>>> Vol. 23, No. 3, pp. 50-60, May/June 2008.
>>
>>
>> Do the games at: http://ontogame.sti2.at/games/, still work? The more data
>> quality oriented games the better re. LOD and the Semantic Web in general.
>>
>> Others: Are there any other games out there?
>
>
> iand is working on a game:
>
> http://blog.iandavis.com/2012/05/21/wolfie/

Is that relevant? :)

L.



Re: Decommissioning a linked data site

2012-06-01 Thread Leigh Dodds
Hi,

On Fri, Jun 1, 2012 at 3:30 PM, Bradley Allen  wrote:
> Leigh- This is great. The question that comes up for me out of what you've
> written for unpublishing brings me back to Antoine's question: is it
> appropriate to use a relation other than owl:sameAs that more specific to
> the domain of the affected datasets being mapped, or is the nature of
> unpublishing such that one would, as opposed to my reasoning earlier, be as
> broad as possible in asserting equivalence, and use owlsameAs in every such
> case?

Really interesting question, and this might prompt me to revise the pattern :)

So, generally, I advocate using the appropriate equivalence relation
that relates to a specific domain. As I wrote in [1] its best to use
the most appropriate equivalence link, as they have varying semantics.

But for the unpublishing use case I think I'd personally lean towards
*always* using owl:sameAs at least in the case where we are returning
a 301 status code. I've previously come to the conclusion [2] that a
301 implies a sameAs statement. The intent seems very similar to a
sameAs. Rewriting local links to use a new location is very similar to
smushing descriptions in an RDF dataset such that statements only
relate to the new URI.

However I can see arguments to the effect that the new authority might
have a slightly different definition of a resource than the original
publisher, such that an owl:sameAs might be inappropriate. That's why
I left the advice in the pattern slightly open ended: I think it may
need to be evaluated on a case by case basis, but owl:sameAs seems
like a good workable default to me.

Cheers,

L.

[1]. http://patterns.dataincubator.org/book/equivalence-links.html
[2]. http://www.ldodds.com/blog/2007/03/the-semantics-of-301-moved-permanently/



Re: Decommissioning a linked data site

2012-06-01 Thread Leigh Dodds
Hi,

On Fri, Jun 1, 2012 at 7:34 AM, Antoine Isaac  wrote:
> @Tim:
>
>> For total extra kudos, provide query rewriting rules
>> from yours site to LoC data, linked so that you can write a program
>> to start with a sparql query which fails
>> and figures out from metadata how to turn it into one which works!
>
>
> Is the combination of 301 + owl:sameAs that we have used for RAMEAU, e.g,
> http://stitch.cs.vu.nl/vocabularies/rameau/ark:/12148/cb11932889r
> good enough?
> Or would you recommend more/different?

I've started to capture some advice here:

http://patterns.dataincubator.org/book/unpublish.html

Cheers,

L



New draft of Linked Data Patterns book

2012-06-01 Thread Leigh Dodds
Hi,

There's a new draft of the Linked Data patterns book available:

http://patterns.dataincubator.org/book/

There have been a number of revisions across the pattern catalogue,
including addition of new introductory sections to each chapter. There
are a total of 12 new patterns, many of which cover data management
patterns relating to use of named graphs.

Cheers,

L.



Re: looking for skos vocabularies

2012-05-17 Thread Leigh Dodds
Hi,

There's a pretty comprehensive set of links available here:

http://www.w3.org/2001/sw/wiki/SKOS/Datasets

Cheers,

L.

On Thu, May 17, 2012 at 4:43 PM, Christian Morbidoni
 wrote:
> Hi,
>
> I've been looking for same example of skos vocabulary to use as a real world
> test case in a project.
> Surprisingly I cannot find so much around...do someone know about an archive
> of skos vocabularies or some good example of skos in use?
> I'm starting to wonder...is people using skos out there?
>
> best,
>
> Christian
>



Re: Layered Data

2012-05-04 Thread Leigh Dodds
Hi Pablo,

On Fri, May 4, 2012 at 10:37 AM, Pablo Mendes  wrote:
>
> Interesting thoughts. It would be nice to have some "default" widely
> accepted facets within an extensible model.

Thanks.

> I had a somewhat related discussion with Niko Popitsch last year on how
> "database views" could look like in the LOD world. The discussion was a
> follow up to his talk:
> Keep Your Triples Together: Modeling a RESTtful, Layered Linked Data Store
> http://cs.univie.ac.at/research/research-groups/multimedia-information-systems/publikation/infpub/2910/

Thanks for the pointer, I'll take a look :)

Cheers,

L.



Layered Data

2012-05-04 Thread Leigh Dodds
Hi,

I've written up some thoughts on considering datasets as "layers" that
can be combined to create useful aggregations. The concept originated
with Dan Brickley and I see the RDF WG are considering the term as an
alternative to "named graph". My own usage is more general. I thought
I'd share a link here to see what people thought.

The paper is at:

http://ldodds.com/papers/layered-data.html

And a blog post with some commentary here:

http://www.ldodds.com/blog/2012/05/layered-data-a-paper-some-commentary/

Cheers,

L.



Re: Datatypes with no (cool) URI

2012-04-04 Thread Leigh Dodds
(apologies if this is a re-post, I don't think it made it through y'day)

Hi

On Tue, Apr 3, 2012 at 6:29 PM, Dave Reynolds  wrote:
> On 03/04/12 16:38, Sarven Capadisli wrote:
>>
>> On 12-04-03 02:33 PM, Phil Archer wrote:
>>>
>>> I'm hoping for a bit of advice and rather than talk in the usual generic
>>> terms I'll use the actual example I'm working on.
>>>
>>> I want to define the best way to record a person's sex (this is related
>>> to the W3C GLD WG's forthcoming spec on describing a Person [1]). To
>>> encourage interoperability, we want people to use a controlled
>>> vocabulary and there are several that cover this topic.
...
>>
>> Perhaps I'm looking at your problem the wrong way, but have you looked
>> at the SDMX Concepts:
>>
>> http://purl.org/linked-data/sdmx/2009/code#sex
>>
>> -Sarven
>>
>
> I was going to suggest that :)

+1. A custom datatype doesn't seem correct in this case. Treating
gender as a category/classification captures both the essence that
there's more than one category & that people may differ in how they
would assign classifications.

I wrote a bit about Custom Datatypes here:

http://patterns.dataincubator.org/book/custom-datatype.html

This use case aside, there ought to be more information to guide
people towards how to do this correctly.

See also:

http://www.w3.org/TR/swbp-xsch-datatypes/

Cheers,

L.



Re: Document Action: 'The Hypertext Transfer Protocol (HTTP) Status Code 308 (Permanent Redirect)' to Experimental RFC (draft-reschke-http-status-308-07.txt)

2012-03-27 Thread Leigh Dodds
Hi James,

On Tue, Mar 27, 2012 at 2:15 AM, James Leigh  wrote:
> Could this 308 (Permanent Redirect) give us a way to cache a probe URI's
> definition document location?
>
> An issue people have with httpRange-14 is that 303 redirects can't be
> cached. If we could agree to use a 308 response as a cache-able
> alternative to 303, we could reduce server load and speed client URI
> processing (by caching the result of a probe URI).

I'm missing how that would help, could you elaborate? The semantics of
that response code is that the resource has permanently moved, that
seems very different to a 303.

A strict reading and application of the rules would suggest that the
new URI should be considered a replacement of the original, so sameAs,
rather than "a description of".

L.



Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14

2012-03-27 Thread Leigh Dodds
Hi,

On Tue, Mar 27, 2012 at 2:02 PM, Jonathan A Rees  wrote:
> ...
> There is a difference, since what is described could be an IR that
> does not have the description as content. A prime example is any DOI,
> e.g.
>
> http://dx.doi.org/10.1371/journal.pcbi.1000462
>
> (try doing conneg for RDF). The identified resource is an IR as you
> suggest, but the representation (after the 303 redirect) is not its
> content.

A couple of comments here:

1. Its not any DOI. I believe CrossRef are still the only registrar
that support this, but I might have missed an announcement. That's
still 50m DOIs though

2. Are you sure its an Information Resource? The DOI handbook [1]
notes that while typically used to identify intellectual property a
DOI can be used to identify anything. The CrossRef guidelines [2]
explain that "[a]s a matter of current policy, the CrossRef DOI
identifies the work, not its various potential manifestations...".

Is a FRBR work an Information Resource? Personally I'd say not, but
others may disagree. But as Dan Brickley has noted elsewhere in the
discussion, there's other nuances to take into account.

[1]. http://www.doi.org/handbook_2000/intro.html#1.6
[2]. http://crossref.org/02publishers/15doi_guidelines.html

Cheers,

L.



Re: What would break? Re: httpRange-14

2012-03-26 Thread Leigh Dodds
Hi,

On Mon, Mar 26, 2012 at 7:59 PM, Kingsley Idehen  wrote:
> On 3/26/12 2:09 PM, Leigh Dodds wrote:
>>
>> Hi Kingsley,
>>
>> On Mon, Mar 26, 2012 at 6:38 PM, Kingsley Idehen
>>  wrote:
>>>
>>> ...
>>> Leigh,
>>>
>>> Everything we've built in the Linked Data realm leverages the findings of
>>> HttpRange-14 re. Name/Address (Reference/Access) disambiguation. Our
>>> Linked
>>> Data clients adhere to these findings. Our Linked Data servers do the
>>> same.
>>
>> By "we" I assume you mean OpenLink. Here's where I asked the original
>> question [1]. Handily Ian Davis published an example resource that
>> returns a 200 OK when you de-reference it [2].
>
> Support was done (basically reusing our old internal redirection code)
> whenever that post was made by Ian.
>
>>
>> I just tested that in URI Burner [3] and it gave me broadly what I'd
>> expect, i.e. the resources mentioned in the resulting RDF. I didn't
>> see any visible breakage. Am I seeing fall-back behaviour?
>
>
> As per comment above its implemented. We have our own heuristic for handling
> self-describing resources. My concern is that what we've done isn't the norm
> i.e., I don't see others working that way, instinctively. You have to be
> over the Linked Data comprehension hump to be in a position emulate what
> we've done.

OK, I thought you might have done, so thanks for the confirmation. But
this further demonstrates that we don't necessarily need redirects.

>> 
>> Are people really testing status codes and changing subsequent
>> processing behaviour because of that? It looks like there's little or
>> no breakage in Sindice for example [3].
>>
>> Based on Tim's comments he has been doing that, are other people doing
>> the same? And if you have to ask if we're not, then who is this ruling
>> benefiting?
>
> We do the same, but we also go beyond (i.e., what you call a fall-back).

Would you care to elaborate on that? i.e: what inferences are you
deriving from the protocol interaction?

I can see that for a .txt document you are inferring that its a
foaf:Document [1].

I'm still also interested to hear from others.

[1]. 
http://linkeddata.uriburner.com/about/html/http/www.gutenberg.org/files/76/76.txt

Cheers,

L.



Re: What would break? Re: httpRange-14

2012-03-26 Thread Leigh Dodds
Hi Kingsley,

On Mon, Mar 26, 2012 at 6:38 PM, Kingsley Idehen  wrote:
> ...
> Leigh,
>
> Everything we've built in the Linked Data realm leverages the findings of
> HttpRange-14 re. Name/Address (Reference/Access) disambiguation. Our Linked
> Data clients adhere to these findings. Our Linked Data servers do the same.

By "we" I assume you mean OpenLink. Here's where I asked the original
question [1]. Handily Ian Davis published an example resource that
returns a 200 OK when you de-reference it [2].

I just tested that in URI Burner [3] and it gave me broadly what I'd
expect, i.e. the resources mentioned in the resulting RDF. I didn't
see any visible breakage. Am I seeing fall-back behaviour?

To answer your other question, I do understand the benefits that can
acrue from having separate URIs for a resource and its description. I
also see arguments for not always requiring both.

As a wider comment and question to the list, I'll freely admit that
what I've always done when fetching Linked Data is let my HTTP library
just follow redirects. Not to deal with 303s specifically, but because
that's just good user agent behaviour.

I've always assumed that everyone else does the same. But maybe I'm
wrong or in the minority.

Are people really testing status codes and changing subsequent
processing behaviour because of that? It looks like there's little or
no breakage in Sindice for example [3].

Based on Tim's comments he has been doing that, are other people doing
the same? And if you have to ask if we're not, then who is this ruling
benefiting?

Tim, could you share more about what application behaviour your
inferences support? Are those there to support specific features for
users?

Cheers,

L.

[1]. http://www.mail-archive.com/public-lod@w3.org/msg06735.html
[2]. http://iandavis.com/2010/303/toucan
[3]. 
http://linkeddata.uriburner.com/about/html/http/iandavis.com/2010/303/toucan
[4]. http://www.mail-archive.com/public-lod@w3.org/msg06746.html



Re: Middle ground change proposal for httpRange-14

2012-03-26 Thread Leigh Dodds
Hi David,

On Sun, Mar 25, 2012 at 6:50 PM, David Wood  wrote:
> Hi David,
>
> *sigh*.  I said recently that I would rather chew my arm off than re-engage 
> with http-range-14.  Apparently I have very little self control.
>
> On Mar 25, 2012, at 11:54, David Booth wrote:
>> Jeni, Ian, Leigh, Nick, Hugh, Steve, Masahide, Gregg, Niklas, Jerry,
>> Dave, Bill, Andy, John, Ben, Damian, Thomas, Ed Summers and Davy,
>>
>> I have drafted what I think may represent a middle ground change
>> proposal and I am wondering if something along this line would also meet
>> your concerns:
>> http://www.w3.org/wiki/UriDefinitionDiscoveryProtocol
>>
>
>> Highlights of this proposal:
>> - It enables a URI owner to unambiguously convey any URI definition to
>> an interested client.
>
> +1 to this.  I have long been a fan of unambiguous definition.  The summary 
> argument against is Leigh Dodd's
> "show what is actually broken" approach and the summary argument for is my 
> "we need to invent new ways to associate RDF
> with other Web resources in a discoverable manner to allow for 
> 'follow-your-nose' across islands of Linked Data."

I may be misreading you here, but I'm not against unambiguous
definition. My "show what is actually broken" comment (on twitter) was
essentially the same question as I've asked here before, and as Hugh
asked again recently: what applications currently rely on httprange-14
as it is written today. That useful so we can get a sense of what
would break with a change. So far there's been 2 examples I think.

That's in contrast to a lot of publisher data (but granted, not yet
quantified as to how much) that breaks the rules of httprange-14. I'd
prefer to fix that even if at the cost of breaking a few apps. But we
all know there are very, very few apps that consume Linked Data today,
so changing client expectations isn't a massive problem.

Identifying a set of publishing patterns that identify how publishers
can reduce ambiguity, and advice for clients on how to tread carefully
in the face of ambiguity and inconsistency is a better starting point
IMHO. The goal there being to encourage more unambiguous publishing of
data, by demonstrating value at every step.

Cheers,

L.



Re: Change Proposal for HttpRange-14

2012-03-26 Thread Leigh Dodds
Hi Tim,

On Sun, Mar 25, 2012 at 8:26 PM, Tim Berners-Lee  wrote:
> ...
> For example, To take an arbitrary one of the trillions out there, what does
> http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108&pageno=11
>  identify, there being no RDF in it?
> What can I possibly do with that URI if the publisher has not explicitly
> allowed me to use it
> to refer to the online book, under your proposal?

You can do anything you want with it. You could use record statements
about your HTTP interactions, e.g. retrieval status & date. Or,
because RDF lets anyone, say anything, anywhere, you could just decide
to use that as the URI for the book and annotate it accordingly. The
obvious caveat and risk is that the publisher might subsequently
disagree with you if they do decide to publish some RDF. I can re-use
your data if I decide that risk is acceptable and we can still
usefully interact.

Even if Gutenberg.org did publish some RDF at that URI, you still have
the risk that they could change their mind at a later date.
httprange-14 doesn't help at all there. Lack of precision and
inconsistency is going to be rife whatever form the URIs or response
codes used.

Encouraging people to say what their URIs refer to is the very first
piece of best practice advice.

L.



Re: Where to put the knowledge you add

2011-10-28 Thread Leigh Dodds
Hi Hugh,

On 12 October 2011 12:55, Hugh Glaser  wrote:
>
> Hi.
>
> I have argued for a long time that the linkage data (in particular owl:sameAs 
> and similar links) should not usually be mixed with the
> knowledge being published.

As an experiment I've added some new dbpedia datasets to Kasabi:

Dbpedia Links:
http://kasabi.com/dataset/dbpedia-links

Which is just the external link datasets (which actually include some
type assertions too)

Dbpedia Core:
http://kasabi.com/dataset/dbpedia-core

Which is just the core english datasets

And then the dbpedia english dataset which layers together these two
into a single dataset:

http://kasabi.com/dataset/dbpedia

That gives some choice over whether you want external links.

I'm also considering some other subsets (e.g. places and people).

To help flag up linksets I've also added a "Linking" category in
Kasabi to group together datasets that purely exist to link between
others.

Cheers,

L.

[1]. http://kasabi.com/browse/datasets/results/og_category%3A5603


-- 
Leigh Dodds
Product Lead, Kasabi
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )

2011-10-21 Thread Leigh Dodds
Hi,

On 21 October 2011 08:47, Dave Reynolds  wrote:
> ...
>> On 20 October 2011 10:34, Dave Reynolds  wrote:
>>>
>>> ...
>>> If you have two resources and later on it turns out you only needed one,
>>> no big deal just declare their equivalence. If you have one resource
>>> where later on it turns out you needed two then you are stuffed.
>>
>> Ed referred to "refactoring". So I'm curious about refactoring from a
>> single URI to two. Are developers necessarily stuffed, if they start
>> with one and later need two?
>>
>> For example, what if I later changed the way I'm serving data to add a
>> Content-Location header (something that Ian has raised in the past,
>> and Michael has mentioned again recently) which points to the source
>> of the data being returned.
>>
>> Within the returned data I can include statements about the document
>> at that URI referred to in the Content-Location header.
>>
>> Doesn't that kind of refactoring help?
>
> Helps yes, but I don't think it solves everything.
>
> Suppose you have been using http://example.com/lovelypictureofm31 to denote
> M31. Some data consumers use your URI to link their data on M31 to it. Some
> other consumers started linking to it in HTML as an IR (because they like
> the picture and the accompanying information, even though they don't care
> about the RDF). Now you have two groups of users treating the URI in
> different ways. This probably doesn't matter right now but if you decide
> later on you need to separate them then you can't introduce a new URI
> (whether via 303 or content-location header) without breaking one or other
> use. Not the end of the world but it's not a refactoring if the test cases
> break :)
>
> Does that make sense?

No, I'm still not clear.

If I retain the original URI as the identifier for the galaxy and add
either a redirect or a Content-Location, then I don't see how I break
those linking their data to it as their statements are still made
about the original URI.

But I don't see how I'm breaking people linking to it as if it were an
IR. That group of people are using my resource ambiguously in the
first place. Their links will also still resolve to the same content.

L.


-- 
Leigh Dodds
Product Lead, Kasabi
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )

2011-10-21 Thread Leigh Dodds
Hi,

On 20 October 2011 23:19, Kingsley Idehen  wrote:
> On 10/20/11 5:31 PM, Dave Reynolds wrote:
>>
>> What's more I really don't think the issues is about not understanding
>> about the distinction (at least in the clear cut cases). Most people I
>> talk to grok the distinction, the hard bit is understanding why 303
>> redirects is a sensible way of making it and caring about it enough to
>> put those in place.
>
> What about separating the concept of "indirection" from its actual
> mechanics? Thus, conversations about benefits will then have the freedom to
> blossom.
>
> Here's a short list of immediately obvious benefits re. Linked Data (at any
> scale):
>
> 1. access to data via data source names -- millions of developers world wide
> already do this with ODBC, JDBC, ADO.NET, OLE DB etc.. the only issue is
> that they are confined to relational database access and all its
> shortcomings
>
> 2. integration of heterogeneous data sources -- the ability to coherently
> source and merge disparately shaped data culled from a myriad of data
> sources (e.g. blogs, wikis, calendars, social media spaces and networks, and
> anything else that's accessible by name or address reference on a network)
>
> 3. crawling and indexing across heterogeneous data sources -- where the end
> product is persistence to a graph model database or store that supports
> declarative query language access via SPARQL (or even better a combination
> of SPARQL and SQL)
>
> 4. etc...
>
> Why is all of this important?
> Data access, integration, and management has been a problem that's straddled
> every stage of computer industry evolution. Managers and end-users always
> think about data conceptually, but continue to be forced to deal with
> access, integration, and management in application logic oriented ways. In a
> nutshell, applications have been silo vectors forever, and in doing so they
> stunt the true potential of computing which (IMHO) is ultimately about our
> collective quests for improved productivity.
>
> No matter what we do, there are only 24 hrs in a day. Most humans taper out
> at 5-6 hrs before physiological system faults kick in, hence our implicit
> dependency of computers for handling voluminous and repetitive tasks.
>
> Are we there yet?
> Much closer that most imagine. Our biggest hurdle (as a community of Linked
> Data oriented professionals) is a protracted struggle re. separating
> concepts from implementation details. We burn too much time fighting
> implementation details oriented battles at the expense of grasping core
> concepts.

Maybe I'm wrong but I think people, especially on this list,
understanding the overall benefits you itemize. The reason we talk
about implementation details is they're important to help people adopt
the technology: we need specific examples.

We get the benefits you describe from inter-linked dereferenceable
URIs, regardless of what format or technology we use to achieve it.
Using the RDF model brings additional benefits.

What I'm trying to draw out in this particular thread is specific
benefits the #/303 additional abstraction brings. At the moment, they
seem pretty small in comparison to the fantastic benefits we get from
data integrated into the web.

Cheers,

L.

-- 
Leigh Dodds
Product Lead, Kasabi
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )

2011-10-21 Thread Leigh Dodds
Hi Dave,

Thanks for the response, there's some good examples in there. I'm glad
that this thread is bearing fruit :)

I had a question about one aspect, please excuse the clipping:

On 20 October 2011 10:34, Dave Reynolds  wrote:
> ...
> If you have two resources and later on it turns out you only needed one,
> no big deal just declare their equivalence. If you have one resource
> where later on it turns out you needed two then you are stuffed.

Ed referred to "refactoring". So I'm curious about refactoring from a
single URI to two. Are developers necessarily stuffed, if they start
with one and later need two?

For example, what if I later changed the way I'm serving data to add a
Content-Location header (something that Ian has raised in the past,
and Michael has mentioned again recently) which points to the source
of the data being returned.

Within the returned data I can include statements about the document
at that URI referred to in the Content-Location header.

Doesn't that kind of refactoring help?

Presumably I could also just drop in a redirect and adopt the current
303 pattern without breaking anything?

Again, I'm probably missing something, but I'm happy to admit
ignorance if that draws out some useful discussion :)

Cheers,

L.

-- 
Leigh Dodds
Product Lead, Kasabi
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )

2011-10-20 Thread Leigh Dodds
Hi,

On 20 October 2011 13:25, Ed Summers  wrote:
> On Wed, Oct 19, 2011 at 12:59 PM, Leigh Dodds  wrote:
>> So, can we turn things on their head a little. Instead of starting out
>> from a position that we *must* have two different resources, can we
>> instead highlight to people the *benefits* of having different
>> identifiers? That makes it more of a best practice discussion and one
>> based on trade-offs: e.g. this class of software won't be able to
>> process your data correctly, or you'll be limited in how you can
>> publish additional data or metadata in the future.
>>
>> I don't think I've seen anyone approach things from that perspective,
>> but I can't help but think it'll be more compelling. And it also has
>> the benefits of not telling people that they're right or wrong, but
>> just illustrate what trade-offs they are making.
>
> I agree Leigh. The argument that you can't deliver an entity like a
> Galaxy to someone's browser sounds increasingly hollow to me. Nobody
> really expects that, and the concept of a Representation from
> WebArch/REST explains it away to most technical people. Plus, we now
> have examples in the wild like OpenGraphProtocol that seem to be
> delivering drinks, politicians, hotels, etc to machine agents at
> Facebook just fine.

It's the arrival of the OpenGraphProtocol which I think warrants a
more careful discussion. It seems to me that we no longer have to try
so hard to convince people that giving things de-referencable URIs
that return useful data. It's happening now, and there's immediate and
obvious benefit, i.e. integration with facebook, better searching
ranking, etc.

> But there does seem to be a valid design pattern, or even refactoring
> pattern, in httpRange-14 that is worth documenting.

Refactoring is how I've been thinking about it too. i.e. under what
situations might you want to have separate URIs for its resource and
its description? Dave Reynolds has given some good examples of that.

> Perhaps a good
> place would be http://patterns.dataincubator.org/book/? I think
> positioning httpRange-14 as a MUST instead of a SHOULD or MAY made a
> lot of sense to get the LOD experiment rolling. It got me personally
> thinking about the issue of identity in a practical way as I built web
> applications, that I probably wouldn't otherwise have otherwise done.
> But it would've been easier if grappling with it was optional, and
> there were practical examples of where it is useful, instead of having
> it be an issue of dogma.

My personal viewpoint is that it has to be optional, because there's
already a growing set of deployed examples of people not doing it (OGP
adoption), so how can we help those users understand the pitfalls
and/or the benefits of a slightly cleaner approach. We can also help
them understand how best to publish data to avoid mis-interpretation.

Simplify ridiculously just to make a point, we seem to have the
following situation:

* Create de-referencable URIs for things. Describe them with OGP
and/or Schema.org
Benefit: Facebook integration, SEO

* Above plus addition # URIs or 303s.
Benefit: ability to make some finer-grained assertions in some
specific scenarios. Tabulator is happy

Cheers,

L.

-- 
Leigh Dodds
Product Lead, Kasabi
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )

2011-10-20 Thread Leigh Dodds
Hi,

On 19 October 2011 23:10, Jonathan Rees  wrote:
> On Wed, Oct 19, 2011 at 5:29 PM, Leigh Dodds  wrote:
>> Hi Jonathan
>>
>> I think what I'm interested in is what problems might surface and
>> approaches for mitigating them.
>
> I'm sorry, the writeup was designed to do exactly that. In the example
> in the "conflict" section, a miscommunication (unsurfaced
> disagreement) leads to copyright infringement. Isn't that a problem?

Yes it is, and these are the issues I think that are worth teasing out.

I'm afraid though that I'll have to admit to not understanding your
specific example. There's no doubt some subtlety that I'm missing (and
a rotten head cold isn't helping). Can you humour me and expand a
little? The bit I'm struggling with is:

[[[
<http://example/x> xhv:license
   <http://creativecommons.org/licenses/by/3.0/>.

According to D2, this says that document X is licensed. According to
S2, this says that document Y is licensed
]]]

Taking the RDF data at face value, I don't see how the D2 and S2
interpretations differ. Both say that <http://example/x> has a
specific license. How could an S2 assuming client, assume that the
data is actually about another resource?

I looked at your specific examples, e.g. Flickr and Jamendo:

The RDFa extracted from the Flickr photo page does seem to be
ambiguous. I'm guessing the intent is to describe the license of the
photo and not the web page. But in that case, isn't the issue that
Flickr aren't being precise enough in the data they're returning?

The RDFa extracted from the Jamendo page including type information
(from the Open Graph Protocol) that says that the resource is an
album, and has a specific Creative Commons license. I think that's
what's intended isn't it?

Why does a client have to assume a specific stance (D2/S2). Why not
simply takes the data returned at face value? It's then up to the
publisher to be sure that they're making clear assertions.

> There is no heuristic that will tell you which of the two works is
> licensed in the stated way, since both interpretations are perfectly
> meaningful and useful.
>
> For mitigation in this case you only have a few options
> 1. precoordinate (via a "disambiguating" rule of some kind, any kind)
> 2. avoid using the URI inside <...> altogether - come up with distinct
> wads of RDF for the 2 documents
> 3. say locally what you think <...> means, effectively treating these
> URIs as blank nodes

Cheers,

L.

-- 
Leigh Dodds
Product Lead, Kasabi
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )

2011-10-19 Thread Leigh Dodds
Hi,

On 19 October 2011 23:36, Nathan  wrote:
> Leigh Dodds wrote:
>>
>> On 19 October 2011 20:48, Kingsley Idehen  wrote:
>>>
>>> On 10/19/11 3:16 PM, Leigh Dodds wrote:
>>>>
>>>> RFC3983:
>>>>
>>>> "A Uniform Resource Identifier (URI) is a compact sequence of
>>>> characters that identifies an abstract or physical resource."
>>>
>>> Yes, I agree with that.
>>>>
>>>> 2 URIs, therefore 2 resources.
>>>
>>> I disagree with your interpretation though.
>>
>> But I'm not interpreting anything there. The definition is a URI
>> identifies a resource. Ergo two different URIs identify two resources.
>
> Nonsense, and I'm surprised to hear it.
>
> Given two distinct URIs the most you can determine is that you have two
> distinct URIs.
>
> You do not know how many resources are identified, there may be no
> resources, one, two, or full sets of resources.
>
> Do see RFC3986, especially the section on equivalence.
>

OK, so maybe there is interpretation here :)

My reading is that, without additional knowledge, we should assume
that different URIs identify different resources. I think the wording
of RFC 3986 is fairly clear that a URI identifies a resource, so
assuming multiple resources for multiple URIs is fine - as a starting
position. I do understand that two  URIs can be aliases.

The section on equivalence you refer to suggests ways to identify
equivalence ranging from syntactic comparisons up to network protocol
operations. The latter gives us additional information (status codes,
headers) that can determine equivalence.

To go back to Kingsley's original example, I don't see any equivalence
of those URIs at the syntactic or network level

L.

-- 
Leigh Dodds
Product Lead, Kasabi
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )

2011-10-19 Thread Leigh Dodds
Hi,

On 19 October 2011 20:48, Kingsley Idehen  wrote:
> On 10/19/11 3:16 PM, Leigh Dodds wrote:
> 
>>> But you don't have two different resources. Please correct me if I am
>>> reading you inaccurately here, but are you saying that:
>>>
>>> http://dbpedia.org/resource/Linked Data and
>>> http://dbpedia.org/page/Linked
>>> Data == two different resources?
>>>
>>> I see:
>>>
>>> 1. 2 URIs
>>> 2. a generic URI (serving as a Name) and a purpose specific URI called a
>>> URL
>>> that serves as a data access address -- still two identifiers albeit
>>> split
>>> by function .
>>
>> RFC3983:
>>
>> "A Uniform Resource Identifier (URI) is a compact sequence of
>> characters that identifies an abstract or physical resource."
>
> Yes, I agree with that.
>>
>> 2 URIs, therefore 2 resources.
>
> I disagree with your interpretation though.

But I'm not interpreting anything there. The definition is a URI
identifies a resource. Ergo two different URIs identify two resources.

Whether those resources might be related to one another, or even
equivalent is an entirely different matter.

> Identifiers are names / handles. Thus, you have Names that resolve to actual
> data albeit via different levels of indirection.
>
> http://dbpedia.org/resource/Linked_Data and
> http://dbpedia.org/page/Linked_Data are routes to different representations
> of the same data. /resource/ (handle or name) is an indirect access route
> while /page/ is direct (address i.e., a location name) albeit with
> representation specificity i.e., HTML in the case of DBpedia.
>
> I am very happy that we've been able to narrow our differing views to
> something very concrete. Ultimately, we are going to arrive at clarity, and
> that's all that matters to me, fundamentally.

*That* all seems to be interpretation to me.

Cheers,

L.

-- 
Leigh Dodds
Product Lead, Kasabi
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )

2011-10-19 Thread Leigh Dodds
Hi Jonathan

On 19 October 2011 18:36, Jonathan Rees  wrote:
> On Wed, Oct 19, 2011 at 12:59 PM, Leigh Dodds  wrote:
>
>> So, can we turn things on their head a little. Instead of starting out
>> from a position that we *must* have two different resources, can we
>> instead highlight to people the *benefits* of having different
>> identifiers? That makes it more of a best practice discussion and one
>> based on trade-offs: e.g. this class of software won't be able to
>> process your data correctly, or you'll be limited in how you can
>> publish additional data or metadata in the future.
>>
>> I don't think I've seen anyone approach things from that perspective,
>> but I can't help but think it'll be more compelling. And it also has
>> the benefits of not telling people that they're right or wrong, but
>> just illustrate what trade-offs they are making.
>>
>> Is this not something we can do on this list? I suspect it'd be more
>> useful than attempting to categorise, yet again, the problems of hash
>> vs slash URIs. Although a canonical list of those might be useful to
>> compile once and for all.
>>
>> Anyone want to start things off?
>
> Sure.  http://www.w3.org/2001/tag/2011/09/referential-use.html

Thanks for the pointer. That's an interesting document. I've read it
once but need to digest it a bit further.

The crux of the issue, and what I was getting at in this thread is
what you refer to towards the end:

"It is possible that D2 and S2 can be used side by side by different
communities for quite a while before a collision of the sort described
above becomes a serious interoperability problem. On the other hand,
when the conflict does happen, it will be very painful."

I think what I'm interested in is what problems might surface and
approaches for mitigating them. I'm particularly curious whether
heuristics might be used to disambiguate or remove conflict.

>> As a leading question: does anyone know of any deployed semantic web
>> software that will reject or incorrectly process data that flagrantly
>> ignores httprange-14?
>
> Tabulator.

Yes. That's the only piece of software I've heard of that has problems.



-- 
Leigh Dodds
Product Lead, Kasabi
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )

2011-10-19 Thread Leigh Dodds
Hi,

On 19 October 2011 18:44, Kingsley Idehen  wrote:
>> 
>> So, can we turn things on their head a little. Instead of starting out
>> from a position that we *must* have two different resources, can we
>> instead highlight to people the *benefits* of having different
>> identifiers?
>
> But you don't have two different resources. Please correct me if I am
> reading you inaccurately here, but are you saying that:
>
> http://dbpedia.org/resource/Linked Data and http://dbpedia.org/page/Linked
> Data == two different resources?
>
> I see:
>
> 1. 2 URIs
> 2. a generic URI (serving as a Name) and a purpose specific URI called a URL
> that serves as a data access address -- still two identifiers albeit split
> by function .

RFC3983:

"A Uniform Resource Identifier (URI) is a compact sequence of
characters that identifies an abstract or physical resource."

2 URIs, therefore 2 resources.

Cheers,

L.

-- 
Leigh Dodds
Product Lead, Kasabi
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs)

2011-10-19 Thread Leigh Dodds
Hi,

I tried it with this URI and got an error:

http://www.bbc.co.uk/programmes/b01102yg#programme

Cheers,

L.

On 17 October 2011 11:41, Yang Squared  wrote:
> Following the HTTP-range-14 discussion, we developed a Semantic Web URIs
> Validator named Hyperthing which helps to publish the Linked Data. We
> particularly investigated what happens when we temporary and
> permnent redirect (e.g. 301 and 302 redirections) of a Semantic Web URI (303
> and hash URI).
> http://www.hyperthing.org/
> Hyperthing mainly functions for three purposes:
> 1) It determines if the requested URI identifies a Real World Object or a
> Web document;
> 2) It checks whether the URIs publishing method follows the W3C hash URIs
> and 303 URI practice;
> 3) It can be used to check the validity of the chains of the redirection
> between the Real World Object URIs and Document URIs to prevent the data
> publisher mistakenly redirecting between these two kinds. (e.g. it checks
> against redirection which include 301, 302 and 307)
> For more information please read
>  Dereferencing Cool URI for the Semantic Web: What is 200 OK on the Semantic
> Web?
> http://dl.dropbox.com/u/4138729/paper/dereference_iswc2011.pdf
> Any suggestion is welcome.


-- 
Leigh Dodds
Product Lead, Kasabi
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )

2011-10-19 Thread Leigh Dodds
Hi,

[Aside: changing the subject line so we can have a clearer discussion]

On 17 October 2011 14:58, Norman Gray  wrote:
>...
> I've done far fewer talks of this type than Tom has, but I've never found 
> anyone having difficulty here, either.  Mind you, I never talk of 
> 'information resource' or httpRange-14.
>
> For what it's worth, I generally say something along the lines of "This URI, 
> X, is the name of a galaxy.  If you put that URI into your
> browser, you can't get the galaxy back, can you, because the galaxy is too 
> big to fit inside your computer.  So something different has to
> happen, doesn't it?"  A remark about Last-Modified generally seals the deal.

I've done the same, and people do quite often get it. At least for a
few minutes :) I think my experience echoes Rob's more than Tom's.
I've had more than one Linked Data talk/tutorial de-railed by debate
and discussion of the issue when there are much more interesting
aspects to explore.

While I've not used the galaxy example, I have taken similar
approaches. But I can also imagine saying, for example:

"This URI, X, is the name of a galaxy.  If you put that URI into your
browser, obviously you can't get the galaxy back, can you. So when you
request it, you get back a representation of it. You know, just like
when you request a file from a web server you don't download the
*actual* file, just a representation of it. Possibly in another
format".

And further, if someone asked about Last-Modified dates:

"Last-Modified? Well as it turns out the Last-Modified date isn't
defined to be the date that a resource last changed. It's up to the
origin server to decide what it means. So for something like a galaxy,
it can be the date of our last observation".

My point being that web architecture already has a good explanation as
to why real-world, or even digital things are passed around the
internet. That's why we have the Resource and Representation
abstractions in the first place.

So, can we turn things on their head a little. Instead of starting out
from a position that we *must* have two different resources, can we
instead highlight to people the *benefits* of having different
identifiers? That makes it more of a best practice discussion and one
based on trade-offs: e.g. this class of software won't be able to
process your data correctly, or you'll be limited in how you can
publish additional data or metadata in the future.

I don't think I've seen anyone approach things from that perspective,
but I can't help but think it'll be more compelling. And it also has
the benefits of not telling people that they're right or wrong, but
just illustrate what trade-offs they are making.

Is this not something we can do on this list? I suspect it'd be more
useful than attempting to categorise, yet again, the problems of hash
vs slash URIs. Although a canonical list of those might be useful to
compile once and for all.

Anyone want to start things off?

As a leading question: does anyone know of any deployed semantic web
software that will reject or incorrectly process data that flagrantly
ignores httprange-14?

Cheers,

L.

-- 
Leigh Dodds
Product Lead, Kasabi
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Beyond the Triple Count

2011-09-28 Thread Leigh Dodds
Hi,

I did a talk at semtech this week about some ideas for improving how
we document, publish and assess datasets. I've done a write-up which
might be of interest:

http://blog.kasabi.com/2011/09/28/beyond-the-triple-count/

Cheers,

L.

-- 
Leigh Dodds
Product Lead, Kasabi
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Question: Authoritative URIs for Geo locations? Multi-lingual labels?

2011-09-09 Thread Leigh Dodds
Hi Kingsley,

On 9 September 2011 15:20, Kingsley Idehen  wrote:
> On 9/9/11 8:58 AM, Leigh Dodds wrote:
>>
>> Hi,
>>
>> As well as the others already mentioned there's also Yahoo Geoplanet:
>>
>> http://beta.kasabi.com/dataset/yahoo-geoplanet
>>
>> This has multi-lingual labels and is cross-linked to the Ordnance
>> Survey data, Dbpedia, but that could be improved.
>>
>> As for a list, there are currently 34 geography related datasets
>> listed in Kasabi here:
>>
>> http://beta.kasabi.com/browse/datasets/results/og_category%3A147
>
> Leigh,
>
> Can anyone access these datasets or must they obtain a kasabi account en
> route to authenticated access?

As I've said (repeatedly!) there's no authentication around any of
Linked Data. That might be an option for publishers in future, but not
during the beta and not for any of the open datasets which we've
published currently.

API keys are only required for the APIs, e.g. SPARQL, search, etc. The
choice of authentication options will increase in future.

So I encourage you to actually go and have a look. There's a direct
link to the Linked Data views from every homepage.

Here's a pointer to the blog post I wrote and circulated after our
last discussion:

http://blog.kasabi.com/2011/08/12/linked-data-in-kasabi/

Cheers,

L.

-- 
Leigh Dodds
Product Lead, Kasabi
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Question: Authoritative URIs for Geo locations? Multi-lingual labels?

2011-09-09 Thread Leigh Dodds
Hi,

As well as the others already mentioned there's also Yahoo Geoplanet:

http://beta.kasabi.com/dataset/yahoo-geoplanet

This has multi-lingual labels and is cross-linked to the Ordnance
Survey data, Dbpedia, but that could be improved.

As for a list, there are currently 34 geography related datasets
listed in Kasabi here:

http://beta.kasabi.com/browse/datasets/results/og_category%3A147

Cheers,

L.

On 8 September 2011 15:38, M. Scott Marshall  wrote:
> It seems that dbpedia is a de facto source of URIs for geographical
> place names. I would expect to find a more specialized source. I think
> that I saw one mentioned here in the last few months. Are there
> alternatives that are possible more fine-grained or designed
> specifically for geo data? With multi-lingual labels? Perhaps somebody
> has kept track of the options on a website?
>
> -Scott
>
> --
> M. Scott Marshall
> http://staff.science.uva.nl/~marshall
>
> On Thu, Sep 8, 2011 at 3:07 PM, Sarven Capadisli  wrote:
>> On Thu, 2011-09-08 at 14:01 +0100, Sarven Capadisli wrote:
>>> On Thu, 2011-09-08 at 14:07 +0200, Karl Dubost wrote:
>>> > # Using RDFa (not implemented in browsers)
>>> >
>>> >
>>> > http://www.w3.org/2003/01/geo/wgs84_pos#"; id="places-rdfa">
>>> >     >> >         about="http://www.dbpedia.org/resource/Montreal";
>>> >         geo:lat_long="45.5,-73.67">Montréal, Canada
>>> >     >> >         about="http://www.dbpedia.org/resource/Paris";
>>> >         geo:lat_long="48.856578,2.351828">Paris, France
>>> > 
>>> >
>>> > * Issue: Latitude and Longitude not separated
>>> >   (have to parse them with regex in JS)
>>> > * Issue: xmlns with 
>>> >
>>> >
>>> > # Question
>>> >
>>> > On RDFa vocabulary, I would really like a solution with geo:lat and 
>>> > geo:long, Ideas?
>>>
>>> Am I overlooking something obvious here? There is lat, long properties
>>> in wgs84 vocab. So,
>>>
>>> http://dbpedia.org/resource/Montreal";>
>>>     >>           content="45.5"
>>>           datatype="xsd:float">
>>>     >>           content="-73.67"
>>>           datatype="xsd:float">
>>>     Montreal
>>> 
>>>
>>> Tabbed for readability. You might need to get rid of whitespace.
>>>
>>> -Sarven
>>
>> Better yet:
>>
>> http://dbpedia.org/resource/Montreal";>
>>    > ...
>>
>>
>> -Sarven
>
>



-- 
Leigh Dodds
Product Lead, Kasabi
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: CAS, DUNS and LOD (was Re: Cost/Benefit Anyone? Re: Vote for my Semantic Web presentation at SXSW)

2011-08-24 Thread Leigh Dodds
Hi,

On 24 August 2011 15:40, David Wood  wrote:
> On Aug 24, 2011, at 2:44, Leigh Dodds  wrote:
>
>> Hi,
>>
>> On 23 August 2011 15:17, Gannon Dick  wrote:
>>> Either "Linked Data ecosystem" or "linked data Ecosystem" is a dangerously 
>>> flawed paradigm, IMHO.  You don't "improve" MeSH by
>>> flattening it, for example, it is what it is. Since CAS numbers are not a 
>>> directed graph, an algorithmic transform to a URI (which *is* a
>>> directed graph) is risks the creation of a "new" irreconcilable taxonomy.  
>>> For example, Nitrogen is ok to breathe and liquid Nitrogen is a
>>> not very practical way to chill wine.
>>
>> A URI isn't a directed graph. You can use them to build one by making
>> statements though.
>>
>> Setting aside any copyright issues, the CAS identifiers are useful
>> Natural Keys [1]. As they're well deployed, using them to create URIs
>> [2] is sensible
>
> Hi Leigh,
>
> Right.  Unfortunately it is also illegal :/

Yes, I read the first part of the thread! I was merely pointing out
the useful patterns for projecting identifiers into URIs.

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: CAS, DUNS and LOD (was Re: Cost/Benefit Anyone? Re: Vote for my Semantic Web presentation at SXSW)

2011-08-24 Thread Leigh Dodds
Hi,

On 23 August 2011 15:17, Gannon Dick  wrote:
> Either "Linked Data ecosystem" or "linked data Ecosystem" is a dangerously 
> flawed paradigm, IMHO.  You don't "improve" MeSH by
> flattening it, for example, it is what it is. Since CAS numbers are not a 
> directed graph, an algorithmic transform to a URI (which *is* a
> directed graph) is risks the creation of a "new" irreconcilable taxonomy.  
> For example, Nitrogen is ok to breathe and liquid Nitrogen is a
> not very practical way to chill wine.

A URI isn't a directed graph. You can use them to build one by making
statements though.

Setting aside any copyright issues, the CAS identifiers are useful
Natural Keys [1]. As they're well deployed, using them to create URIs
[2] is sensible as it simplifies the process of linking between
datasets [3].

To answer Patrick's question, to help bridging between systems that
only use the original literal version, rather than the URIs, then we
should ensure that the literal keys are included in the data [4].

These are well deployed patterns and, from my experience, make it
really simple and easy to bridge and link between different datasets
and systems.

Cheers,

L.

[1]. http://patterns.dataincubator.org/book/natural-keys.html
[2]. http://patterns.dataincubator.org/book/patterned-uris.html
[3]. http://patterns.dataincubator.org/book/shared-keys.html
[4]. http://patterns.dataincubator.org/book/literal-keys.html

-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: New draft of Linked Data Patterns book

2011-08-22 Thread Leigh Dodds
Hi,

On 20 August 2011 16:01, Giovanni Tummarello
 wrote:
> Seems pretty interesting, clearly out of practical experience !

Thanks Giovanni! Yes, I've been trying to apply practical experience
wherever possible. I'm very keen on collecting useful application
patterns that may help others build good RDF & Linked Data based apps.

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



New draft of Linked Data Patterns book

2011-08-19 Thread Leigh Dodds
Hi,

There's a new draft of the Linked Data patterns book available, with
12 new patterns, mainly in the application patterns section.

The latest version is available from here:

http://patterns.dataincubator.org/book/

There are PDF and EPUB versions linked from the homepage. The source
is also available in github at:

https://github.com/ldodds/ld-patterns

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Job: Data Engineer, Kasabi

2011-08-12 Thread Leigh Dodds
Hi,

Just a reminder to people that this job opening is still available. If
you're interested in doing hands on work with a wide range of
different data types, covering both free, open data & commercial
datasets. Over time we expect to be doing more data analysis using
Map-Reduce and Pregel, as well as interlinking and enrichment.

We're looking for someone who is enthusiastic about working with,
analysing, and demonstrating the value of data. If you want a hands-on
role working with data, then this should definitely be of interest.

More details at [1] or feel free to drop me an email with any
questions or applications.

[1] http://tbe.taleo.net/NA9/ats/careers/requisition.jsp?org=TALIS&cws=1&rid=41

Cheers,

L.

On 17 June 2011 16:22, Leigh Dodds  wrote:
> Hi,
>
> Short job advert: we're looking for someone to join the Kasabi team as
> a Data Engineer. The role will involve working with RDF and Linked
> Data so should be of interest to this community!
>
> More information at [1]. Feel free to get in touch with me personally
> if you want more information.
>
> Cheers,
>
> L.
>
> [1] 
> http://tbe.taleo.net/NA9/ats/careers/requisition.jsp?org=TALIS&cws=1&rid=41
>
> --
> Leigh Dodds
> Programme Manager, Talis Platform
> Mobile: 07850 928381
> http://kasabi.com
> http://talis.com
>
> Talis Systems Ltd
> 43 Temple Row
> Birmingham
> B2 5LS
>



-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: DBpedia: limit of triples

2011-08-09 Thread Leigh Dodds
Hi,

On 9 August 2011 11:26, Jörn Hees  wrote:
> ...
> I also guess it would be better to construct the given document first from 
> the outgoing triples, maybe preferring the ontology mapped triples, and then 
> incoming links up to a 2000 triples limit (if necessary to limit bandwidth).
> That would fit the description in the above mentioned section way better than 
> the current implementation.

You could also try a mirror to see if that provides better facilities, e.g. [1]

Cheers,

L.

[1]. http://beta.kasabi.com/dataset/dbpedia-36

-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Get your dataset on the next LOD cloud diagram

2011-07-13 Thread Leigh Dodds
Hi,

On 13 July 2011 14:30, Kingsley Idehen  wrote:
> Can you ping me or reply to this list with a list of missing SPARQL
> endpoints. Alternatively, you bookmark them on del.icio.us using tag:
> sparql_endpoint.
>
> Here is my collection: http://www.delicious.com/kidehen/sparql_endpoint .

The data is all in a machine-readable form. See:

http://data.kasabi.com/datasets

The URI supports conneg so you can follow rdfs:seeAlso links to all of
the VoiD descriptions and hence to the sparql endpoints, plus all of
the other APIs.

It'd be nice if the LD cloud diagram used other machine-readable
sources where possible. I know CKAN is a good focal point for helping
curate activity, but also frustrating to have to copy data around
whether manually or otherwise.

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Get your dataset on the next LOD cloud diagram

2011-07-13 Thread Leigh Dodds
Hi,

On 13 July 2011 13:05, Bernard Vatant  wrote:
> Re. availability, just a reminder of SPARQL Endpoints Status service
> http://labs.mondeca.com/sparqlEndpointsStatus/index.html
> As of today 80% (192/240) endpoints registered at CKAN are up and running.
> Monitor grey dots (still alive?) for candidate passed out datasets ...

Well as Kingsley pointed out SPARQL is only one metric. Whether the
URIs still resolve is arguably most important for the Linked Data
diagram, but service availability is a good thing to monitor.

However its also worth noting that there are mirrors of a number of
datasets. E.g. we have 70+ datasets in Kasabi, some new to the cloud,
some of which are mirrors. Not all (any?) of those SPARQL endpoints
are on your list.

Cheers,

L.
-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Get your dataset on the next LOD cloud diagram

2011-07-13 Thread Leigh Dodds
Hi,

On 12 July 2011 18:45, Pablo Mendes  wrote:
> Dear fellow Linked Open Data publishers and consumers,
> We are in the process of regenerating the next LOD cloud diagram and
> associated statistics [1].
> ...

This email prompted a discussion about how to the data collection or
diagram could be improved or updated. As CKAN is an open platform and
anyone can add additional tags to datasets, why doesn't everyone who
is interested in seeing a particular improvement or alternate view of
the data just go ahead and do it? There's no need to require all this
to be done by one team on a fixed schedule.

Some light co-ordination between people doing similar analyses would
be worthwhile, but it wouldn't be hard to, e.g. tag datasets based on
whether their Linked Data or SPARQL endpoint is available regularly,
whether they're currently maintained, or (my current bug bear) whether
the data dumps they publish parse with more than one tool chain.

It'd be nice to see many different aspects of the cloud being explored.

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: WebID vs. JSON (Was: Re: Think before you write Semantic Web crawlers)

2011-06-22 Thread Leigh Dodds
Hi,

On 22 June 2011 15:41, William Waites  wrote:
> What does WebID have to do with JSON? They're somehow representative
> of two competing trends.
>
> The RDF/JSON, JSON-LD, etc. work is supposed to be about making it
> easier to work with RDF for your average programmer, to remove the
> need for complex parsers, etc. and generally to lower the barriers.
>
> The WebID arrangement is about raising barriers. Not intended to be
> the same kind of barriers, certainly the intent isn't to make
> programmer's lives more difficult, rather to provide a good way to do
> distributed authentication without falling into the traps of PKI and
> such.
>
> While I like WebID, and I think it is very elegant, the fact is that I
> can use just about any HTTP client to retrieve a document whereas to
> get rdf processing clients, agents, whatever, to do it will require
> quite a lot of work [1]. This is one reason why, for example, 4store's
> arrangement of /sparql/ for read operations and /data/ and /update/
> for write operations is *so* much easier to work with than Virtuoso's
> OAuth and WebID arrangement - I can just restrict access using all of
> the normal tools like apache, nginx, squid, etc..
>
> So in the end we have some work being done to address the perception
> that RDF is difficult to work with and on the other hand a suggestion
> of widespread putting in place of authentication infrastructure which,
> whilst obviously filling a need, stands to make working with the data
> behind it more difficult.
>
> How do we balance these two tendencies?

By recognising that often we just need to use existing technologies
more effectively and more widely, rather than throw more technology at
a problem, thereby creating an even greater education and adoption
problem?

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Job: Data Engineer, Kasabi

2011-06-17 Thread Leigh Dodds
Hi,

Short job advert: we're looking for someone to join the Kasabi team as
a Data Engineer. The role will involve working with RDF and Linked
Data so should be of interest to this community!

More information at [1]. Feel free to get in touch with me personally
if you want more information.

Cheers,

L.

[1] http://tbe.taleo.net/NA9/ats/careers/requisition.jsp?org=TALIS&cws=1&rid=41

-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Squaring the HTTP-range-14 circle

2011-06-17 Thread Leigh Dodds
Hi,

On 17 June 2011 15:32, Kingsley Idehen  wrote:
> On 6/17/11 3:11 PM, Leigh Dodds wrote:
>>
>> I just had to go and check whether Amazon reviews and Facebook
>> comments actually do have their own pages. That's because I've never
>> seen them presented as anything other than objects within another
>> container, either in a web page or a mobile app. So I think you could
>> argue that when people are "linking" and marking things as useful,
>> they're doing that on a more general abstraction, i.e. the "Work" (to
>> borrow FRBR terminology) not the particular web page.
>
> You have to apply context to your statement above. Is the context: WWW as an
> Information space or Data Space?

I can't answer that because I don't know what you mean by those terms.
It's just a web of resources as far as I'm concerned.

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Squaring the HTTP-range-14 circle

2011-06-17 Thread Leigh Dodds
Hi,

On 17 June 2011 14:04, Tim Berners-Lee  wrote:
>
> On 2011-06 -17, at 08:51, Ian Davis wrote:
>> ...
>>
>> Quite. When a facebook user clicks the "Like" button on an IMDB page
>> they are expressing an opinion about the movie, not the page.
>
> BUT when the click a "Like" button on a blog they are expressing they like the
> blog, not the movie it is about.
>
> AND when they click "like" on a facebook comment they are
> saying they like the comment not the thing it is commenting on.
>
> And on Amazon people say "I found this review useful" to
> like the review on the product being reviewed, separately from
> rating the product.
> So there is a lot of use out there which involves people expressing
> stuff in general about the message not its subject.

Well even that's debatable.

I just had to go and check whether Amazon reviews and Facebook
comments actually do have their own pages. That's because I've never
seen them presented as anything other than objects within another
container, either in a web page or a mobile app. So I think you could
argue that when people are "linking" and marking things as useful,
they're doing that on a more general abstraction, i.e. the "Work" (to
borrow FRBR terminology) not the particular web page.

And that's presumably the way that Facebook and Amazon see it too
because that data is associated with the status or review in whichever
medium I look at it (page or app).

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: {Disarmed} Re: Squaring the HTTP-range-14 circle

2011-06-13 Thread Leigh Dodds
Hi,

On 13 June 2011 16:04, Christopher Gutteridge  wrote:
> <http://en.wikipedia.org/wiki/David_%28Michelangelo%29>
>  dc:creator<http://en.wikipedia.org/wiki/Michelangelo>  .
>
> Did he make the statue or the webpage?

Given that he died before the internet was invented, it'd probably be
the statue.

More data beats better algorithms :)

L.
-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Common RDF Vocabulary Labels Vocabulary

2011-06-06 Thread Leigh Dodds
Hi,

On 6 June 2011 10:00, Hugh Glaser  wrote:
> But hang on, is the web not about linking, rather than copying things around?

Isn't this annotation, rather than copying?

L.
-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Common RDF Vocabulary Labels Vocabulary

2011-06-06 Thread Leigh Dodds
Hi,

On 6 June 2011 02:42, Christopher Gutteridge  wrote:
> +1
>
> I would go further and suggest that you cut and paste in the property &
> class definitions to provide a single file which can be translated to enable
> core parts of the semweb in other languages.

That's the approach I took with getting translations of the
FOAF-a-Matic. I had a separate XML file with the text that
contributors could just update and send back. Worked really well. A
shared Google spreadsheet might work well as a lo-fi approach.

But that assumes people will do a whole translation set or whole
vocabulary in one go. Maybe it would be easier to do a few here and
there. Strikes me it'd be a good case for a super-simple service:
homepage shows a random property, prompts user to fill in a
translation. Make it into a little game.

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Announce: Kasabi Public Beta, RDF data hosting and Linked Data publishing

2011-06-03 Thread Leigh Dodds
Hi,

If you'll forgive the product announcement, I wanted to let people
know that we've launched the Kasabi Public Beta today. The
announcement is here [1] and the beta side is accessible from [2]

The beta includes RDF data hosting and Linked Data publishing, the API
supports importing of RDFa data too. The dataset directory is
available as RDF and there are VoiD descriptions of each dataset, e.g.
[4]. We're using a simple experimental vocabulary extension to VoiD
[5] to point to additional APIs relating to a dataset [5]. This is to
allow clients to boot-strap discovery of services in a RESTful way.

The site allows uses to create APIs either using the Linked Data API
specification, or something that we're calling SPARQL Stored
Procedures [6]. The goal is to support as broad a range of data access
options as possible, and enable new ways for developers to share their
skills. E.g. simplifying access to a dataset by creating a simpler API
over a SPARQL query, or just sharing SPARQL queries for a particular
dataset.

I'd welcome feedback on any of these features. We have a separate
developer list at [7] for more detailed discussion, but there are some
general features and services which I think are of interest to this
community :)

Cheers,

L.

[1]. http://blog.kasabi.com/2011/06/03/kasabi-public-beta/
[2]. http://beta.kasabi.com
[3]. http://data.kasabi.com/datasets
[4]. http://data.kasabi.com/dataset/bricklink
[5]. http://labs.kasabi.com/ns/services
[6]. http://beta.kasabi.com/doc/api/sparql-stored-procedure
[7]. http://groups.google.com/group/kasabi-dev

-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: ANN: SKOS implementation of the ACM Communication Classification System

2011-06-02 Thread Leigh Dodds
Hi Christoph,

Very happy to see more data appearing, congrats :)

A brief aside on licensing...

On 2 June 2011 09:27, Christoph Lange  wrote:
> ...
> Note that we have not yet considered copyright issues -- but at least
> preserved the original ACM copyright statement, which permits "personal or
> classroom use".  That's probably not enough for reasonable Linked Data
> applications.  I would be glad if someone familiar with the subject could
> point out what to do.  What did previous publishers of RDF versions of the
> ACM CCS do?

I won't reproduce the text here, and I'm not a lawyer, but the wording
says "...to republish, to post on servers, or to redistribute to
lists, requires prior specific permission and/or a fee. Request
permission to republish from..[address+email]".

I think one reasonable thing that someone may want to do is mirror the
data, e.g. to provide a public SPARQL endpoint or other services.
Currently it doesn't look like I can do that without contacting the
ACM directly, which I assume you've also done.

It's not clear to me whether I could even copy parts of the data and
index it to use in an application, as that potentially falls out side
of the personal and classroom use.

I fully support arguments to the effect of "use and seek forgiveness
later" when using data, but as we see more and more commercial usage
of Linked Data, I think we really need to see clearer licensing around
data. Otherwise feels like we're building on uncertain ground.

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: For our UK readers

2011-05-25 Thread Leigh Dodds
Lets hope that any fall-out doesn't come back to me as the person to
whom errors are reported to!

Arguably the generatorAgent and errorReportsTo predicate ought to be
removed if you're done further hand editing/changes to the file, but I
doubt anyone does that in practice.

Cheers,

L.

On 24 May 2011 15:07, Hugh Glaser  wrote:
> http://who.isthat.org/id/CTB
>
> Have I got the RDF right?
> Not sure foaf is the right thing for this.
> Should there be a blank node somewhere in there?
> Suggestions for improvements welcome.
>
> Hugh
>
>



-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: implied datasets

2011-05-23 Thread Leigh Dodds
Hi William,

On 23 May 2011 14:01, William Waites  wrote:
> ...
> Then for each dataset that I have that uses the links to this space, I
> count them up and make a linkset pointing at this imaginary dataset.
>
> Obviously the same strategy for anywhere there exist some kind of
> standard identifiers that are not URIs in HTTP.
>
> Does this make sense?

I'm not sure that the dataset is "imaginary", but what you're doing
seems eminently sensible to me. I've been working on a little project
that I hope to release shortly that aims to facilitate this kind of
linking, especially where those non-URI identifiers, or Literal Keys
[1] are
used to build patterned URIs.

> Can we sensibly talk about and even assert the existence of a dataset
> of infinite size? (whatever "existence" means).

I think so, we can assert what kinds of things it contains and
describe it in general terms, even if we can't enumerate all of its
elements.

It may be more natural to thing of these more as services though than
datasets. i.e. a service that accepts some keys as input and returns a
set of assertions. In this case the assertions would be links to other
datasets.

> Is this an abuse of DCat/voiD?

Not in my view, I think the notion of dataset is already pretty broad.

> Are this class of datasets subsets of sameAs.org (assuming sameAs.org
> to be complete in principle?)

Subsets if they only asserted sameAs links, but I think you're
suggesting that this may be too strict. I think there's potentially a
whole set of related "predicate based services" [2] that provide
useful indexes of existing datasets, or expose additional annotations
of extra sources.

The project I've been working on facilitates not just sameAs links,
but any form of links that can be derived from shared URI patterns.
This would include topic/subject based linking. ISBN was one the use
cases I had in mind, but here are others.

Cheers,

L.

[1]. http://patterns.dataincubator.org/book/literal-keys.html
[2]. http://www.ldodds.com/blog/2010/03/predicate-based-services/

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Navigating Data (was Re: Take2: 15 Ways to Think About Data Quality (Just for a Start) )

2011-04-28 Thread Leigh Dodds
Hi,

Changed subject line to match topic:

On 15 April 2011 14:47, glenn mcdonald  wrote:
> This reminds me to come back to the point about what I initially
> called Directionality, and Dave improved to Modeling Consistency.

> ...
> - But even in RDF, directionality poses a significant discovery
> problem. In a minimal graph (let's say "minimal graph" means that each
> relationship is asserted in only one direction, so there's no
> relationship redundancy), you can't actually explore the data
> navigationally. You can't go to a single known point of interest, like
> a given president, and explore to find out everything the data holds
> and how it connects...

Doesn't this really depend on how the navigational interface is constructed?

If we're looking purely at Linked Data views created using a Concise
Bounded Description, then yes I agree, if there are no "back links" in
the data, then navigation is problematic.

But if we use different algorithms to describe the views, or
supplement it with SPARQL queries, then those navigational links can
be presented, e.g. "other resources that refer to this resources".

I think as you noted elsewhere inverse links could also be inferred
based on the schema. This simplifies the navigation UI as the links
are part of the data.

> ...You can explore the *outward* relationships from
> any given point, but to find out about the *inward* relationships you
> have to keep doing new queries over the entire dataset.

Yes.

> ...The same basic
> issue applies to an XML representation of the data as a tree: you can
> squirrel your way down, but only in the direction the original modeler
> decided was "down". If you need a different direction, you have to
> hire a hypersquirrel.

Well an XML node typically has a reference to its parent (it does in
the DOM anyway) so moving back up the tree is easy.

> - Of course, most RDF-presenting systems recognize this as a usability
> problem, and address it by turning the minimal graph into a redundant
> graph for UI purposes. Thus in a data-browser UI you usually see, for
> a given node, lists of both outward and inward relationships. This is
> better, but if this abstraction is done at the UI layer, you still
> lose it once you drop down into the SPARQL realm. This makes the
> SPARQL queries harder to write, because you can't write them the way
> you logically think about the question, you have to write them the way
> the data thinks about the question. And this skew from real logic to
> directional logic can make them *much* harder to understand or
> maintain, because the directionality obscures the purpose and reduces
> the self-documenting nature of the query.

Assuming you don't materialize the inferences directly in the data,
then isn't the answer to have both the SPARQL endpoint and the
navigational UI use the same set of inferred data?

> All of this is *much* better, in usability terms, if the data is
> redundantly, bi-directionally connected all the way down to the level
> of abstraction at which you're working. Now you can explore to figure
> out what's there, and you can write your queries in the way that makes
> the most human sense. The artificicial skew between the logical
> structure and the representational structure has been removed. This is
> perfectly possible in an RDF-based system, of course, if the software
> either generates or infers the missing inverses. We incur extra
> machine overhead to reduce the human congnitive burden. I contend this
> should be considered a nearly-mandatory best-practice for linked data,
> and that propogating inverses around the LOD cloud ought to be one of
> things that makes the LOD cloud *a thing*, rather than just a
> collection of logical silos.

The same problem exists on the document web: it can be useful to know
what links to a specific page. There are various techniques to help
address that, e.g. centralized indexes that can expose more of the
graph (Google) or point-to-point mechanisms for notifying links (e.g.
Pingback, etc).

With RDF system we may be able to infer some extra links, buth with
Linked Data we can't infer all of them, so we have the same issue and
can deploy very similar infrastructure to solve the problem.

Currently we have SameAs.org, which is specialized for one type of
linking, but it'd be nice to see others [1]. And there have been
experiments with various pingback/notification services for Linked
Data. Are any of the latter being widely deployed/used?

Cheers,

L.

[1]. http://www.ldodds.com/blog/2010/03/predicate-based-services/

-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Why does rdf-sparql-protocol say to return 500 when refusing a query?

2011-04-28 Thread Leigh Dodds
Hi,

On 27 April 2011 11:18, Alexander Dutton  wrote:
> On 17/04/11 21:07, Hugh Glaser wrote:
>>
>> As a consumer I would like to be able to distinguish a refusal to answer
>> from a failure of the web server to access the store, for example.
>
> In the general case, that was my concern, too. AFAICT from the spec, you
> aren't precluded from returning e.g. 504 if the store has disappeared.
>
> I've always (perhaps wrongly) equated a 500 with the web server encountering
> some exceptional and *unexpected* condition¹; specifically, an uncaught
> exception in the web application. As such I've always taken a 500 to be
> indicative of a bug which should be fixed to fail more gracefully, perhaps
> with a more appropriate code from the 4xx/5xx range².
>
> As a web developer I always try to 'fix' situations where my code returns a
> 500. As a consumer I will take a 500 to be an application error and attempt
> to inform the webmaster of the inferred 'bug'.
>
> I can think of the following situations where a SPARQL endpoint might not
> return a result:
>
> * Syntax error (400)
> * Accept range mismatch (406)
> * Query rejected off-hand as too resource-intensive (403?)
> * Store unreachable (504?)
> * Server overloaded (503?)
> * Query timed out (504?, 403?)

+1 to using the full range of HTTP status codes.

Personally I don't really see it as see it as revisionist or
retro-fitting to use HTTP status codes to indicate these application
level semantics. There's a good range of status codes available and
they're reasonably well defined for these broad scenarios, IMO.
Especially so when you use additional headers, e.g. Retry-After (as
David Booth noted) to communicate additional information at the
protocol level.

This is mainly about good web application engineering that anything to
do with SPARQL protocol per se.

However it may be useful to define a standard response format and
potentially error messages to help client apps/users distinguish
between more fine-grained error states. I suggested this during
discussion of the original protocol specification but the WG decided
it wasn't warranted initially [1]. Based on this discussion I'm not
sure implementation experience has moved on enough, or converged
enough to feed this back as part of SPARQL 1.1.

Doesn't stop the community agreeing on some conventions/best practices though.

Cheers,

L.

[1]. 
http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2006Jan/0106.html

-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Minting URIs: how to deal with unknown data structures

2011-04-18 Thread Leigh Dodds
Hi,

On 15 April 2011 13:48, Frans Knibbe  wrote:
> I have acquired the first part (authority) of my URIs, let's say it is
> lod.mycompany.com. Now I am faced with the question: How do I come up with a
> URI scheme that will stand the test of time?

You might be interested in the Identifier Patterns documented here:

http://patterns.dataincubator.org/book/identifier-patterns.html

There's also the "Designing URI Sets for the Public Sector" document,
which provides the guidance for creating URIs for UK government data:

http://www.cabinetoffice.gov.uk/resource-library/designing-uri-sets-uk-public-sector

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS



Re: Possible Idea For a Sem Web Based Game?

2010-11-22 Thread Leigh Dodds
Hi,

On 20 November 2010 17:28, Melvin Carvalho  wrote:
> I was thinking about creating a simple game based on semantic web
> technologies and linked data.
>
> Some on this list may be too young to remember this, but there used to
> be game books where you would choose your own adventure.
>
> http://en.wikipedia.org/wiki/Choose_Your_Own_Adventure

Yes, I've thought this would make a really nice showcase too.

Liam Quinn built a nice little demo [1] of something like this. I was
also looking at the Inform interactive fiction engine [1] (again!)
recently. The basic engine is basically a set of core rules about a
game world operates. The core rules can be extended and ability for
user to interact with the world can be inferred from those rules. E.g.
whether you can climb onto or inside something. Struck me that it'd be
possible to (re-)build a lot of that using RDF, OWL, RIF.

Cheers,

L.

[1]. http://dirk.holoweb.net/~liam/rdfg/rdfg.cgi
[2]. http://www.inform-fiction.org/

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: Google Refine 2.0

2010-11-12 Thread Leigh Dodds
Hi Kingsley:

I recommend you take some time to work with Refine, watch the demos,
and perhaps read the paper that Richard et al published on how they
have used and extended Refine (or Gridworks as it was)

But to answer you question:

On 12 November 2010 13:23, Kingsley Idehen  wrote:
> How does the DERI effort differ from yours, if at all?

They have produced a plugin that complements the ability to map a
table structure to a Freebase schema and graph, by providing the same
functionality for RDF. So a simple way to define how RDF should be
generated from data in a Refine project, using either existing or
custom schemas.

The end result can then be exported using various serialisations.

My extension simply extends that further by providing the ability to
POST the data to a Talis Platform store. It'd be trivial to tweak that
code to support POSTing to another resource, or wrapping the data into
a SPARUL insert

Ideally it'd be nice to roll the core of this into the DERI extension
for wider use.

Cheers,

L.
-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: Google Refine 2.0

2010-11-12 Thread Leigh Dodds
Hi David,

Congratulations on getting the 2.0 release out. I'm looking forward to
working with it some more.

Kingsley asked about extensions. You've already mentioned the work
done at DERI, and I've previously pointed at the reconciliation API I
built over the Talis Platform [1].

I used Refines' excellent plugin architecture to create a simple
upload tool for loading Talis Platform stores. This hooks into both
core Gridworks and the DERI RDF extension to support POSTing of the
RDF to a service. Code is just a proof of concept [2] but I have a
more refined version that I parked briefly whilst awaiting the 2.0
release.

I think this nicely demonstrates how open Refine is as tool.

Cheers,

L.

[1]. 
http://www.ldodds.com/blog/2010/08/gridworks-reconciliation-api-implementation/
[2]. https://github.com/ldodds/gridworks-talisplatform

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: RDB to RDF & ontology terms reuse

2010-11-05 Thread Leigh Dodds
Hi Christian,

On Friday, November 5, 2010, Christian Rivas  wrote:

> foaf:firstName => Domain: foaf:Person Range: Literal
> foaf:familyName => Domain: foaf:Person Range: Literal
> foaf:phone => Domain: NONE Range => NONE
> vcard:email => Domain: vcard:VCard Range => NONE

Personally I would use all foaf terms, foaf:mbox can be used to
capture an email as a mailto: URI.

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: Is 303 really necessary - demo

2010-11-05 Thread Leigh Dodds
Hi,

On 5 November 2010 12:37, Nathan  wrote:
> Wrong question, correct question is "if I 200 OK will people think this
> is a document", to which the answer is yes. You're toucan is a :Document.

You keep reiterating this, but I'm still not clear on what you're saying.

1. It seems like you're saying that a status code licenses someone to
infer an rdf:type for a resource (in what vocab I'm not sure, but it
looks like you're saying that). Someone is obviously entitled to do
that. Not sure I can think of a use case, do you have one?

2. It also seems like you're suggesting someone is actually doing
that. Or maybe that it's you're expecting someone will start doing it?

3. It also seems like you're suggesting that if someone does do that,
then it breaks the (semantic) web for the rest of us. Which it won't,
unless you blithely trust all data everywhere or don't care to check
your facts

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: Is 303 really necessary - demo

2010-11-05 Thread Leigh Dodds
Hi,

On 5 November 2010 13:57, Giovanni Tummarello
 wrote:
> I might be wrong but I dont like it much . Sindice would index it as 2
> documents.
>
> http://iandavis.com/2010/303/toucan
> http://iandavis.com/2010/303/toucan.rdf

Even though one returns a Content-Location?

Cheers,

L.
-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: isDefinedBy and isDescribedBy, Tale of two missing predicates

2010-11-05 Thread Leigh Dodds
Hi,

On 5 November 2010 12:43, Nathan  wrote:
> Dave Reynolds wrote:
>>
>> Clearly simply using # URIs solves this but people can be surprisingly
>> reluctant to go that route.
>
> Why? I still don't understand the reluctance, any info on the technical
> non-made-up-pedantic reasons would be great.

Dave provided a pointer to TimBL's discussion which had some comments,
there's also some brief discussion of the technical issues in the Cool
URIs paper, see [1]

[1]. http://www.w3.org/TR/cooluris/#choosing

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: isDefinedBy and isDescribedBy, Tale of two missing predicates

2010-11-05 Thread Leigh Dodds
Hi Dave

On 5 November 2010 12:35, Dave Reynolds  wrote:
> Yes but I don't think the proposal was to ban use of 303 but to add an
> alternative solution, a "third way" :)
>
> I have some sympathy with this. The situation I've faced several times
> of late is roughly this:
>
> ...
[snip]

Really nice summary Dave.

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: What would break, a question for implementors? (was Re: Is 303 really necessary?)

2010-11-05 Thread Leigh Dodds
Hi Robert,

Thanks for the response, good to hear from an implementor.

On 5 November 2010 10:41, Robert Fuller  wrote:
> ...
> However... with regard to publishing ontologies, we could expect
> additional overhead if same content is delivered on retrieving different
> Resources for example http://example.com/schema/latitude and
> http://example.com/schema/longitude . In such a case ETag could be used
> to suggest the contents are identical, but not sure that is a practical
> solution. I expect that without 303 it will be more difficult in
> particular to publish and process ontologies.

This is useful to know thanks. I don't think the ETag approach works
as it's intended to version a specific resource, not be carried across
resources.

One way to avoid the overhead is to strongly recommend # URIs for
vocabularies. This seems to be increasingly the norm. It also makes
them easier to work with (you often want the whole document)

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Inferring data from network interactions (was Re: Is 303 really necessary?)

2010-11-05 Thread Leigh Dodds
Hi,

On 5 November 2010 09:54, William Waites  wrote:
> On Fri, Nov 05, 2010 at 09:34:43AM +0000, Leigh Dodds wrote:
>>
>> Are you suggesting that Linked Data crawlers could/should look at the
>> status code and use that to infer new statements about the resources
>> returned? If so, I think that's the first time I've seen that
>> mentioned, and am curious as to why someone would do it. Surely all of
>> the useful information is in the data itself.
>
> Provenance and debugging. It would be quite possible to
> record the fact that this set of triples, G, were obtained
> by dereferencing this uri N, at a certain time, from a
> certain place, with a request that looked like this and a
> response that had these headers and response code. The
> class of information that is kept for [0]. If N appeared
> in G, that could lead directly to inferences involving the
> provenance information. If later reasoning is concerned at
> all with the trustworthiness or up-to-dateness of the
> data it could look at this as well.

Yes, I've done something similar to that in the past when I added
support for the ScutterVocab [1] to my crawler

It was the suggestion that inferring information directly from 200/303
that I was most curious about. I've argued for inferring data from 301
in the past [2], but wasn't sure of merit of introducing data based on
the other interactions

> Keeping this quantity of information around might quickly
> turn out to be too data-intensive to be practical, but
> that's more of an engineering question. I think it does
> make some sense to do this in principle at least.

That's what I found when crawling the BBC pages. Huge amounts of data
and overhead in storing it. Capturing just enough to gather statistics
on the crawl was sufficient.

Cheers,

L.

[1]. http://wiki.foaf-project.org/w/ScutterVocab
[2]. http://www.ldodds.com/blog/2007/03/the-semantics-of-301-moved-permanently/

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



What would break, a question for implementors? (was Re: Is 303 really necessary?)

2010-11-05 Thread Leigh Dodds
Hi Michael,

On 5 November 2010 09:29, Michael Hausenblas
 wrote:
> It occurs to me that one of the main features of the Linked Data community
> is that we *do* things rather than having endless conversations what would
> be the best for the world out there. Heck, this is how the whole thing
> started. A couple of people defining a set of good practices and providing
> data following these practices and tools for it.
>
> Concluding. If you are serious about this, please go ahead. You have a very
> popular and powerful platform at your hand. Implement it there (and in your
> libraries, such as Moriarty), document it, and others may/will follow.

Yes, actually doing things does help more than talking. I sometimes
wonder whether as a community we're doing all the right things, but
that's another discussion ;)

Your suggestion about forging ahead is a good one, but it also reminds
me of Ian's original question: what would break if we used this
pattern?

So here's a couple of questions for those of you on the list who have
implemented Linked Data tools, applications, services, etc:

* Do you rely on or require HTTP 303 redirects in your application? Or
does your app just follow the redirect?
* Would your application tool/service/etc break or generic inaccurate
data if Ian's pattern was used to publish Linked Data.

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: Is 303 really necessary?

2010-11-05 Thread Leigh Dodds
Hi,

On 4 November 2010 17:51, Nathan  wrote:
> But, for whatever reasons, we've made our choices, each has pro's and
> cons, and we have to live with them - different things have different
> name, and the giant global graph is usable. Please, keep it that way.

I think it's useful to continually assess the state of the art to see
whether we're on track. My experience, which seems to be confirmed by
comments from other people on this thread, is that we're seeing push
back from the wider web community -- who have already published way
more data that we have -- on the technical approach we've been
advocating, so looking for a middle ground seems useful.

Different things do have different names, but conflating IR/NIR is not
part of Ian's proposal which addresses the publishing mechanism only.

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: Is 303 really necessary?

2010-11-05 Thread Leigh Dodds
Hi David,

On 4 November 2010 19:57, David Wood  wrote:
> Some small number of people and organizations need to provide back-links on 
> the Web since the Web doesn't have them.
> 303s provide a generic mechanism for that to occur.  URL curation is a useful 
> and proper activity on the Web, again in my opinion.

I agree that URL curation is a useful and proper activity on the Web.
I'm not clear on your core concern though. It looks like you're
asserting that HTTP 303 status codes, in general, are useful and
should not be deprecated. Totally agree there. But Ian's proposal is
about using 303 as a necessary part of publishing Linked Data. That
seems distinct from how services like PURLs and DOIs operate, and from
the value they provide. But perhaps I'm misunderstanding?

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



  1   2   >