Enhancing open data with identifiers
I thought I'd share a link to this UKODI/Thomson Reuters white paper which was published today: http://theodi.org/guides/data-identifiers-white-paper Cheers, L. -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com
Dbpedia is down?
Hi, Dbpedia has been down for maintenance since yesterday evening, does anyone know when it will be back up: All resource URIs return: "The web-site you are currently trying to access is under maintenance at this time. We are sorry for any inconvenience this has caused." I'd have reported this to the bug tracker listed on the dbpedia support page, but that link is also broken: http://sourceforge.net/tracker/?group_id=190976 Is there a location where planned maintenance is noted? Similarly, is there somewhere to go to check service status and updates on fault finding? Thanks, L. -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com
Re: URIs within URIs
Hi, I documented all the variations of this form of URI construction I was aware of in the Rebased URI pattern: http://patterns.dataincubator.org/book/rebased-uri.html This covers generating one URI from another. What that new URI returns is a separate concern. Cheers, L. On Fri, Aug 22, 2014 at 4:56 PM, Bill Roberts wrote: > Hi Luca > > We certainly find a need for that kind of feature (as do many other linked > data publishers) and our choice in our PublishMyData platform has been the > URL pattern {domain}/resource?uri={url-encoded external URI} to expose info > in our databases about URIs in other domains. > > If there was a standard URL route for this scenario, we'd be glad to > implement it > > Best regards > > Bill > > On 22 Aug 2014, at 16:44, Luca Matteis wrote: > >> Dear LOD community, >> >> I'm wondering whether there has been any research regarding the idea >> of having URIs contain an actual URI, that would then resolve >> information about what the linked dataset states about the input URI. >> >> Example: >> >> http://foo.com/alice -> returns data about what foo.com has regarding alice >> >> http://bar.com/endpoint?uri=http%3A%2F%2Ffoo.com%2Falice -> doesn't >> just resolve the alice URI above, but returns what bar.com wants to >> say about the alice URI >> >> For that matter http://bar.com/?uri=http%3A%2F%2Ffoo.com%2Falice could >> return: >> >> <http://bar.com/?uri=http%3A%2F%2Ffoo.com%2Falice> a void:Dataset . >> <http://foo.com/alice> <#some> <#data> . >> >> I know SPARQL endpoints already have this functionality, but was >> wondering whether any formal research was done towards this direction >> rather than a full-blown SPARQL endpoint. >> >> The reason I'm looking for this sort of thing is because I simply need >> to ask certain third-party datasets whether they have data about a URI >> (inbound links). >> >> Best, >> Luca >> > > -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com
ORCID as Linked Data
I discovered this today: curl -v -L -H "Accept: text/turtle" http://orcid.org/-0003-0837-2362 A fairly new addition to the ORCID service I think. With many DOIs already supporting Linked Data views, this makes a nice addition to the academic linked data landscape. Still lots of room for improvement, but definitely a step forwards. Cheers, L. -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com
Re: rdf:HTML datatype in RDF 1.1
The value space is defined as being a DocumentFragment. I'm not clear on whether DOM4 has changed the meaning of that, but a fragment is a collection of nodes, which don't necessarily have a common root element. So I think either is valid. L. On Wed, Apr 2, 2014 at 11:54 AM, john.walker wrote: > Simple question on this which wasn't immediately obvious from the > recommendation [1]. > > Is it expected that the string has a single top-level element: > > "Hello world!" > > Or is it OK to include fragments like: > > "Hello world!" > "Hello world!" > "Hello world!" > "Hello world!" > > Regards, > > John > > [1] http://www.w3.org/TR/rdf11-concepts/#section-html -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com
Exchanging Links with LINK and UNLINK
Hi, The "HTTP Link and Unlink Methods" RFC [1] specifies how to use the LINK/UNLINK HTTP methods to support exchanging links between resources on the web. To explore these ideas I've created a Ruby implementation based on Rack middleware. This means that it can be easily integrated into any ruby based web framework [2]. There are a couple of link stores provides, including one based on a SPARQL 1.1 compliant endpoint. Supplemented with suitable authentication I think this provides an interesting way to exchange links between Linked Data publishers. No special mechanism is needed, just existing protocols. Its nicely aligned with existing web infrastructure. I thought I'd share this with the community as I don't feel we've settled on a common pattern for exchanging this kind of information between publishers. Cheers, L. [1]. http://tools.ietf.org/html/draft-snell-link-method-08 [2]. https://github.com/ldodds/link-middleware -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com
Re: How to publish SPARQL endpoint limits/metadata?
Hi, As others have suggested, extending service descriptions would be the best way to do this. This might make a nice little community project. It would be useful to itemise a list of the type of limits that might be faced, then look at how best to model them. Perhaps something we could do on the list? Cheers, L. On Tue, Oct 8, 2013 at 10:46 AM, Frans Knibbe | Geodan wrote: > Hello, > > I am experimenting with running SPARQL endpoints and I notice the need to > impose some limits to prevent overloading/abuse. The easiest and I believe > fairly common way to do that is to LIMIT the number of results that the > endpoint will return for a single query. > > I now wonder how I can publish the fact that my SPARQL endpoint has a LIMIT > and that is has a certain value. > > I have read the thread Public SPARQL endpoints:managing (mis)-use and > communicating limits to users, but that seemed to be about how to > communicate limits during querying. I would like to know if there is a way > to communicate limits before querying is started. > > It seems to me that a logical place to publish a limit would be in the > metadata of the SPARQL endpoint. Those metadata could contain all limits > imposed on the endpoint, and perhaps other things like a SLA or a > maintenance schedule... data that could help in the proper use of the > endpoint by both software agents and human users. > > So perhaps my enquiry really is about a standard for publishing SPARQL > endpoint metadata, and how to access them. > > Greetings, > Frans > > > -- > Geodan > President Kennedylaan 1 > 1079 MB Amsterdam (NL) > > T +31 (0)20 - 5711 347 > E frans.kni...@geodan.nl > www.geodan.nl | disclaimer > -- -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com
Re: ANN: DBpedia 3.9 released, including wider infobox coverage, additional type statements, and new YAGO and Wikidata links
Hi Hugh, Hasn't dbpedia always suffered from this? I've tended to do the same as you and have encountered similar inconsistencies. I've never really figured out whether its down to inconsistency encoding in the data conversion or something else. Cheers, L. On Fri, Oct 4, 2013 at 1:42 PM, Hugh Glaser wrote: > Hi. > Chris has suggested I send the following to the LOD list, as it may be of > interest to several people: > > Hi Chris. > Great stuff! > > I have a question. > Or would you prefer I put it on the LOD list for discussion? > > It is about url encoding. > > Dbpedia: > http://dbpedia.org/page/Ashford_%28borough%29 is not found > http://dbpedia.org/page/Ashford_(borough) works, and redirects to > http://dbpedia.org/resource/Borough_of_Ashford > Wikipedia: > http://en.wikipedia.org/wiki/Ashford_%28borough%29 works > http://en.wikipedia.org/wiki/Ashford_(borough) works > Both go to the page with content of > http://en.wikipedia.org/wiki/Borough_of_Ashford although the URL in the > address bar doesn't change. > > So the problem: > I usually find things in wikipedia, and then use the last bit to construct > the dbpedia URI - I suspect lots of people do this. > But as you can see, the url encoded URI, which can often be found in the > wild, won't allow me to do this. > There are of course many wikipedia URLs with "(" and ")" in them - (artist), > (programmer), (borough) etc. > It is also the same with comma and single quote. > > I think this may be different from 3.8, but can't be sure - is it intended? > > Very best > Hugh -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com
Re: Minimizing data volume
Hi, Before using compression you might also make a decision about whether you need to represent all of this information as RDF in the first place. For example, rather than include the large geometries as literals, why not store them as separate documents and let clients fetch the geometries when needed, rather than as part of a SPARQL query? Geometries can be served using standard HTTP compression techniques and will benefit from caching. You can provide summary statistics (including size of the document, and properties of the described area, e.g. centroids) in the RDF to help address a few common requirements, allowing clients to only fetch the geometries they need, as they need them. This can greatly reduce the volume of data you have to store and provides clients with more flexibility. Cheers, L. On Mon, Sep 9, 2013 at 10:47 AM, Frans Knibbe | Geodan wrote: > Hello, > > In my line of work (geographical information) I often deal with high volume > data. The high volume is caused by single facts having a big size. A single > 2D or 3D geometry is often encoded as a single text string and can consist > of thousands of numbers (coordinates). It is easy to see that this can cause > performance issues with transferring and processing data. So I wonder about > the state of the art in minimizing data volume in Linked Data. I know that > careful publication of data will help a bit: multiple levels of detail could > be published, coordinates could use significant digits (they almost never > do), but it seems to me that some kind of compression is needed too. Is > there something like a common approach to data compression at the moment? > Something that is understood by both publishers and consumers of data? > > Regards, > Frans > > -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com
Re: Open Data Rights Statements
Hi, Yes, I'm aware of L4LOD. It's essentially the same as LIMO, ccREL and, to a lesser extent, ODRL. All of these attempt to provide terms for describing the key facets of licenses. The benefit of ccREL is that all of the CC licences are already described using those terms so the machine-readable metadata exists. Thanks for the pointer to the paper. However from a quick skim I must admit to being confused by their Real World example in Section 3.6. While the logic might be correct in the derivation of the combined licence, its not a great example because: * You can't create a new derived dataset using data published under a no-derivatives licence -- so the scenario isn't allowed * The legal terms of the ODbL licence indicate that any derivatives that are shared publicly must be done so under the ODbL or a compatible licence designated by the publisher -- the derived licence is neither So while there may be some value in being able to automatically create summaries of the combined obligations/permissions of licences, I think this is at most useful for helping understand your obligations, not the creation of new downstream licences. Partly because it glosses over important legal points in the terms, and partly because the community is not best served by a proliferation of licences. Convergence creates simplicity. Cheers, L. On Mon, Aug 12, 2013 at 5:53 PM, Ghislain Atemezing wrote: > Hi Leigh, > Nice work indeed! I confess I didn't go through all the guide. > >> This work looks at the implications of various open licences on the >> creation of derived datasets. There's a blog post with pointers here: >> >> http://theodi.org/blog/exploring-compatibility-between-data-licences >> >> If anyone has any comments then please let me know. > > I was wondering if there were connection with the work of Serena et al. at > INRIA (WIMIX team) on License composition...basically with this ontology > L4LOD (Licenses for Linked Open Data) [1], and this paper [2] explains all > the logic behind. > > > Cheers, > Ghislain > > > > [1] http://ns.inria.fr/l4lod/v2/l4lod_v2.html > [2] http://www-sop.inria.fr/members/Serena.Villata/Resources/icail2013.pdf > -- > Ghislain Atemezing > EURECOM, Multimedia Communications Department > Campus SophiaTech > 450, route des Chappes, 06410 Biot, France. > e-mail: auguste.atemez...@eurecom.fr & ghislain.atemez...@gmail.com > Tel: +33 (0)4 - 9300 8178 > Fax: +33 (0)4 - 9000 8200 > Web: http://www.eurecom.fr/~atemezin > -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com
Re: License LINK Headers and Linked Data
Hi Mike, On Mon, Aug 12, 2013 at 5:34 PM, mike amundsen wrote: > "A HEAD request can be made on a resource to check its licensing..." > > Since HEAD does not resolve the LINK URLs, agents can check for the > *existence* of licensing information, but not necessarily determine the > licensing context. > > If the LINK @href or one of the associated @rel values is a URI/IRI that the > agent recognizes (knows ahead of time) then that MAY provide sufficient > context for the agent to make a judgment on whether the representation is > marked with an acceptable license. > > Failing that, the agent will need to deref the LINK @href and parse/process > the response in order to make a judgment on the appropriateness of the > licensing of the initial response. Yes, that's exactly what I meant by "check its licensing". I didn't mean that the header itself communicated all of the necessary information. Thanks for spelling it out! :) L. -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com
License LINK Headers and Linked Data
Hi, There's one aspect of my document on publishing machine-readable rights statements that I want to flag to this community. Specifically its the section on including references to licence and rights statements from LINK headers in HTTP responses: https://github.com/theodi/open-data-licensing/blob/master/guides/publisher-guide.md#linking-to-rights-statements-from-web-apis While that information can also be published in RDF, as part of the Linked Data response, I think adding LINK headers is very important too, for several reasons: Linked Data applications and browsers will commonly encounter new resources and the licensing information should be immediately clear. Having this be accessible outside of the response will allow user agents to be able to clearly detect licences before they start retrieving data from a new source. This will allow users to place pre-conditions on what type of data they want to harvest/collect/process. A HEAD request can be made on a resource to check its licensing, before data is actually retrieved. Cheers, L. -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com
Re: Open Data Rights Statements
Hi, A quick follow-up to my previous announcement. The schema and user guides have been updated based on feedback I've received from the wider community. I've also just published a follow-up piece of work that I think is also relevant to this community. This work looks at the implications of various open licences on the creation of derived datasets. There's a blog post with pointers here: http://theodi.org/blog/exploring-compatibility-between-data-licences If anyone has any comments then please let me know. Cheers, L. On Tue, Jul 2, 2013 at 9:23 AM, Leigh Dodds wrote: > Hi, > > At the UK Open Data Institute we've been working on some guidance and > a new vocabulary to help support the publication of machine-readable > rights statements for open data. The vocabulary builds on existing > work in this area (e.g. Dublin Core and Creative Commons) but > addresses a few issues that we felt were underspecified. > > The vocabulary is intended to work in a wide variety of contexts, from > simple JSON documents and data packaging formats through to Linked > Data and Web APIs. > > The work is now at a stage where we're keen to get wider feedback from > the community. > > You can read a background on the work in this introductory blog post > on the UK ODI blog: > > http://theodi.org/blog/machine-readable-rights-statements > > The draft schema can be found here: > > http://schema.theodi.org/odrs/ > > And there are publisher and re-user guides to accompany it: > > https://github.com/theodi/open-data-licensing/blob/master/guides/publisher-guide.md > https://github.com/theodi/open-data-licensing/blob/master/guides/reusers-guide.md > > We would love to hear your feedback on the work. If you do have issues > or comments, then can I ask that you submit them as an issue to our > github project: > > https://github.com/theodi/open-data-licensing/issues > > Thanks, > > L. > > -- > Leigh Dodds > Freelance Technologist > Open Data, Linked Data Geek > t: @ldodds > w: ldodds.com > e: le...@ldodds.com -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com
Re: Open Data Rights Statements
Hi Bernard, On Fri, Jul 5, 2013 at 7:12 PM, Bernard Vatant wrote: > Hello David > > Thanks for the ping, LOV lurking on public-lod anyway ... > But since we are in public, just a reminder that the simplest way to > suggest new vocabularies to LOV is through > http://lov.okfn.org/dataset/lov/suggest/ > > But we always of course appreciate direct conversation, and ORDS is > definitely on the queue. > > @Leigh do you think this preliminary version is worth including in LOV as > is (if nothing else for history) or do we wait for a more "mature" version? > I say go ahead and include it. I don't envisage any major changes to structure, although we may add some new properties in future. I'll also look at including alternate serializations to the existing Turtle file. Cheers, L. -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com
Re: Open Data Rights Statements
Hi Andrea, On Tue, Jul 2, 2013 at 11:19 AM, Andrea Perego wrote: > That's very interesting, thank you, Leigh. > > I wonder whether you plan to consider work carried out in the framework of > the Open Data Rights Language (ODRL) CG of W3C [1]. Yes, I'm aware of that work. ODRL is a general purpose rights expression language that can describe re-use policies. This is similar to the existing Creative Commons ccRel vocabulary which also captures the permissions, etc that are described by a licence. The ODRS vocabulary doesn't attempt to describe licenses themselves. It's intended more of a way to annotate the relationship between a dataset and one or more licences. Those licenses could be give a machine-readable description using ccREL or ODRL. So I think the vocabularies are compatible. I've already added an issue to cover describing this relationship a little more. > Also, do you plan to support the notion of "licence type"? This is being > used, e.g., in vocabularies like ADMS.SW [2] and the DCAT-AP (DCAT > Application Profile for EU data portals) [3]. Looking at the DCAT profile it seems that license type is a category of license, e.g. public domain, royalties required, etc. To me, this overlaps with what ccRel and ODRL already cover, but at a more coarse grained level. I think for the purposes of the ODRS vocabulary we'll leave the description of licenses reasonably opaque and defer to other vocabularies to describe those in more detail. However we do distinguish between separate licenses that relate to the data and copyrightable aspects of the dataset. Cheers, L. -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com
Open Data Rights Statements
Hi, At the UK Open Data Institute we've been working on some guidance and a new vocabulary to help support the publication of machine-readable rights statements for open data. The vocabulary builds on existing work in this area (e.g. Dublin Core and Creative Commons) but addresses a few issues that we felt were underspecified. The vocabulary is intended to work in a wide variety of contexts, from simple JSON documents and data packaging formats through to Linked Data and Web APIs. The work is now at a stage where we're keen to get wider feedback from the community. You can read a background on the work in this introductory blog post on the UK ODI blog: http://theodi.org/blog/machine-readable-rights-statements The draft schema can be found here: http://schema.theodi.org/odrs/ And there are publisher and re-user guides to accompany it: https://github.com/theodi/open-data-licensing/blob/master/guides/publisher-guide.md https://github.com/theodi/open-data-licensing/blob/master/guides/reusers-guide.md We would love to hear your feedback on the work. If you do have issues or comments, then can I ask that you submit them as an issue to our github project: https://github.com/theodi/open-data-licensing/issues Thanks, L. -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com
Re: Business Models, Profitability, and Linked Data
Hi, On Mon, Jun 10, 2013 at 12:00 PM, Kingsley Idehen wrote: > On 6/10/13 4:18 AM, Leigh Dodds wrote: >> >> Hi, >> >> On Fri, Jun 7, 2013 at 5:52 PM, Kingsley Idehen >> wrote: >>> >>> There have been a few recent threads on the LOD and Semantic >>> Web mailing lists that boil down to the fundamental issues of >>> profitability, business models, and Linked Data. >>> >>> Situation Analysis >>> == >>> >>> Business Model Issue >>> >>> >>> The problem with "Data"-oriented business models is that you >>> ultimately have to deal with the issue of wholesale data copying >>> without attribution. That's the key issue; everything else is >>> a futile dance around this concern. >> >> Why do you think that attribution is the key issue with data oriented >> businesses? > > Its the key to provenance. It's the key making all contributors to the data > value chain visible. I don't disagree that attribution and provenance are important, especially for Open Data, but also whenever it becomes important to understand sources of data. > As I've already stated, the big problem here is wholesale copying and > reproduction without attribution. Every data publisher has to deal with this > problem, at some point, when crafting a data oriented business model. Every data publisher that aggregates or collects data from other sources certainly needs to understand -- for their own workflow -- where data originates. >> I've spoken with a number of firms who have business models based on >> data supply and have never once heard attribution being mentioned as >> an issue for themselves or their customers. So I'm curious why you >> think this is a problem. > > And are those data suppliers conforming to patterns such as those associated > with publicly available Linked Open Data? Can they provide open access to > data and actually have a functional business model based on the > aforementioned style of data publication? No they weren't using Linked Open Data. No they weren't publishing open data (it was commercially licensed for the most part). But they all had successful business models. But I understood you to be making a general statement about a key issue that is common to all data business models, one that Linked Data then solves. I agree that every data aggregator needs to understand their workflow, to manage their own processes. I agree that publishing details of data provenance and attribution is important, particularly for Open Data. And absolutely agree that Linked Data can help there. Maybe I'm misunderstanding your point but I'm not seeing evidence that attribution is a key business issue that data businesses have to solve in order to be successful. You said that "everything else is a futile dance around this concern" which I found surprising, so I'm curious about the evidence. I'm curious about the general business drivers, regardless of whether the data is Linked or Open. Making the data Linked is a solution; making the data Open might also be a solution, but also presents its own challenges. Sometimes its important to know how the sausage is made, sometimes its not. Cheers, L. -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com
Re: Business Models, Profitability, and Linked Data
Hi, On Mon, Jun 10, 2013 at 9:26 AM, Víctor Rodríguez Doncel wrote: > > While attribution may not be hindering any business, it would be nice being > able to specify in a machine readable form the way it should be made... Yes there's definitely scope to do more there, and something I'm working on at the moment. Cheers, L. -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com
Re: Business Models, Profitability, and Linked Data
Hi, On Fri, Jun 7, 2013 at 5:52 PM, Kingsley Idehen wrote: > There have been a few recent threads on the LOD and Semantic > Web mailing lists that boil down to the fundamental issues of > profitability, business models, and Linked Data. > > Situation Analysis > == > > Business Model Issue > > > The problem with "Data"-oriented business models is that you > ultimately have to deal with the issue of wholesale data copying > without attribution. That's the key issue; everything else is > a futile dance around this concern. Why do you think that attribution is the key issue with data oriented businesses? I've spoken with a number of firms who have business models based on data supply and have never once heard attribution being mentioned as an issue for themselves or their customers. So I'm curious why you think this is a problem. Cheers, L. -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com
Re: There's No Money in Linked Data
Hi Pascal, Its good to draw attention to these issues. At ISWC 2009 Tom Heath, Kaitlin Thaney, Jordan Hatcher and myself ran a workshop a legal and social issues for data sharing [1, 2]. Key themes from the workshop were around the importance of clear licensing, norms for attribution, and including machine-readable license data. At the time I did a survey of the current state of licensing of the Linked Data cloud, there's a write-up [3] and diagram [4]. Looking over your analysis, I don't think the picture has changed considerably since then. We need to work harder to ensure that data is clearly licensed. But this is a general problem for Open Data, not just Linked Open Data. You don't say in your paper how you did the analysis. Did you use the metadata from the LOD group in datahub? [5]. At the time I had to do mine manually, but it wouldn't be hard to automate some of this now, perhaps to create an regularly updated set of indicators. One criteria that agents might apply when conducting "Follow Your Nose" consumption of Linked Data is the licensing of the target data, e.g. ignore links to datasets that are not licensed for your particular usage. Cheers, L. [1]. http://opendatacommons.org/events/iswc-2009-legal-social-sharing-data-web/ [2]. http://blog.okfn.org/2009/11/05/slides-from-open-data-session-at-iswc-2009/ [3]. http://blog.ldodds.com/2010/01/01/rights-statements-on-the-web-of-data/ [4]. http://www.flickr.com/photos/ldodds/4043803502/ [5]. http://datahub.io/group/lodcloud On Sat, May 18, 2013 at 3:15 AM, Pascal Hitzler wrote: > We just finished a piece indicating serious legal issues regarding the > commercialization of Linked Data - this may be of general interest, hence > the post. We hope to stimulate discussions on this issue (hence the > provokative title). > > Available from > http://knoesis.wright.edu/faculty/pascal/pub/nomoneylod.pdf > > Abstract. > Linked Data (LD) has been an active research area for more than 6 years and > many aspects about publishing, retrieving, linking, and cleaning Linked Data > have been investigated. There seems to be a broad and general agreement that > in principle LD datasets can be very useful for solving a wide variety of > problems ranging from practical industrial analytics to highly specific > research problems. Having these notions in mind, we started exploring the > use of notable LD datasets such as DBpedia, Freebase, Geonames and others > for a commercial application. However, it turns out that using these > datasets in realistic settings is not always easy. Surprisingly, in many > cases the underlying issues are not technical but legal barriers erected by > the LD data publishers. In this paper we argue that these barriers are often > not justified, detrimental to both data publishers and users, and are often > built without much consideration of their consequences. > > Authors: > Prateek Jain, Pascal Hitzler, Krzysztof Janowicz, Chitra Venkatramani > > -- > Prof. Dr. Pascal Hitzler > Kno.e.sis Center, Wright State University, Dayton, OH > pas...@pascal-hitzler.de http://www.knoesis.org/pascal/ > Semantic Web Textbook: http://www.semantic-web-book.org > Semantic Web Journal: http://www.semantic-web-journal.net > > -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com
Summarising dbpedia country coverage
Thought this might be interesting for people on here. I wrote a script to summarise the geographic coverage of dbpedia: http://blog.ldodds.com/2013/05/15/summarising-geographic-coverage-of-dbpedia-and-wikipedia/ Lots more potential here, both for creating proper Linked Data for the results, and for further analysis. What other Linked Data sets include a range of geographic locations? Cheers, L. -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com
Re: Is science on sale this week?
;>>>>>>> >>>>>>>>> If we subscribe to science, free and open access to knowledge, what's >>>>>>>>> the >>>>>>>>> purpose of the arrangement between conferences and publishers? >>>>>>>>> >>>>>>>>> -Sarven >>>>>>>>> http://csarven.ca/#i >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Leon R A Derczynski >>>>>>>> Research Associate, NLP Group >>>>>>>> >>>>>>>> Department of Computer Science >>>>>>>> University of Sheffield >>>>>>>> Regent Court, 211 Portobello >>>>>>>> Sheffield S1 4DP, UK >>>>>>>> >>>>>>>> +45 5157 4948 >>>>>>>> http://www.dcs.shef.ac.uk/~leon/ >>>>>>>> >>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups >>>>>>>> "Beyond the PDF" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>>>>> an >>>>>>>> email to beyond-the-pdf+unsubscr...@googlegroups.com. >>>>>>>> For more options, visit https://groups.google.com/groups/opt_out. >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Alexander Garcia >>>>>>> http://www.alexandergarcia.name/ >>>>>>> http://www.usefilm.com/photographer/75943.html >>>>>>> http://www.linkedin.com/in/alexgarciac >>>>>> >>>>>> -- >>>>>> Phillip Lord, Phone: +44 (0) 191 222 7827 >>>>>> Lecturer in Bioinformatics, Email: >>>>>> phillip.l...@newcastle.ac.uk >>>>>> School of Computing Science, >>>>>> http://homepages.cs.ncl.ac.uk/phillip.lord >>>>>> Room 914 Claremont Tower, skype: russet_apples >>>>>> Newcastle University, twitter: phillord >>>>>> NE1 7RU >>>>>> >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google Groups >>>>> "Beyond the PDF" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send an >>>>> email to beyond-the-pdf+unsubscr...@googlegroups.com. >>>>> For more options, visit https://groups.google.com/groups/opt_out. >>>>> >>>>> >>> >>> -- >>> Phillip Lord, Phone: +44 (0) 191 222 7827 >>> Lecturer in Bioinformatics, Email: phillip.l...@newcastle.ac.uk >>> School of Computing Science, >>> http://homepages.cs.ncl.ac.uk/phillip.lord >>> Room 914 Claremont Tower, skype: russet_apples >>> Newcastle University, twitter: phillord >>> NE1 7RU >>> >> > > > > -- > Alexander Garcia > http://www.alexandergarcia.name/ > http://www.usefilm.com/photographer/75943.html > http://www.linkedin.com/in/alexgarciac > -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com
Re: Content negotiation negotiation
The first two indicate that responses vary based on Accept header as both have a Vary: Accept. The third doesn't so doesn't support negotiation. None of the URLs advertise what formats are available. That's not a requirement for content-negotiation, although it'd be useful. Cheers, L. On Wed, Apr 24, 2013 at 2:17 PM, Phillip Lord wrote: > > Hmmm. > > So, taking a look at these three URLs, can you tell me > a) which of these support content negotiation, and b) what formats > they provide. > > http://dx.doi.org/10.3390/fi4041004 > http://dx.doi.org/10.1594/PANGAEA.527932 > http://dx.doi.org/10.1000/182 > > I tried vapor -- it seems to work by probing with application/rdf+xml, > but it appears to work by probing. I can't find any of the headers > mentioned either, although perhaps I am looking wrongly. > > Phil > > > > Hugh Glaser writes: > >> Ah of course - thanks Mark, silly me. >> So I look at the Link: header for something like >> curl -L -i http://dbpedia.org/resource/Luton >> Which gives me the information I want. >> >> Anyone got any offers for how I would use Linked Data to get this into my >> RDF store? >> >> So then I can do things something like: >> SELECT ?type ?source FROM { <http://dbpedia.org/resource/Luton> ?foo ?file . >> ?file ?type ?source . } >> (I think). >> >> I suppose it would need to actually be returned from a URI at the site - I >> can't get a header as URI resolution - right? >> And I would need an ontology? >> >> Cheers. >> >> On 23 Apr 2013, at 19:49, Mark Baker >> wrote: >> >>> On Tue, Apr 23, 2013 at 1:42 PM, Hugh Glaser wrote: >>>> >>>> On 22 Apr 2013, at 12:18, Phillip Lord >>>> wrote: >>>> >>>>> We need to check for content negotiation; I'm not clear, though, how we >>>>> are supposed to know what forms of content are available. Is there >>>>> anyway we can tell from your website that content negotiation is >>>>> possible? >>>> Ah, and interesting question. >>>> I don't know of any, but maybe someone else does? >>> >>> Client-side conneg, look for Link rel=alternate headers in response >>> >>> Server-side conneg, look for "Vary: Content-Type" in response >>> >>> Mark. >> >> >> > > -- > Phillip Lord, Phone: +44 (0) 191 222 7827 > Lecturer in Bioinformatics, Email: phillip.l...@newcastle.ac.uk > School of Computing Science, > http://homepages.cs.ncl.ac.uk/phillip.lord > Room 914 Claremont Tower, skype: russet_apples > Newcastle University, twitter: phillord > NE1 7RU > -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com
Re: SPARQL, philosophy n'stuff..
Hi Barry, On Mon, Apr 22, 2013 at 9:17 AM, Barry Norton wrote: > > I'm sorry, but you seem to have misunderstood the use of a graph URI > parameter in indirect graph addressing for GSP. > > I wish all GSP actions addressed graphs directly, Queries were all GETs, and > that Updates were all PATCH documents, but a degree of pragmatism has been > applied. I think Mark's point was that SPARQL 1.1/GSP specify a fixed query parameter (query, graph) in the specification, requiring clients to construct URIs rather than using hypermedia. Cheers, L. -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com
Re: Fwd: Re: Public SPARQL endpoints:managing (mis)-use and communicating limits to users.
Hi, On Fri, Apr 19, 2013 at 11:55 AM, Kingsley Idehen wrote: > ... > If you have OFFSET and LIMIT in use, you can reflect the new state of > affairs when the next GET is performed i.e, lets say you have OFFSET 20 and > LIMIT 20, the URL with OFFSET 40 is the request for the next batch of > results from the solution and the one that would reflect the new state of > affairs. This requires the client to page from the outset. Ideally there would be a way for a server to force paging where it needed to. At the moment though there's no way for a server to indicate that its done that, e.g. by including a "next page" link in the results. This also moves us towards a more hypermedia approach where clients don't need to construct URIs: the server provides them. The community could decide on some extension elements/keys that could be used in SPARQL XML/JSON results formats to achieve this. If the link element in the existing format were a little more flexible [1] then this option would be available. We could still use the atom link element as an extension though with existing rel values (which addresses other use cases). [1]. http://www.w3.org/2009/sparql/wiki/Feature:Query_response_linking -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com
Re: Re: Public SPARQL endpoints:managing (mis)-use and communicating limits to users.
Hi, On Fri, Apr 19, 2013 at 8:49 AM, Jerven Bolleman wrote: > Original Message > Subject: Re: Public SPARQL endpoints:managing (mis)-use and communicating > limits to users. > Date: Thu, 18 Apr 2013 23:21:46 +0200 > From: Jerven Bolleman > To: Rob Warren > > Hi Rob, > > There is a fundamental problem with HTTP status codes. > Lets say a user submits a complex but small sparql request. > > My server sees the syntax is good and starts to reply in good faith. > This means the server starts the http response and sends an 200 OK > Some results are being send > However, during the evaluation the server gets an exception. > What to do? I can't change the status code anymore... > > Waiting until server know the query can be answered is not feasible because > that would mean > the server can't start giving replies as soon as possible. Which likely > leads > to connection timeouts. Using HTTP status codes when responses are likely to > be larger > than 1 MB works badly in practice. That's not really true. I can download multi-gigabyte files over HTTP without any problem. The issue is more with servers sending a 200 OK response, when they can't actually guarantee that they can fulfil the request. While there are always going to be things like hardware failures that might mean requests might fail, e.g. leading to truncated or no responses, but servers shouldn't be sending 200 responses if there are expected failure conditions. For example timing out a query after a 200 response is sent seems wrong to me. There are work arounds: * Response formats, particularly those intended for streaming, could support markup that indicates that results are terminated, perhaps with pointers to next page. SPARQL XML & JSON could be extended in this way, difficult to do with RDF/XML, etc. This would allow server to terminate streaming but still give a client a valid response with potentially a link to further results * Not responding directly at all: serve a 202 Accepted for (expensive) queries and route the user to another resource from which they can fetch the query results. Data can be prepared asynchronously and the response can respond correctly for a timed-out query. The latter wouldn't necessarily involve changes to SPARQL formats or the protocol. Cheers, L. -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com
Re: Restpark - Minimal RESTful API for querying RDF triples
Hi, On Thu, Apr 18, 2013 at 4:23 PM, Alan Ruttenberg wrote: > Luca, > > In the past I have suggested a simple way to create simple restful services > based on SPARQL. This could easily be implemented as an extension to your > beginning of restpark. > > The idea is to have the definition of a service be a sparql query with > blanks, and possibly some extra annotations. That's essentialy what we called "SPARQL Stored Procedures" in Kasabi. SPARQL queries bound to URIs with parameters injected from query string. We also had transformation of results using XSLT. Swirrl have implemented this as "named queries" [1], and I used their name when writing up the pattern [2]. One set of annotations I'm planning on adding to sparql-doc are the parameters that need to be injected and, optionally, a path to bind the query to when mounted. The goal being to allow a package of queries to be mounted at a URL and used as named queries. [1]. http://blog.swirrl.com/articles/new-publishmydata-feature-named-queries [2]. http://patterns.dataincubator.org/book/named-query.html [3]. http://blog.ldodds.com/2013/01/30/sparql-doc/ Cheers, L. -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com
Re: SPARQL, philosophy n'stuff..
Hi, On Thu, Apr 18, 2013 at 12:54 PM, Kingsley Idehen wrote: > On 4/18/13 7:44 AM, Leigh Dodds wrote: >> >> But I bet you learnt it in stages using a pedagogical approach that >> guided you towards the basic building blocks first. And I expect there >> were other reasons -- network effects -- why learning English was >> worth up-front effort. We're not there with SPARQL. > > Do you have an example of any declarative query language that meets the goal > in question, assuming I am interpreting your comments accurately? I think you misunderstood me. I don't think any declarative query language (or technology) can meet the wider goal, because many of the issues are non-technical. My specific point in that comment was that SPARQL is still not widely deployed or in-use enough that someone might just sit down and learn it simply because its a core skill or technology. That's changing but its still very far from, e.g. SQL, in that regard. Cheers, L. -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com
Re: SPARQL, philosophy n'stuff..
Hi, On Thu, Apr 18, 2013 at 12:21 PM, Jürgen Jakobitsch SWC wrote: > i think there's yet another point overlooked : > > what we are trying to do is to create barrier free means of > communication on data level in a globalized world. this effort requires > a common language. Did you mean a common *query* language? I'm not sure I agree. Mainly because no-one has yet created such as thing, so we might find out that the bigger challenges are elsewhere. I guess time will tell :) I used to think that there might be convergence around common query languages for APIs, but there's little evidence of that happening. > my personal view is that providing simplier subsets of such a language > (an api) only leads to the fact that nobody will learn the language (see > pocket calculators,...), although there's hardly anything easier than to > write a sparql query, it can be learned in a day. > > i do not really understand where this "the developer can't sparql, so > let's provide something similar (easier)" - idea comes from. Well if our goal is to create barrier free data sharing and re-use then we should focus on achieving that regardless of technology, and should be open to a variety of approaches. We can't decide that SPARQL is the right solution and then just expect everyone to learn it. Maybe it only takes a day to learn SPARQL, but personally I find that usually I can get up to speed with a custom API in a few minutes, so that's even faster. And it turns out that often the issue isn't just learning SPARQL alone, its also learning the data model [1]. > did anyone provide me with a wrapper for the english language? nope, had > to learn it. But I bet you learnt it in stages using a pedagogical approach that guided you towards the basic building blocks first. And I expect there were other reasons -- network effects -- why learning English was worth up-front effort. We're not there with SPARQL. Cheers, L. [1]. http://blog.ldodds.com/2011/06/16/giving-rdf-datasets-more-affordance/ -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com
Re: Restpark - Minimal RESTful API for querying RDF triples
Hi, On Thu, Apr 18, 2013 at 12:01 PM, Luca Matteis wrote: > Thanks Paul, > > That is exactly what my point was entirely about. Many service don't expose > their SQL interface, so why should Linked Data? > > Regarding this Linked Data API, it seems to still require a SPARQL endpoint. > In fact it states that it is a proxy for SPARQL. Would it simply be possible > to implement this API without SPARQL on top of a regular database that > contains triples? While the specification talks about mapping to a SPARQL endpoint the processing model would potentially allow you to use different backends. Servicing a Linked Data API request involves several steps: 1. Mapping the request to a query (currently a SPARQL SELECT) to identify the list of resources of interest 2. Mapping the request to a query (currently a SPARQL CONSTRUCT) to produce a description of each item on the list 3. Serialising the results Broadly speaking you could swap out steps 1 & 2. For example you could map the first step to a search query that produces a list of results from a search engine, or a SQL query that extracts the resources from a database. You could map the second step to requests to a document database that fetches pre-existing descriptions of each item. The API supports a number of filtering and sorting options, which will add some complexity to both stages, but I don't think there's any show stoppers in there. Cheers, L. -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com
Re: Restpark - Minimal RESTful API for querying RDF triples
Hi Paul, On Thu, Apr 18, 2013 at 11:54 AM, Paul Groth wrote: > Hi Leigh > > The problem is that it's really easy to write sparql queries that are > inefficient when you don't know the data [1] and even when you do the > flexibility of sparql means that people can easily end-up writing complex > hard to process queries. Totally agree with your assessment, I was just observing that there's a number of factors in play which result in a design trade-off meaning there is no right answer or winning solution. My experience is much the same as yours. Which is why I've been experimenting with APIs over SPARQL and worked with Jeni and Dave on the design of the Linked Data API. I think its pretty good, but don't think we've done a good job yet of documenting it. I also suspect there's an even simpler subset or profile in there, but I've not had the time yet to dig through and see what kinds of APIs people are building with it. L. -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com
Re: Restpark - Minimal RESTful API for querying RDF triples
Hi Hugh, On Thu, Apr 18, 2013 at 10:56 AM, Hugh Glaser wrote: > (Yes, Linked Data API is cool!, and thanks for getting back to the main > subject, although I somehow doubt anyone is expecting to read anything about > it in this thread now :-) ) I'm still hoping we might return to the original topic :) What this discussion, and in fact most related discussions about SPARQL as a web service, seems to overlook is that there are several different issues in play here: * Whether SPARQL is more accessible to developers than other forms of web API. For example is the learning curve, harder or easier? * Whether offering query languages like SPARQL, SQL, YQL, etc is a sensible option when offering a public API and what kinds of quality of service can be wrapped around that. Or do other forms of API offer more options for providing quality of service by trading off power of query expression? * Techniques for making SPARQL endpoints scale in scenarios where the typical query patterns are unknown (which is true of most public endpoints). Scaling and quality of service considerations for a public web service and a private enterprise endpoint are different. Not all of the techniques that people use, e.g. query timeouts or partial results, are actually standardised so plenty of scope for more exploration here. * Whether SPARQL is the only query language we need for RDF, or for more general graph databases, or whether there are room for other forms of graph query languages The Linked Data API was designed to provide a simplified read-only API that is less expressive than full SPARQL. The goals were to make something easier to use, but not preclude helping developers towards using full SPARQL if that's what they wanted. It also fills a short-fall with most Linked Data publishing approaches, i.e. that getting lists of things, possibly as a paged list, possibly with some simple filtering is not easy. We don't need a full graph query language for that. The Linked Data Platform is looking at that area too, but its also got a lot more requirements its trying to address. Cheers, L. -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com
Re: Coping with gaps in linked data (UK postcodes)?
Hi Stephen, Really your only option is to mint your own URIs, but then later build in links to the official URIs if/when they become available: http://patterns.dataincubator.org/book/proxy-uris.html The postcodes make good Natural Keys for building your URIs. This will help to automatically generate links not just to your data, but also to the official version when available. Cheers, L. On Fri, Apr 12, 2013 at 2:08 PM, Cresswell, Stephen wrote: > > Hello, > > In our application, we wish to publish linked data, including addresses > with postcode URIs. The postcode URIs provided by Ordnance Survey for > England, Wales and Scotland are really useful, with the postcode URIs > dereferencing to provide useful information including co-ordinates. > > However, the geographical extent of our data includes Northern Ireland, > which is outside the scope of (British) Ordnance Survey and not included > in their dataset. The equivalent postcode data for Northern Ireland is > available from the NI government body NISRA, but it is not on an open > license. > > This leaves us with a question about what URIs to use for Northern > Ireland postcodes, as we know of no existing URI scheme for Northern > Ireland postcodes. > > If we generate postcode URIs using the same pattern as the rest of the > UK, those URIs would be in the Ordnance Survey's domain, but NI > postcodes are not actually in their dataset and they won't dereference, > so that seems wrong. > > If we are to have dereferencable URIs, we would presumably have to host > them in our own domain, which is definitely not the most appropriate > place for them to be. If we buy a license to use the NI postcode data, > we still wouldn't be able to republish it as linked data. Presumably, > however, there is some geographical information that is open and could > be published, e.g. courser geographical information based on just the > postcode district. > > Does anyone have any advice on best practice, either for the specific > problem (NI postcodes) or for the general problem of how to cope with > URIs based on an existing coding scheme (e.g. postcodes), where the > published URIs don't cover all of the original codes? > > Stephen Cresswell > The Stationery Office > > > This email is confidential and may also be privileged and/or proprietary to > The Stationery Office Limited. It may be read, copied and used only by the > intended recipient(s). Any unauthorised use of this email is strictly > prohibited. If you have received this email in error please contact us > immediately and delete it and any copies you have made. Thank you for your > cooperation. > The Stationery Office Limited is registered in England under Company No. > 3049649 at 1-5 Poland Street, London, W1F 8PR > > > -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com
Re: Content negotiation for Turtle files
Hi, On Wed, Feb 6, 2013 at 9:54 AM, Bernard Vatant wrote: > ... > But what I still don't understand is the answer of Vapour when requesting > RDF/XML : > > 1st request while dereferencing resource URI without specifying the desired > content type (HTTP response code should be 303 (redirect)): Passed > 2nd request while dereferencing resource URI without specifying the desired > content type (Content type should be 'application/rdf+xml'): Failed > 2nd request while dereferencing resource URI without specifying the desired > content type (HTTP response code should be 200): Passed >From a purely HTTP and Content Negotiation point of view, if a client doesn't specify an Accept header then its perfectly legitimate for a server to return a default format of its choosing. I think it could also decide to serve a 300 status code and prompt the client to choose an option thats available. >From an interoperability point of view, having a default format that clients can rely on is reasonable. Until now, RDF/XML has been the standardised format that we can all rely on, although shortly we may all collectively decide to prefer Turtle. So ensuring that RDF/XML is available seems like a reasonable thing for a validator to try and test for. But there's several ways that test could have been carried out. E.g. Vapour could have checked that there was a RDF/XML version and provided you with some reasons why that would be useful. Perhaps as a warning, rather than a fail. The explicit check for RDF/XML being available AND being the default preference of the server is raising the bar slightly, but its still trying to aim for interop. Personally I think I'd implement this kind of check as "ensure there is at least one valid RDF serialisation available, either RDF/XML or Turtle". I wouldn't force a default on a server, particularly as we know that many clients can consume multiple formats. This is where automated validation tools have to tread carefully: while they play an excellent role in encouraging consistently, the tests they perform and the feedback they give need to have some nuance. Cheers, L. -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com
Re: Linked Data Adoption Challenges Poll
Hi, You might need to clarify your questions, I can have a guess at what they mean, but they may not be right. Presumably you are also targeting this poll at people trying (and failing) to adopt linked data. In that case you might want to broaden the base of potential respondents. People on these lists may not reflect all of the issues. Cheers, L. On Thu, Sep 13, 2012 at 5:34 PM, Kingsley Idehen wrote: > All, > > I've created a poll oriented towards capturing data about issues that folks > find most challenging re., Linked Data Adoption. > > Please cast your vote as the results will be useful to all Linked Data > stakeholders. > > Link: http://poll.fm/3w0cb . > > -- > > Regards, > > Kingsley Idehen > Founder & CEO > OpenLink Software > Company Web: http://www.openlinksw.com > Personal Weblog: http://www.openlinksw.com/blog/~kidehen > Twitter/Identi.ca handle: @kidehen > Google+ Profile: https://plus.google.com/112399767740508618350/about > LinkedIn Profile: http://www.linkedin.com/in/kidehen > > > > > -- Leigh Dodds Freelance Technologist Open Data, Linked Data Geek t: @ldodds w: ldodds.com e: le...@ldodds.com
Re: Can we create better links by playing games?
On Wed, Jun 20, 2012 at 2:19 PM, Melvin Carvalho wrote: > > > On 20 June 2012 15:11, Kingsley Idehen wrote: >> >> On 6/19/12 3:23 PM, Martin Hepp wrote: >>> >>> [1] Games with a Purpose for the Semantic Web, IEEE Intelligent Systems, >>> Vol. 23, No. 3, pp. 50-60, May/June 2008. >> >> >> Do the games at: http://ontogame.sti2.at/games/, still work? The more data >> quality oriented games the better re. LOD and the Semantic Web in general. >> >> Others: Are there any other games out there? > > > iand is working on a game: > > http://blog.iandavis.com/2012/05/21/wolfie/ Is that relevant? :) L.
Re: Decommissioning a linked data site
Hi, On Fri, Jun 1, 2012 at 3:30 PM, Bradley Allen wrote: > Leigh- This is great. The question that comes up for me out of what you've > written for unpublishing brings me back to Antoine's question: is it > appropriate to use a relation other than owl:sameAs that more specific to > the domain of the affected datasets being mapped, or is the nature of > unpublishing such that one would, as opposed to my reasoning earlier, be as > broad as possible in asserting equivalence, and use owlsameAs in every such > case? Really interesting question, and this might prompt me to revise the pattern :) So, generally, I advocate using the appropriate equivalence relation that relates to a specific domain. As I wrote in [1] its best to use the most appropriate equivalence link, as they have varying semantics. But for the unpublishing use case I think I'd personally lean towards *always* using owl:sameAs at least in the case where we are returning a 301 status code. I've previously come to the conclusion [2] that a 301 implies a sameAs statement. The intent seems very similar to a sameAs. Rewriting local links to use a new location is very similar to smushing descriptions in an RDF dataset such that statements only relate to the new URI. However I can see arguments to the effect that the new authority might have a slightly different definition of a resource than the original publisher, such that an owl:sameAs might be inappropriate. That's why I left the advice in the pattern slightly open ended: I think it may need to be evaluated on a case by case basis, but owl:sameAs seems like a good workable default to me. Cheers, L. [1]. http://patterns.dataincubator.org/book/equivalence-links.html [2]. http://www.ldodds.com/blog/2007/03/the-semantics-of-301-moved-permanently/
Re: Decommissioning a linked data site
Hi, On Fri, Jun 1, 2012 at 7:34 AM, Antoine Isaac wrote: > @Tim: > >> For total extra kudos, provide query rewriting rules >> from yours site to LoC data, linked so that you can write a program >> to start with a sparql query which fails >> and figures out from metadata how to turn it into one which works! > > > Is the combination of 301 + owl:sameAs that we have used for RAMEAU, e.g, > http://stitch.cs.vu.nl/vocabularies/rameau/ark:/12148/cb11932889r > good enough? > Or would you recommend more/different? I've started to capture some advice here: http://patterns.dataincubator.org/book/unpublish.html Cheers, L
New draft of Linked Data Patterns book
Hi, There's a new draft of the Linked Data patterns book available: http://patterns.dataincubator.org/book/ There have been a number of revisions across the pattern catalogue, including addition of new introductory sections to each chapter. There are a total of 12 new patterns, many of which cover data management patterns relating to use of named graphs. Cheers, L.
Re: looking for skos vocabularies
Hi, There's a pretty comprehensive set of links available here: http://www.w3.org/2001/sw/wiki/SKOS/Datasets Cheers, L. On Thu, May 17, 2012 at 4:43 PM, Christian Morbidoni wrote: > Hi, > > I've been looking for same example of skos vocabulary to use as a real world > test case in a project. > Surprisingly I cannot find so much around...do someone know about an archive > of skos vocabularies or some good example of skos in use? > I'm starting to wonder...is people using skos out there? > > best, > > Christian >
Re: Layered Data
Hi Pablo, On Fri, May 4, 2012 at 10:37 AM, Pablo Mendes wrote: > > Interesting thoughts. It would be nice to have some "default" widely > accepted facets within an extensible model. Thanks. > I had a somewhat related discussion with Niko Popitsch last year on how > "database views" could look like in the LOD world. The discussion was a > follow up to his talk: > Keep Your Triples Together: Modeling a RESTtful, Layered Linked Data Store > http://cs.univie.ac.at/research/research-groups/multimedia-information-systems/publikation/infpub/2910/ Thanks for the pointer, I'll take a look :) Cheers, L.
Layered Data
Hi, I've written up some thoughts on considering datasets as "layers" that can be combined to create useful aggregations. The concept originated with Dan Brickley and I see the RDF WG are considering the term as an alternative to "named graph". My own usage is more general. I thought I'd share a link here to see what people thought. The paper is at: http://ldodds.com/papers/layered-data.html And a blog post with some commentary here: http://www.ldodds.com/blog/2012/05/layered-data-a-paper-some-commentary/ Cheers, L.
Re: Datatypes with no (cool) URI
(apologies if this is a re-post, I don't think it made it through y'day) Hi On Tue, Apr 3, 2012 at 6:29 PM, Dave Reynolds wrote: > On 03/04/12 16:38, Sarven Capadisli wrote: >> >> On 12-04-03 02:33 PM, Phil Archer wrote: >>> >>> I'm hoping for a bit of advice and rather than talk in the usual generic >>> terms I'll use the actual example I'm working on. >>> >>> I want to define the best way to record a person's sex (this is related >>> to the W3C GLD WG's forthcoming spec on describing a Person [1]). To >>> encourage interoperability, we want people to use a controlled >>> vocabulary and there are several that cover this topic. ... >> >> Perhaps I'm looking at your problem the wrong way, but have you looked >> at the SDMX Concepts: >> >> http://purl.org/linked-data/sdmx/2009/code#sex >> >> -Sarven >> > > I was going to suggest that :) +1. A custom datatype doesn't seem correct in this case. Treating gender as a category/classification captures both the essence that there's more than one category & that people may differ in how they would assign classifications. I wrote a bit about Custom Datatypes here: http://patterns.dataincubator.org/book/custom-datatype.html This use case aside, there ought to be more information to guide people towards how to do this correctly. See also: http://www.w3.org/TR/swbp-xsch-datatypes/ Cheers, L.
Re: Document Action: 'The Hypertext Transfer Protocol (HTTP) Status Code 308 (Permanent Redirect)' to Experimental RFC (draft-reschke-http-status-308-07.txt)
Hi James, On Tue, Mar 27, 2012 at 2:15 AM, James Leigh wrote: > Could this 308 (Permanent Redirect) give us a way to cache a probe URI's > definition document location? > > An issue people have with httpRange-14 is that 303 redirects can't be > cached. If we could agree to use a 308 response as a cache-able > alternative to 303, we could reduce server load and speed client URI > processing (by caching the result of a probe URI). I'm missing how that would help, could you elaborate? The semantics of that response code is that the resource has permanently moved, that seems very different to a 303. A strict reading and application of the rules would suggest that the new URI should be considered a replacement of the original, so sameAs, rather than "a description of". L.
Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14
Hi, On Tue, Mar 27, 2012 at 2:02 PM, Jonathan A Rees wrote: > ... > There is a difference, since what is described could be an IR that > does not have the description as content. A prime example is any DOI, > e.g. > > http://dx.doi.org/10.1371/journal.pcbi.1000462 > > (try doing conneg for RDF). The identified resource is an IR as you > suggest, but the representation (after the 303 redirect) is not its > content. A couple of comments here: 1. Its not any DOI. I believe CrossRef are still the only registrar that support this, but I might have missed an announcement. That's still 50m DOIs though 2. Are you sure its an Information Resource? The DOI handbook [1] notes that while typically used to identify intellectual property a DOI can be used to identify anything. The CrossRef guidelines [2] explain that "[a]s a matter of current policy, the CrossRef DOI identifies the work, not its various potential manifestations...". Is a FRBR work an Information Resource? Personally I'd say not, but others may disagree. But as Dan Brickley has noted elsewhere in the discussion, there's other nuances to take into account. [1]. http://www.doi.org/handbook_2000/intro.html#1.6 [2]. http://crossref.org/02publishers/15doi_guidelines.html Cheers, L.
Re: What would break? Re: httpRange-14
Hi, On Mon, Mar 26, 2012 at 7:59 PM, Kingsley Idehen wrote: > On 3/26/12 2:09 PM, Leigh Dodds wrote: >> >> Hi Kingsley, >> >> On Mon, Mar 26, 2012 at 6:38 PM, Kingsley Idehen >> wrote: >>> >>> ... >>> Leigh, >>> >>> Everything we've built in the Linked Data realm leverages the findings of >>> HttpRange-14 re. Name/Address (Reference/Access) disambiguation. Our >>> Linked >>> Data clients adhere to these findings. Our Linked Data servers do the >>> same. >> >> By "we" I assume you mean OpenLink. Here's where I asked the original >> question [1]. Handily Ian Davis published an example resource that >> returns a 200 OK when you de-reference it [2]. > > Support was done (basically reusing our old internal redirection code) > whenever that post was made by Ian. > >> >> I just tested that in URI Burner [3] and it gave me broadly what I'd >> expect, i.e. the resources mentioned in the resulting RDF. I didn't >> see any visible breakage. Am I seeing fall-back behaviour? > > > As per comment above its implemented. We have our own heuristic for handling > self-describing resources. My concern is that what we've done isn't the norm > i.e., I don't see others working that way, instinctively. You have to be > over the Linked Data comprehension hump to be in a position emulate what > we've done. OK, I thought you might have done, so thanks for the confirmation. But this further demonstrates that we don't necessarily need redirects. >> >> Are people really testing status codes and changing subsequent >> processing behaviour because of that? It looks like there's little or >> no breakage in Sindice for example [3]. >> >> Based on Tim's comments he has been doing that, are other people doing >> the same? And if you have to ask if we're not, then who is this ruling >> benefiting? > > We do the same, but we also go beyond (i.e., what you call a fall-back). Would you care to elaborate on that? i.e: what inferences are you deriving from the protocol interaction? I can see that for a .txt document you are inferring that its a foaf:Document [1]. I'm still also interested to hear from others. [1]. http://linkeddata.uriburner.com/about/html/http/www.gutenberg.org/files/76/76.txt Cheers, L.
Re: What would break? Re: httpRange-14
Hi Kingsley, On Mon, Mar 26, 2012 at 6:38 PM, Kingsley Idehen wrote: > ... > Leigh, > > Everything we've built in the Linked Data realm leverages the findings of > HttpRange-14 re. Name/Address (Reference/Access) disambiguation. Our Linked > Data clients adhere to these findings. Our Linked Data servers do the same. By "we" I assume you mean OpenLink. Here's where I asked the original question [1]. Handily Ian Davis published an example resource that returns a 200 OK when you de-reference it [2]. I just tested that in URI Burner [3] and it gave me broadly what I'd expect, i.e. the resources mentioned in the resulting RDF. I didn't see any visible breakage. Am I seeing fall-back behaviour? To answer your other question, I do understand the benefits that can acrue from having separate URIs for a resource and its description. I also see arguments for not always requiring both. As a wider comment and question to the list, I'll freely admit that what I've always done when fetching Linked Data is let my HTTP library just follow redirects. Not to deal with 303s specifically, but because that's just good user agent behaviour. I've always assumed that everyone else does the same. But maybe I'm wrong or in the minority. Are people really testing status codes and changing subsequent processing behaviour because of that? It looks like there's little or no breakage in Sindice for example [3]. Based on Tim's comments he has been doing that, are other people doing the same? And if you have to ask if we're not, then who is this ruling benefiting? Tim, could you share more about what application behaviour your inferences support? Are those there to support specific features for users? Cheers, L. [1]. http://www.mail-archive.com/public-lod@w3.org/msg06735.html [2]. http://iandavis.com/2010/303/toucan [3]. http://linkeddata.uriburner.com/about/html/http/iandavis.com/2010/303/toucan [4]. http://www.mail-archive.com/public-lod@w3.org/msg06746.html
Re: Middle ground change proposal for httpRange-14
Hi David, On Sun, Mar 25, 2012 at 6:50 PM, David Wood wrote: > Hi David, > > *sigh*. I said recently that I would rather chew my arm off than re-engage > with http-range-14. Apparently I have very little self control. > > On Mar 25, 2012, at 11:54, David Booth wrote: >> Jeni, Ian, Leigh, Nick, Hugh, Steve, Masahide, Gregg, Niklas, Jerry, >> Dave, Bill, Andy, John, Ben, Damian, Thomas, Ed Summers and Davy, >> >> I have drafted what I think may represent a middle ground change >> proposal and I am wondering if something along this line would also meet >> your concerns: >> http://www.w3.org/wiki/UriDefinitionDiscoveryProtocol >> > >> Highlights of this proposal: >> - It enables a URI owner to unambiguously convey any URI definition to >> an interested client. > > +1 to this. I have long been a fan of unambiguous definition. The summary > argument against is Leigh Dodd's > "show what is actually broken" approach and the summary argument for is my > "we need to invent new ways to associate RDF > with other Web resources in a discoverable manner to allow for > 'follow-your-nose' across islands of Linked Data." I may be misreading you here, but I'm not against unambiguous definition. My "show what is actually broken" comment (on twitter) was essentially the same question as I've asked here before, and as Hugh asked again recently: what applications currently rely on httprange-14 as it is written today. That useful so we can get a sense of what would break with a change. So far there's been 2 examples I think. That's in contrast to a lot of publisher data (but granted, not yet quantified as to how much) that breaks the rules of httprange-14. I'd prefer to fix that even if at the cost of breaking a few apps. But we all know there are very, very few apps that consume Linked Data today, so changing client expectations isn't a massive problem. Identifying a set of publishing patterns that identify how publishers can reduce ambiguity, and advice for clients on how to tread carefully in the face of ambiguity and inconsistency is a better starting point IMHO. The goal there being to encourage more unambiguous publishing of data, by demonstrating value at every step. Cheers, L.
Re: Change Proposal for HttpRange-14
Hi Tim, On Sun, Mar 25, 2012 at 8:26 PM, Tim Berners-Lee wrote: > ... > For example, To take an arbitrary one of the trillions out there, what does > http://www.gutenberg.org/catalog/world/readfile?fk_files=2372108&pageno=11 > identify, there being no RDF in it? > What can I possibly do with that URI if the publisher has not explicitly > allowed me to use it > to refer to the online book, under your proposal? You can do anything you want with it. You could use record statements about your HTTP interactions, e.g. retrieval status & date. Or, because RDF lets anyone, say anything, anywhere, you could just decide to use that as the URI for the book and annotate it accordingly. The obvious caveat and risk is that the publisher might subsequently disagree with you if they do decide to publish some RDF. I can re-use your data if I decide that risk is acceptable and we can still usefully interact. Even if Gutenberg.org did publish some RDF at that URI, you still have the risk that they could change their mind at a later date. httprange-14 doesn't help at all there. Lack of precision and inconsistency is going to be rife whatever form the URIs or response codes used. Encouraging people to say what their URIs refer to is the very first piece of best practice advice. L.
Re: Where to put the knowledge you add
Hi Hugh, On 12 October 2011 12:55, Hugh Glaser wrote: > > Hi. > > I have argued for a long time that the linkage data (in particular owl:sameAs > and similar links) should not usually be mixed with the > knowledge being published. As an experiment I've added some new dbpedia datasets to Kasabi: Dbpedia Links: http://kasabi.com/dataset/dbpedia-links Which is just the external link datasets (which actually include some type assertions too) Dbpedia Core: http://kasabi.com/dataset/dbpedia-core Which is just the core english datasets And then the dbpedia english dataset which layers together these two into a single dataset: http://kasabi.com/dataset/dbpedia That gives some choice over whether you want external links. I'm also considering some other subsets (e.g. places and people). To help flag up linksets I've also added a "Linking" category in Kasabi to group together datasets that purely exist to link between others. Cheers, L. [1]. http://kasabi.com/browse/datasets/results/og_category%3A5603 -- Leigh Dodds Product Lead, Kasabi Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )
Hi, On 21 October 2011 08:47, Dave Reynolds wrote: > ... >> On 20 October 2011 10:34, Dave Reynolds wrote: >>> >>> ... >>> If you have two resources and later on it turns out you only needed one, >>> no big deal just declare their equivalence. If you have one resource >>> where later on it turns out you needed two then you are stuffed. >> >> Ed referred to "refactoring". So I'm curious about refactoring from a >> single URI to two. Are developers necessarily stuffed, if they start >> with one and later need two? >> >> For example, what if I later changed the way I'm serving data to add a >> Content-Location header (something that Ian has raised in the past, >> and Michael has mentioned again recently) which points to the source >> of the data being returned. >> >> Within the returned data I can include statements about the document >> at that URI referred to in the Content-Location header. >> >> Doesn't that kind of refactoring help? > > Helps yes, but I don't think it solves everything. > > Suppose you have been using http://example.com/lovelypictureofm31 to denote > M31. Some data consumers use your URI to link their data on M31 to it. Some > other consumers started linking to it in HTML as an IR (because they like > the picture and the accompanying information, even though they don't care > about the RDF). Now you have two groups of users treating the URI in > different ways. This probably doesn't matter right now but if you decide > later on you need to separate them then you can't introduce a new URI > (whether via 303 or content-location header) without breaking one or other > use. Not the end of the world but it's not a refactoring if the test cases > break :) > > Does that make sense? No, I'm still not clear. If I retain the original URI as the identifier for the galaxy and add either a redirect or a Content-Location, then I don't see how I break those linking their data to it as their statements are still made about the original URI. But I don't see how I'm breaking people linking to it as if it were an IR. That group of people are using my resource ambiguously in the first place. Their links will also still resolve to the same content. L. -- Leigh Dodds Product Lead, Kasabi Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )
Hi, On 20 October 2011 23:19, Kingsley Idehen wrote: > On 10/20/11 5:31 PM, Dave Reynolds wrote: >> >> What's more I really don't think the issues is about not understanding >> about the distinction (at least in the clear cut cases). Most people I >> talk to grok the distinction, the hard bit is understanding why 303 >> redirects is a sensible way of making it and caring about it enough to >> put those in place. > > What about separating the concept of "indirection" from its actual > mechanics? Thus, conversations about benefits will then have the freedom to > blossom. > > Here's a short list of immediately obvious benefits re. Linked Data (at any > scale): > > 1. access to data via data source names -- millions of developers world wide > already do this with ODBC, JDBC, ADO.NET, OLE DB etc.. the only issue is > that they are confined to relational database access and all its > shortcomings > > 2. integration of heterogeneous data sources -- the ability to coherently > source and merge disparately shaped data culled from a myriad of data > sources (e.g. blogs, wikis, calendars, social media spaces and networks, and > anything else that's accessible by name or address reference on a network) > > 3. crawling and indexing across heterogeneous data sources -- where the end > product is persistence to a graph model database or store that supports > declarative query language access via SPARQL (or even better a combination > of SPARQL and SQL) > > 4. etc... > > Why is all of this important? > Data access, integration, and management has been a problem that's straddled > every stage of computer industry evolution. Managers and end-users always > think about data conceptually, but continue to be forced to deal with > access, integration, and management in application logic oriented ways. In a > nutshell, applications have been silo vectors forever, and in doing so they > stunt the true potential of computing which (IMHO) is ultimately about our > collective quests for improved productivity. > > No matter what we do, there are only 24 hrs in a day. Most humans taper out > at 5-6 hrs before physiological system faults kick in, hence our implicit > dependency of computers for handling voluminous and repetitive tasks. > > Are we there yet? > Much closer that most imagine. Our biggest hurdle (as a community of Linked > Data oriented professionals) is a protracted struggle re. separating > concepts from implementation details. We burn too much time fighting > implementation details oriented battles at the expense of grasping core > concepts. Maybe I'm wrong but I think people, especially on this list, understanding the overall benefits you itemize. The reason we talk about implementation details is they're important to help people adopt the technology: we need specific examples. We get the benefits you describe from inter-linked dereferenceable URIs, regardless of what format or technology we use to achieve it. Using the RDF model brings additional benefits. What I'm trying to draw out in this particular thread is specific benefits the #/303 additional abstraction brings. At the moment, they seem pretty small in comparison to the fantastic benefits we get from data integrated into the web. Cheers, L. -- Leigh Dodds Product Lead, Kasabi Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )
Hi Dave, Thanks for the response, there's some good examples in there. I'm glad that this thread is bearing fruit :) I had a question about one aspect, please excuse the clipping: On 20 October 2011 10:34, Dave Reynolds wrote: > ... > If you have two resources and later on it turns out you only needed one, > no big deal just declare their equivalence. If you have one resource > where later on it turns out you needed two then you are stuffed. Ed referred to "refactoring". So I'm curious about refactoring from a single URI to two. Are developers necessarily stuffed, if they start with one and later need two? For example, what if I later changed the way I'm serving data to add a Content-Location header (something that Ian has raised in the past, and Michael has mentioned again recently) which points to the source of the data being returned. Within the returned data I can include statements about the document at that URI referred to in the Content-Location header. Doesn't that kind of refactoring help? Presumably I could also just drop in a redirect and adopt the current 303 pattern without breaking anything? Again, I'm probably missing something, but I'm happy to admit ignorance if that draws out some useful discussion :) Cheers, L. -- Leigh Dodds Product Lead, Kasabi Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )
Hi, On 20 October 2011 13:25, Ed Summers wrote: > On Wed, Oct 19, 2011 at 12:59 PM, Leigh Dodds wrote: >> So, can we turn things on their head a little. Instead of starting out >> from a position that we *must* have two different resources, can we >> instead highlight to people the *benefits* of having different >> identifiers? That makes it more of a best practice discussion and one >> based on trade-offs: e.g. this class of software won't be able to >> process your data correctly, or you'll be limited in how you can >> publish additional data or metadata in the future. >> >> I don't think I've seen anyone approach things from that perspective, >> but I can't help but think it'll be more compelling. And it also has >> the benefits of not telling people that they're right or wrong, but >> just illustrate what trade-offs they are making. > > I agree Leigh. The argument that you can't deliver an entity like a > Galaxy to someone's browser sounds increasingly hollow to me. Nobody > really expects that, and the concept of a Representation from > WebArch/REST explains it away to most technical people. Plus, we now > have examples in the wild like OpenGraphProtocol that seem to be > delivering drinks, politicians, hotels, etc to machine agents at > Facebook just fine. It's the arrival of the OpenGraphProtocol which I think warrants a more careful discussion. It seems to me that we no longer have to try so hard to convince people that giving things de-referencable URIs that return useful data. It's happening now, and there's immediate and obvious benefit, i.e. integration with facebook, better searching ranking, etc. > But there does seem to be a valid design pattern, or even refactoring > pattern, in httpRange-14 that is worth documenting. Refactoring is how I've been thinking about it too. i.e. under what situations might you want to have separate URIs for its resource and its description? Dave Reynolds has given some good examples of that. > Perhaps a good > place would be http://patterns.dataincubator.org/book/? I think > positioning httpRange-14 as a MUST instead of a SHOULD or MAY made a > lot of sense to get the LOD experiment rolling. It got me personally > thinking about the issue of identity in a practical way as I built web > applications, that I probably wouldn't otherwise have otherwise done. > But it would've been easier if grappling with it was optional, and > there were practical examples of where it is useful, instead of having > it be an issue of dogma. My personal viewpoint is that it has to be optional, because there's already a growing set of deployed examples of people not doing it (OGP adoption), so how can we help those users understand the pitfalls and/or the benefits of a slightly cleaner approach. We can also help them understand how best to publish data to avoid mis-interpretation. Simplify ridiculously just to make a point, we seem to have the following situation: * Create de-referencable URIs for things. Describe them with OGP and/or Schema.org Benefit: Facebook integration, SEO * Above plus addition # URIs or 303s. Benefit: ability to make some finer-grained assertions in some specific scenarios. Tabulator is happy Cheers, L. -- Leigh Dodds Product Lead, Kasabi Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )
Hi, On 19 October 2011 23:10, Jonathan Rees wrote: > On Wed, Oct 19, 2011 at 5:29 PM, Leigh Dodds wrote: >> Hi Jonathan >> >> I think what I'm interested in is what problems might surface and >> approaches for mitigating them. > > I'm sorry, the writeup was designed to do exactly that. In the example > in the "conflict" section, a miscommunication (unsurfaced > disagreement) leads to copyright infringement. Isn't that a problem? Yes it is, and these are the issues I think that are worth teasing out. I'm afraid though that I'll have to admit to not understanding your specific example. There's no doubt some subtlety that I'm missing (and a rotten head cold isn't helping). Can you humour me and expand a little? The bit I'm struggling with is: [[[ <http://example/x> xhv:license <http://creativecommons.org/licenses/by/3.0/>. According to D2, this says that document X is licensed. According to S2, this says that document Y is licensed ]]] Taking the RDF data at face value, I don't see how the D2 and S2 interpretations differ. Both say that <http://example/x> has a specific license. How could an S2 assuming client, assume that the data is actually about another resource? I looked at your specific examples, e.g. Flickr and Jamendo: The RDFa extracted from the Flickr photo page does seem to be ambiguous. I'm guessing the intent is to describe the license of the photo and not the web page. But in that case, isn't the issue that Flickr aren't being precise enough in the data they're returning? The RDFa extracted from the Jamendo page including type information (from the Open Graph Protocol) that says that the resource is an album, and has a specific Creative Commons license. I think that's what's intended isn't it? Why does a client have to assume a specific stance (D2/S2). Why not simply takes the data returned at face value? It's then up to the publisher to be sure that they're making clear assertions. > There is no heuristic that will tell you which of the two works is > licensed in the stated way, since both interpretations are perfectly > meaningful and useful. > > For mitigation in this case you only have a few options > 1. precoordinate (via a "disambiguating" rule of some kind, any kind) > 2. avoid using the URI inside <...> altogether - come up with distinct > wads of RDF for the 2 documents > 3. say locally what you think <...> means, effectively treating these > URIs as blank nodes Cheers, L. -- Leigh Dodds Product Lead, Kasabi Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )
Hi, On 19 October 2011 23:36, Nathan wrote: > Leigh Dodds wrote: >> >> On 19 October 2011 20:48, Kingsley Idehen wrote: >>> >>> On 10/19/11 3:16 PM, Leigh Dodds wrote: >>>> >>>> RFC3983: >>>> >>>> "A Uniform Resource Identifier (URI) is a compact sequence of >>>> characters that identifies an abstract or physical resource." >>> >>> Yes, I agree with that. >>>> >>>> 2 URIs, therefore 2 resources. >>> >>> I disagree with your interpretation though. >> >> But I'm not interpreting anything there. The definition is a URI >> identifies a resource. Ergo two different URIs identify two resources. > > Nonsense, and I'm surprised to hear it. > > Given two distinct URIs the most you can determine is that you have two > distinct URIs. > > You do not know how many resources are identified, there may be no > resources, one, two, or full sets of resources. > > Do see RFC3986, especially the section on equivalence. > OK, so maybe there is interpretation here :) My reading is that, without additional knowledge, we should assume that different URIs identify different resources. I think the wording of RFC 3986 is fairly clear that a URI identifies a resource, so assuming multiple resources for multiple URIs is fine - as a starting position. I do understand that two URIs can be aliases. The section on equivalence you refer to suggests ways to identify equivalence ranging from syntactic comparisons up to network protocol operations. The latter gives us additional information (status codes, headers) that can determine equivalence. To go back to Kingsley's original example, I don't see any equivalence of those URIs at the syntactic or network level L. -- Leigh Dodds Product Lead, Kasabi Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )
Hi, On 19 October 2011 20:48, Kingsley Idehen wrote: > On 10/19/11 3:16 PM, Leigh Dodds wrote: > >>> But you don't have two different resources. Please correct me if I am >>> reading you inaccurately here, but are you saying that: >>> >>> http://dbpedia.org/resource/Linked Data and >>> http://dbpedia.org/page/Linked >>> Data == two different resources? >>> >>> I see: >>> >>> 1. 2 URIs >>> 2. a generic URI (serving as a Name) and a purpose specific URI called a >>> URL >>> that serves as a data access address -- still two identifiers albeit >>> split >>> by function . >> >> RFC3983: >> >> "A Uniform Resource Identifier (URI) is a compact sequence of >> characters that identifies an abstract or physical resource." > > Yes, I agree with that. >> >> 2 URIs, therefore 2 resources. > > I disagree with your interpretation though. But I'm not interpreting anything there. The definition is a URI identifies a resource. Ergo two different URIs identify two resources. Whether those resources might be related to one another, or even equivalent is an entirely different matter. > Identifiers are names / handles. Thus, you have Names that resolve to actual > data albeit via different levels of indirection. > > http://dbpedia.org/resource/Linked_Data and > http://dbpedia.org/page/Linked_Data are routes to different representations > of the same data. /resource/ (handle or name) is an indirect access route > while /page/ is direct (address i.e., a location name) albeit with > representation specificity i.e., HTML in the case of DBpedia. > > I am very happy that we've been able to narrow our differing views to > something very concrete. Ultimately, we are going to arrive at clarity, and > that's all that matters to me, fundamentally. *That* all seems to be interpretation to me. Cheers, L. -- Leigh Dodds Product Lead, Kasabi Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )
Hi Jonathan On 19 October 2011 18:36, Jonathan Rees wrote: > On Wed, Oct 19, 2011 at 12:59 PM, Leigh Dodds wrote: > >> So, can we turn things on their head a little. Instead of starting out >> from a position that we *must* have two different resources, can we >> instead highlight to people the *benefits* of having different >> identifiers? That makes it more of a best practice discussion and one >> based on trade-offs: e.g. this class of software won't be able to >> process your data correctly, or you'll be limited in how you can >> publish additional data or metadata in the future. >> >> I don't think I've seen anyone approach things from that perspective, >> but I can't help but think it'll be more compelling. And it also has >> the benefits of not telling people that they're right or wrong, but >> just illustrate what trade-offs they are making. >> >> Is this not something we can do on this list? I suspect it'd be more >> useful than attempting to categorise, yet again, the problems of hash >> vs slash URIs. Although a canonical list of those might be useful to >> compile once and for all. >> >> Anyone want to start things off? > > Sure. http://www.w3.org/2001/tag/2011/09/referential-use.html Thanks for the pointer. That's an interesting document. I've read it once but need to digest it a bit further. The crux of the issue, and what I was getting at in this thread is what you refer to towards the end: "It is possible that D2 and S2 can be used side by side by different communities for quite a while before a collision of the sort described above becomes a serious interoperability problem. On the other hand, when the conflict does happen, it will be very painful." I think what I'm interested in is what problems might surface and approaches for mitigating them. I'm particularly curious whether heuristics might be used to disambiguate or remove conflict. >> As a leading question: does anyone know of any deployed semantic web >> software that will reject or incorrectly process data that flagrantly >> ignores httprange-14? > > Tabulator. Yes. That's the only piece of software I've heard of that has problems. -- Leigh Dodds Product Lead, Kasabi Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )
Hi, On 19 October 2011 18:44, Kingsley Idehen wrote: >> >> So, can we turn things on their head a little. Instead of starting out >> from a position that we *must* have two different resources, can we >> instead highlight to people the *benefits* of having different >> identifiers? > > But you don't have two different resources. Please correct me if I am > reading you inaccurately here, but are you saying that: > > http://dbpedia.org/resource/Linked Data and http://dbpedia.org/page/Linked > Data == two different resources? > > I see: > > 1. 2 URIs > 2. a generic URI (serving as a Name) and a purpose specific URI called a URL > that serves as a data access address -- still two identifiers albeit split > by function . RFC3983: "A Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resource." 2 URIs, therefore 2 resources. Cheers, L. -- Leigh Dodds Product Lead, Kasabi Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs)
Hi, I tried it with this URI and got an error: http://www.bbc.co.uk/programmes/b01102yg#programme Cheers, L. On 17 October 2011 11:41, Yang Squared wrote: > Following the HTTP-range-14 discussion, we developed a Semantic Web URIs > Validator named Hyperthing which helps to publish the Linked Data. We > particularly investigated what happens when we temporary and > permnent redirect (e.g. 301 and 302 redirections) of a Semantic Web URI (303 > and hash URI). > http://www.hyperthing.org/ > Hyperthing mainly functions for three purposes: > 1) It determines if the requested URI identifies a Real World Object or a > Web document; > 2) It checks whether the URIs publishing method follows the W3C hash URIs > and 303 URI practice; > 3) It can be used to check the validity of the chains of the redirection > between the Real World Object URIs and Document URIs to prevent the data > publisher mistakenly redirecting between these two kinds. (e.g. it checks > against redirection which include 301, 302 and 307) > For more information please read > Dereferencing Cool URI for the Semantic Web: What is 200 OK on the Semantic > Web? > http://dl.dropbox.com/u/4138729/paper/dereference_iswc2011.pdf > Any suggestion is welcome. -- Leigh Dodds Product Lead, Kasabi Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Explaining the benefits of http-range14 (was Re: [HTTP-range-14] Hyperthing: Semantic Web URI Validator (303, 301, 302, 307 and hash URIs) )
Hi, [Aside: changing the subject line so we can have a clearer discussion] On 17 October 2011 14:58, Norman Gray wrote: >... > I've done far fewer talks of this type than Tom has, but I've never found > anyone having difficulty here, either. Mind you, I never talk of > 'information resource' or httpRange-14. > > For what it's worth, I generally say something along the lines of "This URI, > X, is the name of a galaxy. If you put that URI into your > browser, you can't get the galaxy back, can you, because the galaxy is too > big to fit inside your computer. So something different has to > happen, doesn't it?" A remark about Last-Modified generally seals the deal. I've done the same, and people do quite often get it. At least for a few minutes :) I think my experience echoes Rob's more than Tom's. I've had more than one Linked Data talk/tutorial de-railed by debate and discussion of the issue when there are much more interesting aspects to explore. While I've not used the galaxy example, I have taken similar approaches. But I can also imagine saying, for example: "This URI, X, is the name of a galaxy. If you put that URI into your browser, obviously you can't get the galaxy back, can you. So when you request it, you get back a representation of it. You know, just like when you request a file from a web server you don't download the *actual* file, just a representation of it. Possibly in another format". And further, if someone asked about Last-Modified dates: "Last-Modified? Well as it turns out the Last-Modified date isn't defined to be the date that a resource last changed. It's up to the origin server to decide what it means. So for something like a galaxy, it can be the date of our last observation". My point being that web architecture already has a good explanation as to why real-world, or even digital things are passed around the internet. That's why we have the Resource and Representation abstractions in the first place. So, can we turn things on their head a little. Instead of starting out from a position that we *must* have two different resources, can we instead highlight to people the *benefits* of having different identifiers? That makes it more of a best practice discussion and one based on trade-offs: e.g. this class of software won't be able to process your data correctly, or you'll be limited in how you can publish additional data or metadata in the future. I don't think I've seen anyone approach things from that perspective, but I can't help but think it'll be more compelling. And it also has the benefits of not telling people that they're right or wrong, but just illustrate what trade-offs they are making. Is this not something we can do on this list? I suspect it'd be more useful than attempting to categorise, yet again, the problems of hash vs slash URIs. Although a canonical list of those might be useful to compile once and for all. Anyone want to start things off? As a leading question: does anyone know of any deployed semantic web software that will reject or incorrectly process data that flagrantly ignores httprange-14? Cheers, L. -- Leigh Dodds Product Lead, Kasabi Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Beyond the Triple Count
Hi, I did a talk at semtech this week about some ideas for improving how we document, publish and assess datasets. I've done a write-up which might be of interest: http://blog.kasabi.com/2011/09/28/beyond-the-triple-count/ Cheers, L. -- Leigh Dodds Product Lead, Kasabi Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: Question: Authoritative URIs for Geo locations? Multi-lingual labels?
Hi Kingsley, On 9 September 2011 15:20, Kingsley Idehen wrote: > On 9/9/11 8:58 AM, Leigh Dodds wrote: >> >> Hi, >> >> As well as the others already mentioned there's also Yahoo Geoplanet: >> >> http://beta.kasabi.com/dataset/yahoo-geoplanet >> >> This has multi-lingual labels and is cross-linked to the Ordnance >> Survey data, Dbpedia, but that could be improved. >> >> As for a list, there are currently 34 geography related datasets >> listed in Kasabi here: >> >> http://beta.kasabi.com/browse/datasets/results/og_category%3A147 > > Leigh, > > Can anyone access these datasets or must they obtain a kasabi account en > route to authenticated access? As I've said (repeatedly!) there's no authentication around any of Linked Data. That might be an option for publishers in future, but not during the beta and not for any of the open datasets which we've published currently. API keys are only required for the APIs, e.g. SPARQL, search, etc. The choice of authentication options will increase in future. So I encourage you to actually go and have a look. There's a direct link to the Linked Data views from every homepage. Here's a pointer to the blog post I wrote and circulated after our last discussion: http://blog.kasabi.com/2011/08/12/linked-data-in-kasabi/ Cheers, L. -- Leigh Dodds Product Lead, Kasabi Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: Question: Authoritative URIs for Geo locations? Multi-lingual labels?
Hi, As well as the others already mentioned there's also Yahoo Geoplanet: http://beta.kasabi.com/dataset/yahoo-geoplanet This has multi-lingual labels and is cross-linked to the Ordnance Survey data, Dbpedia, but that could be improved. As for a list, there are currently 34 geography related datasets listed in Kasabi here: http://beta.kasabi.com/browse/datasets/results/og_category%3A147 Cheers, L. On 8 September 2011 15:38, M. Scott Marshall wrote: > It seems that dbpedia is a de facto source of URIs for geographical > place names. I would expect to find a more specialized source. I think > that I saw one mentioned here in the last few months. Are there > alternatives that are possible more fine-grained or designed > specifically for geo data? With multi-lingual labels? Perhaps somebody > has kept track of the options on a website? > > -Scott > > -- > M. Scott Marshall > http://staff.science.uva.nl/~marshall > > On Thu, Sep 8, 2011 at 3:07 PM, Sarven Capadisli wrote: >> On Thu, 2011-09-08 at 14:01 +0100, Sarven Capadisli wrote: >>> On Thu, 2011-09-08 at 14:07 +0200, Karl Dubost wrote: >>> > # Using RDFa (not implemented in browsers) >>> > >>> > >>> > http://www.w3.org/2003/01/geo/wgs84_pos#"; id="places-rdfa"> >>> > >> > about="http://www.dbpedia.org/resource/Montreal"; >>> > geo:lat_long="45.5,-73.67">Montréal, Canada >>> > >> > about="http://www.dbpedia.org/resource/Paris"; >>> > geo:lat_long="48.856578,2.351828">Paris, France >>> > >>> > >>> > * Issue: Latitude and Longitude not separated >>> > (have to parse them with regex in JS) >>> > * Issue: xmlns with >>> > >>> > >>> > # Question >>> > >>> > On RDFa vocabulary, I would really like a solution with geo:lat and >>> > geo:long, Ideas? >>> >>> Am I overlooking something obvious here? There is lat, long properties >>> in wgs84 vocab. So, >>> >>> http://dbpedia.org/resource/Montreal";> >>> >> content="45.5" >>> datatype="xsd:float"> >>> >> content="-73.67" >>> datatype="xsd:float"> >>> Montreal >>> >>> >>> Tabbed for readability. You might need to get rid of whitespace. >>> >>> -Sarven >> >> Better yet: >> >> http://dbpedia.org/resource/Montreal";> >> > ... >> >> >> -Sarven > > -- Leigh Dodds Product Lead, Kasabi Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: CAS, DUNS and LOD (was Re: Cost/Benefit Anyone? Re: Vote for my Semantic Web presentation at SXSW)
Hi, On 24 August 2011 15:40, David Wood wrote: > On Aug 24, 2011, at 2:44, Leigh Dodds wrote: > >> Hi, >> >> On 23 August 2011 15:17, Gannon Dick wrote: >>> Either "Linked Data ecosystem" or "linked data Ecosystem" is a dangerously >>> flawed paradigm, IMHO. You don't "improve" MeSH by >>> flattening it, for example, it is what it is. Since CAS numbers are not a >>> directed graph, an algorithmic transform to a URI (which *is* a >>> directed graph) is risks the creation of a "new" irreconcilable taxonomy. >>> For example, Nitrogen is ok to breathe and liquid Nitrogen is a >>> not very practical way to chill wine. >> >> A URI isn't a directed graph. You can use them to build one by making >> statements though. >> >> Setting aside any copyright issues, the CAS identifiers are useful >> Natural Keys [1]. As they're well deployed, using them to create URIs >> [2] is sensible > > Hi Leigh, > > Right. Unfortunately it is also illegal :/ Yes, I read the first part of the thread! I was merely pointing out the useful patterns for projecting identifiers into URIs. Cheers, L. -- Leigh Dodds Programme Manager, Talis Platform Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: CAS, DUNS and LOD (was Re: Cost/Benefit Anyone? Re: Vote for my Semantic Web presentation at SXSW)
Hi, On 23 August 2011 15:17, Gannon Dick wrote: > Either "Linked Data ecosystem" or "linked data Ecosystem" is a dangerously > flawed paradigm, IMHO. You don't "improve" MeSH by > flattening it, for example, it is what it is. Since CAS numbers are not a > directed graph, an algorithmic transform to a URI (which *is* a > directed graph) is risks the creation of a "new" irreconcilable taxonomy. > For example, Nitrogen is ok to breathe and liquid Nitrogen is a > not very practical way to chill wine. A URI isn't a directed graph. You can use them to build one by making statements though. Setting aside any copyright issues, the CAS identifiers are useful Natural Keys [1]. As they're well deployed, using them to create URIs [2] is sensible as it simplifies the process of linking between datasets [3]. To answer Patrick's question, to help bridging between systems that only use the original literal version, rather than the URIs, then we should ensure that the literal keys are included in the data [4]. These are well deployed patterns and, from my experience, make it really simple and easy to bridge and link between different datasets and systems. Cheers, L. [1]. http://patterns.dataincubator.org/book/natural-keys.html [2]. http://patterns.dataincubator.org/book/patterned-uris.html [3]. http://patterns.dataincubator.org/book/shared-keys.html [4]. http://patterns.dataincubator.org/book/literal-keys.html -- Leigh Dodds Programme Manager, Talis Platform Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: New draft of Linked Data Patterns book
Hi, On 20 August 2011 16:01, Giovanni Tummarello wrote: > Seems pretty interesting, clearly out of practical experience ! Thanks Giovanni! Yes, I've been trying to apply practical experience wherever possible. I'm very keen on collecting useful application patterns that may help others build good RDF & Linked Data based apps. L. -- Leigh Dodds Programme Manager, Talis Platform Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
New draft of Linked Data Patterns book
Hi, There's a new draft of the Linked Data patterns book available, with 12 new patterns, mainly in the application patterns section. The latest version is available from here: http://patterns.dataincubator.org/book/ There are PDF and EPUB versions linked from the homepage. The source is also available in github at: https://github.com/ldodds/ld-patterns Cheers, L. -- Leigh Dodds Programme Manager, Talis Platform Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: Job: Data Engineer, Kasabi
Hi, Just a reminder to people that this job opening is still available. If you're interested in doing hands on work with a wide range of different data types, covering both free, open data & commercial datasets. Over time we expect to be doing more data analysis using Map-Reduce and Pregel, as well as interlinking and enrichment. We're looking for someone who is enthusiastic about working with, analysing, and demonstrating the value of data. If you want a hands-on role working with data, then this should definitely be of interest. More details at [1] or feel free to drop me an email with any questions or applications. [1] http://tbe.taleo.net/NA9/ats/careers/requisition.jsp?org=TALIS&cws=1&rid=41 Cheers, L. On 17 June 2011 16:22, Leigh Dodds wrote: > Hi, > > Short job advert: we're looking for someone to join the Kasabi team as > a Data Engineer. The role will involve working with RDF and Linked > Data so should be of interest to this community! > > More information at [1]. Feel free to get in touch with me personally > if you want more information. > > Cheers, > > L. > > [1] > http://tbe.taleo.net/NA9/ats/careers/requisition.jsp?org=TALIS&cws=1&rid=41 > > -- > Leigh Dodds > Programme Manager, Talis Platform > Mobile: 07850 928381 > http://kasabi.com > http://talis.com > > Talis Systems Ltd > 43 Temple Row > Birmingham > B2 5LS > -- Leigh Dodds Programme Manager, Talis Platform Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: DBpedia: limit of triples
Hi, On 9 August 2011 11:26, Jörn Hees wrote: > ... > I also guess it would be better to construct the given document first from > the outgoing triples, maybe preferring the ontology mapped triples, and then > incoming links up to a 2000 triples limit (if necessary to limit bandwidth). > That would fit the description in the above mentioned section way better than > the current implementation. You could also try a mirror to see if that provides better facilities, e.g. [1] Cheers, L. [1]. http://beta.kasabi.com/dataset/dbpedia-36 -- Leigh Dodds Programme Manager, Talis Platform Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: Get your dataset on the next LOD cloud diagram
Hi, On 13 July 2011 14:30, Kingsley Idehen wrote: > Can you ping me or reply to this list with a list of missing SPARQL > endpoints. Alternatively, you bookmark them on del.icio.us using tag: > sparql_endpoint. > > Here is my collection: http://www.delicious.com/kidehen/sparql_endpoint . The data is all in a machine-readable form. See: http://data.kasabi.com/datasets The URI supports conneg so you can follow rdfs:seeAlso links to all of the VoiD descriptions and hence to the sparql endpoints, plus all of the other APIs. It'd be nice if the LD cloud diagram used other machine-readable sources where possible. I know CKAN is a good focal point for helping curate activity, but also frustrating to have to copy data around whether manually or otherwise. Cheers, L. -- Leigh Dodds Programme Manager, Talis Platform Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: Get your dataset on the next LOD cloud diagram
Hi, On 13 July 2011 13:05, Bernard Vatant wrote: > Re. availability, just a reminder of SPARQL Endpoints Status service > http://labs.mondeca.com/sparqlEndpointsStatus/index.html > As of today 80% (192/240) endpoints registered at CKAN are up and running. > Monitor grey dots (still alive?) for candidate passed out datasets ... Well as Kingsley pointed out SPARQL is only one metric. Whether the URIs still resolve is arguably most important for the Linked Data diagram, but service availability is a good thing to monitor. However its also worth noting that there are mirrors of a number of datasets. E.g. we have 70+ datasets in Kasabi, some new to the cloud, some of which are mirrors. Not all (any?) of those SPARQL endpoints are on your list. Cheers, L. -- Leigh Dodds Programme Manager, Talis Platform Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: Get your dataset on the next LOD cloud diagram
Hi, On 12 July 2011 18:45, Pablo Mendes wrote: > Dear fellow Linked Open Data publishers and consumers, > We are in the process of regenerating the next LOD cloud diagram and > associated statistics [1]. > ... This email prompted a discussion about how to the data collection or diagram could be improved or updated. As CKAN is an open platform and anyone can add additional tags to datasets, why doesn't everyone who is interested in seeing a particular improvement or alternate view of the data just go ahead and do it? There's no need to require all this to be done by one team on a fixed schedule. Some light co-ordination between people doing similar analyses would be worthwhile, but it wouldn't be hard to, e.g. tag datasets based on whether their Linked Data or SPARQL endpoint is available regularly, whether they're currently maintained, or (my current bug bear) whether the data dumps they publish parse with more than one tool chain. It'd be nice to see many different aspects of the cloud being explored. Cheers, L. -- Leigh Dodds Programme Manager, Talis Platform Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: WebID vs. JSON (Was: Re: Think before you write Semantic Web crawlers)
Hi, On 22 June 2011 15:41, William Waites wrote: > What does WebID have to do with JSON? They're somehow representative > of two competing trends. > > The RDF/JSON, JSON-LD, etc. work is supposed to be about making it > easier to work with RDF for your average programmer, to remove the > need for complex parsers, etc. and generally to lower the barriers. > > The WebID arrangement is about raising barriers. Not intended to be > the same kind of barriers, certainly the intent isn't to make > programmer's lives more difficult, rather to provide a good way to do > distributed authentication without falling into the traps of PKI and > such. > > While I like WebID, and I think it is very elegant, the fact is that I > can use just about any HTTP client to retrieve a document whereas to > get rdf processing clients, agents, whatever, to do it will require > quite a lot of work [1]. This is one reason why, for example, 4store's > arrangement of /sparql/ for read operations and /data/ and /update/ > for write operations is *so* much easier to work with than Virtuoso's > OAuth and WebID arrangement - I can just restrict access using all of > the normal tools like apache, nginx, squid, etc.. > > So in the end we have some work being done to address the perception > that RDF is difficult to work with and on the other hand a suggestion > of widespread putting in place of authentication infrastructure which, > whilst obviously filling a need, stands to make working with the data > behind it more difficult. > > How do we balance these two tendencies? By recognising that often we just need to use existing technologies more effectively and more widely, rather than throw more technology at a problem, thereby creating an even greater education and adoption problem? Cheers, L. -- Leigh Dodds Programme Manager, Talis Platform Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Job: Data Engineer, Kasabi
Hi, Short job advert: we're looking for someone to join the Kasabi team as a Data Engineer. The role will involve working with RDF and Linked Data so should be of interest to this community! More information at [1]. Feel free to get in touch with me personally if you want more information. Cheers, L. [1] http://tbe.taleo.net/NA9/ats/careers/requisition.jsp?org=TALIS&cws=1&rid=41 -- Leigh Dodds Programme Manager, Talis Platform Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: Squaring the HTTP-range-14 circle
Hi, On 17 June 2011 15:32, Kingsley Idehen wrote: > On 6/17/11 3:11 PM, Leigh Dodds wrote: >> >> I just had to go and check whether Amazon reviews and Facebook >> comments actually do have their own pages. That's because I've never >> seen them presented as anything other than objects within another >> container, either in a web page or a mobile app. So I think you could >> argue that when people are "linking" and marking things as useful, >> they're doing that on a more general abstraction, i.e. the "Work" (to >> borrow FRBR terminology) not the particular web page. > > You have to apply context to your statement above. Is the context: WWW as an > Information space or Data Space? I can't answer that because I don't know what you mean by those terms. It's just a web of resources as far as I'm concerned. Cheers, L. -- Leigh Dodds Programme Manager, Talis Platform Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: Squaring the HTTP-range-14 circle
Hi, On 17 June 2011 14:04, Tim Berners-Lee wrote: > > On 2011-06 -17, at 08:51, Ian Davis wrote: >> ... >> >> Quite. When a facebook user clicks the "Like" button on an IMDB page >> they are expressing an opinion about the movie, not the page. > > BUT when the click a "Like" button on a blog they are expressing they like the > blog, not the movie it is about. > > AND when they click "like" on a facebook comment they are > saying they like the comment not the thing it is commenting on. > > And on Amazon people say "I found this review useful" to > like the review on the product being reviewed, separately from > rating the product. > So there is a lot of use out there which involves people expressing > stuff in general about the message not its subject. Well even that's debatable. I just had to go and check whether Amazon reviews and Facebook comments actually do have their own pages. That's because I've never seen them presented as anything other than objects within another container, either in a web page or a mobile app. So I think you could argue that when people are "linking" and marking things as useful, they're doing that on a more general abstraction, i.e. the "Work" (to borrow FRBR terminology) not the particular web page. And that's presumably the way that Facebook and Amazon see it too because that data is associated with the status or review in whichever medium I look at it (page or app). Cheers, L. -- Leigh Dodds Programme Manager, Talis Platform Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: {Disarmed} Re: Squaring the HTTP-range-14 circle
Hi, On 13 June 2011 16:04, Christopher Gutteridge wrote: > <http://en.wikipedia.org/wiki/David_%28Michelangelo%29> > dc:creator<http://en.wikipedia.org/wiki/Michelangelo> . > > Did he make the statue or the webpage? Given that he died before the internet was invented, it'd probably be the statue. More data beats better algorithms :) L. -- Leigh Dodds Programme Manager, Talis Platform Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: Common RDF Vocabulary Labels Vocabulary
Hi, On 6 June 2011 10:00, Hugh Glaser wrote: > But hang on, is the web not about linking, rather than copying things around? Isn't this annotation, rather than copying? L. -- Leigh Dodds Programme Manager, Talis Platform Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: Common RDF Vocabulary Labels Vocabulary
Hi, On 6 June 2011 02:42, Christopher Gutteridge wrote: > +1 > > I would go further and suggest that you cut and paste in the property & > class definitions to provide a single file which can be translated to enable > core parts of the semweb in other languages. That's the approach I took with getting translations of the FOAF-a-Matic. I had a separate XML file with the text that contributors could just update and send back. Worked really well. A shared Google spreadsheet might work well as a lo-fi approach. But that assumes people will do a whole translation set or whole vocabulary in one go. Maybe it would be easier to do a few here and there. Strikes me it'd be a good case for a super-simple service: homepage shows a random property, prompts user to fill in a translation. Make it into a little game. Cheers, L. -- Leigh Dodds Programme Manager, Talis Platform Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Announce: Kasabi Public Beta, RDF data hosting and Linked Data publishing
Hi, If you'll forgive the product announcement, I wanted to let people know that we've launched the Kasabi Public Beta today. The announcement is here [1] and the beta side is accessible from [2] The beta includes RDF data hosting and Linked Data publishing, the API supports importing of RDFa data too. The dataset directory is available as RDF and there are VoiD descriptions of each dataset, e.g. [4]. We're using a simple experimental vocabulary extension to VoiD [5] to point to additional APIs relating to a dataset [5]. This is to allow clients to boot-strap discovery of services in a RESTful way. The site allows uses to create APIs either using the Linked Data API specification, or something that we're calling SPARQL Stored Procedures [6]. The goal is to support as broad a range of data access options as possible, and enable new ways for developers to share their skills. E.g. simplifying access to a dataset by creating a simpler API over a SPARQL query, or just sharing SPARQL queries for a particular dataset. I'd welcome feedback on any of these features. We have a separate developer list at [7] for more detailed discussion, but there are some general features and services which I think are of interest to this community :) Cheers, L. [1]. http://blog.kasabi.com/2011/06/03/kasabi-public-beta/ [2]. http://beta.kasabi.com [3]. http://data.kasabi.com/datasets [4]. http://data.kasabi.com/dataset/bricklink [5]. http://labs.kasabi.com/ns/services [6]. http://beta.kasabi.com/doc/api/sparql-stored-procedure [7]. http://groups.google.com/group/kasabi-dev -- Leigh Dodds Programme Manager, Talis Platform Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: ANN: SKOS implementation of the ACM Communication Classification System
Hi Christoph, Very happy to see more data appearing, congrats :) A brief aside on licensing... On 2 June 2011 09:27, Christoph Lange wrote: > ... > Note that we have not yet considered copyright issues -- but at least > preserved the original ACM copyright statement, which permits "personal or > classroom use". That's probably not enough for reasonable Linked Data > applications. I would be glad if someone familiar with the subject could > point out what to do. What did previous publishers of RDF versions of the > ACM CCS do? I won't reproduce the text here, and I'm not a lawyer, but the wording says "...to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permission to republish from..[address+email]". I think one reasonable thing that someone may want to do is mirror the data, e.g. to provide a public SPARQL endpoint or other services. Currently it doesn't look like I can do that without contacting the ACM directly, which I assume you've also done. It's not clear to me whether I could even copy parts of the data and index it to use in an application, as that potentially falls out side of the personal and classroom use. I fully support arguments to the effect of "use and seek forgiveness later" when using data, but as we see more and more commercial usage of Linked Data, I think we really need to see clearer licensing around data. Otherwise feels like we're building on uncertain ground. Cheers, L. -- Leigh Dodds Programme Manager, Talis Platform Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: For our UK readers
Lets hope that any fall-out doesn't come back to me as the person to whom errors are reported to! Arguably the generatorAgent and errorReportsTo predicate ought to be removed if you're done further hand editing/changes to the file, but I doubt anyone does that in practice. Cheers, L. On 24 May 2011 15:07, Hugh Glaser wrote: > http://who.isthat.org/id/CTB > > Have I got the RDF right? > Not sure foaf is the right thing for this. > Should there be a blank node somewhere in there? > Suggestions for improvements welcome. > > Hugh > > -- Leigh Dodds Programme Manager, Talis Platform Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: implied datasets
Hi William, On 23 May 2011 14:01, William Waites wrote: > ... > Then for each dataset that I have that uses the links to this space, I > count them up and make a linkset pointing at this imaginary dataset. > > Obviously the same strategy for anywhere there exist some kind of > standard identifiers that are not URIs in HTTP. > > Does this make sense? I'm not sure that the dataset is "imaginary", but what you're doing seems eminently sensible to me. I've been working on a little project that I hope to release shortly that aims to facilitate this kind of linking, especially where those non-URI identifiers, or Literal Keys [1] are used to build patterned URIs. > Can we sensibly talk about and even assert the existence of a dataset > of infinite size? (whatever "existence" means). I think so, we can assert what kinds of things it contains and describe it in general terms, even if we can't enumerate all of its elements. It may be more natural to thing of these more as services though than datasets. i.e. a service that accepts some keys as input and returns a set of assertions. In this case the assertions would be links to other datasets. > Is this an abuse of DCat/voiD? Not in my view, I think the notion of dataset is already pretty broad. > Are this class of datasets subsets of sameAs.org (assuming sameAs.org > to be complete in principle?) Subsets if they only asserted sameAs links, but I think you're suggesting that this may be too strict. I think there's potentially a whole set of related "predicate based services" [2] that provide useful indexes of existing datasets, or expose additional annotations of extra sources. The project I've been working on facilitates not just sameAs links, but any form of links that can be derived from shared URI patterns. This would include topic/subject based linking. ISBN was one the use cases I had in mind, but here are others. Cheers, L. [1]. http://patterns.dataincubator.org/book/literal-keys.html [2]. http://www.ldodds.com/blog/2010/03/predicate-based-services/ Cheers, L. -- Leigh Dodds Programme Manager, Talis Platform Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Navigating Data (was Re: Take2: 15 Ways to Think About Data Quality (Just for a Start) )
Hi, Changed subject line to match topic: On 15 April 2011 14:47, glenn mcdonald wrote: > This reminds me to come back to the point about what I initially > called Directionality, and Dave improved to Modeling Consistency. > ... > - But even in RDF, directionality poses a significant discovery > problem. In a minimal graph (let's say "minimal graph" means that each > relationship is asserted in only one direction, so there's no > relationship redundancy), you can't actually explore the data > navigationally. You can't go to a single known point of interest, like > a given president, and explore to find out everything the data holds > and how it connects... Doesn't this really depend on how the navigational interface is constructed? If we're looking purely at Linked Data views created using a Concise Bounded Description, then yes I agree, if there are no "back links" in the data, then navigation is problematic. But if we use different algorithms to describe the views, or supplement it with SPARQL queries, then those navigational links can be presented, e.g. "other resources that refer to this resources". I think as you noted elsewhere inverse links could also be inferred based on the schema. This simplifies the navigation UI as the links are part of the data. > ...You can explore the *outward* relationships from > any given point, but to find out about the *inward* relationships you > have to keep doing new queries over the entire dataset. Yes. > ...The same basic > issue applies to an XML representation of the data as a tree: you can > squirrel your way down, but only in the direction the original modeler > decided was "down". If you need a different direction, you have to > hire a hypersquirrel. Well an XML node typically has a reference to its parent (it does in the DOM anyway) so moving back up the tree is easy. > - Of course, most RDF-presenting systems recognize this as a usability > problem, and address it by turning the minimal graph into a redundant > graph for UI purposes. Thus in a data-browser UI you usually see, for > a given node, lists of both outward and inward relationships. This is > better, but if this abstraction is done at the UI layer, you still > lose it once you drop down into the SPARQL realm. This makes the > SPARQL queries harder to write, because you can't write them the way > you logically think about the question, you have to write them the way > the data thinks about the question. And this skew from real logic to > directional logic can make them *much* harder to understand or > maintain, because the directionality obscures the purpose and reduces > the self-documenting nature of the query. Assuming you don't materialize the inferences directly in the data, then isn't the answer to have both the SPARQL endpoint and the navigational UI use the same set of inferred data? > All of this is *much* better, in usability terms, if the data is > redundantly, bi-directionally connected all the way down to the level > of abstraction at which you're working. Now you can explore to figure > out what's there, and you can write your queries in the way that makes > the most human sense. The artificicial skew between the logical > structure and the representational structure has been removed. This is > perfectly possible in an RDF-based system, of course, if the software > either generates or infers the missing inverses. We incur extra > machine overhead to reduce the human congnitive burden. I contend this > should be considered a nearly-mandatory best-practice for linked data, > and that propogating inverses around the LOD cloud ought to be one of > things that makes the LOD cloud *a thing*, rather than just a > collection of logical silos. The same problem exists on the document web: it can be useful to know what links to a specific page. There are various techniques to help address that, e.g. centralized indexes that can expose more of the graph (Google) or point-to-point mechanisms for notifying links (e.g. Pingback, etc). With RDF system we may be able to infer some extra links, buth with Linked Data we can't infer all of them, so we have the same issue and can deploy very similar infrastructure to solve the problem. Currently we have SameAs.org, which is specialized for one type of linking, but it'd be nice to see others [1]. And there have been experiments with various pingback/notification services for Linked Data. Are any of the latter being widely deployed/used? Cheers, L. [1]. http://www.ldodds.com/blog/2010/03/predicate-based-services/ -- Leigh Dodds Programme Manager, Talis Platform Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: Why does rdf-sparql-protocol say to return 500 when refusing a query?
Hi, On 27 April 2011 11:18, Alexander Dutton wrote: > On 17/04/11 21:07, Hugh Glaser wrote: >> >> As a consumer I would like to be able to distinguish a refusal to answer >> from a failure of the web server to access the store, for example. > > In the general case, that was my concern, too. AFAICT from the spec, you > aren't precluded from returning e.g. 504 if the store has disappeared. > > I've always (perhaps wrongly) equated a 500 with the web server encountering > some exceptional and *unexpected* condition¹; specifically, an uncaught > exception in the web application. As such I've always taken a 500 to be > indicative of a bug which should be fixed to fail more gracefully, perhaps > with a more appropriate code from the 4xx/5xx range². > > As a web developer I always try to 'fix' situations where my code returns a > 500. As a consumer I will take a 500 to be an application error and attempt > to inform the webmaster of the inferred 'bug'. > > I can think of the following situations where a SPARQL endpoint might not > return a result: > > * Syntax error (400) > * Accept range mismatch (406) > * Query rejected off-hand as too resource-intensive (403?) > * Store unreachable (504?) > * Server overloaded (503?) > * Query timed out (504?, 403?) +1 to using the full range of HTTP status codes. Personally I don't really see it as see it as revisionist or retro-fitting to use HTTP status codes to indicate these application level semantics. There's a good range of status codes available and they're reasonably well defined for these broad scenarios, IMO. Especially so when you use additional headers, e.g. Retry-After (as David Booth noted) to communicate additional information at the protocol level. This is mainly about good web application engineering that anything to do with SPARQL protocol per se. However it may be useful to define a standard response format and potentially error messages to help client apps/users distinguish between more fine-grained error states. I suggested this during discussion of the original protocol specification but the WG decided it wasn't warranted initially [1]. Based on this discussion I'm not sure implementation experience has moved on enough, or converged enough to feed this back as part of SPARQL 1.1. Doesn't stop the community agreeing on some conventions/best practices though. Cheers, L. [1]. http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2006Jan/0106.html -- Leigh Dodds Programme Manager, Talis Platform Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: Minting URIs: how to deal with unknown data structures
Hi, On 15 April 2011 13:48, Frans Knibbe wrote: > I have acquired the first part (authority) of my URIs, let's say it is > lod.mycompany.com. Now I am faced with the question: How do I come up with a > URI scheme that will stand the test of time? You might be interested in the Identifier Patterns documented here: http://patterns.dataincubator.org/book/identifier-patterns.html There's also the "Designing URI Sets for the Public Sector" document, which provides the guidance for creating URIs for UK government data: http://www.cabinetoffice.gov.uk/resource-library/designing-uri-sets-uk-public-sector Cheers, L. -- Leigh Dodds Programme Manager, Talis Platform Mobile: 07850 928381 http://kasabi.com http://talis.com Talis Systems Ltd 43 Temple Row Birmingham B2 5LS
Re: Possible Idea For a Sem Web Based Game?
Hi, On 20 November 2010 17:28, Melvin Carvalho wrote: > I was thinking about creating a simple game based on semantic web > technologies and linked data. > > Some on this list may be too young to remember this, but there used to > be game books where you would choose your own adventure. > > http://en.wikipedia.org/wiki/Choose_Your_Own_Adventure Yes, I've thought this would make a really nice showcase too. Liam Quinn built a nice little demo [1] of something like this. I was also looking at the Inform interactive fiction engine [1] (again!) recently. The basic engine is basically a set of core rules about a game world operates. The core rules can be extended and ability for user to interact with the world can be inferred from those rules. E.g. whether you can climb onto or inside something. Struck me that it'd be possible to (re-)build a lot of that using RDF, OWL, RIF. Cheers, L. [1]. http://dirk.holoweb.net/~liam/rdfg/rdfg.cgi [2]. http://www.inform-fiction.org/ -- Leigh Dodds Programme Manager, Talis Platform Talis leigh.do...@talis.com http://www.talis.com
Re: Google Refine 2.0
Hi Kingsley: I recommend you take some time to work with Refine, watch the demos, and perhaps read the paper that Richard et al published on how they have used and extended Refine (or Gridworks as it was) But to answer you question: On 12 November 2010 13:23, Kingsley Idehen wrote: > How does the DERI effort differ from yours, if at all? They have produced a plugin that complements the ability to map a table structure to a Freebase schema and graph, by providing the same functionality for RDF. So a simple way to define how RDF should be generated from data in a Refine project, using either existing or custom schemas. The end result can then be exported using various serialisations. My extension simply extends that further by providing the ability to POST the data to a Talis Platform store. It'd be trivial to tweak that code to support POSTing to another resource, or wrapping the data into a SPARUL insert Ideally it'd be nice to roll the core of this into the DERI extension for wider use. Cheers, L. -- Leigh Dodds Programme Manager, Talis Platform Talis leigh.do...@talis.com http://www.talis.com
Re: Google Refine 2.0
Hi David, Congratulations on getting the 2.0 release out. I'm looking forward to working with it some more. Kingsley asked about extensions. You've already mentioned the work done at DERI, and I've previously pointed at the reconciliation API I built over the Talis Platform [1]. I used Refines' excellent plugin architecture to create a simple upload tool for loading Talis Platform stores. This hooks into both core Gridworks and the DERI RDF extension to support POSTing of the RDF to a service. Code is just a proof of concept [2] but I have a more refined version that I parked briefly whilst awaiting the 2.0 release. I think this nicely demonstrates how open Refine is as tool. Cheers, L. [1]. http://www.ldodds.com/blog/2010/08/gridworks-reconciliation-api-implementation/ [2]. https://github.com/ldodds/gridworks-talisplatform -- Leigh Dodds Programme Manager, Talis Platform Talis leigh.do...@talis.com http://www.talis.com
Re: RDB to RDF & ontology terms reuse
Hi Christian, On Friday, November 5, 2010, Christian Rivas wrote: > foaf:firstName => Domain: foaf:Person Range: Literal > foaf:familyName => Domain: foaf:Person Range: Literal > foaf:phone => Domain: NONE Range => NONE > vcard:email => Domain: vcard:VCard Range => NONE Personally I would use all foaf terms, foaf:mbox can be used to capture an email as a mailto: URI. Cheers, L. -- Leigh Dodds Programme Manager, Talis Platform Talis leigh.do...@talis.com http://www.talis.com
Re: Is 303 really necessary - demo
Hi, On 5 November 2010 12:37, Nathan wrote: > Wrong question, correct question is "if I 200 OK will people think this > is a document", to which the answer is yes. You're toucan is a :Document. You keep reiterating this, but I'm still not clear on what you're saying. 1. It seems like you're saying that a status code licenses someone to infer an rdf:type for a resource (in what vocab I'm not sure, but it looks like you're saying that). Someone is obviously entitled to do that. Not sure I can think of a use case, do you have one? 2. It also seems like you're suggesting someone is actually doing that. Or maybe that it's you're expecting someone will start doing it? 3. It also seems like you're suggesting that if someone does do that, then it breaks the (semantic) web for the rest of us. Which it won't, unless you blithely trust all data everywhere or don't care to check your facts Cheers, L. -- Leigh Dodds Programme Manager, Talis Platform Talis leigh.do...@talis.com http://www.talis.com
Re: Is 303 really necessary - demo
Hi, On 5 November 2010 13:57, Giovanni Tummarello wrote: > I might be wrong but I dont like it much . Sindice would index it as 2 > documents. > > http://iandavis.com/2010/303/toucan > http://iandavis.com/2010/303/toucan.rdf Even though one returns a Content-Location? Cheers, L. -- Leigh Dodds Programme Manager, Talis Platform Talis leigh.do...@talis.com http://www.talis.com
Re: isDefinedBy and isDescribedBy, Tale of two missing predicates
Hi, On 5 November 2010 12:43, Nathan wrote: > Dave Reynolds wrote: >> >> Clearly simply using # URIs solves this but people can be surprisingly >> reluctant to go that route. > > Why? I still don't understand the reluctance, any info on the technical > non-made-up-pedantic reasons would be great. Dave provided a pointer to TimBL's discussion which had some comments, there's also some brief discussion of the technical issues in the Cool URIs paper, see [1] [1]. http://www.w3.org/TR/cooluris/#choosing Cheers, L. -- Leigh Dodds Programme Manager, Talis Platform Talis leigh.do...@talis.com http://www.talis.com
Re: isDefinedBy and isDescribedBy, Tale of two missing predicates
Hi Dave On 5 November 2010 12:35, Dave Reynolds wrote: > Yes but I don't think the proposal was to ban use of 303 but to add an > alternative solution, a "third way" :) > > I have some sympathy with this. The situation I've faced several times > of late is roughly this: > > ... [snip] Really nice summary Dave. Cheers, L. -- Leigh Dodds Programme Manager, Talis Platform Talis leigh.do...@talis.com http://www.talis.com
Re: What would break, a question for implementors? (was Re: Is 303 really necessary?)
Hi Robert, Thanks for the response, good to hear from an implementor. On 5 November 2010 10:41, Robert Fuller wrote: > ... > However... with regard to publishing ontologies, we could expect > additional overhead if same content is delivered on retrieving different > Resources for example http://example.com/schema/latitude and > http://example.com/schema/longitude . In such a case ETag could be used > to suggest the contents are identical, but not sure that is a practical > solution. I expect that without 303 it will be more difficult in > particular to publish and process ontologies. This is useful to know thanks. I don't think the ETag approach works as it's intended to version a specific resource, not be carried across resources. One way to avoid the overhead is to strongly recommend # URIs for vocabularies. This seems to be increasingly the norm. It also makes them easier to work with (you often want the whole document) L. -- Leigh Dodds Programme Manager, Talis Platform Talis leigh.do...@talis.com http://www.talis.com
Inferring data from network interactions (was Re: Is 303 really necessary?)
Hi, On 5 November 2010 09:54, William Waites wrote: > On Fri, Nov 05, 2010 at 09:34:43AM +0000, Leigh Dodds wrote: >> >> Are you suggesting that Linked Data crawlers could/should look at the >> status code and use that to infer new statements about the resources >> returned? If so, I think that's the first time I've seen that >> mentioned, and am curious as to why someone would do it. Surely all of >> the useful information is in the data itself. > > Provenance and debugging. It would be quite possible to > record the fact that this set of triples, G, were obtained > by dereferencing this uri N, at a certain time, from a > certain place, with a request that looked like this and a > response that had these headers and response code. The > class of information that is kept for [0]. If N appeared > in G, that could lead directly to inferences involving the > provenance information. If later reasoning is concerned at > all with the trustworthiness or up-to-dateness of the > data it could look at this as well. Yes, I've done something similar to that in the past when I added support for the ScutterVocab [1] to my crawler It was the suggestion that inferring information directly from 200/303 that I was most curious about. I've argued for inferring data from 301 in the past [2], but wasn't sure of merit of introducing data based on the other interactions > Keeping this quantity of information around might quickly > turn out to be too data-intensive to be practical, but > that's more of an engineering question. I think it does > make some sense to do this in principle at least. That's what I found when crawling the BBC pages. Huge amounts of data and overhead in storing it. Capturing just enough to gather statistics on the crawl was sufficient. Cheers, L. [1]. http://wiki.foaf-project.org/w/ScutterVocab [2]. http://www.ldodds.com/blog/2007/03/the-semantics-of-301-moved-permanently/ -- Leigh Dodds Programme Manager, Talis Platform Talis leigh.do...@talis.com http://www.talis.com
What would break, a question for implementors? (was Re: Is 303 really necessary?)
Hi Michael, On 5 November 2010 09:29, Michael Hausenblas wrote: > It occurs to me that one of the main features of the Linked Data community > is that we *do* things rather than having endless conversations what would > be the best for the world out there. Heck, this is how the whole thing > started. A couple of people defining a set of good practices and providing > data following these practices and tools for it. > > Concluding. If you are serious about this, please go ahead. You have a very > popular and powerful platform at your hand. Implement it there (and in your > libraries, such as Moriarty), document it, and others may/will follow. Yes, actually doing things does help more than talking. I sometimes wonder whether as a community we're doing all the right things, but that's another discussion ;) Your suggestion about forging ahead is a good one, but it also reminds me of Ian's original question: what would break if we used this pattern? So here's a couple of questions for those of you on the list who have implemented Linked Data tools, applications, services, etc: * Do you rely on or require HTTP 303 redirects in your application? Or does your app just follow the redirect? * Would your application tool/service/etc break or generic inaccurate data if Ian's pattern was used to publish Linked Data. Cheers, L. -- Leigh Dodds Programme Manager, Talis Platform Talis leigh.do...@talis.com http://www.talis.com
Re: Is 303 really necessary?
Hi, On 4 November 2010 17:51, Nathan wrote: > But, for whatever reasons, we've made our choices, each has pro's and > cons, and we have to live with them - different things have different > name, and the giant global graph is usable. Please, keep it that way. I think it's useful to continually assess the state of the art to see whether we're on track. My experience, which seems to be confirmed by comments from other people on this thread, is that we're seeing push back from the wider web community -- who have already published way more data that we have -- on the technical approach we've been advocating, so looking for a middle ground seems useful. Different things do have different names, but conflating IR/NIR is not part of Ian's proposal which addresses the publishing mechanism only. Cheers, L. -- Leigh Dodds Programme Manager, Talis Platform Talis leigh.do...@talis.com http://www.talis.com
Re: Is 303 really necessary?
Hi David, On 4 November 2010 19:57, David Wood wrote: > Some small number of people and organizations need to provide back-links on > the Web since the Web doesn't have them. > 303s provide a generic mechanism for that to occur. URL curation is a useful > and proper activity on the Web, again in my opinion. I agree that URL curation is a useful and proper activity on the Web. I'm not clear on your core concern though. It looks like you're asserting that HTTP 303 status codes, in general, are useful and should not be deprecated. Totally agree there. But Ian's proposal is about using 303 as a necessary part of publishing Linked Data. That seems distinct from how services like PURLs and DOIs operate, and from the value they provide. But perhaps I'm misunderstanding? Cheers, L. -- Leigh Dodds Programme Manager, Talis Platform Talis leigh.do...@talis.com http://www.talis.com