Re: New LOD ESW wikipage about Data Licensing
Hi Marc, I am not aware of any google linked data services, but you can be sure the google lawyers will not allow any usage of the data set you describe. They process rich snippets[1] and display them in the search results. This includes RDFa. Your developer can consider himself lucky if he is only fired and not facing more serious problems after it is discovered that he puts unauthorized corporate data on a public web server. Not sure which unauthorized corporate data you are referring to. I had the following scenario in mind: Acme Inc. have a nice HTML product listing, and task their developers with adding GoodRelations RDFa to that listing, to make it machine-readable. All they do is insert a bunch of attributes into the pages. The publicity level of data remains the same, the format of delivery changes. The license is important for the consumer of the data, so that their corporate lawyers will allow to use the data. Publishing and consuming data cannot be viewed separately. If data is published, it's for someone else to grab it and do useful things with it. If nobody can do any useful things with data unless it has an explicit license attached to it, then there is no point in publishing data without such a license, thus licensing becomes a must have. [1] http://www.google.com/support/webmasters/bin/answer.py?answer=99170 -- Vasiliy Faronov
Re: New LOD ESW wikipage about Data Licensing
Hi Marc, It is the right of the data provider to determine what the data may be used for. To an extent allowed by the copyright / database right law. I would estimate for most datasets on the lod diagram you would not be allowed to do what you describe. Most have some 'by' restriction so you would at least have to give credit to the providers of the dataset if you want to use it. Luckily this is a restriction pretty easy to comply with. If your estimate is correct, then basically Google would be illegal, I figure. They gather data from the Web (even store it, which my hypothetical app doesn't), process it, and display it to the user in some form. They don't give any special credit, save for the links themselves. RSS aggregators would be illegal, for the same reasons. I believe this issue needs clarification. Licensing is a serious issue, but I don't think we can make it a must have for proper LD serving. Imagine a normal web developer cautiously trying to add a bit of RDFa to their company's web site. Now we come along and tell them that they must also indicate a license. But they don't have a special license for their web content, as they have never needed it. Their reaction? They just abandon LD altogether. -- Vasiliy Faronov
Re: New LOD ESW wikipage about Data Licensing
On 9/27/10 4:37 PM, Vasiliy Faronov wrote: Hi Marc, It is the right of the data provider to determine what the data may be used for. To an extent allowed by the copyright / database right law. I would estimate for most datasets on the lod diagram you would not be allowed to do what you describe. Most have some 'by' restriction so you would at least have to give credit to the providers of the dataset if you want to use it. Luckily this is a restriction pretty easy to comply with. If your estimate is correct, then basically Google would be illegal, I figure. They gather data from the Web (even store it, which my hypothetical app doesn't), process it, and display it to the user in some form. They don't give any special credit, save for the links themselves. Yep! And in these links lies attribution. I think Google is a fantastic example of a strangely overlooked aspect of the LINK in Linked Data. URI abstraction provides powerful branding, imprint, and attribution all in one. Original source URIs keep data providers in the value chain forever. This is why literal attribution is no good, its always why protection should focus primarily on preserving data source imprints. Example (I've given this in the past). The following forms of attribution aren't equivalent: 1. A page that provides information about London with the Powered by DBpedia displayed on a Web Page 2. http://dbpedia.org/resource/London -- Entity Name associated with a rich Structured Linked Data Source that provides access to data about London . RSS aggregators would be illegal, for the same reasons. Yep! I believe this issue needs clarification. Licensing is a serious issue, but I don't think we can make it a must have for proper LD serving. Correct! Google has indexed and served information on the Web forever, ditto all the other search engines. The Web Resources in question happen to be HTML (semantically challenged at the data end but strong on the display and presentation side of things). In the case of the Blogosphere, we are looking at the same thing with XML based Web Resources (semantically challenged re. data aspect, but strong re. semantics for content structure that enabled separation from formatting etc). Imagine a normal web developer cautiously trying to add a bit of RDFa to their company's web site. Now we come along and tell them that they must also indicate a license. But they don't have a special license for their web content, as they have never needed it. Their reaction? They just abandon LD altogether. More important question, why did the organization in question publish content to the Web? If it wasn't to be accessed then why bother? The real issue publishers have is this: taking their content and re-purposing it under new URIs without any reference to the actual origins of the content (and the data it carries). If Google, Yahoo!, Bing!, and all the others can aggregate and provide search services that cough up Web Resource URIs, the same applies fine re. Linked Data. The most important point (as I see it) is this: don't import data from a source, and then rebrand as yours by changing the URIs. That's simply wrong! Always refer back to your sources. Basic practice that's been established in the real world for a very long time. -- Regards, Kingsley Idehen President CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
New LOD ESW wikipage about Data Licensing
Hi all, as the Web of Linked Data is moving towards more serious applications, putting published data under a proper license is becoming more and more important. If no license is specified, people cannot use published data within any serious applications. Thus, I have stated a new LOD ESW wiki page that collects information about 1. existing data licenses 2. best practices on how to annotate Linked data with licensing meta-information The page is found at http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/DataLicensing #Data_Licensing I have added the things I know about. But I'm sure there are many other important resources about the topic. So if you know about any, please add them to the wiki so that other people can find the relevant pointers. Thank you very much in advance. Best, Chris
Re: New LOD ESW wikipage about Data Licensing
An idle thought. Suppose I take two datasets, licensed differently, and combine them. Maybe I do something clever to capture provenance information in how they are combined (a combination of opmv and evopat comes to mind). If the licenses are defined at a suitable granularity (is the cc vocabulary enough?) I can then derive the the resulting terms by doing something like the intersection of rights granted in the source licenses. So (copyleft \cap public domain) = copyleft, etc. I wonder about constructing inference rules for this... If the combination is done in a way that is reversible, simply selecting some triples from different sources, for example, rather than putting provenance and license information on graphs [0], putting it on individual triples might be nice. But then we need some token for a triple, ideally in a global way where if the same triple occurs independently in two places, two people making tokens for it will end up with the same token... Hrmmm... As I said, idle thoughts... Cheers, -w [0] I'm not sure graph isn't a misnomer, or at least loose language. An RDF graph is a set, I think, and you can make a standard graph relative to a predicate by taking vertices from subject and object and edges from IEXT(predicate). Is this spliting hairs? -- William Waites w...@styx.org Mob: +44 789 798 9965 Fax: +44 131 464 4948 CD70 0498 8AE4 36EA 1CD7 281C 427A 3F36 2130 E9F5 signature.asc Description: OpenPGP digital signature
Re: New LOD ESW wikipage about Data Licensing
Hi Chris, If no license is specified, people cannot use published data within any serious applications. Even if my application only processes this data, never stores it, never redistributes it? I mean, can't I write an LD client that would go over a given website and (through DC in RDFa) compile a list of pages made by a given person? Or one that would compare several GoodRelations-annotated product pages to select an offering that better matches a user's criteria? -- Vasiliy Faronov