Re: New LOD ESW wikipage about Data Licensing

2010-09-28 Thread Vasiliy Faronov
Hi Marc,

 I am not aware of any google linked data services, but you can be sure 
 the google lawyers will not allow any usage of the data set you 
 describe.

They process rich snippets[1] and display them in the search results.
This includes RDFa.


 Your developer can consider himself lucky if he is only fired 
 and not facing more serious problems after it is discovered that he puts 
 unauthorized corporate data on a public web server.

Not sure which unauthorized corporate data you are referring to.

I had the following scenario in mind: Acme Inc. have a nice HTML product
listing, and task their developers with adding GoodRelations RDFa to
that listing, to make it machine-readable. All they do is insert a bunch
of attributes into the pages. The publicity level of data remains the
same, the format of delivery changes.


 The license is important for the consumer of the data, so that their
 corporate lawyers will allow to use the data.

Publishing and consuming data cannot be viewed separately. If data is
published, it's for someone else to grab it and do useful things with
it. If nobody can do any useful things with data unless it has an
explicit license attached to it, then there is no point in publishing
data without such a license, thus licensing becomes a must have.


[1] http://www.google.com/support/webmasters/bin/answer.py?answer=99170

-- 
Vasiliy Faronov




Re: New LOD ESW wikipage about Data Licensing

2010-09-27 Thread Vasiliy Faronov
Hi Marc,

 It is the right of the data provider to determine what the data may be
 used for.

To an extent allowed by the copyright / database right law.

 I would estimate for most datasets on the lod diagram you would not be
 allowed to do what you describe. Most have some 'by' restriction so
 you would at least have to give credit to the providers of the dataset
 if you want to use it. Luckily this is a restriction pretty easy to
 comply with.

If your estimate is correct, then basically Google would be illegal,
I figure. They gather data from the Web (even store it, which my
hypothetical app doesn't), process it, and display it to the user in
some form. They don't give any special credit, save for the links
themselves.

RSS aggregators would be illegal, for the same reasons.

I believe this issue needs clarification. Licensing is a serious issue,
but I don't think we can make it a must have for proper LD serving.
Imagine a normal web developer cautiously trying to add a bit of RDFa
to their company's web site. Now we come along and tell them that they
must also indicate a license. But they don't have a special license for
their web content, as they have never needed it. Their reaction? They
just abandon LD altogether.

-- 
Vasiliy Faronov




Re: New LOD ESW wikipage about Data Licensing

2010-09-27 Thread Kingsley Idehen

 On 9/27/10 4:37 PM, Vasiliy Faronov wrote:

Hi Marc,


It is the right of the data provider to determine what the data may be
used for.

To an extent allowed by the copyright / database right law.


I would estimate for most datasets on the lod diagram you would not be
allowed to do what you describe. Most have some 'by' restriction so
you would at least have to give credit to the providers of the dataset
if you want to use it. Luckily this is a restriction pretty easy to
comply with.

If your estimate is correct, then basically Google would be illegal,
I figure. They gather data from the Web (even store it, which my
hypothetical app doesn't), process it, and display it to the user in
some form. They don't give any special credit, save for the links
themselves.



Yep! And in these links lies attribution. I think Google is a 
fantastic example of a strangely overlooked aspect of the LINK in Linked 
Data.


URI abstraction provides powerful branding, imprint, and attribution all 
in one. Original source URIs keep data providers in the value chain 
forever. This is why literal attribution is no good, its always why 
protection should focus primarily on preserving data source imprints.


Example (I've given this in the past). The following forms of 
attribution aren't equivalent:


1. A page that provides information about London with the Powered by 
DBpedia displayed on a Web Page
2. http://dbpedia.org/resource/London -- Entity Name associated with a 
rich Structured Linked Data Source that provides access to data about 
London .

RSS aggregators would be illegal, for the same reasons.


Yep!

I believe this issue needs clarification. Licensing is a serious issue,
but I don't think we can make it a must have for proper LD serving.

Correct!

Google has indexed and served information on the Web forever, ditto all 
the other search engines. The Web Resources in question happen to be 
HTML (semantically challenged at the data end but strong on the display 
and presentation side of things). In the case of the Blogosphere, we are 
looking at the same thing with XML based Web Resources (semantically 
challenged re. data aspect, but strong re. semantics for content 
structure that enabled separation from formatting etc).



Imagine a normal web developer cautiously trying to add a bit of RDFa
to their company's web site. Now we come along and tell them that they
must also indicate a license. But they don't have a special license for
their web content, as they have never needed it. Their reaction? They
just abandon LD altogether.


More important question, why did the organization in question publish 
content to the Web? If it wasn't to be accessed then why bother?


The real issue publishers have is this: taking their content and 
re-purposing it under new URIs without any reference to the actual 
origins of the content (and the data it carries).


If Google, Yahoo!, Bing!, and all the others can aggregate and provide 
search services that cough up Web Resource URIs, the same applies fine 
re. Linked Data.


The most important point (as I see it) is this: don't import data from a 
source, and then rebrand as yours by changing the URIs. That's simply 
wrong! Always refer back to your sources. Basic practice that's been 
established in the real world for a very long time.


--

Regards,

Kingsley Idehen 
President  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen








New LOD ESW wikipage about Data Licensing

2010-09-19 Thread Chris Bizer
Hi all,

as the Web of Linked Data is moving towards more serious applications,
putting published data under a proper license is becoming more and more
important. If no license is specified, people cannot use published data
within any serious applications.

Thus, I have stated a new LOD ESW wiki page that collects information about 

1. existing data licenses 
2. best practices on how to annotate Linked data with licensing
meta-information

The page is found at 

http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/DataLicensing
#Data_Licensing

I have added the things I know about. But I'm sure there are many other
important resources about the topic.

So if you know about any, please add them to the wiki so that other people
can find the relevant pointers.

Thank you very much in advance.

Best,

Chris





Re: New LOD ESW wikipage about Data Licensing

2010-09-19 Thread William Waites
An idle thought. Suppose I take two datasets, licensed
differently, and combine them. Maybe I do something
clever to capture provenance information in how they
are combined (a combination of opmv and evopat comes
to mind). If the licenses are defined at a suitable
granularity (is the cc vocabulary enough?) I can then
derive the the resulting terms by doing something like
the intersection of rights granted in the source
licenses.

So (copyleft \cap public domain) = copyleft, etc.

I wonder about constructing inference rules for this...

If the combination is done in a way that is reversible,
simply selecting some triples from different sources,
for example, rather than putting provenance and
license information on graphs [0], putting it on
individual triples might be nice. But then we need
some token for a triple, ideally in a global way where
if the same triple occurs independently in two places,
two people making tokens for it will end up with the
same token...

Hrmmm... As I said, idle thoughts...

Cheers,
-w

[0] I'm not sure graph isn't a misnomer, or at least
loose language. An RDF graph is a set, I think, and
you can make a standard graph relative to a predicate
by taking vertices from subject and object and
edges from IEXT(predicate). Is this spliting hairs?

-- 
William Waites   w...@styx.org
Mob: +44 789 798 9965
Fax: +44 131 464 4948
CD70 0498 8AE4 36EA 1CD7  281C 427A 3F36 2130 E9F5



signature.asc
Description: OpenPGP digital signature


Re: New LOD ESW wikipage about Data Licensing

2010-09-19 Thread Vasiliy Faronov
Hi Chris,


 If no license is specified, people cannot use published data within any
 serious applications.

Even if my application only processes this data, never stores it, never
redistributes it?

I mean, can't I write an LD client that would go over a given website
and (through DC in RDFa) compile a list of pages made by a given person?

Or one that would compare several GoodRelations-annotated product pages
to select an offering that better matches a user's criteria?

-- 
Vasiliy Faronov