subject:"LOD Data Sets, Licensing, and AWS"


Ian Davis wrote:
On Wed, Jun 24, 2009 at 9:56 PM, Kingsley Idehen 
mailto:kide...@openlinksw.com>> wrote:


The NYT, London Times, and others of this ilk, are more likely to
contribute their quality data to the LOD cloud if they know there
is a vehicle (e.g., a license scheme) that ensures their HTTP URIs
are protected i.e., always accessible to user agents at the data
representation (HTML, XML, N3, RDF/XML, Turtle etc..) level;
thereby ensuring citation and attribution requirements are honored.


I agree with that, but it only covers a small portion of what is 
needed. You fail to consider the situations where people publish data 
about other people's URIs, as reviews or annotation.

I am not, far from it.
The foaf:primaryTopic mechanism isn't strong enough if the publisher 
requires full attribution for use of their data. If I use SPARQL to 
extract a subset of reviews to display on my site then in all 
likelihood I have lost that linkage with the publishing document.
Only if you choose to construct your result document using literal 
values i.e., a SPARQL solution that has URIs filtered out;  anyway, if 
thats what you end up doing, then you do have  and @rel at your 
disposal for identifying your data sources, worst case.




 


Attribution is the kind of thing one gives as the result of a
license requirement in exchange for permission to copy. In the
academic world for journal articles this doesn't come into play at
all, since there is no copying (in the usual case). Instead people
cite articles because the norms of their community demand it.

Yes, and the HTTP URI ultimately delivers the kind mechanism I
believe most traditional media companies seek (as stated above).
They ultimately want people to use their data with low cost
citation and attribution intrinsic to the medium of value exchange.


The BBC is a traditional media company. Its data is licensed only for 
personal, non-commercial use: http://www.bbc.co.uk/terms/#3
I used New York Times and London Times for specific reasons, their 
business models are different from that of the BBC; they are traditional 
*commercial* media companies.
 


btw - how are you dealing with this matter re. the
nuerocommons.org  linked data space? How
do you ensure your valuable work is fully credited as it bubbles
up the value chain?


I found this linked from the RDF Distribution page on neurocommons.org 
 : 
http://svn.neurocommons.org/svn/trunk/product/bundles/frontend/nsparql/NOTICES.txt


Everyone should read it right now to appreciate the complexity of 
aggregating data from many sources when they all have idiosyncratic 
requirements of attribution.


Then read 
http://sciencecommons.org/projects/publishing/open-access-data-protocol/ 
to see how we should be approaching the licensing of data. It explains 
in detail the motivations for things like CC-0 and PDDL which seek to 
promote open access for all by removing restrictions:


"Thus, to facilitate data integration and open access data sharing, 
any implementation of this protocol MUST waive all rights necessary 
for data extraction and re-use (including copyright, sui generis 
database rights, claims of unfair competition, implied contracts, and 
other legal rights), and MUST NOT apply any obligations on the user of 
the data or database such as “copyleft” or “share alike”, or even the 
legal requirement to provide attribution. Any implementation SHOULD 
define a non-legally binding set of citation norms in clear, 
lay-readable language."


Science Commons have spent a lot of time and resources to come to this 
conclusion, and they tried all kinds of alternatives such as 
attribution and share alike licences (as did Talis). The final 
consensus was that the public domain was the only mechanism that could 
scale for the future. Without this kind of approach, aggregating, 
querying and reusing the web of data will become impossibly complex. 
This is a key motivation for Talis starting the Connected Commons 
programme ( http://www.talis.com/platform/cc/ ). We want to see more 
data that is unambiguously reusable because it has been placed in the 
public domain using CC-0 or the Open Data Commons PDDL.


So, I urge everyone publishing data onto the linked data web to 
consider waiving all rights over it using one of the licenses above.
I don't think "waiving all rights" is a practical option for the likes 
of New York Times or Times of London, ditto traditional commercial media 
companies.
As Kingsley points out, you will always be attributed via the URIs you 
mint.

This part I totally agree with :-)



Ian

PS. This was the subject of my keynote at code4lib 2009 "If you love 
something, set it free", which you can view here 
http://www.slideshare.net/iandavis/code4lib2009-keynote-1073812



The thing about "Free" is that we'll always end up having to 
disambiguate: "Free Speech" an

Re: Contd LOD Data Sets, Licensing, and AWS

2009-06-24 Thread Danny Ayers

2009/6/25 Ian Davis :

> I think the onus is on the consumer to ensure they abide with the supplier's
> wishes, not the other way round. It's really a matter or respect and
> politeness to give people the credit they ask for.

Certainly in principle, but the supplier should know what they are
doing. It would be their loss after all.



-- 
http://danny.ayers.name

Re: Contd LOD Data Sets, Licensing, and AWS


Ian Davis wrote:
On Wed, Jun 24, 2009 at 7:40 PM, Kingsley Idehen 
mailto:kide...@openlinksw.com>> wrote:



I stand by my position, we are adhering to their terms.
What they seek is de-referencable via their URIs which remain in
scope at both the data presentation and representation layers.

I am sure Jamie and the folks at Freebase are party to this
conversation and would chime in should we be violating the terms
of their license etc..


I think the onus is on the consumer to ensure they abide with the 
supplier's wishes, not the other way round. It's really a matter or 
respect and politeness to give people the credit they ask for.
Sadly, there lies the root of most problems re. present and prior 
economies past :-) We end up doing the wrong thing for a myriad of 
reasons and the net result is a completely broken value chain.


I believe you can define terms of data use and enforce them at minimum 
cost, courtesy of HTTP URIs.


We've done it with software (eons ago re. our data access drivers) and 
it will also work fine for Linked Data, and on this statement I am ready 
to stake anything :-)


re: specific ODC license. I think the ODBL license does what you want.
Or PDDL with specified community norms.
 


ODBL license URI please.


http://www.opendatacommons.org/licenses/odbl/

I'll take a look.

Kingsley


Ian






--


Regards,

Kingsley Idehen   Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software Web: http://www.openlinksw.com

The Public Domain (was Re: LOD Data Sets, Licensing, and AWS)

2009-06-24 Thread Ian Davis

On Wed, Jun 24, 2009 at 9:56 PM, Kingsley Idehen wrote:

> The NYT, London Times, and others of this ilk, are more likely to
> contribute their quality data to the LOD cloud if they know there is a
> vehicle (e.g., a license scheme) that ensures their HTTP URIs are protected
> i.e., always accessible to user agents at the data representation (HTML,
> XML, N3, RDF/XML, Turtle etc..) level; thereby ensuring citation and
> attribution requirements are honored.

I agree with that, but it only covers a small portion of what is needed. You
fail to consider the situations where people publish data about other
people's URIs, as reviews or annotation. The foaf:primaryTopic mechanism
isn't strong enough if the publisher requires full attribution for use of
their data. If I use SPARQL to extract a subset of reviews to display on my
site then in all likelihood I have lost that linkage with the publishing
document.

> Attribution is the kind of thing one gives as the result of a license
> requirement in exchange for permission to copy. In the academic world for
> journal articles this doesn't come into play at all, since there is no
> copying (in the usual case). Instead people cite articles because the norms
> of their community demand it.
>
Yes, and the HTTP URI ultimately delivers the kind mechanism I believe most
> traditional media companies seek (as stated above). They ultimately want
> people to use their data with low cost citation and attribution intrinsic to
> the medium of value exchange.
>

The BBC is a traditional media company. Its data is licensed only for
personal, non-commercial use: http://www.bbc.co.uk/terms/#3

> btw - how are you dealing with this matter re. the nuerocommons.org linked
> data space? How do you ensure your valuable work is fully credited as it
> bubbles up the value chain?
>

I found this linked from the RDF Distribution page on neurocommons.org :
http://svn.neurocommons.org/svn/trunk/product/bundles/frontend/nsparql/NOTICES.txt

Everyone should read it right now to appreciate the complexity of
aggregating data from many sources when they all have idiosyncratic
requirements of attribution.

Then read
http://sciencecommons.org/projects/publishing/open-access-data-protocol/ to
see how we should be approaching the licensing of data. It explains in
detail the motivations for things like CC-0 and PDDL which seek to promote
open access for all by removing restrictions:

"Thus, to facilitate data integration and open access data sharing, any
implementation of this protocol MUST waive all rights necessary for data
extraction and re-use (including copyright, sui generis database rights,
claims of unfair competition, implied contracts, and other legal rights),
and MUST NOT apply any obligations on the user of the data or database such
as “copyleft” or “share alike”, or even the legal requirement to provide
attribution. Any implementation SHOULD define a non-legally binding set of
citation norms in clear, lay-readable language."

Science Commons have spent a lot of time and resources to come to this
conclusion, and they tried all kinds of alternatives such as attribution and
share alike licences (as did Talis). The final consensus was that the public
domain was the only mechanism that could scale for the future. Without this
kind of approach, aggregating, querying and reusing the web of data will
become impossibly complex. This is a key motivation for Talis starting the
Connected Commons programme ( http://www.talis.com/platform/cc/ ). We want
to see more data that is unambiguously reusable because it has been placed
in the public domain using CC-0 or the Open Data Commons PDDL.

So, I urge everyone publishing data onto the linked data web to consider
waiving all rights over it using one of the licenses above. As Kingsley
points out, you will always be attributed via the URIs you mint.

Ian

PS. This was the subject of my keynote at code4lib 2009 "If you love
something, set it free", which you can view here
http://www.slideshare.net/iandavis/code4lib2009-keynote-1073812

Re: Contd LOD Data Sets, Licensing, and AWS

2009-06-24 Thread Ian Davis

On Wed, Jun 24, 2009 at 7:40 PM, Kingsley Idehen wrote:

>
> I stand by my position, we are adhering to their terms.
> What they seek is de-referencable via their URIs which remain in scope at
> both the data presentation and representation layers.
>
> I am sure Jamie and the folks at Freebase are party to this conversation
> and would chime in should we be violating the terms of their license etc..
>

I think the onus is on the consumer to ensure they abide with the supplier's
wishes, not the other way round. It's really a matter or respect and
politeness to give people the credit they ask for.

 re: specific ODC license. I think the ODBL license does what you want.
> Or PDDL with specified community norms.
>
>
ODBL license URI please.
>

http://www.opendatacommons.org/licenses/odbl/

Ian

Re: LOD Data Sets, Licensing, and AWS


Alan Ruttenberg wrote:

Kingsley,

Encouraging attribution by URI is a bad idea because it encourages 
people or organizations to create URIs where perfectly good ones 
exist, solely so that they can get their "attribution". Were this no 
cost, I wouldn't mind. But having more than one URI for a resource 
causes real trouble for data integration.
Let's try to look at this matter slightly differently, putting some the 
labels in this conversation to one side, for a second.


Scenario:

I am the New York Time or Times of London, I've decided to expose my 
treasure troves to the Web (highly quality data assembled since day one 
of our existence) in line with the guidelines intrinsic to the Linked 
Data meme. But, I am wary of the fact that anyone can some along to my 
newly unveiled Linked Data space, grab my data, and reconstitute in a 
new Linked Data Space on the Web without any reference back to me.


Incidentally, there is a legal difference between attribution and 
citation. Virtually all of academic credit is based on citation, not 
attribution.
Hence, my request to put the labels aside (above). It might be that what 
I am seeking via HTTP URIs is a Citation/Attribution hybrid (like 
Reference & Access duality inherent to HTTP URIs re. Linked Data meme) 
that acknowledges "data sources" via their originating URIs thereby 
bringing citation and attribution together coherently.


Ultimately owners of high quality databases have to realize the 
following re. their data and publication on the Web:


1. Separation of "value" from "medium of value exchange";
2. HTTP URIs are effective mediums of value exchange on the Web.


The NYT, London Times, and others of this ilk, are more likely to 
contribute their quality data to the LOD cloud if they know there is a 
vehicle (e.g., a license scheme) that ensures their HTTP URIs are 
protected i.e., always accessible to user agents at the data 
representation (HTML, XML, N3, RDF/XML, Turtle etc..) level; thereby 
ensuring citation and attribution requirements are honored.

Attribution is the kind of thing one gives as the result of a license 
requirement in exchange for permission to copy. In the academic world for 
journal articles this doesn't come into play at all, since there is no copying 
(in the usual case). Instead people cite articles because the norms of their 
community demand it.
Yes, and the HTTP URI ultimately delivers the kind mechanism I believe 
most traditional media companies seek (as stated above). They ultimately 
want people to use their data with low cost citation and attribution 
intrinsic to the medium of value exchange.


btw - how are you dealing with this matter re. the nuerocommons.org 
linked data space? How do you ensure your valuable work is fully 
credited as it bubbles up the value chain?






-Alan



--


Regards,

Kingsley Idehen   Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software Web: http://www.openlinksw.com

Re: LOD Data Sets, Licensing, and AWS

2009-06-24 Thread Alan Ruttenberg

Kingsley,
Encouraging attribution by URI is a bad idea because it encourages people or
organizations to create URIs where perfectly good ones exist, solely so that
they can get their "attribution". Were this no cost, I wouldn't mind. But
having more than one URI for a resource causes real trouble for data
integration.

Incidentally, there is a legal difference between attribution and citation.
Virtually all of academic credit is based on citation, not attribution.
Attribution is the kind of thing one gives as the result of a license
requirement in exchange for permission to copy. In the academic world
for journal articles this doesn't come into play at all, since there
is no copying (in the usual case). Instead people cite articles
because the norms of their community demand it.

-Alan

Re: Contd LOD Data Sets, Licensing, and AWS

Leigh Dodds wrote:

Hi,

2009/6/24 Kingsley Idehen :

Kingsley Idehen wrote:

Leigh Dodds wrote:

Hi,

2009/6/24 Kingsley Idehen :

To save time etc..

What is the URI of a license that effectively enables data publishers to
express and enforce how they are attributed? Whatever that is I am happy
with. Whatever that is will be vital to attracting curators of high
quality
data to the LOD fold.

If you have a an example URI even better.

You can chose from several at http://www.opendatacommons.org/

Take a look at Freebase, and how they are effectively doing what I
espouse.
Google uses Freebase URIs, and they attribute by URI.

I have. I've read the licensing, terms and policies of a number of
different websites.

I see Freebase using CC-BY-SA to effectively propagate their URIs. I
also
see all consumers of Freebase URIs honoring the terms without any
issues.

Really? I'm not trying to be unfair, but where on:

You're not being unfair. We are trying to get to the bottom of something
that really important.

http://lod.openlinksw.com/

http://lod.openlinksw.com/describe/?url=http%3A%2F%2Ffreebase.com%2Fguid%2F9202a8c04000641f883d84dd

The URIs are in full view. Use a Linked Data Aware user agent against the
URIs and you end up in the originating Freebase Data Space. This is my
fundamental point re. preservation of original URIs.

The fact that the URIs are in full view accords with your view as URI
as sole means of attribution, but its irrelevant as far as the
Freebase terms goes. Where's the text, logo, etc that they're asking
for? Thats how the rights holder is asking to be attributed.

I stand by my position, we are adhering to their terms.
What they seek is de-referencable via their URIs which remain in scope
at both the data presentation and representation layers.

I am sure Jamie and the folks at Freebase are party to this conversation
and would chime in should we be violating the terms of their license etc..

re: specific ODC license. I think the ODBL license does what you want.
Or PDDL with specified community norms.

ODBL license URI please.

Kingsley

Cheers,

Regards,

Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO
OpenLink Software Web: http://www.openlinksw.com

Re: Contd LOD Data Sets, Licensing, and AWS

Hi,

2009/6/24 Kingsley Idehen :
> Kingsley Idehen wrote:
>>
>> Leigh Dodds wrote:
>>>
>>> Hi,
>>>
>>> 2009/6/24 Kingsley Idehen :
>>>

 To save time etc..

 What is the URI of a license that effectively enables data publishers to
 express and enforce how they are attributed? Whatever that is I am happy
 with. Whatever that is will be vital to attracting curators of high
 quality
 data to the LOD fold.

 If you have a an example URI even better.

>>>
>>> You can chose from several at http://www.opendatacommons.org/
>>>
>>>

 Take a look at Freebase, and how they are effectively doing what I
 espouse.
 Google uses Freebase URIs, and they attribute by URI.

>>>
>>> I have. I've read the licensing, terms and policies of a number of
>>> different websites.
>>>
>>>

 I see Freebase using CC-BY-SA to effectively propagate their URIs. I
 also
 see all consumers of Freebase URIs honoring the terms without any
 issues.

>>>
>>> Really? I'm not trying to be unfair, but where on:
>>>
>>
>> You're not being unfair. We are trying to get to the bottom of something
>> that really important.
>>>
>>> http://lod.openlinksw.com/
>>>
>>> Or
>>>
>>>
>>> http://lod.openlinksw.com/describe/?url=http%3A%2F%2Ffreebase.com%2Fguid%2F9202a8c04000641f883d84dd
>>>
>>
>> The URIs are in full view. Use a Linked Data Aware user agent against the
>> URIs and you end up in the originating Freebase Data Space. This is my
>> fundamental point re. preservation of original URIs.

The fact that the URIs are in full view accords with your view as URI
as sole means of attribution, but its irrelevant as far as the
Freebase terms goes. Where's the text, logo, etc that they're asking
for? Thats how the rights holder is asking to be attributed.

re: specific ODC license. I think the ODBL license does what you want.
Or PDDL with specified community norms.

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com

Contd LOD Data Sets, Licensing, and AWS

Kingsley Idehen wrote:

Leigh Dodds wrote:

Hi,

2009/6/24 Kingsley Idehen :

To save time etc..

What is the URI of a license that effectively enables data
publishers to
express and enforce how they are attributed? Whatever that is I am
happy
with. Whatever that is will be vital to attracting curators of high
quality

data to the LOD fold.

If you have a an example URI even better.

You can chose from several at http://www.opendatacommons.org/

Take a look at Freebase, and how they are effectively doing what I
espouse.

Google uses Freebase URIs, and they attribute by URI.

I have. I've read the licensing, terms and policies of a number of
different websites.

I see Freebase using CC-BY-SA to effectively propagate their URIs. I
also
see all consumers of Freebase URIs honoring the terms without any
issues.

Really? I'm not trying to be unfair, but where on:

You're not being unfair. We are trying to get to the bottom of
something that really important.

http://lod.openlinksw.com/

http://lod.openlinksw.com/describe/?url=http%3A%2F%2Ffreebase.com%2Fguid%2F9202a8c04000641f883d84dd

The URIs are in full view. Use a Linked Data Aware user agent against
the URIs and you end up in the originating Freebase Data Space. This
is my fundamental point re. preservation of original URIs.

Are the attributions required in:

http://www.freebase.com/signin/licensing

A link to the topic page isn't enough based on the terms that Freebase
currently publish. I'm not saying I agree with them, as clearly they
don't scale well in the large. They've also not defined an attribution
policy for data linking. And if they don't care enough about the
attribution to follow-up, then why not publish it under a
non-attribution license in the first place?

The source of the data in our data space is crystal clear by virtue of
URI visibility. Just look at the Entity URI associated with the text
following "About:" (the @href value). We are saying: this page is
about a resource in the Freebase Data Space and we use the Freebase
URI in @href. In short, just click on it and see what happens :-)

You can use ODE, DISCO, Zitgist Data Viewer, or Tabulator etc. to
explore the URIs, and you will end up in the originating data space of
a given entity.

Now for the problems re. our pages:

Our embedded RDFa produces:

http://lod.openlinksw.com/fct/rdfdesc/";>
rdf:resource="http://lod.openlinksw.com/sparql?query=DESCRIBE%20%3Chttp%3A%2F%2Ffreebase.com%2Fguid%2F9202a8c04000641f883d84dd%3E"/>

rdf:resource="http://lod.openlinksw.com/fct/rdfdesc/styles/default.css"/>
rdf:resource="http://lod.openlinksw.com/fct/rdfdesc/styles/highlighter.css"/>

rdf:resource="http://creativecommons.org/licenses/by-sa/3.0/"/>

That's wrong, it should be:

rdf:about="http://freebase.com/guid/9202a8c04000641f883d84dd";>
rdf:resource="http://lod.openlinksw.com/sparql?query=DESCRIBE%20%3Chttp%3A%2F%2Ffreebase.com%2Fguid%2F9202a8c04000641f883d84dd%3E"/>

rdf:resource="http://lod.openlinksw.com/fct/rdfdesc/styles/default.css"/>
rdf:resource="http://lod.openlinksw.com/fct/rdfdesc/styles/highlighter.css"/>

rdf:resource="http://creativecommons.org/licenses/by-sa/3.0/"/>

Kingsley

Cheers,

Also note:

1. In the footer sections of our pages (as we do re. DBpedia) you do
have links to alternative metadata (resource description)
representations (RDF/XML, N3 etc..) ;

2. Click on one of the items referred to above and you will see all
roads point to the originating data space courtesy of the implicit
attribution that HTTP URIs accord.

Just need to fix the little snafu in the embedded RDFa.

Others:
note, we have online now (it has much more
data i.e., 5 Billion Triples and counting), and it will soon be what you
see at: http://lod.openlinkswcom (which is old and contains less data).

Regards,

Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO
OpenLink Software Web: http://www.openlinksw.com

Re: LOD Data Sets, Licensing, and AWS

Leigh Dodds wrote:

Hi,

2009/6/24 Kingsley Idehen :

To save time etc..

If you have a an example URI even better.

You can chose from several at http://www.opendatacommons.org/

Take a look at Freebase, and how they are effectively doing what I espouse.
Google uses Freebase URIs, and they attribute by URI.

I have. I've read the licensing, terms and policies of a number of
different websites.

I see Freebase using CC-BY-SA to effectively propagate their URIs. I also
see all consumers of Freebase URIs honoring the terms without any issues.

Really? I'm not trying to be unfair, but where on:

You're not being unfair. We are trying to get to the bottom of something
that really important.

http://lod.openlinksw.com/

http://lod.openlinksw.com/describe/?url=http%3A%2F%2Ffreebase.com%2Fguid%2F9202a8c04000641f883d84dd

The URIs are in full view. Use a Linked Data Aware user agent against
the URIs and you end up in the originating Freebase Data Space. This is
my fundamental point re. preservation of original URIs.

Are the attributions required in:

http://www.freebase.com/signin/licensing

The source of the data in our data space is crystal clear by virtue of
URI visibility. Just look at the Entity URI associated with the text
following "About:" (the @href value). We are saying: this page is about
a resource in the Freebase Data Space and we use the Freebase URI in
@href. In short, just click on it and see what happens :-)

You can use ODE, DISCO, Zitgist Data Viewer, or Tabulator etc. to
explore the URIs, and you will end up in the originating data space of a
given entity.

Now for the problems re. our pages:

Our embedded RDFa produces:

http://lod.openlinksw.com/fct/rdfdesc/";>
rdf:resource="http://lod.openlinksw.com/sparql?query=DESCRIBE%20%3Chttp%3A%2F%2Ffreebase.com%2Fguid%2F9202a8c04000641f883d84dd%3E"/>
rdf:resource="http://lod.openlinksw.com/fct/rdfdesc/styles/default.css"/>
rdf:resource="http://lod.openlinksw.com/fct/rdfdesc/styles/highlighter.css"/>

http://creativecommons.org/licenses/by-sa/3.0/"/>

That's wrong, it should be:

rdf:about="http://freebase.com/guid/9202a8c04000641f883d84dd";>
rdf:resource="http://lod.openlinksw.com/sparql?query=DESCRIBE%20%3Chttp%3A%2F%2Ffreebase.com%2Fguid%2F9202a8c04000641f883d84dd%3E"/>
rdf:resource="http://lod.openlinksw.com/fct/rdfdesc/styles/default.css"/>
rdf:resource="http://lod.openlinksw.com/fct/rdfdesc/styles/highlighter.css"/>

http://creativecommons.org/licenses/by-sa/3.0/"/>

Kingsley

Cheers,

Regards,

Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO
OpenLink Software Web: http://www.openlinksw.com

Re: LOD Data Sets, Licensing, and AWS


Ian Davis wrote:
On Wed, Jun 24, 2009 at 4:05 PM, Kingsley Idehen 
mailto:kide...@openlinksw.com>> wrote:


My comments are still fundamentally about my preference for
CC-BY-SA.  Hence the transcopyright reference :-)

I want Linked Data to have its GPL equivalent; a license scheme that:


Have you read the licenses at http://opendatacommons.org/ ?


 Ian

Just point me to the license ( via a URI) that closest addresses the 
attribution mechanism and/or specifics, that I seek.


This will save time since I've been unable to find locate such a license 
so far. Of course, I might be wrong since I've only perused 
 .


Once the license in question is unveiled I think we're all set.

--


Regards,

Kingsley Idehen   Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software Web: http://www.openlinksw.com

Re: LOD Data Sets, Licensing, and AWS

Hi,

2009/6/24 Kingsley Idehen :
> To save time etc..
>
> What is the URI of a license that effectively enables data publishers to
> express and enforce how they are attributed? Whatever that is I am happy
> with. Whatever that is will be vital to attracting curators of high quality
> data to the LOD fold.
>
> If you have a an example URI even better.

You can chose from several at http://www.opendatacommons.org/

> Take a look at Freebase, and how they are effectively doing what I espouse.
> Google uses Freebase URIs, and they attribute by URI.

I have. I've read the licensing, terms and policies of a number of
different websites.

> I see Freebase using CC-BY-SA to effectively propagate their URIs. I also
> see all consumers of Freebase URIs honoring the terms without any issues.

Really? I'm not trying to be unfair, but where on:

http://lod.openlinksw.com/

Or

http://lod.openlinksw.com/describe/?url=http%3A%2F%2Ffreebase.com%2Fguid%2F9202a8c04000641f883d84dd

Are the attributions required in:

http://www.freebase.com/signin/licensing

A link to the topic page isn't enough based on the terms that Freebase
currently publish. I'm not saying I agree with them, as clearly they
don't scale well in the large. They've also not defined an attribution
policy for data linking. And if they don't care enough about the
attribution to follow-up, then why not publish it under a
non-attribution license in the first place?

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com

Re: LOD Data Sets, Licensing, and AWS


Leigh Dodds wrote:

2009/6/24 Kingsley Idehen :
  

My comments are still fundamentally about my preference for CC-BY-SA.  Hence
the transcopyright reference :-)



Unfortunately your preference doesn't actually it make it legally
applicable to data and databases. 
The problem, as I see it,  at the

moment is that this is what the majority of people are doing: using a
CC license to capture their desire or intent with respect to
licensing, rights waivers, attribution, intended uses, etc. The
disconnect is between what people want to do with the license, and
what's actually supported in law.

  

I want Linked Data to have its GPL equivalent; a license scheme that:

1.  protects the rights of data contributors;
2.  easy to express;
3.  easy to adhere to;
4.  easy to enforce.



Then the best way to do this is to engage with the communities that
are attempting to do exactly that: the open data commons and creative
commons. We shouldn't be encouraging people to do the wrong thing and
use licenses and waivers that don't actually do what they want them to
do. The science commons protocol is a good example of best practices
w.r.t data licensing that are being agreed to within a specific
community; one that has a a long standing culture of citation and
attribution.

IMHO much of the advice and reasoning that has gone into the
definition and publishing of the science commons protocol is
applicable to the the web of data as a whole. Convergence on a commons
-- which can still support and encourage attribution through community
norms -- is a Good Thing.
  


To save time etc..

What is the URI of a license that effectively enables data publishers to 
express and enforce how they are attributed? Whatever that is I am happy 
with. Whatever that is will be vital to attracting curators of high 
quality data to the LOD fold.


If you have a an example URI even better.

  

As I stated during one of the Semtech 2009 sessions. HTTP URIs provide a
closed loop re. the above. When you visit my data space you leave your
fingerprints in my HTTP logs. I can follow the log back to your resources to
see if you are conforming with my terms. I can compare the data in your
resource against my and sniff out if you are attributing your data sources
(what you got from me) correctly.

If all the major media companies grok the above, there will be far less
resistance to publishing linked data since they will actually have better
comprehension of its inherent virtues and positive impact on their bottom
line.



I'm not sure that understanding the value of a unique uri for every
resource, and the benefits of a larger surface area of their website,
is the primary barrier to entry for those companies. One might build
similar arguments around SEO and APIs. IMO, the understanding has to
come through the network effects created by opening up the data for
widest possible reuse. Clear and liberal licensing is a part of that.
  


Take a look at Freebase, and how they are effectively doing what I 
espouse. Google uses Freebase URIs, and they attribute by URI.


I see Freebase using CC-BY-SA to effectively propagate their URIs. I 
also see all consumers of Freebase URIs honoring the terms without any 
issues.


Kingsley

Cheers,

L.

  



--


Regards,

Kingsley Idehen   Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software Web: http://www.openlinksw.com

Re: LOD Data Sets, Licensing, and AWS

2009/6/24 Kingsley Idehen :
> My comments are still fundamentally about my preference for CC-BY-SA.  Hence
> the transcopyright reference :-)

Unfortunately your preference doesn't actually it make it legally
applicable to data and databases. The problem, as I see it,  at the
moment is that this is what the majority of people are doing: using a
CC license to capture their desire or intent with respect to
licensing, rights waivers, attribution, intended uses, etc. The
disconnect is between what people want to do with the license, and
what's actually supported in law.

> I want Linked Data to have its GPL equivalent; a license scheme that:
>
> 1.  protects the rights of data contributors;
> 2.  easy to express;
> 3.  easy to adhere to;
> 4.  easy to enforce.

Then the best way to do this is to engage with the communities that
are attempting to do exactly that: the open data commons and creative
commons. We shouldn't be encouraging people to do the wrong thing and
use licenses and waivers that don't actually do what they want them to
do. The science commons protocol is a good example of best practices
w.r.t data licensing that are being agreed to within a specific
community; one that has a a long standing culture of citation and
attribution.

IMHO much of the advice and reasoning that has gone into the
definition and publishing of the science commons protocol is
applicable to the the web of data as a whole. Convergence on a commons
-- which can still support and encourage attribution through community
norms -- is a Good Thing.

> As I stated during one of the Semtech 2009 sessions. HTTP URIs provide a
> closed loop re. the above. When you visit my data space you leave your
> fingerprints in my HTTP logs. I can follow the log back to your resources to
> see if you are conforming with my terms. I can compare the data in your
> resource against my and sniff out if you are attributing your data sources
> (what you got from me) correctly.
>
> If all the major media companies grok the above, there will be far less
> resistance to publishing linked data since they will actually have better
> comprehension of its inherent virtues and positive impact on their bottom
> line.

I'm not sure that understanding the value of a unique uri for every
resource, and the benefits of a larger surface area of their website,
is the primary barrier to entry for those companies. One might build
similar arguments around SEO and APIs. IMO, the understanding has to
come through the network effects created by opening up the data for
widest possible reuse. Clear and liberal licensing is a part of that.

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com

Re: LOD Data Sets, Licensing, and AWS

2009-06-24 Thread Ian Davis

On Wed, Jun 24, 2009 at 4:05 PM, Kingsley Idehen wrote:

> My comments are still fundamentally about my preference for CC-BY-SA.
>  Hence the transcopyright reference :-)
>
> I want Linked Data to have its GPL equivalent; a license scheme that:

Have you read the licenses at http://opendatacommons.org/ ?

 Ian

Re: LOD Data Sets, Licensing, and AWS


Leigh Dodds wrote:

Hi,

2009/6/24 Kingsley Idehen :
  

When you publish said data as Linked Data you will be using an HTTP URI, and
in doing so there is implicit attribution.
If you retain the URIs of the source, or make explicit claims (e.g.,
dc:source) that expose the original data sources then everything is fine,
nobody along the value chain gets dislocated.



Yes, with respect to linking back to the originating *dataset* I
basically agree with you. I'd read your original comments as
suggesting that simply reusing the core data was sufficient, and I
think we're agreeing that the source (i.e. the void dataset) needs to
be acknowledged.
  

Yes.

However this simply provides a means for citing sources, there are
other aspects to attribution that also need to be addressed, e.g. how
its actually surfaced to a user. E.g. what properties are included in
the Void description of a dataset that might be included in a user
interface. There's also the protocol level issues, e.g. how do we
include links from SPARQL results?
  
If you site sources using HTTP URIs then the Linked Data meme's implicit 
association of Entity ID and Entity Metadata kicks in i.e., you have a 
timeless
and persistent pointer to the origins of any piece of data exposed by an 
given Data Space (which is the same thing as a *Dataspace*).


  

Ted Nelson: referred to the above in different terms as: Transcopyright.



AIUI Transcopyright is a default licensing scheme for content (and
presumably data) that encourages a share-alike behaviour rather than
the current default "all rights reserved" copyright situation we have
now. So related but not exactly the same.
  


My comments are still fundamentally about my preference for CC-BY-SA.  
Hence the transcopyright reference :-)


I want Linked Data to have its GPL equivalent; a license scheme that:

1.  protects the rights of data contributors;
2.  easy to express;
3.  easy to adhere to;
4.  easy to enforce.

As I stated during one of the Semtech 2009 sessions. HTTP URIs provide a 
closed loop re. the above. When you visit my data space you leave your 
fingerprints in my HTTP logs. I can follow the log back to your 
resources to see if you are conforming with my terms. I can compare the 
data in your resource against my and sniff out if you are attributing 
your data sources (what you got from me) correctly.


If all the major media companies grok the above, there will be far less 
resistance to publishing linked data since they will actually have 
better comprehension of its inherent virtues and positive impact on 
their bottom line.


HTTP URIs are potent mediums of value exchange, and for media companies 
they will come to understand that their crown jewels (think "data 
wine")  simply needs to be served up in cyberspace via HTTP URIs instead 
of "paper cups" or electronic renditions of "paper cups"  (e.g. 
newspaper industry).
  

He also used the term: Transclusion, to describe what we commonly refer to
as: mashups (Web 2.0 code hacks) and meshups (Linked Data emixes), today.



He'd probably argue differently, I've seen him speak and he's an
interesting character! :). But yes, the essence is the same.
  
It's always important to apply temporal context to Ted's comments. The 
guy groked today's Linked Data meme circa. 1965 (or slightly earlier). 
He always espoused interwingularity and hyper-orthogonality of data 
based on inherent irregularity of data structures across Data Spaces / 
Dataspaces.  Xanadu, ZigZag are profound visionary insights that are all 
doable today via applications of Linked Data and RDFa.



Kingsley

L.

  



--


Regards,

Kingsley Idehen   Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software Web: http://www.openlinksw.com

Re: LOD Data Sets, Licensing, and AWS

Hi,

2009/6/24 Kingsley Idehen :
> When you publish said data as Linked Data you will be using an HTTP URI, and
> in doing so there is implicit attribution.
> If you retain the URIs of the source, or make explicit claims (e.g.,
> dc:source) that expose the original data sources then everything is fine,
> nobody along the value chain gets dislocated.

Yes, with respect to linking back to the originating *dataset* I
basically agree with you. I'd read your original comments as
suggesting that simply reusing the core data was sufficient, and I
think we're agreeing that the source (i.e. the void dataset) needs to
be acknowledged.

However this simply provides a means for citing sources, there are
other aspects to attribution that also need to be addressed, e.g. how
its actually surfaced to a user. E.g. what properties are included in
the Void description of a dataset that might be included in a user
interface. There's also the protocol level issues, e.g. how do we
include links from SPARQL results?

> Ted Nelson: referred to the above in different terms as: Transcopyright.

AIUI Transcopyright is a default licensing scheme for content (and
presumably data) that encourages a share-alike behaviour rather than
the current default "all rights reserved" copyright situation we have
now. So related but not exactly the same.

> He also used the term: Transclusion, to describe what we commonly refer to
> as: mashups (Web 2.0 code hacks) and meshups (Linked Data emixes), today.

He'd probably argue differently, I've seen him speak and he's an
interesting character! :). But yes, the essence is the same.

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com

Re: LOD Data Sets, Licensing, and AWS


Leigh Dodds wrote:

Hi,

2009/6/24 Ian Davis :
  

But your URIs conveys your point of view. The important thing here is that
their is a route back to your data space; the place from which your point of
view originates.

If the pathways to the origins of data are obscured we are recreating
yesterday's economy (imho), one in which original creators of work as easily
dislocated by middlemen. An economy in which incentives for data publishing
are minimal for those who have invested time and money in quality data
curation and maintenance.
  

I'm not talking about obscuring any pathways. I'm talking about using
existing URIs and adding more information. If I publish the following RDF as
part of a set of reviews at http://example.com/reviews then how, in your
scheme, am I supposed to get attribution?

 a foaf:weblog ;
rev:text "Kingsley's blog, often containing pertinent lod postings" .



I think there are also some other circumstances in which the "URIs as
attribution" mechanism is not sufficient:

- Editorial. I may produce a subset of the LOD cloud which includes
data that I consider to be of high quality or is relevant (in some
sense) to a specific area of interest. I might reasonably want
attribution for the effort invested there, although I've not
contributed any additional data (indeed there's likely to be less).
The custom dataset should be attributable and citeable. AIUI, In EU
law I would have some rights to this database ("a database right")
which derives from the collection and editorial input.
  
When you publish said data as Linked Data you will be using an HTTP URI, 
and in doing so there is implicit attribution.
If you retain the URIs of the source, or make explicit claims (e.g., 
dc:source) that expose the original data sources then everything is fine,

nobody along the value chain gets dislocated.

- Derived Data. I may carry out some statistical analysis on LOD data,
covering millions of triples from dozens of different sources. The
derived data can be published as linked data, and the original
datasets owners may reasonably expect attribution of my sources, even
though I'm not republishing any of the original triples.
  
But the original data set was input for you work, and you've published 
your work using your HTTP URIs. Like the example above, dc:source 
statements exposing

each data source URI will do i.e., keep the value chain transparent.

An "Attribution Economy" is one in which each unit contribution is 
de-referencable to its originator.


Ted Nelson: referred to the above in different terms as: 
Transcopyright.   He also used the term: Transclusion, to describe what 
we commonly refer to as: mashups (Web 2.0 code hacks) and meshups 
(Linked Data emixes), today. 




Cheers,

L.

  



--


Regards,

Kingsley Idehen   Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software Web: http://www.openlinksw.com

Re: LOD Data Sets, Licensing, and AWS


Ian Davis wrote:
On Wed, Jun 24, 2009 at 2:00 AM, Kingsley Idehen 
mailto:kide...@openlinksw.com>> wrote:



There will be dozens or hundreds of other documents that use
the same URI and the owners of those datasets would like
attribution for their work. For example, I can make some
unique assertions about you that no-one else has and I would
like those attributed to me - using your URI would not provide
that attribution.


But your URIs conveys your point of view. The important thing here
is that their is a route back to your data space; the place from
which your point of view originates.

If the pathways to the origins of data are obscured we are
recreating yesterday's economy (imho), one in which original
creators of work as easily dislocated by middlemen. An economy in
which incentives for data publishing are minimal for those who
have invested time and money in quality data curation and
maintenance.



I'm not talking about obscuring any pathways. I'm talking about using 
existing URIs and adding more information. If I publish the following 
RDF as part of a set of reviews at http://example.com/reviews then 
how, in your scheme, am I supposed to get attribution?


> a foaf:weblog ;

rev:text "Kingsley's blog, often containing pertinent lod postings" .


Ian


Ian,

Via the following statements in your own data space (assuming the review 
doesn't already include metadata that exposes author or creator URIs):



 a foaf:Document;
foaf:primarytopic  > ;

dc:creator <#you>.
..
..

Also note, there is implicit data provider attribution to whoever owns 
or runs , assuming this is an HTTP URI 
that's implicitly bound to its metadata (as per Linked Data meme).


--


Regards,

Kingsley Idehen   Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software Web: http://www.openlinksw.com

Re: LOD Data Sets, Licensing, and AWS

2009-06-24 Thread Hilmar Lapp



On Jun 23, 2009, at 7:04 PM, Peter Ansell wrote:

Interestingly, there is a large economy involved with patenting gene  
sequences. Aren't they facts also? Why is patenting different to  
copyright in this respect?



It isn't. I don't know of any gene sequence patent that was just that  
and withstood being challenged in court.


The gene sequence patents that I'm aware of and are active aren't for  
the sequence, but for an application of the sequence, such as as a  
diagnostic of a certain disease, or a drug target for a certain  
indication, or a biological therapeutic. Those kinds of discoveries  
aren't typically facts of nature, and hence eligible for intellectual  
property.


-hilmar
--
===
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===

Re: LOD Data Sets, Licensing, and AWS

Hi,

2009/6/24 Ian Davis :
>> But your URIs conveys your point of view. The important thing here is that
>> their is a route back to your data space; the place from which your point of
>> view originates.
>>
>> If the pathways to the origins of data are obscured we are recreating
>> yesterday's economy (imho), one in which original creators of work as easily
>> dislocated by middlemen. An economy in which incentives for data publishing
>> are minimal for those who have invested time and money in quality data
>> curation and maintenance.
>
> I'm not talking about obscuring any pathways. I'm talking about using
> existing URIs and adding more information. If I publish the following RDF as
> part of a set of reviews at http://example.com/reviews then how, in your
> scheme, am I supposed to get attribution?
>
>  a foaf:weblog ;
> rev:text "Kingsley's blog, often containing pertinent lod postings" .

I think there are also some other circumstances in which the "URIs as
attribution" mechanism is not sufficient:

- Editorial. I may produce a subset of the LOD cloud which includes
data that I consider to be of high quality or is relevant (in some
sense) to a specific area of interest. I might reasonably want
attribution for the effort invested there, although I've not
contributed any additional data (indeed there's likely to be less).
The custom dataset should be attributable and citeable. AIUI, In EU
law I would have some rights to this database ("a database right")
which derives from the collection and editorial input.

- Derived Data. I may carry out some statistical analysis on LOD data,
covering millions of triples from dozens of different sources. The
derived data can be published as linked data, and the original
datasets owners may reasonably expect attribution of my sources, even
though I'm not republishing any of the original triples.

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com

Re: LOD Data Sets, Licensing, and AWS

2009-06-24 Thread Rob Styles

On 24 Jun 2009, at 00:04, Peter Ansell wrote:

2009/6/24 Ian Davis
On Tue, Jun 23, 2009 at 11:11 PM, Kingsley Idehen > wrote:

Using licensing to ensure the data providers URIs are always
preserved delivers low cost and implicit attribution. This is what I
believe CC-BY-SA delivers. There is nothing wrong with granular
attribution if compliance is low cost. Personally, I think we are on
the verge of an "Attribution Economy", and said economy will
encourage contributions from a plethora of high quality data
providers (esp. from the tradition media realm).

Regardless of any attribution economy, CC-BY-SA is basically
unenforceable for data so is not appropriate. You can't copyright
the diameter of the moon.

Ian

Interestingly, there is a large economy involved with patenting gene
sequences. Aren't they facts also? Why is patenting different to
copyright in this respect?

#random_aside_about_copyright_and_patent

Patents and Copyright differ in many respects.

Firstly, Copyright protection is given to creative works automatically
with no need to register. Simply by authoring something that shows a
basic level of creative expression I am granted Copyright protection
over that work. This is fairly uniform throughout countries that trade
with the US as the US has pushed very hard to unify the protection of
its own IP globally. Copyright only applies to the work I've done
though, characters, ideas and many other aspects are not covered.

Patents on the other hand require a successful patent application and
(though this is debatable in many cases) have a rigourous set of rules
about the novelty of the invention applied. In the case of gene
sequences it is not the sequence alone that is patented, but inventive
description of the possible treatments, cures or other benefits of
manipulating the gene (http://www.guardian.co.uk/science/2000/nov/15/genetics.theissuesexplained
). That is, Patent protection covers the idea where Copyright does not.

The other major difference is in how they can apply to what you do. If
you create something that is very similar to somebody else's work, but
can show that the original work was not referenced in any way, then
you have not infringed the copyright of that work (of course, that's
difficult to show). With a patent, however, the idea is protected
exclusively for the original inventor even if you came up with the
same idea completely independently.

rob

Cheers,

Peter

Rob Styles
tel: +44 (0)870 400 5000
fax: +44 (0)870 400 5001
mobile: +44 (0)7971 475 257
msn: m...@yahoo.com
irc: irc.freenode.net/mrob,isnick
web: http://www.talis.com/
blog: http://www.dynamicorange.com/blog/
blog: http://blogs.talis.com/panlibus/
blog: http://blogs.talis.com/nodalities/
blog: http://blogs.talis.com/n2/

Please consider the environment before printing this email.

Find out more about Talis at www.talis.com

shared innovationTM

Any views or personal opinions expressed within this email may not be those of
Talis Information Ltd or its employees. The content of this email message and
any files that may be attached are confidential, and for the usage of the
intended recipient only. If you are not the intended recipient, then please
return this message to the sender and delete it. Any use of this e-mail by an
unauthorised recipient is prohibited.

Talis Information Ltd is a member of the Talis Group of companies and is
registered in England No 3638278 with its registered office at Knights Court,
Solihull Parkway, Birmingham Business Park, B37 7YB.

Re: LOD Data Sets, Licensing, and AWS

Hi,

2009/6/23 Kingsley Idehen :
> All,
>
> As you may have noticed, AWS still haven't made the LOD cloud data sets  --
> that I submitted eons ago -- public. Basically, the hold-up comes down to
> discomfort with the lack of license clarity re. some of the data sets.

Yes, this is an issue that Amazon mentioned when I discussed mirroring
data from the Connected Commons with them a few months ago. Its a
reasonable concern as, being a large organization, they are the
obvious target for any potential lawsuit w.r.t. licensing or copyright
infringement. Other organizations may have similar concerns and we
need to anticipate that.

I'm glad that this issue is starting to get more attention, and
there's been some useful discussion so far. Licensing and rights
waivers, are topics that need to be addressed if we are to move
forward with building a sustainable infrastructure that can be
reliably and legally used for both commercial and non-commercial
usage.

As Ian mentioned, a tutorial proposal has been submitted to ISWC by
representatives of the Open Data Commons, Science Commons, and Talis
on precisely these topics, and will cover both legal and social
frameworks that relate to open data publishing. I hope that we'll also
be able to provide some clear advice on what is/isn't covered by
copyright and database licensing law to also ensure that people
scraping and converting facts from existing websites can have a
clearer understanding of what they legally can and can't do.

I think as the discussion proceeds we need to be clear about several
different issues: what mechanisms exist for waiving or granting
licenses to data and content and their applicability, and the social
norms that should underpin a community of "good data reusers";
attribution is one of these. At the moment many datasets are either
not explicitly licensed or incorrectly licensed, e.g. using a CC-By-SA
license for data. The latter typically expresses the wishes or
intentions of the data publisher ("please acknowledge my efforts") but
is not legally enforceable.

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com

Re: LOD Data Sets, Licensing, and AWS

On Wed, Jun 24, 2009 at 2:00 AM, Kingsley Idehen wrote:

>
>  There will be dozens or hundreds of other documents that use the same URI
>> and the owners of those datasets would like attribution for their work. For
>> example, I can make some unique assertions about you that no-one else has
>> and I would like those attributed to me - using your URI would not provide
>> that attribution.
>>
>
> But your URIs conveys your point of view. The important thing here is that
> their is a route back to your data space; the place from which your point of
> view originates.
>
> If the pathways to the origins of data are obscured we are recreating
> yesterday's economy (imho), one in which original creators of work as easily
> dislocated by middlemen. An economy in which incentives for data publishing
> are minimal for those who have invested time and money in quality data
> curation and maintenance.
>

I'm not talking about obscuring any pathways. I'm talking about using
existing URIs and adding more information. If I publish the following RDF as
part of a set of reviews at http://example.com/reviews then how, in your
scheme, am I supposed to get attribution?

 a foaf:weblog ;
rev:text "Kingsley's blog, often containing pertinent lod postings" .

Ian

Re: LOD Data Sets, Licensing, and AWS


Ian Davis wrote:


On Tue, Jun 23, 2009 at 11:11 PM, Kingsley Idehen
mailto:kide...@openlinksw.com>> wrote:


Using licensing to ensure the data providers URIs are always
preserved delivers low cost and implicit attribution. This is
what I believe CC-BY-SA delivers. There is nothing wrong with
granular attribution if compliance is low cost. Personally, I
think we are on the verge of an "Attribution Economy", and
said economy will encourage contributions from a plethora of
high quality data providers (esp. from the tradition media realm).


Regardless of any attribution economy, CC-BY-SA is basically 
unenforceable for data so is not appropriate. You can't copyright the 
diameter of the moon.


Ian


 

I am not talking about copyrighting the diameter of the moon. I am 
talking about the origin of the diameter of the moon that is 
de-referencable via an HTTP URI :-) Just want a pathway to the origin of 
the perspective that is being used in some other context and space.


--


Regards,

Kingsley Idehen   Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software Web: http://www.openlinksw.com

Re: LOD Data Sets, Licensing, and AWS


Ian Davis wrote:
On Tue, Jun 23, 2009 at 11:11 PM, Kingsley Idehen 
mailto:kide...@openlinksw.com>> wrote:


Ian Davis wrote:

Hi all,


On Tue, Jun 23, 2009 at 9:36 PM, Kingsley Idehen
mailto:kide...@openlinksw.com>
>> wrote:

   All,

   As you may have noticed, AWS still haven't made the LOD
cloud data
   sets  -- that I submitted eons ago -- public. Basically, the
   hold-up comes down to discomfort with the lack of license
clarity
   re. some of the data sets.

   Action items for all data set publishers:

   1. Integrate your data set licensing into your data set
(for LOD I
   would expect CC-BY-SA to be the norm)


Please do not use CC-BY-SA for LOD - it is not an appropriate
licence and it is making the problem worse. That licence uses
copyright which does not hold for factual information.

Please use an Open Data Commons license or CC-0

http://www.opendatacommons.org/licenses/

http://wiki.creativecommons.org/CC0

If your dataset contains copyrighted material too (e.g.
reviews) and you hold the rights over that content then you
should also apply a standard copyright licence. So for
completeness you need a licence for your data and one for your
content. If you use CC-0 you can apply it to both at the same
time. Obviously if you aren't the rightsholder (e.g. it is
scraped data/content from someone else) then you can't just
slap any licence you like on it - you have to abide by the
original rightsholder's wishes.

Personally I would try and select a public domain waiver or
dedication, not one that requires attributon. The reason can
be seen at
http://en.wikipedia.org/wiki/BSD_license#UC_Berkeley_advertising_clause
where stacking of attributions becomes a huge burden. Having
datasets require attribution will negate one of the linked
data web's greatest strengths: the simplicity of remixing and
reusing data.

Ian,

Using licensing to ensure the data providers URIs are always
preserved delivers low cost and implicit attribution. This is what
I believe CC-BY-SA delivers. There is nothing wrong with granular
attribution if compliance is low cost. Personally, I think we are
on the verge of an "Attribution Economy", and said economy will
encourage contributions from a plethora of high quality data
providers (esp. from the tradition media realm).


I don't think usage of a URI is enough for attribution because a URI 
is not information bearing.
Of course I could dereference it and perhaps obtain some triples that 
use it, but that URI does not denote those triples or that document.
An HTTP URI (as used re. Linked Data meme) carries implicit attribution 
prowess by implicitly binding the thing it identifies to its metadata 
(very data bearing). This is what makes this URI type so potent when 
dealing with data publishing and data access.
There will be dozens or hundreds of other documents that use the same 
URI and the owners of those datasets would like attribution for their 
work. For example, I can make some unique assertions about you that 
no-one else has and I would like those attributed to me - using your 
URI would not provide that attribution.


But your URIs conveys your point of view. The important thing here is 
that their is a route back to your data space; the place from which your 
point of view originates.


If the pathways to the origins of data are obscured we are recreating 
yesterday's economy (imho), one in which original creators of work as 
easily dislocated by middlemen. An economy in which incentives for data 
publishing are minimal for those who have invested time and money in 
quality data curation and maintenance.







Anyway, each data set provider should pick the license that works
for them :-)


Yes I agree. The above paragraph was my personal preference, but I'd 
like to convince others to think like me :)


Ditto :-)


Ian




--


Regards,

Kingsley Idehen   Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software Web: http://www.openlinksw.com

Re: LOD Data Sets, Licensing, and AWS

On Wednesday, June 24, 2009, Peter Ansell  wrote:
>
>
> 2009/6/24 Ian Davis 
>
> On Tue, Jun 23, 2009 at 11:11 PM, Kingsley Idehen  
> wrote:
>
>
> Using licensing to ensure the data providers URIs are always preserved 
> delivers low cost and implicit attribution. This is what I believe CC-BY-SA 
> delivers. There is nothing wrong with granular attribution if compliance is 
> low cost. Personally, I think we are on the verge of an "Attribution 
> Economy", and said economy will encourage contributions from a plethora of 
> high quality data providers (esp. from the tradition media realm).
>
>
> Regardless of any attribution economy, CC-BY-SA is basically unenforceable 
> for data so is not appropriate. You can't copyright the diameter of the moon.
>
> Ian
>
>
> Interestingly, there is a large economy involved with patenting gene 
> sequences. Aren't they facts also? Why is patenting different to copyright in 
> this respect?

I can't explain the technicalities (IANAL) but there are many
different types of property rights that are granted by governments
over information : copyright, database right, patent right, moral
right etc. Each of those have seperate legislation that varies by
jurisdiction (WIPO is attempting to normalising some of them). It's
complicated which is why the efforts of creative commons, science
commons and open data commons are so valuable: they create simple ways
for people to declare the conditions under which their data and
content can be reused.

Ian

Re: LOD Data Sets, Licensing, and AWS

2009-06-23 Thread Peter Ansell

2009/6/24 Ian Davis 

> On Tue, Jun 23, 2009 at 11:11 PM, Kingsley Idehen 
> wrote:
>>
>>>
>>> Using licensing to ensure the data providers URIs are always preserved
>>> delivers low cost and implicit attribution. This is what I believe CC-BY-SA
>>> delivers. There is nothing wrong with granular attribution if compliance is
>>> low cost. Personally, I think we are on the verge of an "Attribution
>>> Economy", and said economy will encourage contributions from a plethora of
>>> high quality data providers (esp. from the tradition media realm).
>>
>>
> Regardless of any attribution economy, CC-BY-SA is basically unenforceable
> for data so is not appropriate. You can't copyright the diameter of the
> moon.
>
> Ian
>

Interestingly, there is a large economy involved with patenting gene
sequences. Aren't they facts also? Why is patenting different to copyright
in this respect?

Cheers,

Peter

Re: LOD Data Sets, Licensing, and AWS

>
> On Tue, Jun 23, 2009 at 11:11 PM, Kingsley Idehen 
> wrote:
>
>>
>> Using licensing to ensure the data providers URIs are always preserved
>> delivers low cost and implicit attribution. This is what I believe CC-BY-SA
>> delivers. There is nothing wrong with granular attribution if compliance is
>> low cost. Personally, I think we are on the verge of an "Attribution
>> Economy", and said economy will encourage contributions from a plethora of
>> high quality data providers (esp. from the tradition media realm).
>
>
Regardless of any attribution economy, CC-BY-SA is basically unenforceable
for data so is not appropriate. You can't copyright the diameter of the
moon.

Ian

Re: LOD Data Sets, Licensing, and AWS

On Tue, Jun 23, 2009 at 11:11 PM, Kingsley Idehen wrote:

> Ian Davis wrote:
>
>> Hi all,
>>
>> On Tue, Jun 23, 2009 at 9:36 PM, Kingsley Idehen 
>> > kide...@openlinksw.com>> wrote:
>>
>>All,
>>
>>As you may have noticed, AWS still haven't made the LOD cloud data
>>sets  -- that I submitted eons ago -- public. Basically, the
>>hold-up comes down to discomfort with the lack of license clarity
>>re. some of the data sets.
>>
>>Action items for all data set publishers:
>>
>>1. Integrate your data set licensing into your data set (for LOD I
>>would expect CC-BY-SA to be the norm)
>>
>>
>> Please do not use CC-BY-SA for LOD - it is not an appropriate licence and
>> it is making the problem worse. That licence uses copyright which does not
>> hold for factual information.
>>
>> Please use an Open Data Commons license or CC-0
>>
>> http://www.opendatacommons.org/licenses/
>>
>> http://wiki.creativecommons.org/CC0
>>
>> If your dataset contains copyrighted material too (e.g. reviews) and you
>> hold the rights over that content then you should also apply a standard
>> copyright licence. So for completeness you need a licence for your data and
>> one for your content. If you use CC-0 you can apply it to both at the same
>> time. Obviously if you aren't the rightsholder (e.g. it is scraped
>> data/content from someone else) then you can't just slap any licence you
>> like on it - you have to abide by the original rightsholder's wishes.
>>
>> Personally I would try and select a public domain waiver or dedication,
>> not one that requires attributon. The reason can be seen at
>> http://en.wikipedia.org/wiki/BSD_license#UC_Berkeley_advertising_clausewhere 
>> stacking of attributions becomes a huge burden. Having datasets
>> require attribution will negate one of the linked data web's greatest
>> strengths: the simplicity of remixing and reusing data.
>>
> Ian,
>
> Using licensing to ensure the data providers URIs are always preserved
> delivers low cost and implicit attribution. This is what I believe CC-BY-SA
> delivers. There is nothing wrong with granular attribution if compliance is
> low cost. Personally, I think we are on the verge of an "Attribution
> Economy", and said economy will encourage contributions from a plethora of
> high quality data providers (esp. from the tradition media realm).


I don't think usage of a URI is enough for attribution because a URI is not
information bearing. Of course I could dereference it and perhaps obtain
some triples that use it, but that URI does not denote those triples or that
document. There will be dozens or hundreds of other documents that use the
same URI and the owners of those datasets would like attribution for their
work. For example, I can make some unique assertions about you that no-one
else has and I would like those attributed to me - using your URI would not
provide that attribution.


>
> Anyway, each data set provider should pick the license that works for them
> :-)


Yes I agree. The above paragraph was my personal preference, but I'd like to
convince others to think like me :)

Ian

Re: LOD Data Sets, Licensing, and AWS

Alan Ruttenberg wrote:

On Tue, Jun 23, 2009 at 4:36 PM, Kingsley Idehen wrote:

All,

As you may have noticed, AWS still haven't made the LOD cloud data sets --
that I submitted eons ago -- public. Basically, the hold-up comes down to
discomfort with the lack of license clarity re. some of the data sets.

Action items for all data set publishers:

1. Integrate your data set licensing into your data set (for LOD I would expect
CC-BY-SA to be the norm)

First off, I am not a lawyer, and neither I nor Science Commons give
legal advice. I can pass along the results of our research and policy
work in this space, and connect you with others at Science Commons if
need be.

Data is tricky, since it's not always clear whether copyright licenses
can be applied. Copyright law at its core applies when there is
"creative expression" and does not protect facts, which most data
arguably is. It's very difficult to discern where copyright protection
ends and when the data is naturally in the public domain, and so we do
not advocate applying a copyright license to data (CC-BY-SA being an
example of such).

Here are some links if you are interested in understanding more about
the problem.

http://sciencecommons.org/resources/faq/database-protocol/
http://sciencecommons.org/projects/publishing/open-access-data-protocol/
http://www.slideshare.net/kaythaney/sharing-scientific-data-legal-normative-and-social-issues
http://sciencecommons.org/wp-content/uploads/freedom-to-research.pdf

A further issue is that any *license* applied to data constrains the
ability to integrate it on a large scale because any requirement on
the licensee gets magnified as more and more data sources become
available, each with a separate requirement. Instead it is suggested
that providers effectively commit the data to the public domain. In
order to do that, Science Commons defined a protocol for implementing
open access data. It is intended that various license and public
domain dedications might follow this protocol, and there are two thus
far that we have certified as truly open.

The Public Domain Dedication and License
http://www.opendatacommons.org/licenses/pddl/

and

CC Zero
http://creativecommons.org/publicdomain/zero/1.0/

We recommend that you use one of these approaches when releasing your
data, to ensure maximum freedom to integrate.

Alan,

Which license simply allows me to assert that I want to be attributed by
data source URI. Example (using DBpedia even though it isn't currently
CC-BY-SA):

I have the URI: . If you use
this URI as a data source in a Linked Data meshup of Web 2.0 mashup, I
would like any user agent to be able to discover
. Thus, "Data provided by
DBpedia" isn't good enough because the path to the actual data source
isn't reflected in the literal and generic attribution.

The point above is the crux of the matter for traditional media
companies (today) and smaller curators of high quality data (in the near
future). Nobody wants to invest time in making high quality data spaces
that are easily usurped by crawling and reconstitution via completely
different URIs that dislocate the originals; or even worse, produce
pretty presentations that complete obscure paths to original data
provider (what you see in a lot of Ajax and RIA style apps today).

Kingsley

With regards,
Alan Ruttenberg
http://sciencecommons.org/about/whoweare/ruttenberg/

2. Indicate license terms in the appropriate column at:
http://esw.w3.org/topic/DataSetRDFDumps

If licenses aren't clear I will have to exclude offending data sets from the
AWS publication effort.

Regards,

Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO OpenLink Software Web: http://www.openlinksw.com

Regards,

Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO
OpenLink Software Web: http://www.openlinksw.com

Re: LOD Data Sets, Licensing, and AWS


Ian Davis wrote:

Hi all,

On Tue, Jun 23, 2009 at 9:36 PM, Kingsley Idehen 
mailto:kide...@openlinksw.com>> wrote:


All,

As you may have noticed, AWS still haven't made the LOD cloud data
sets  -- that I submitted eons ago -- public. Basically, the
hold-up comes down to discomfort with the lack of license clarity
re. some of the data sets.

Action items for all data set publishers:

1. Integrate your data set licensing into your data set (for LOD I
would expect CC-BY-SA to be the norm)


Please do not use CC-BY-SA for LOD - it is not an appropriate licence 
and it is making the problem worse. That licence uses copyright which 
does not hold for factual information.


Please use an Open Data Commons license or CC-0

http://www.opendatacommons.org/licenses/

http://wiki.creativecommons.org/CC0

If your dataset contains copyrighted material too (e.g. reviews) and 
you hold the rights over that content then you should also apply a 
standard copyright licence. So for completeness you need a licence for 
your data and one for your content. If you use CC-0 you can apply it 
to both at the same time. Obviously if you aren't the rightsholder 
(e.g. it is scraped data/content from someone else) then you can't 
just slap any licence you like on it - you have to abide by the 
original rightsholder's wishes.


Personally I would try and select a public domain waiver or 
dedication, not one that requires attributon. The reason can be seen 
at 
http://en.wikipedia.org/wiki/BSD_license#UC_Berkeley_advertising_clause 
where stacking of attributions becomes a huge burden. Having datasets 
require attribution will negate one of the linked data web's greatest 
strengths: the simplicity of remixing and reusing data.

Ian,

Using licensing to ensure the data providers URIs are always preserved 
delivers low cost and implicit attribution. This is what I believe 
CC-BY-SA delivers. There is nothing wrong with granular attribution if 
compliance is low cost. Personally, I think we are on the verge of an 
"Attribution Economy", and said economy will encourage contributions 
from a plethora of high quality data providers (esp. from the tradition 
media realm).


Anyway, each data set provider should pick the license that works for 
them :-)




A group of us have submitted a tutorial on these issues for ISWC 2009, 
hopefully it will get accepted because this is a really important area 
of Linked Data that is poorly understood.
 



2. Indicate license terms in the appropriate column at:
http://esw.w3.org/topic/DataSetRDFDumps

If licenses aren't clear I will have to exclude offending data
sets from the AWS publication effort.


I completely support declaring what rights are asserted or waived for 
a dataset, so please everyone help this effort.


Ian



--


Regards,

Kingsley Idehen   Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software Web: http://www.openlinksw.com

Re: LOD Data Sets, Licensing, and AWS

2009-06-23 Thread Alan Ruttenberg

On Tue, Jun 23, 2009 at 4:36 PM, Kingsley Idehen  wrote:
>
> All,
>
> As you may have noticed, AWS still haven't made the LOD cloud data sets  -- 
> that I submitted eons ago -- public. Basically, the hold-up comes down to 
> discomfort with the lack of license clarity re. some of the data sets.
>
> Action items for all data set publishers:
>
> 1. Integrate your data set licensing into your data set (for LOD I would 
> expect CC-BY-SA to be the norm)

First off, I am not a lawyer, and neither I nor Science Commons give
legal advice. I can pass along the results of our research and policy
work in this space, and connect you with others at Science Commons if
need be.

Data is tricky, since it's not always clear whether copyright licenses
can be applied. Copyright law at its core applies when there is
"creative expression" and does not protect facts, which most data
arguably is. It's very difficult to discern where copyright protection
ends and when the data is naturally in the public domain, and so we do
not advocate applying a copyright license to data (CC-BY-SA being an
example of such).

Here are some links if you are interested in understanding more about
the problem.

http://sciencecommons.org/resources/faq/database-protocol/
http://sciencecommons.org/projects/publishing/open-access-data-protocol/
http://www.slideshare.net/kaythaney/sharing-scientific-data-legal-normative-and-social-issues
http://sciencecommons.org/wp-content/uploads/freedom-to-research.pdf

A further issue is that any *license* applied to data constrains the
ability to integrate it on a large scale because any requirement on
the licensee gets magnified as more and more data sources become
available, each with a separate requirement. Instead it is suggested
that providers effectively commit the data to the public domain. In
order to do that, Science Commons defined a protocol for implementing
open access data. It is intended that various license and public
domain dedications might follow this protocol, and there are two thus
far that we have certified as truly open.

The Public Domain Dedication and License
http://www.opendatacommons.org/licenses/pddl/

and

CC Zero
http://creativecommons.org/publicdomain/zero/1.0/

We recommend that you use one of these approaches when releasing your
data, to ensure maximum freedom to integrate.

With regards,
Alan Ruttenberg
http://sciencecommons.org/about/whoweare/ruttenberg/

>
> 2. Indicate license terms in the appropriate column at: 
> http://esw.w3.org/topic/DataSetRDFDumps
>
> If licenses aren't clear I will have to exclude offending data sets from the 
> AWS publication effort.
>
>
>
> --
>
>
> Regards,
>
> Kingsley Idehen       Weblog: http://www.openlinksw.com/blog/~kidehen
> President & CEO OpenLink Software     Web: http://www.openlinksw.com
>
>
>
>
>

Re: LOD Data Sets, Licensing, and AWS

Hi all,

On Tue, Jun 23, 2009 at 9:36 PM, Kingsley Idehen wrote:

> All,
>
> As you may have noticed, AWS still haven't made the LOD cloud data sets  --
> that I submitted eons ago -- public. Basically, the hold-up comes down to
> discomfort with the lack of license clarity re. some of the data sets.
>
> Action items for all data set publishers:
>
> 1. Integrate your data set licensing into your data set (for LOD I would
> expect CC-BY-SA to be the norm)

Please do not use CC-BY-SA for LOD - it is not an appropriate licence and it
is making the problem worse. That licence uses copyright which does not hold
for factual information.

Please use an Open Data Commons license or CC-0

http://www.opendatacommons.org/licenses/

http://wiki.creativecommons.org/CC0

If your dataset contains copyrighted material too (e.g. reviews) and you
hold the rights over that content then you should also apply a standard
copyright licence. So for completeness you need a licence for your data and
one for your content. If you use CC-0 you can apply it to both at the same
time. Obviously if you aren't the rightsholder (e.g. it is scraped
data/content from someone else) then you can't just slap any licence you
like on it - you have to abide by the original rightsholder's wishes.

Personally I would try and select a public domain waiver or dedication, not
one that requires attributon. The reason can be seen at
http://en.wikipedia.org/wiki/BSD_license#UC_Berkeley_advertising_clausewhere
stacking of attributions becomes a huge burden. Having datasets
require attribution will negate one of the linked data web's greatest
strengths: the simplicity of remixing and reusing data.

A group of us have submitted a tutorial on these issues for ISWC 2009,
hopefully it will get accepted because this is a really important area of
Linked Data that is poorly understood.

>
> 2. Indicate license terms in the appropriate column at:
> http://esw.w3.org/topic/DataSetRDFDumps
>
> If licenses aren't clear I will have to exclude offending data sets from
> the AWS publication effort.
>

I completely support declaring what rights are asserted or waived for a
dataset, so please everyone help this effort.

Ian

LOD Data Sets, Licensing, and AWS