Re: [Discuss] Best practices for licensing data?

Karin Lagesen Fri, 27 May 2016 10:07:32 -0700

Thanks Elizabeth, this gives me a place to start :)

Karin


On 25/05/16 21:16, E.W. wrote:

Karin,

This depends on the area that you are working in and the type of
questions that you have.  There tends to be issues around:

* data/software/code licensing problems
* open access for publications
* sensitive data: what is it really?
* anonymizing data and preparing sensitive/IRB/protected data for public
access
* IP issues, including data scraping & database copyright laws and tech
transfer issues
* digital and long term preservation for data, text, code, and other
digital scholarship products

Many of these things are still being sorted out by a variety of groups,
but most of those groups intersect or usually live within the librarian
research community.

I like to refer people to their university's library or iSchool
(http://ischools.org/members/directory/) in the hopes that they have one
or many of the following people or faculty with interests in:  a
copyright librarian, research data service units or single humans,
digital repositories, digital preservation units, and/or scholarly
publishing offices/teams.  Depending on the size of the university, the
library human(s) who know about these topics may hold 1-N number of
these titles.  Many of these issues are heavily influenced by locality
and country-specific laws and standard practices, so local answers are
often more valuable than any formally published article or book.

Two groups that can help you get started down the rabbit hole:

* THOR Project: https://project-thor.readme.io/
* DataCite: https://blog.datacite.org/

There's also the #datalibs hashtag on twitter, which is sadly a formal
communication network for us.  Much of this is very US-centric, because
we're dealing with the tangled web of OSTP responses.

Elizabeth


On Wed, May 25, 2016 at 1:25 PM, Karin Lagesen <[email protected]
<mailto:[email protected]>> wrote:

    Elizabeth, thanks for a very informative answer!

    You wouldn´t happen to have some links to sites where a "normal"
    person could read about stuff like this so that those of us who tend
    to accumulate data like this can actually make sense of these issues?

    Karin (who is frequently confused regarding things like this)



    On 25/05/16 18:43, E.W. wrote:

        Hi Matthias,

        A big part of my job is attempting to answer this question for
        researchers at my (US) university and as part of on a team
        developing a
        data repository.

        I've done some analysis on the licenses that are used by
        datasets within
        DataCite records.  As of December 2015 when I scraped the data
        in, 59%
        of the records had a rights statement and 95% of those were in the
        Creative commons family.  Yanking more out of my slides, when
        looking at
        the CC uses: 62% CC-BY-NC; 36% CC-BY; 1% CC0; <1% other.  These
        numbers
        are heavily biased towards specific repositories using stock
        licenses
        for all their records and having a high volume of records, so these
        should not be interpreted as data representing the self-deposit
        data world.

        CC has a nice wizard to select a license from, but CC0 or CCBY are
        usually the ones we (the data repository team I work in) try to
        recommend to people for open data.  I can provided unapologetically
        biased opinions about which to use, but I shall refrain unless
        prodded.

        There may be a domain repository that specializes in this kind
        of data
        and they likely have some recommendations.  As far as adding it
        goes,
        most repositories just have a declaration on the splash page for the
        dataset, within the metadata, and sometimes a copy of the license as
        part of the file set.

        But to focus more on the third item, please do consider formally
        depositing this into a data repository of some sort (versus just
        having
        a public github repo).  Zenodo has hooks to github and issues out
        DataCite metadata when it generates the DOI.  Figshare does this as
        well, but Zenodo has better editing capabilities for the
        metadata.  I'm
        happy to brain dump about this more offline for the curious of
        if you're
        confused as to how to use the elements (this is an open offer to
        anyone
        on there wrangling with datacite metadata).

        As far as other considerations about the question of making things
        public, it depends on the source and content of the data.

        1) Is work on the content creation and edit of these data files
        done?
        You don't want to potentially be changing content under people's
        feet if
        they are working with the data.  There are ways to version the
        data and
        I can expand on this if it is an issue.

        2) Are there any data sensitivities?  For example: Is this human
        subject
        data?  Could this potentially have a harmful impact on any subjects?
        Looks like these are just models, so likely not, but always
        consider this.

        3) Are there any contractual or licensing sensitivities for
        making this
        open?  For example, are these data files derived from a source with
        restrictions on such derivatives?  Any other contracts or IP
        issues with
        tools used or the University in regards to licensing?  University IP
        concerns are highly variable by local laws and policies, but
        something
        to consider if they would want to have a stake in this.

        Just some things to chew on.

        Elizabeth
        (Data Curation Specialist, Research Data Service, University of
        Illinois)

        On Wed, May 25, 2016 at 11:05 AM, Matthias Nilsson
        <[email protected] <mailto:[email protected]>
        <mailto:[email protected] <mailto:[email protected]>>> wrote:

             Hi,

             I got a question at work today that I felt unable to
        answer, so I
             thought I'd pass the question on to more knowledgeable people.

             At my institution we have a set of metabolic models, which are
             basically descriptions of reactions and metabolites and so
        on, stored
             in an SBML[0] file. Internally, we have started to move them to
             private Git repositories, but would now like to make them
        public.

             As far as we can tell, there are no requirements from the
        institution
             or the university on which type of license to choose, apart
        from that
             the data should be "open".

             So what I'd like to know is this:

             1. What licenses are recommended for data? I've looked at
        Creative
             Commons and Open Data Commons, but I suspect that there may
        be more.

             2. How do we actually license things? Is it enough to add a
        file
             called LICENSE to the repository and point to it in the README?

             3. Is there anything else that we should consider when
        making the
             transition from private to public?


             Best regards,
             Matthias


             [0] A format based on XML.
             _______________________________________________
             Discuss mailing list
        [email protected]
        <mailto:[email protected]>
             <mailto:[email protected]
        <mailto:[email protected]>>
        http://lists.software-carpentry.org/listinfo/discuss




        _______________________________________________
        Discuss mailing list
        [email protected]
        <mailto:[email protected]>
        http://lists.software-carpentry.org/listinfo/discuss


    _______________________________________________
    Discuss mailing list
    [email protected]
    <mailto:[email protected]>
    http://lists.software-carpentry.org/listinfo/discuss


_______________________________________________
Discuss mailing list
[email protected]
http://lists.software-carpentry.org/listinfo/discuss

Re: [Discuss] Best practices for licensing data?

Reply via email to