Many, probably even most (but I have not checked) of the datasets available in R packages have help files with a references section. That section should lead you to an original source that may have the copyright information and is what should be referenced.
My understanding (but I am not a lawyer, do not play one on TV, or claim to be any type of legal expert) is that you cannot copyright facts, but you can copyright the layout and presentation of facts. So real data about the real world cannot be copyrighted, but the layout and presentation can be. So if you photocopy a page from a journal and post that you may be in trouble for copying and distributing the layout and presentation of the data, but not the data itself. But if you transform the numbers to a file to be read by the computer then you have just copied the facts which are not copyrighted. On the other hand simulated or otherwise made up datasets could be considered to be fiction and therefore able to be copyrighted. I remember hearing (but I don't remember where or when) that some textbook authors are encouraged to use simulated data instead of real data (it can have the same mean, sd, etc. as a real dataset so the interpretation is the same) in textbooks so that the copyright of the textbook also applies to the data. It is not always clear whether a dataset is fact or simulated, so it is best to obtain permission or check official statements from the source. Beyond what is legal you should consider what is right. Even if you don't have to cite a data source, you should try to give credit where it is due (and possibly blame if there is an error). At a minimum you should cite original sources when they can be found and also mention where you obtained the data if not from the original source. Think of the effort that people went through to collect the data and make it available to you, how would you feel if you put that much effort into something then someone else stole the credit or other rewards. Many data sources have statements on how the data can be used and it is best to follow those instructions/requests, is it really that hard to add a reference to where the data came from and how you obtained it? In some educational cases it may be better to initially hide the source of the data, for example the outliers dataset in the TeachingDemos package would be a lot less useful for its intended purposes if students were to read its help page before analyzing it, therefore I have no problem with teachers using it without telling students where it came from (and since it is simulated I could possibly claim copyright), though I would appreciate a mention after the fact (once the lesson is learned the teacher could say "by the way, this data came from ...") and I expect that others would feel similarly (I should add a note to that effect to the documentation page). But you should check the sources to see if this is specifically allowed or disallowed. I probably have not fully answered your question, but hopefully this gives a little more guidance. On Tue, Apr 22, 2014 at 11:29 AM, Soeren Groettrup <soeren.groett...@gmail.com> wrote: > Hi everybody, > > I've been searching the web for quite a time now and haven't found a > satisfying answer. I was wondering if the datasets provided within the R > packages are open, and thus can be used in publications? Concretely, can the > data, for example, be exported from R and uploaded in a different format > (like csv) to a website to be accessible for students to work with the data > in SPSS or Matlab? Is it enough to cite the source or paper or do I need a > permission for every dataset? > > Thanks in advance for your replies, > Sören Gröttrup > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.