Re: [R] Staging area for data before read into R
Stephen, One of the big problems with spreadsheets (other than the column limit in some) is that the standard entry mode allows too much flexibility which does nothing to help you avoid data entry errors. The Webpage: http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html has some examples of this going wrong, including one that happened to my group where the column for dates was not preformatted, the dates were entered using European format, and Excel did 2 different wrong things with them making it very difficult to do anything with the data without major extra work. If you are going to stick with a spreadsheet, then at a minimum you should start by naming all your columns, then formatting each column based on the type of data you expect to be entered there. Going the database route is not that much to learn to get started. You can use MSAccess, or the OpenOffice database, create a new table and enter the names of each column along with the data type (this is a big advantage in that it will not allow you to enter character data where numbers are expected, forces dates to look like dates, etc.). It is not that much extra work to enter valid levels for what will become factors (e.g. Male and Female for sex, so that those are the only values allowed, my current record for datasets entered by others using spreadsheets is 9 sexes). Then you can pick up more as you go along, but setting up the first database to enter data should only take you an hour or so to learn the basics. Another option is to just use R, the following code gives one approach that could get you started entering data: tmp - rep( list(character(0), numeric(0)), c(2,5) ) names(tmp) - c( 'ID','Sex', paste('Stream', 1:5, sep='') ) tmp - as.data.frame(tmp) levels(tmp$Sex) - c(Female,Male) tmp$ID - as.character(tmp$ID) mydata - edit(tmp) Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] 801.408.8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] project.org] On Behalf Of stephen sefick Sent: Monday, October 20, 2008 7:02 PM To: R Help Subject: Re: [R] Staging area for data before read into R Well, I am going to type in ever value because the data sheets are of counts of insects that I identified, so I should be okay with accuracy... I really just need something that allows for more than 256 columns as I have encounter over 256 species of insects in even small streams. I think calc with it's 1000ish columns will do the trick... thanks everbody for your help. On Mon, Oct 20, 2008 at 8:25 PM, Gabor Grothendieck [EMAIL PROTECTED] wrote: There is a list of free spreadsheets with their row and column limits at this link: http://en.wikipedia.org/wiki/OpenOffice.org_Calc On Mon, Oct 20, 2008 at 3:13 PM, stephen sefick [EMAIL PROTECTED] wrote: sorry excel 2003 with no immediate update in the future. On Mon, Oct 20, 2008 at 3:12 PM, Gabor Grothendieck [EMAIL PROTECTED] wrote: You didn't say which version of Excel you are using but Excel 2007 allows 16,384 columns. On Mon, Oct 20, 2008 at 2:27 PM, stephen sefick [EMAIL PROTECTED] wrote: I am wondering if there is a better alternative than Excel for data storage that does not require database knowledge (I will eventually have to learn this, but it is not on my immediate todo list). I need something that is not limited to 256 columns... I don't need any of the built in functions in excel just a spreadsheet like program with cells that hold data in a data.frame format for a staging area before I get it into R. Any help would be greatly appreciated. This is not a direct r question, but all of you folks have more experience than I do and I am having a time finding what I need with google. thanks in advance -- Stephen Sefick Research Scientist Southeastern Natural Sciences Academy Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. -- Stephen Sefick Research Scientist Southeastern Natural Sciences Academy Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals
Re: [R] Staging area for data before read into R
this is also the sort of thing that EpiData does very well. That's what it was designed for: data entry with minimal errors. Also simplifies double data entry for error checking, if you need/want that. --Chris Christopher W. Ryan, MD SUNY Upstate Medical University Clinical Campus at Binghamton 40 Arch Street, Johnson City, NY 13790 cryanatbinghamtondotedu PGP public keys available at http://home.stny.rr.com/ryancw/ Greg Snow wrote: Stephen, . . . . . Going the database route is not that much to learn to get started. You can use MSAccess, or the OpenOffice database, create a new table and enter the names of each column along with the data type (this is a big advantage in that it will not allow you to enter character data where numbers are expected, forces dates to look like dates, etc.). It is not that much extra work to enter valid levels for what will become factors (e.g. Male and Female for sex, so that those are the only values allowed, my current record for datasets entered by others using spreadsheets is 9 sexes). Then you can pick up more as you go along, but setting up the first database to enter data should only take you an hour or so to learn the basics. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] 801.408.8111 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Staging area for data before read into R
Excel has a data validation facility and also has data input forms to facilitate data entry. On Tue, Oct 21, 2008 at 1:45 PM, Greg Snow [EMAIL PROTECTED] wrote: Stephen, One of the big problems with spreadsheets (other than the column limit in some) is that the standard entry mode allows too much flexibility which does nothing to help you avoid data entry errors. The Webpage: http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html has some examples of this going wrong, including one that happened to my group where the column for dates was not preformatted, the dates were entered using European format, and Excel did 2 different wrong things with them making it very difficult to do anything with the data without major extra work. If you are going to stick with a spreadsheet, then at a minimum you should start by naming all your columns, then formatting each column based on the type of data you expect to be entered there. Going the database route is not that much to learn to get started. You can use MSAccess, or the OpenOffice database, create a new table and enter the names of each column along with the data type (this is a big advantage in that it will not allow you to enter character data where numbers are expected, forces dates to look like dates, etc.). It is not that much extra work to enter valid levels for what will become factors (e.g. Male and Female for sex, so that those are the only values allowed, my current record for datasets entered by others using spreadsheets is 9 sexes). Then you can pick up more as you go along, but setting up the first database to enter data should only take you an hour or so to learn the basics. Another option is to just use R, the following code gives one approach that could get you started entering data: tmp - rep( list(character(0), numeric(0)), c(2,5) ) names(tmp) - c( 'ID','Sex', paste('Stream', 1:5, sep='') ) tmp - as.data.frame(tmp) levels(tmp$Sex) - c(Female,Male) tmp$ID - as.character(tmp$ID) mydata - edit(tmp) Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] 801.408.8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] project.org] On Behalf Of stephen sefick Sent: Monday, October 20, 2008 7:02 PM To: R Help Subject: Re: [R] Staging area for data before read into R Well, I am going to type in ever value because the data sheets are of counts of insects that I identified, so I should be okay with accuracy... I really just need something that allows for more than 256 columns as I have encounter over 256 species of insects in even small streams. I think calc with it's 1000ish columns will do the trick... thanks everbody for your help. On Mon, Oct 20, 2008 at 8:25 PM, Gabor Grothendieck [EMAIL PROTECTED] wrote: There is a list of free spreadsheets with their row and column limits at this link: http://en.wikipedia.org/wiki/OpenOffice.org_Calc On Mon, Oct 20, 2008 at 3:13 PM, stephen sefick [EMAIL PROTECTED] wrote: sorry excel 2003 with no immediate update in the future. On Mon, Oct 20, 2008 at 3:12 PM, Gabor Grothendieck [EMAIL PROTECTED] wrote: You didn't say which version of Excel you are using but Excel 2007 allows 16,384 columns. On Mon, Oct 20, 2008 at 2:27 PM, stephen sefick [EMAIL PROTECTED] wrote: I am wondering if there is a better alternative than Excel for data storage that does not require database knowledge (I will eventually have to learn this, but it is not on my immediate todo list). I need something that is not limited to 256 columns... I don't need any of the built in functions in excel just a spreadsheet like program with cells that hold data in a data.frame format for a staging area before I get it into R. Any help would be greatly appreciated. This is not a direct r question, but all of you folks have more experience than I do and I am having a time finding what I need with google. thanks in advance -- Stephen Sefick Research Scientist Southeastern Natural Sciences Academy Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. -- Stephen Sefick Research Scientist Southeastern Natural Sciences Academy Let's not spend our time and resources thinking about things that are so little or so large that all
Re: [R] Staging area for data before read into R
column based on the type of data you expect to be entered there. Going the database route is not that much to learn to get started. You can use MSAccess, or the OpenOffice database, create a new table and enter the names of each column along with the data type (this is a big advantage in that it will not allow you to enter character data where numbers are expected, forces dates to look like dates, etc.). It is not that much extra work to enter valid levels for what will become factors (e.g. Male and Female for sex, so that those are the only values allowed, my current record for datasets entered by others using spreadsheets is 9 sexes). Then you can pick up more as you go along, but setting up the first database to enter data should only take you an hour or so to learn the basics. Another option is to just use R, the following code gives one approach that could get you started entering data: tmp - rep( list(character(0), numeric(0)), c(2,5) ) names(tmp) - c( 'ID','Sex', paste('Stream', 1:5, sep='') ) tmp - as.data.frame(tmp) levels(tmp$Sex) - c(Female,Male) tmp$ID - as.character(tmp$ID) mydata - edit(tmp) Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] 801.408.8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] project.org] On Behalf Of stephen sefick Sent: Monday, October 20, 2008 7:02 PM To: R Help Subject: Re: [R] Staging area for data before read into R Well, I am going to type in ever value because the data sheets are of counts of insects that I identified, so I should be okay with accuracy... I really just need something that allows for more than 256 columns as I have encounter over 256 species of insects in even small streams. I think calc with it's 1000ish columns will do the trick... thanks everbody for your help. On Mon, Oct 20, 2008 at 8:25 PM, Gabor Grothendieck [EMAIL PROTECTED] wrote: There is a list of free spreadsheets with their row and column limits at this link: http://en.wikipedia.org/wiki/OpenOffice.org_Calc On Mon, Oct 20, 2008 at 3:13 PM, stephen sefick [EMAIL PROTECTED] wrote: sorry excel 2003 with no immediate update in the future. On Mon, Oct 20, 2008 at 3:12 PM, Gabor Grothendieck [EMAIL PROTECTED] wrote: You didn't say which version of Excel you are using but Excel 2007 allows 16,384 columns. On Mon, Oct 20, 2008 at 2:27 PM, stephen sefick [EMAIL PROTECTED] wrote: I am wondering if there is a better alternative than Excel for data storage that does not require database knowledge (I will eventually have to learn this, but it is not on my immediate todo list). I need something that is not limited to 256 columns... I don't need any of the built in functions in excel just a spreadsheet like program with cells that hold data in a data.frame format for a staging area before I get it into R. Any help would be greatly appreciated. This is not a direct r question, but all of you folks have more experience than I do and I am having a time finding what I need with google. thanks in advance -- Stephen Sefick Research Scientist Southeastern Natural Sciences Academy Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. -- Stephen Sefick Research Scientist Southeastern Natural Sciences Academy Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis -- Stephen Sefick Research Scientist Southeastern Natural Sciences Academy Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained
Re: [R] Staging area for data before read into R
On 22/10/2008, at 8:18 AM, Ted Byers wrote: snip ... even with all the power and utility of Excel ... snip Is this some kind of joke? cheers, Rolf Turner ## Attention:\ This e-mail message is privileged and confid...{{dropped:9}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Staging area for data before read into R
On Tue, Oct 21, 2008 at 3:18 PM, Ted Byers [EMAIL PROTECTED] wrote: There are tradeoffs no matter what route you take. You can do validation in Access as you can in Excel, but Excel is not designed to manage data where Access is, and both are crippled by their dependance on VB (a seriouusly broken language: fine for scripting MS Excel can do validation without VB. For example, you can restrict data to a certain range of dates, limit choices by using a list, or make sure that only positive whole numbers are entered all without any VB. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Staging area for data before read into R
I wasn't suggesting that the validation requires VB. Creating forms and handling form events does (unless MS has introduced new utilities to hide all that since last I used it). Some of the most interesting things I have seen done with Excel did involve VB, and there are better tools to do most of those things. Gabor Grothendieck wrote: On Tue, Oct 21, 2008 at 3:18 PM, Ted Byers [EMAIL PROTECTED] wrote: There are tradeoffs no matter what route you take. You can do validation in Access as you can in Excel, but Excel is not designed to manage data where Access is, and both are crippled by their dependance on VB (a seriouusly broken language: fine for scripting MS Excel can do validation without VB. For example, you can restrict data to a certain range of dates, limit choices by using a list, or make sure that only positive whole numbers are entered all without any VB. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Staging-area-for-data-before-read-into-R-tp20075962p20099445.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Staging area for data before read into R
No. Excel, like most spreadsheets, does what it designed for reasonably well. It is easy to find fault, but not so easy to satisfy all one's critics. There is no doubt that Excel has faults, but it provides significant modelling and analysis capability to users with no programming expertise or limited experience using IT. I have used it as a teaching tool for very basic modelling to undergraduate students who would not have been able to do any modelling without it. In a one session course, there just isn't time to teach students enough programming in any language for them to have a hope of producing an interesting model. But they can produce an interesting model with some guidance using Excel. Similarly, they can do elementary data analysis entering their data into Excel and using it to analyse it. Excel was designed primarily for business people, and I have seen them use it effectively, doing things I don't fully understand (as I am not a businessman). But these same people would go into a catatonic state the moment a discussion becomes technical or mathematical. They describe Excel as powerful, and until I become an expert MBA type, I won't knock them for that. If they find it useful, why would I argue with them. Don't get me wrong, I do not normally use it, and for 99% of the work I do, it provides no value to me, so I do not have it installed on my own systems. I am better served by C++, Java, and the related tools specific to my work. But that it isn't useful to me, or apparently you, is not sufficient grounds to question its utility for others (neither is the existance of bugs, as ALL software has bugs: MS makes for an easy target, but I try to be as fair to them as I am to an independant developer who works alone - lets not have this degenerate into an attack on MS, please). As a software engineer myself, I won't knock the work of another just because what he's produced isn't particularly useful for me. I won't even knock him if I don't agree with the design decisions he's made. When that happens, it is likely I was not part of his intended market: nothing more can be implied. Rolf Turner-3 wrote: On 22/10/2008, at 8:18 AM, Ted Byers wrote: snip ... even with all the power and utility of Excel ... snip Is this some kind of joke? cheers, Rolf Turner ## Attention:\ This e-mail message is privileged and confid...{{dropped:9}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Staging-area-for-data-before-read-into-R-tp20075962p20099848.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Staging area for data before read into R
You can create data entry forms without VB in Excel too. On Tue, Oct 21, 2008 at 5:09 PM, Ted Byers [EMAIL PROTECTED] wrote: I wasn't suggesting that the validation requires VB. Creating forms and handling form events does (unless MS has introduced new utilities to hide all that since last I used it). Some of the most interesting things I have seen done with Excel did involve VB, and there are better tools to do most of those things. Gabor Grothendieck wrote: On Tue, Oct 21, 2008 at 3:18 PM, Ted Byers [EMAIL PROTECTED] wrote: There are tradeoffs no matter what route you take. You can do validation in Access as you can in Excel, but Excel is not designed to manage data where Access is, and both are crippled by their dependance on VB (a seriouusly broken language: fine for scripting MS Excel can do validation without VB. For example, you can restrict data to a certain range of dates, limit choices by using a list, or make sure that only positive whole numbers are entered all without any VB. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Staging-area-for-data-before-read-into-R-tp20075962p20099445.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Staging area for data before read into R
Ah, OK. That is new since I used Excel last. Thanks On Tue, Oct 21, 2008 at 5:52 PM, Gabor Grothendieck [EMAIL PROTECTED] wrote: You can create data entry forms without VB in Excel too. On Tue, Oct 21, 2008 at 5:09 PM, Ted Byers [EMAIL PROTECTED] wrote: I wasn't suggesting that the validation requires VB. Creating forms and handling form events does (unless MS has introduced new utilities to hide all that since last I used it). Some of the most interesting things I have seen done with Excel did involve VB, and there are better tools to do most of those things. Gabor Grothendieck wrote: On Tue, Oct 21, 2008 at 3:18 PM, Ted Byers [EMAIL PROTECTED] wrote: There are tradeoffs no matter what route you take. You can do validation in Access as you can in Excel, but Excel is not designed to manage data where Access is, and both are crippled by their dependance on VB (a seriouusly broken language: fine for scripting MS Excel can do validation without VB. For example, you can restrict data to a certain range of dates, limit choices by using a list, or make sure that only positive whole numbers are entered all without any VB. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Staging-area-for-data-before-read-into-R-tp20075962p20099445.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Staging area for data before read into R
It was in Excel 2003 too so you must not have used Excel for years. On Tue, Oct 21, 2008 at 6:36 PM, Ted Byers [EMAIL PROTECTED] wrote: Ah, OK. That is new since I used Excel last. Thanks On Tue, Oct 21, 2008 at 5:52 PM, Gabor Grothendieck [EMAIL PROTECTED] wrote: You can create data entry forms without VB in Excel too. On Tue, Oct 21, 2008 at 5:09 PM, Ted Byers [EMAIL PROTECTED] wrote: I wasn't suggesting that the validation requires VB. Creating forms and handling form events does (unless MS has introduced new utilities to hide all that since last I used it). Some of the most interesting things I have seen done with Excel did involve VB, and there are better tools to do most of those things. Gabor Grothendieck wrote: On Tue, Oct 21, 2008 at 3:18 PM, Ted Byers [EMAIL PROTECTED] wrote: There are tradeoffs no matter what route you take. You can do validation in Access as you can in Excel, but Excel is not designed to manage data where Access is, and both are crippled by their dependance on VB (a seriouusly broken language: fine for scripting MS Excel can do validation without VB. For example, you can restrict data to a certain range of dates, limit choices by using a list, or make sure that only positive whole numbers are entered all without any VB. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Staging-area-for-data-before-read-into-R-tp20075962p20099445.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Staging area for data before read into R
I fully agree, with important or large data sets you can not be paranoid enough. linux and the mac allow you to easily write scripts that handle dumping, zipping, copying (locally and elsewhere) and verifying the data. Once written correctly and tested they can run fully automatic with cron. Been doing this for 15 years. And where you are advised to burn 2 DVDs, burn 5 each. Read the data on at least two different hardwares and operating systems. Send at least one of each by courier to a collaborating colleague on a different continent. As they say, different hard disk, differenr power supply, different earthquake :-)-O el On 21 Oct 2008, at 21:18 , Ted Byers wrote: [...] Dr. Snow is right in recommending going the route of using an RDBMS and in saying that it isn't that hard to get started. I'd be recommending PostgreSQL, though, since it is relatively easy to use, and it has pl/r (which lets you run R code within stored procedures in the DB) which carries obvious advantages. [...] If I were in his place, I'd say my data is sacred, and can not be replaced (just as you can't step into the same stream twice); and therefore I'd use a RDBMS to manage it, and the very moment it is all entered, I'd make a backup of both the data (e.g. in MySQL I'd use mysqldump) AND the software, and copy both backups to two CDs or DVDs. And, if the data were originally recorded on paper, I'd be scanning the pages and copying those images onto a couple CDs or DVDs also: with two copies on optical media, one copy can be stored in a fireproof vault while the other is in the office ready to be used should a HDD fail, or some other disaster interrupt my work. OK, so I'm paranoid about my data, but I'd rather go the extra mile than risk losing it. -- Dr. Eberhard W. Lisse \/ Obstetrician Gynaecologist (Saar) [EMAIL PROTECTED] el108-ARIN / * | Telephone: +264 81 124 6733 (cell) PO Box 8421 \ / Please send DNS/NA-NiC related e-mail Bachbrecht, Namibia ;/ to [EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Staging area for data before read into R
I am wondering if there is a better alternative than Excel for data storage that does not require database knowledge (I will eventually have to learn this, but it is not on my immediate todo list). I need something that is not limited to 256 columns... I don't need any of the built in functions in excel just a spreadsheet like program with cells that hold data in a data.frame format for a staging area before I get it into R. Any help would be greatly appreciated. This is not a direct r question, but all of you folks have more experience than I do and I am having a time finding what I need with google. thanks in advance -- Stephen Sefick Research Scientist Southeastern Natural Sciences Academy Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Staging area for data before read into R
sorry excel 2003 with no immediate update in the future. On Mon, Oct 20, 2008 at 3:12 PM, Gabor Grothendieck [EMAIL PROTECTED] wrote: You didn't say which version of Excel you are using but Excel 2007 allows 16,384 columns. On Mon, Oct 20, 2008 at 2:27 PM, stephen sefick [EMAIL PROTECTED] wrote: I am wondering if there is a better alternative than Excel for data storage that does not require database knowledge (I will eventually have to learn this, but it is not on my immediate todo list). I need something that is not limited to 256 columns... I don't need any of the built in functions in excel just a spreadsheet like program with cells that hold data in a data.frame format for a staging area before I get it into R. Any help would be greatly appreciated. This is not a direct r question, but all of you folks have more experience than I do and I am having a time finding what I need with google. thanks in advance -- Stephen Sefick Research Scientist Southeastern Natural Sciences Academy Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Stephen Sefick Research Scientist Southeastern Natural Sciences Academy Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Staging area for data before read into R
Define better. Really, it depends on what you need to do (are all your data appropriately represented in a 2D array?) and what resources are available. If all your data can be represented using a 2D array, then Excel is probably your best bet for th enear term. If not, you might as well bite the bullit and learn to use an RDBMS, as there are few other data management options that can cope with relational or hierarchical or object oriented data. I use a number of different RDBMS (ranging from MS SQL to PostgreSQL and MySQL). I also use Excel on occasion, and plain text editors (like Emacs), to create CSV files. Which I use depends on the details of the particular problem I am facing. While I have not yet explored them, I did notice that R includes a number of facilities for editing data (and the list of options is all the longer when I use help.search(edit). It may be a bit quicker for you to study up on basic use of something like PostgreSQL, combined with pl/r (something I wish MySQL had), than it would be to diligently examine all the different options open to you using R. (I have a couple books I could recommend that would likely be sufficient for you to figure out what you need to do with either PostgreSQL or MySQL in a matter of a week or two). HTH Ted stephen sefick wrote: I am wondering if there is a better alternative than Excel for data storage that does not require database knowledge (I will eventually have to learn this, but it is not on my immediate todo list). I need something that is not limited to 256 columns... I don't need any of the built in functions in excel just a spreadsheet like program with cells that hold data in a data.frame format for a staging area before I get it into R. Any help would be greatly appreciated. This is not a direct r question, but all of you folks have more experience than I do and I am having a time finding what I need with google. thanks in advance -- Stephen Sefick Research Scientist Southeastern Natural Sciences Academy Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Staging-area-for-data-before-read-into-R-tp20075962p20078353.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Staging area for data before read into R
Why not just write your data to a CSV (comma-spaced-variable) or a tab-spaced variable text file? You didn't say what software and/or hardware was generating your data, but most gizmos these days let you dump data to CSV. No need for Excel at all. I forget :-( how many rows/columns OpenOffice.org or KOffice can handle. Carl __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Staging area for data before read into R
Hi Stephen, You don't say what staging is - do you mean for data entry or loading a data file for review, or ... ? In general, I keep away from Excel for data transfer purposes. It tends to make intelligent decisions on data types leading to strange bizarre results (unless you explicitly type each column - which most users don't do). Integers are interpreted as dates, high order zeros are stripped off of ZIP codes, and the like. HTH, Jim Porzak TGN.com San Francisco, CA http://www.linkedin.com/in/jimporzak useR Group SF: http://ia.meetup.com/67/ On Mon, Oct 20, 2008 at 11:27 AM, stephen sefick [EMAIL PROTECTED] wrote: I am wondering if there is a better alternative than Excel for data storage that does not require database knowledge (I will eventually have to learn this, but it is not on my immediate todo list). I need something that is not limited to 256 columns... I don't need any of the built in functions in excel just a spreadsheet like program with cells that hold data in a data.frame format for a staging area before I get it into R. Any help would be greatly appreciated. This is not a direct r question, but all of you folks have more experience than I do and I am having a time finding what I need with google. thanks in advance -- Stephen Sefick Research Scientist Southeastern Natural Sciences Academy Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Staging area for data before read into R
There is a list of free spreadsheets with their row and column limits at this link: http://en.wikipedia.org/wiki/OpenOffice.org_Calc On Mon, Oct 20, 2008 at 3:13 PM, stephen sefick [EMAIL PROTECTED] wrote: sorry excel 2003 with no immediate update in the future. On Mon, Oct 20, 2008 at 3:12 PM, Gabor Grothendieck [EMAIL PROTECTED] wrote: You didn't say which version of Excel you are using but Excel 2007 allows 16,384 columns. On Mon, Oct 20, 2008 at 2:27 PM, stephen sefick [EMAIL PROTECTED] wrote: I am wondering if there is a better alternative than Excel for data storage that does not require database knowledge (I will eventually have to learn this, but it is not on my immediate todo list). I need something that is not limited to 256 columns... I don't need any of the built in functions in excel just a spreadsheet like program with cells that hold data in a data.frame format for a staging area before I get it into R. Any help would be greatly appreciated. This is not a direct r question, but all of you folks have more experience than I do and I am having a time finding what I need with google. thanks in advance -- Stephen Sefick Research Scientist Southeastern Natural Sciences Academy Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Stephen Sefick Research Scientist Southeastern Natural Sciences Academy Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Staging area for data before read into R
Well, I am going to type in ever value because the data sheets are of counts of insects that I identified, so I should be okay with accuracy... I really just need something that allows for more than 256 columns as I have encounter over 256 species of insects in even small streams. I think calc with it's 1000ish columns will do the trick... thanks everbody for your help. On Mon, Oct 20, 2008 at 8:25 PM, Gabor Grothendieck [EMAIL PROTECTED] wrote: There is a list of free spreadsheets with their row and column limits at this link: http://en.wikipedia.org/wiki/OpenOffice.org_Calc On Mon, Oct 20, 2008 at 3:13 PM, stephen sefick [EMAIL PROTECTED] wrote: sorry excel 2003 with no immediate update in the future. On Mon, Oct 20, 2008 at 3:12 PM, Gabor Grothendieck [EMAIL PROTECTED] wrote: You didn't say which version of Excel you are using but Excel 2007 allows 16,384 columns. On Mon, Oct 20, 2008 at 2:27 PM, stephen sefick [EMAIL PROTECTED] wrote: I am wondering if there is a better alternative than Excel for data storage that does not require database knowledge (I will eventually have to learn this, but it is not on my immediate todo list). I need something that is not limited to 256 columns... I don't need any of the built in functions in excel just a spreadsheet like program with cells that hold data in a data.frame format for a staging area before I get it into R. Any help would be greatly appreciated. This is not a direct r question, but all of you folks have more experience than I do and I am having a time finding what I need with google. thanks in advance -- Stephen Sefick Research Scientist Southeastern Natural Sciences Academy Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Stephen Sefick Research Scientist Southeastern Natural Sciences Academy Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis -- Stephen Sefick Research Scientist Southeastern Natural Sciences Academy Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Staging area for data before read into R
How about simply a text editor, typing in your data, separated by commas or tabs or spaces? One row for each case/subject/observation? R can read that in easily. A good, open-source, free data entry program is EpiData. www.epidata.dk. It is simple to use but probably more than you need for task. --Chris Christopher W. Ryan, MD SUNY Upstate Medical University Clinical Campus at Binghamton 40 Arch Street, Johnson City, NY 13790 cryanatbinghamtondotedu PGP public keys available at http://home.stny.rr.com/ryancw/ If you want to build a ship, don't drum up the men to gather wood, divide the work and give orders. Instead, teach them to yearn for the vast and endless sea. [Antoine de St. Exupery] stephen sefick wrote: I am wondering if there is a better alternative than Excel for data storage that does not require database knowledge (I will eventually have to learn this, but it is not on my immediate todo list). I need something that is not limited to 256 columns... I don't need any of the built in functions in excel just a spreadsheet like program with cells that hold data in a data.frame format for a staging area before I get it into R. Any help would be greatly appreciated. This is not a direct r question, but all of you folks have more experience than I do and I am having a time finding what I need with google. thanks in advance __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Staging area for data before read into R
Well, I am going to type in ever value because the data sheets are of counts of insects that I identified, so I should be okay with accuracy... I really just need something that allows for more than 256 columns as I have encounter over 256 species of insects in even small streams. ... Oh, ugh, that sounds difficult and prone to entry errors. You might be better off organizing in the 'long' format with 4-5 columns: siteID, subSiteID, species, count, comments You can then reshape(), or use the reshape package, or even the 'pivot table' available in excel and other spreadcheats. Glad you have an answer. Enjoy your day. cur -- Curt Seeliger, Data Ranger Raytheon Information Services - Contractor to ORD [EMAIL PROTECTED] 541/754-4638 [EMAIL PROTECTED] wrote on 10/20/2008 06:01:43 PM: Well, I am going to type in ever value because the data sheets are of counts of insects that I identified, so I should be okay with accuracy... I really just need something that allows for more than 256 columns as I have encounter over 256 species of insects in even small streams. I think calc with it's 1000ish columns will do the trick... thanks everbody for your help. On Mon, Oct 20, 2008 at 8:25 PM, Gabor Grothendieck [EMAIL PROTECTED] wrote: There is a list of free spreadsheets with their row and column limits at this link: http://en.wikipedia.org/wiki/OpenOffice.org_Calc On Mon, Oct 20, 2008 at 3:13 PM, stephen sefick [EMAIL PROTECTED] wrote: sorry excel 2003 with no immediate update in the future. On Mon, Oct 20, 2008 at 3:12 PM, Gabor Grothendieck [EMAIL PROTECTED] wrote: You didn't say which version of Excel you are using but Excel 2007 allows 16,384 columns. On Mon, Oct 20, 2008 at 2:27 PM, stephen sefick [EMAIL PROTECTED] wrote: I am wondering if there is a better alternative than Excel for data storage that does not require database knowledge (I will eventually have to learn this, but it is not on my immediate todo list). I need something that is not limited to 256 columns... I don't need any of the built in functions in excel just a spreadsheet like program with cells that hold data in a data.frame format for a staging area before I get it into R. Any help would be greatly appreciated. This is not a direct r question, but all of you folks have more experience than I do and I am having a time finding what I need with google. thanks in advance -- Stephen Sefick Research Scientist Southeastern Natural Sciences Academy Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project. org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Stephen Sefick Research Scientist Southeastern Natural Sciences Academy Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis -- Stephen Sefick Research Scientist Southeastern Natural Sciences Academy Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.