[R] Programcode and data in the same textfile
Here is a further improvement on sourcing code and data from the same file, namely, the sourced file no longer needs to specify its name and location. (Instead, my.stdin grabs this from the environment within the source command, which is one of its ancestors.) It also occurred to me that the use of my.stdin() does have one potential advantage over stdin(), even assuming that the problem with stdin() not working in sourced files is ultimately addressed in R. In the case where the data is lengthy, it might be desirable to place the data at the end of the code so as not to break it up. The data read by my.stdin() can be placed anywhere in the file. In the example below, the data for x is placed right after the statement which reads in x but the data for y and z are placed at the end of this file. The file and path of the file are no longer explicitly specified. # source the following file from R my.stdin - function( tag, this.file = eval.parent(quote(file),n=3) ) textConnection( sub(tag, , grep(tag,readLines(this.file),value=T)) ) x - read.table( my.stdin(^#x), header=T ) #x Sex Response # this data has a header #x Male 1 #x Male 2 #x Female 3 #x Female 4 y - read.table( my.stdin(^#y) ) z - scan( my.stdin(^#z) ) # -- data #y 3.4 4 # this is first line of y data #y 3 3 #y 6 6 #z 3 5 4 6 7 #z 8 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] Programcode and data in the same textfile
My request for a way of having both data and R-code in the same textfile, resultet in a considerable number of very good suggestions, that I will now summarize. The boundary conditions for the problem were as follows: the data should be written in the textfile in a format that was readable to the human eye. And this ruled out the 'transposed' way of writing the data, that is used in most help-files, eg. in ?model.matrix. As the purpose of the exercise is to make the textfile easy to read, there is a limit to how complicated the extra code should be - otherwise it would make matters worse. I don't know if any of the solutions below qualify in this sense - but I surely learned a lot from them. The most popular idea was using textConnection() in a combination with read.table(). For instance Thomas Hotz wrote it like # Solution by Thomas Hotz MyFrame - read.table(textConnection(c( 'SexRespons', 'Male 1', 'Male 2', 'Female 3', 'Female 4' )), header = T) Gabor Grothendieck had a similar solution. James Holtman provided a nifty trick to get rid of the strategically placed commas and quotations, using escaped carriagereturns, # Solution by James Holtman MyFrame - read.table(textConnection('\ SexRespons \ Male 1 \ Male 2 \ Female 3 \ Female 4 \ '), header = T, skip = 1) Duncan Temple Lang suggested that the entire textfile should be wrapped up as XML, and parsed via the XML package. In the context of me and my students, I think that this would be overkill, and I also think it necessarily breaks the one-file boundary condition, but in a larger context it seems like an excellent advise. # Solution by Duncan Tempel Lang # Content of myFile.q doc data SexResponse Male 1 Male 2 Female 3 Female 4 /data code .. /code /doc To read the data, tr = xmlRoot(xmlTreeParse(myFile.q)) read.table(textConnection(xmlValue(tr[[data]])), header=TRUE) and to access the code text xmlValue(tr[[code]]) A number of approaches not based on textConnection() emerged, though. Torsten Hothorn suggested that the data should be surrounded by some kind of print-statement, writing it to a temporary file. Then read.table() could be used to retrieve the data: # Torsten Hothorns solution: tmpfilename - tempfile() tmpfile - file(tmpfilename, 'w') cat( 'SexRespons', 'Male 1', 'Male 2', 'Female 3', 'Female 4', file = tmpfile, sep='\n') close(tmpfile) read.table(tmpfilename, header = TRUE) Barry Rowlingson suggested that the data should be written as a vector of characters, and then shaped by hand: # Barry Rowlingsons solution data - c( 'Sex', 'Respons', 'Male', 1, 'Female', 2, 'Male', 3, 'Male', 2, ) ncol - 2 nrow - length(data)/ncol heads - data[1:ncol];data - data[-(1:ncol)] asDF - data.frame(matrix(data,ncol=ncol,byrow=T)) asDF[,2] - as.numeric(asDF[,2]) names(asDF) - heads Finally, Thomas Blackwell and Greg Louis implemented a nice idea, where the data are commented out in the textfile, but where a call to read.table() from within the file, makes it read exactly those lines, using a different convention for comments: # Greg Louis' solution MyFrame - read.table('myFile.q', header = T, skip = 28, nrows = 4, comment.char=)[-1] # SexRespons # Male 1 # Male 2 # Female 3 # Female 4 Exactly how lines that will need to be skipped depends on the circumstances. nrows is the number of cases in the dataframe. The original request follows below. Thank you all for participating. Ernst Hansen Department of Statistics University of Copenhagen Ernst Hansen writes: I have the following problem. It is not of earthshaking importance, but still I have spent a considerable amount of time thinking about it. PROBLEM: Is there any way I can have a single textfile that contains both a) data b) programcode The program should act on the data, if the textfile is source()'ed into R. BOUNDARY CONDITION: I want the data written in the textfile in exactly the same format as I would use, if I had data in a separate textfile, to be read by read.table(). That is, with 'horizontal inhomogeneity' and 'vertical homogeneity' in the type of entries. I want to write something like SexRespons Male 1 Male 2 Female 3 Female 4 In effect, I am asking if there is some way I can convince read.table(), that the data is contained in the following n lines of text. ILLEGAL SOLUTIONS: I know I can simulate the behaviour by reading the columns of the dataframe one by one, and using data.frame() to glue them together. Like in data.frame(Sex = c('Male', 'Male',
[R] Programcode and data in the same textfile
I have the following problem. It is not of earthshaking importance, but still I have spent a considerable amount of time thinking about it. PROBLEM: Is there any way I can have a single textfile that contains both a) data b) programcode The program should act on the data, if the textfile is source()'ed into R. BOUNDARY CONDITION: I want the data written in the textfile in exactly the same format as I would use, if I had data in a separate textfile, to be read by read.table(). That is, with 'horizontal inhomogeneity' and 'vertical homogeneity' in the type of entries. I want to write something like SexRespons Male 1 Male 2 Female 3 Female 4 In effect, I am asking if there is some way I can convince read.table(), that the data is contained in the following n lines of text. ILLEGAL SOLUTIONS: I know I can simulate the behaviour by reading the columns of the dataframe one by one, and using data.frame() to glue them together. Like in data.frame(Sex = c('Male', 'Male', 'Female', 'Female'), Respons = c(1, 2, 3, 4)) I do not like this solution, because it represents the data in a transposed way in the textfile, and this transposition makes the structure of the dataframe less transparent - at least to me. It becomes even less comprehensible if the Sex-factor above is written with the help of rep() or gl() or the like. I know I can make read.table() read from stdin, so I could type the dataframe at the prompt. That is against the spirit of the problem, as I describe below. I know I can make read.table() do the job, if I split the data and the programcode in to different files. But as the purpose of the exercise is to distribute the data and the code to other people, splitting into several files is a complication. MOTIVATION: I frequently find myself distributing small chunks of code to my students, along with data on which the code can work. As an example, I might want to demonstrate how model.matrix() treats interactions, in a certain setting. For that I need a dataframe that is complex enough to exhibit the behaviour I want, but still so small that the model.matrix is easily understood. So I make such a dataframe. I am trying to distribute this dataframe along with my code, in a way that is as simple as possible to USE for the students (hence the one-file boundary condition) and to READ (hence the non-transposition boundary condition). Does anybody have any ideas? Ernst Hansen Department of Statistics University of Copenhagen __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: [R] Programcode and data in the same textfile
On 12-Jun-03 Ernst Hansen wrote: I have the following problem. It is not of earthshaking importance, but still I have spent a considerable amount of time thinking about it. PROBLEM: Is there any way I can have a single textfile that contains both a) data b) programcode The program should act on the data, if the textfile is source()'ed into R. BOUNDARY CONDITION: I want the data written in the textfile in exactly the same format as I would use, if I had data in a separate textfile, to be read by read.table(). That is, with 'horizontal inhomogeneity' and 'vertical homogeneity' in the type of entries. I want to write something like SexRespons Male 1 Male 2 Female 3 Female 4 In effect, I am asking if there is some way I can convince read.table(), that the data is contained in the following n lines of text. A thought which occurs to me, which (as far as I can tell) is not already implemented (at any rate in read.table() which is where it could have a natural home) is that, in the same spirit as read,table(file=stdin) one could, if available, use read.table(file= EOT) i.e. the here document style of redirection that has been a part of Unix since approximately forever (if you take the origin of time as 01/01/70 00:00). Then the above data could be read in from within the source file by X-read.table(header=TRUE,file= EOT) SexRespons Male 1 Male 2 Female 3 Female 4 EOT I.e. this form of the command would take input from the following lines until EOT is encountered on a line by itself. In the Unix setup, EOT could be anything so long as it won't occur on a line by itself within the data, and is not included in the content which is read in. Ted, E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 167 1972 Date: 12-Jun-03 Time: 14:21:00 -- XFMail -- __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Programcode and data in the same textfile
On Thu, 12 Jun 2003, Barry Rowlingson wrote: Eurgh! Does R clean up tempfiles by itself? Yes. That's what they are for. It happens on normal exit. -thomas __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Programcode and data in the same textfile
Ernst - Here's a solution which works for me, and seems to do what you want. It's a bit of a hack, since it requires you, the author, to know in advance what file path name the student will have saved the file as. In my example, this will be ./r.source.file, and this includes one blank line before the first assignment statement below. It also requires knowing how many lines of code precede the data lines. But it _is_ a one-file solution, as requested. Put the following 9 or 10 lines into a file named r.source.file, then source it. data.01 - read.table(file=r.source.file, header=T, skip=4, comment.char=)[-1] # junk Sex Response # Male 1 # Male 2 # Female 3 # Female 4 I'm quite surprised no one else has suggested this already. - tom blackwell - u michigan medical school - ann arbor - On Thu, 12 Jun 2003, Ernst Hansen wrote: PROBLEM: Is there any way I can have a single textfile that contains both a) data b) programcode The program should act on the data, if the textfile is source()'ed into R. BOUNDARY CONDITION: I want the data written in the textfile in exactly the same format as I would use, if I had data in a separate textfile, to be read by read.table(). something like SexRespons Male 1 Male 2 Female 3 Female 4 MOTIVATION: I frequently find myself distributing small chunks of code to my students, along with data on which the code can work. As an example, I might want to demonstrate how model.matrix() treats interactions, in a certain setting. For that I need a dataframe that is complex enough to exhibit the behaviour I want, but still so small that the model.matrix is easily understood. So I make such a dataframe. I am trying to distribute this dataframe along with my code, in a way that is as simple as possible to USE for the students (hence the one-file boundary condition) and to READ (hence the non-transposition boundary condition). Ernst Hansen Department of Statistics University of Copenhagen __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Programcode and data in the same textfile
On 20030612 (Thu) at 1139:34 -0400, Thomas W Blackwell wrote: It also requires knowing how many lines of code precede the data lines. But it _is_ a one-file solution, as requested. Put the following 9 or 10 lines into a file named r.source.file, then source it. data.01 - read.table(file=r.source.file, header=T, skip=4, comment.char=)[-1] # junk Sex Response # Male 1 # Male 2 # Female 3 # Female 4 The nrows parameter can help by letting you put the data early in the file: data.01 - read.table(file=r.source.file, header=T, skip=4, nrows=4, comment.char=)[-1] # Sex Response # Male1 # Male2 # Female 3 # Female 4 print(data.01) (more code) (I got an error line 1 did not have 4 elements when I left the junk header in place.) On Thu, 12 Jun 2003, Ernst Hansen wrote: PROBLEM: Is there any way I can have a single textfile that contains both a) data b) programcode The program should act on the data, if the textfile is source()'ed into R. BOUNDARY CONDITION: I want the data written in the textfile in exactly the same format as I would use, if I had data in a separate textfile, to be read by read.table(). something like SexRespons Male 1 Male 2 Female 3 Female 4 Obviously the above doesn't quite meet the requirement, since the data have to be commented out -- but unless someone implements here documents, as another list member suggested, I don't think there's a perfect solution. -- | G r e g L o u i s | gpg public key: finger | | http://www.bgl.nu/~glouis | [EMAIL PROTECTED] | | http://wecanstopspam.org in signatures fights junk email | __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Programcode and data in the same textfile
Thomas W Blackwell writes: Ernst - Here's a solution which works for me, and seems to do what you want. It's a bit of a hack, since it requires you, the author, to know in advance what file path name the student will have saved the file as. In my example, this will be ./r.source.file, and this includes one blank line before the first assignment statement below. It also requires knowing how many lines of code precede the data lines. But it _is_ a one-file solution, as requested. Put the following 9 or 10 lines into a file named r.source.file, then source it. data.01 - read.table(file=r.source.file, header=T, skip=4, comment.char=)[-1] # junk Sex Response # Male 1 # Male 2 # Female 3 # Female 4 I'm quite surprised no one else has suggested this already. Nice thinking , Thomas, and good fun indeed. To take this slightly further, we can hack the history mechanism to read off the name of the file being sourced. If the following lines MyHistory - function() { ## basically the first few lines of history() file1 - tempfile(Rrawhist) savehistory(file1) rawhist - scan(file1, what = , quiet = TRUE, sep = \n) unlink(file1) rawhist[length(rawhist)] } cat(strsplit(strsplit(MyHistory(), 'source\\(')[[1]][2],'\\)')[[1]][1], '\n') are placed in the file foo.q, then the call source('foo.q') will produce as output 'foo.q' on the terminal. Instead of writing it out, it could be piped into read.table(), and by careful linecounting, it could be combined with your idea of reading lines, that are commented out in the 'real reading' of the file. Then it indeed does what I wanted to do. Though my students would be horrified...:-) And, of course, if it is allowed to write the history to a temporary file and read it again, we might as well write the data to a temporary file, as has already been suggested by Torsten Hothorn. Ernst __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Programcode and data in the same textfile
This is not a valid solution: R does not necessarily have a history mechanism operational. But if it did, you could use history() not savehistory(). Does no one ever read the help pages? On Thu, 12 Jun 2003, Ernst Hansen wrote: Thomas W Blackwell writes: Ernst - Here's a solution which works for me, and seems to do what you want. It's a bit of a hack, since it requires you, the author, to know in advance what file path name the student will have saved the file as. In my example, this will be ./r.source.file, and this includes one blank line before the first assignment statement below. It also requires knowing how many lines of code precede the data lines. But it _is_ a one-file solution, as requested. Put the following 9 or 10 lines into a file named r.source.file, then source it. data.01 - read.table(file=r.source.file, header=T, skip=4, comment.char=)[-1] # junk Sex Response # Male 1 # Male 2 # Female 3 # Female 4 I'm quite surprised no one else has suggested this already. Nice thinking , Thomas, and good fun indeed. To take this slightly further, we can hack the history mechanism to read off the name of the file being sourced. If the following lines MyHistory - function() { ## basically the first few lines of history() file1 - tempfile(Rrawhist) savehistory(file1) rawhist - scan(file1, what = , quiet = TRUE, sep = \n) unlink(file1) rawhist[length(rawhist)] } cat(strsplit(strsplit(MyHistory(), 'source\\(')[[1]][2],'\\)')[[1]][1], '\n') are placed in the file foo.q, then the call source('foo.q') will produce as output 'foo.q' on the terminal. Instead of writing it out, it could be piped into read.table(), and by careful linecounting, it could be combined with your idea of reading lines, that are commented out in the 'real reading' of the file. Then it indeed does what I wanted to do. Though my students would be horrified...:-) And, of course, if it is allowed to write the history to a temporary file and read it again, we might as well write the data to a temporary file, as has already been suggested by Torsten Hothorn. Ernst __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help