[R] Programcode and data in the same textfile

2003-06-16 Thread Gabor Grothendieck
Here is a further improvement on sourcing code and 
data from the same file, namely, the sourced file
no longer needs to specify its name and location.
(Instead, my.stdin grabs this from the environment
within the source command, which is one of its 
ancestors.)

It also occurred to me that the use of my.stdin() 
does have one potential advantage over stdin(), 
even assuming that the problem with stdin() not 
working in sourced files is ultimately addressed in R.
In the case where the data is lengthy, it might be 
desirable to place the data at the end of the code 
so as not to break it up.  The data read by my.stdin() 
can be placed anywhere in the file.  

In the example below, the data for x is placed right 
after the statement which reads in x but the data for 
y and z are placed at the end of this file.  The file 
and path of the file are no longer explicitly specified.


# source the following file from R

my.stdin - function( tag, this.file = eval.parent(quote(file),n=3) )
  textConnection( sub(tag, , grep(tag,readLines(this.file),value=T)) )

x - read.table( my.stdin(^#x), header=T )
#x Sex Response # this data has a header
#x Male 1 
#x Male 2 
#x Female 3 
#x Female 4 

y - read.table( my.stdin(^#y) )
z - scan( my.stdin(^#z) )

# -- data

#y 3.4 4  # this is first line of y data
#y 3 3 
#y 6 6 

#z 3 5 4 6 7
#z 8

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] Programcode and data in the same textfile

2003-06-13 Thread Ernst Hansen
My request for a way of having both data and R-code in the same
textfile, resultet in a considerable number of very good suggestions,
that I will now summarize.

The boundary conditions for the problem were as follows: the data
should be written in the textfile in a format that was readable to the
human eye. And this ruled out the 'transposed' way of writing the
data, that is used in most help-files, eg. in ?model.matrix.

As the purpose of the exercise is to make the textfile easy to read,
there is a limit to how complicated the extra code should be -
otherwise it would make matters worse.   I don't know if any of the
solutions below qualify in this sense - but I surely learned a lot
from them.






The most popular idea was using textConnection() in a combination with
read.table().  For instance Thomas Hotz wrote it like


# Solution by Thomas Hotz

   MyFrame - read.table(textConnection(c(

'SexRespons',
'Male   1',
'Male   2',
'Female 3',
'Female 4'

   )), header = T)


Gabor Grothendieck had a similar solution. James Holtman provided a
nifty trick to get rid of the strategically placed commas and
quotations, using escaped carriagereturns,

# Solution by James Holtman

   MyFrame - read.table(textConnection('\
SexRespons \
Male   1 \
Male   2 \
Female 3 \
Female 4 \
   '), header = T, skip = 1)
 

Duncan Temple Lang suggested that the entire textfile should be
wrapped up as XML, and parsed via the XML package.  In the context of
me and my students, I think that this would be overkill, and I also
think it necessarily breaks the one-file boundary condition, but in a
larger context it seems like an excellent advise.

# Solution by Duncan Tempel Lang

# Content of myFile.q
   doc
   data
SexResponse
Male   1
Male   2
Female 3
Female 4
   /data

   code
..
   /code
   /doc

To read the data,

 tr = xmlRoot(xmlTreeParse(myFile.q))
 read.table(textConnection(xmlValue(tr[[data]])), header=TRUE)

and to access the code text

 xmlValue(tr[[code]])




A number of approaches not based on textConnection() emerged, though.

Torsten Hothorn suggested that the data should be surrounded by some
kind of  print-statement, writing it to a temporary file.  Then
read.table() could be used to retrieve the data:

# Torsten Hothorns solution:

  tmpfilename - tempfile()
  tmpfile - file(tmpfilename, 'w')
  cat(
  'SexRespons',
  'Male   1',
  'Male   2',
  'Female 3',
  'Female 4',
  file = tmpfile, sep='\n')
  close(tmpfile)
  read.table(tmpfilename, header = TRUE)



Barry Rowlingson suggested that the data should be written as a vector
of characters, and then shaped by hand:

# Barry Rowlingsons solution

   data - c(

 'Sex', 'Respons',
 'Male',   1,
 'Female', 2,
 'Male',   3,
 'Male',   2,

 )

   ncol - 2
   nrow - length(data)/ncol

   heads - data[1:ncol];data - data[-(1:ncol)]
   asDF - data.frame(matrix(data,ncol=ncol,byrow=T))

   asDF[,2] - as.numeric(asDF[,2])
   names(asDF) - heads


Finally, Thomas Blackwell and Greg Louis implemented a nice idea,
where the data are commented out in the textfile, but where a call to
read.table() from within the file, makes it read exactly those lines,
using a different convention for comments:

# Greg Louis' solution

   MyFrame - read.table('myFile.q', header = T, 
  skip = 28, nrows = 4, comment.char=)[-1]
   # SexRespons 
   # Male   1 
   # Male   2 
   # Female 3 
   # Female 4 

Exactly how lines that will need to be skipped depends on the
circumstances. nrows is the number of cases in the dataframe. 
 


The original request follows below.

Thank you all for participating.


Ernst Hansen
Department of Statistics
University of Copenhagen




Ernst Hansen writes:
  I have the following problem.  It is not of earthshaking importance,
  but still I have spent a considerable amount of time thinking about
  it. 
  
  PROBLEM: Is there any way I can have a single textfile that contains
  both
  
a) data
  
b) programcode
  
  The program should act on the data, if the textfile is source()'ed
  into R.
  
  
  BOUNDARY CONDITION: I want the data written in the textfile in exactly
  the same format as I would use, if I had data in a separate textfile,
  to be read by read.table().  That is, with 'horizontal inhomogeneity'
  and 'vertical homogeneity' in the type of entries.  I want to write
  something like 
  
SexRespons
Male   1
Male   2
Female 3
Female 4
  
  In effect, I am asking if there is some way I can convince
  read.table(), that the data is contained in the following n lines of
  text. 
  
  
  ILLEGAL SOLUTIONS:
  I know I can simulate the behaviour by reading the columns of the
  dataframe one by one, and using data.frame() to glue them together.
  Like in 
  
  data.frame(Sex = c('Male', 'Male', 

[R] Programcode and data in the same textfile

2003-06-12 Thread Ernst Hansen
I have the following problem.  It is not of earthshaking importance,
but still I have spent a considerable amount of time thinking about
it. 

PROBLEM: Is there any way I can have a single textfile that contains
both

  a) data

  b) programcode

The program should act on the data, if the textfile is source()'ed
into R.


BOUNDARY CONDITION: I want the data written in the textfile in exactly
the same format as I would use, if I had data in a separate textfile,
to be read by read.table().  That is, with 'horizontal inhomogeneity'
and 'vertical homogeneity' in the type of entries.  I want to write
something like 

  SexRespons
  Male   1
  Male   2
  Female 3
  Female 4

In effect, I am asking if there is some way I can convince
read.table(), that the data is contained in the following n lines of
text. 


ILLEGAL SOLUTIONS:
I know I can simulate the behaviour by reading the columns of the
dataframe one by one, and using data.frame() to glue them together.
Like in 

data.frame(Sex = c('Male', 'Male', 'Female', 'Female'),
   Respons = c(1, 2, 3, 4))

I do not like this solution, because it represents the data in a
transposed way in the textfile, and this transposition makes the
structure of the dataframe less transparent - at least to me. It
becomes even less comprehensible if the Sex-factor above is written
with the help of rep() or gl() or the like.

I know I can make read.table() read from stdin, so I could type the
dataframe at the prompt.  That is against the spirit of the problem,
as I describe below.


I know I can make read.table() do the job, if I split the data and the
programcode in to different files.  But as the purpose of the exercise
is to distribute the data and the code to other people, splitting
into several files is a complication.


MOTIVATION: I frequently find myself distributing small chunks of code
to my students, along with data on which the code can work.

As an example, I might want to demonstrate how model.matrix() treats
interactions, in a certain setting.  For that I need a dataframe that
is complex enough to exhibit the behaviour I want, but still so small
that the model.matrix is easily understood.  So I make such a
dataframe.

I am trying to distribute this dataframe along with my code, in a way
that is as simple as possible to USE for the students (hence the
one-file boundary condition) and to READ (hence the non-transposition
boundary condition).



Does anybody have any ideas?


Ernst Hansen
Department of Statistics
University of Copenhagen

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: [R] Programcode and data in the same textfile

2003-06-12 Thread Ted Harding
On 12-Jun-03 Ernst Hansen wrote:
 I have the following problem.  It is not of earthshaking importance,
 but still I have spent a considerable amount of time thinking about
 it. 
 
 PROBLEM: Is there any way I can have a single textfile that contains
 both
 
   a) data
 
   b) programcode
 
 The program should act on the data, if the textfile is source()'ed
 into R.
 
 
 BOUNDARY CONDITION: I want the data written in the textfile in exactly
 the same format as I would use, if I had data in a separate textfile,
 to be read by read.table().  That is, with 'horizontal inhomogeneity'
 and 'vertical homogeneity' in the type of entries.  I want to write
 something like 
 
   SexRespons
   Male   1
   Male   2
   Female 3
   Female 4
 
 In effect, I am asking if there is some way I can convince
 read.table(), that the data is contained in the following n lines of
 text. 

A thought which occurs to me, which (as far as I can tell) is not
already implemented (at any rate in read.table() which is where it
could have a natural home) is that, in the same spirit as

  read,table(file=stdin)

one could, if available, use

  read.table(file= EOT)

i.e. the here document style of redirection that has been a part
of Unix since approximately forever (if you take the origin of time
as 01/01/70 00:00). Then the above data could be read in from within
the source file by

  X-read.table(header=TRUE,file= EOT)
  SexRespons
  Male   1
  Male   2
  Female 3
  Female 4
  EOT

I.e. this form of the command would take input from the following
lines until EOT is encountered on a line by itself. In the Unix
setup, EOT could be anything so long as it won't occur on a line by
itself within the data, and is not included in the content which is
read in.

Ted,



E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 167 1972
Date: 12-Jun-03   Time: 14:21:00
-- XFMail --

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Programcode and data in the same textfile

2003-06-12 Thread Thomas Lumley
On Thu, 12 Jun 2003, Barry Rowlingson wrote:


   Eurgh! Does R clean up tempfiles by itself?

Yes.  That's what they are for.  It happens on normal exit.

-thomas

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Programcode and data in the same textfile

2003-06-12 Thread Thomas W Blackwell
Ernst  -

Here's a solution which works for me, and seems to do
what you want.  It's a bit of a hack, since it requires
you, the author, to know in advance what file path name
the student will have saved the file as.  In my example,
this will be ./r.source.file, and this includes one
blank line before the first assignment statement below.

It also requires knowing how many lines of code precede
the data lines.  But it _is_ a one-file solution, as
requested.  Put the following 9 or 10 lines into a
file named r.source.file, then source it.

data.01 - read.table(file=r.source.file, header=T,
skip=4, comment.char=)[-1]

 # junk Sex  Response
#   Male 1
#   Male 2
#   Female   3
#   Female   4


I'm quite surprised no one else has suggested this already.

-  tom blackwell  -  u michigan medical school  -  ann arbor  -

On Thu, 12 Jun 2003, Ernst Hansen wrote:

 PROBLEM: Is there any way I can have a single textfile that contains both
  a) data   b) programcode
 The program should act on the data, if the textfile is source()'ed
 into R.

 BOUNDARY CONDITION: I want the data written in the textfile in exactly
 the same format as I would use, if I had data in a separate textfile,
 to be read by read.table().   something like

   SexRespons
   Male   1
   Male   2
   Female 3
   Female 4

 MOTIVATION: I frequently find myself distributing small chunks of code
 to my students, along with data on which the code can work.

 As an example, I might want to demonstrate how model.matrix() treats
 interactions, in a certain setting.  For that I need a dataframe that
 is complex enough to exhibit the behaviour I want, but still so small
 that the model.matrix is easily understood.  So I make such a dataframe.

 I am trying to distribute this dataframe along with my code, in a way
 that is as simple as possible to USE for the students (hence the
 one-file boundary condition) and to READ (hence the non-transposition
 boundary condition).

 Ernst Hansen
 Department of Statistics
 University of Copenhagen

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Programcode and data in the same textfile

2003-06-12 Thread Greg Louis
On 20030612 (Thu) at 1139:34 -0400, Thomas W Blackwell wrote:

 It also requires knowing how many lines of code precede
 the data lines.  But it _is_ a one-file solution, as
 requested.  Put the following 9 or 10 lines into a
 file named r.source.file, then source it.
 
 data.01 - read.table(file=r.source.file, header=T,
   skip=4, comment.char=)[-1]
 
  # junk Sex  Response
 #   Male 1
 #   Male 2
 #   Female   3
 #   Female   4
 

The nrows parameter can help by letting you put the data early in the
file:

data.01 - read.table(file=r.source.file, header=T,
skip=4, nrows=4, comment.char=)[-1]

#   Sex Response
#   Male1
#   Male2
#   Female  3
#   Female  4

print(data.01)
(more code)

(I got an error line 1 did not have 4 elements when I left the
junk header in place.)

 On Thu, 12 Jun 2003, Ernst Hansen wrote:
 
  PROBLEM: Is there any way I can have a single textfile that contains both
   a) data   b) programcode
  The program should act on the data, if the textfile is source()'ed
  into R.
 
  BOUNDARY CONDITION: I want the data written in the textfile in exactly
  the same format as I would use, if I had data in a separate textfile,
  to be read by read.table().   something like
 
SexRespons
Male   1
Male   2
Female 3
Female 4

Obviously the above doesn't quite meet the requirement, since the data
have to be commented out -- but unless someone implements here
documents, as another list member suggested, I don't think there's a
perfect solution.

-- 
| G r e g  L o u i s  | gpg public key: finger |
|   http://www.bgl.nu/~glouis |   [EMAIL PROTECTED] |
| http://wecanstopspam.org in signatures fights junk email |

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Programcode and data in the same textfile

2003-06-12 Thread Ernst Hansen
Thomas W Blackwell writes:
  Ernst  -
  
  Here's a solution which works for me, and seems to do
  what you want.  It's a bit of a hack, since it requires
  you, the author, to know in advance what file path name
  the student will have saved the file as.  In my example,
  this will be ./r.source.file, and this includes one
  blank line before the first assignment statement below.
  
  It also requires knowing how many lines of code precede
  the data lines.  But it _is_ a one-file solution, as
  requested.  Put the following 9 or 10 lines into a
  file named r.source.file, then source it.
  
  data.01 - read.table(file=r.source.file, header=T,
   skip=4, comment.char=)[-1]
  
   # junk Sex  Response
  #   Male 1
  #   Male 2
  #   Female   3
  #   Female   4
  
  
  I'm quite surprised no one else has suggested this already.
  


Nice thinking , Thomas, and good fun indeed.  To take this slightly
further, we can hack the history mechanism to read off the name of the
file being sourced.  If the following lines

  MyHistory - function() {
## basically the first few lines of history()

file1 - tempfile(Rrawhist)
savehistory(file1)
rawhist - scan(file1, what = , quiet = TRUE, sep = \n)
unlink(file1)
rawhist[length(rawhist)]
}

  cat(strsplit(strsplit(MyHistory(),
  'source\\(')[[1]][2],'\\)')[[1]][1], '\n')

are placed in the file foo.q, then the call

 source('foo.q')

will produce as output

  'foo.q' 

on the terminal.  Instead of writing it out, it could be piped into
read.table(), and by careful linecounting, it could be combined with
your idea of reading lines, that are commented out in the 'real
reading' of the file.  


Then it indeed does what I wanted to do.  Though my students would 
be horrified...:-)

And, of course, if it is allowed to write the history to a temporary
file and read it again, we might as well write the data to a temporary
file, as has already been suggested by Torsten Hothorn.

Ernst

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Programcode and data in the same textfile

2003-06-12 Thread Prof Brian Ripley
This is not a valid solution: R does not necessarily have a history 
mechanism operational.  But if it did, you could use history() not
savehistory().

Does no one ever read the help pages?

On Thu, 12 Jun 2003, Ernst Hansen wrote:

 Thomas W Blackwell writes:
   Ernst  -
   
   Here's a solution which works for me, and seems to do
   what you want.  It's a bit of a hack, since it requires
   you, the author, to know in advance what file path name
   the student will have saved the file as.  In my example,
   this will be ./r.source.file, and this includes one
   blank line before the first assignment statement below.
   
   It also requires knowing how many lines of code precede
   the data lines.  But it _is_ a one-file solution, as
   requested.  Put the following 9 or 10 lines into a
   file named r.source.file, then source it.
   
   data.01 - read.table(file=r.source.file, header=T,
  skip=4, comment.char=)[-1]
   
# junk Sex  Response
   #   Male 1
   #   Male 2
   #   Female   3
   #   Female   4
   
   
   I'm quite surprised no one else has suggested this already.
   
 
 
 Nice thinking , Thomas, and good fun indeed.  To take this slightly
 further, we can hack the history mechanism to read off the name of the
 file being sourced.  If the following lines
 
   MyHistory - function() {
 ## basically the first few lines of history()
 
 file1 - tempfile(Rrawhist)
 savehistory(file1)
 rawhist - scan(file1, what = , quiet = TRUE, sep = \n)
 unlink(file1)
 rawhist[length(rawhist)]
 }
 
   cat(strsplit(strsplit(MyHistory(),
   'source\\(')[[1]][2],'\\)')[[1]][1], '\n')
 
 are placed in the file foo.q, then the call
 
  source('foo.q')
 
 will produce as output
 
   'foo.q' 
 
 on the terminal.  Instead of writing it out, it could be piped into
 read.table(), and by careful linecounting, it could be combined with
 your idea of reading lines, that are commented out in the 'real
 reading' of the file.  
 
 
 Then it indeed does what I wanted to do.  Though my students would 
 be horrified...:-)
 
 And, of course, if it is allowed to write the history to a temporary
 file and read it again, we might as well write the data to a temporary
 file, as has already been suggested by Torsten Hothorn.
 
 Ernst
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help