Re: [R] Help with read.csv2

2005-11-17 Thread Marc Schwartz (via MN)
On Thu, 2005-11-17 at 17:09 +0100, Matthieu Cornec wrote:
 Hello,
  I am importing the following file
   ;aa;bb;cc
 1988;12;12;12
 1989;78;78;12
 1990;78;78;12
 1991;78;78;12
 1992;78;78;12
 1993;78;78;12
 1994;78;78;12
 
 data-read.csv2(test.csv,header=T)
 --
 it gives
   X aa bb cc
 1 1988 12 12 12
 2 1989 78 78 12
 3 1990 78 78 12
 4 1991 78 78 12
 5 1992 78 78 12
 6 1993 78 78 12
 7 1994 78 78 12
  How do I get :
 
  aa bb cc
 1988 12 12 12
 1989 78 78 12
 1990 78 78 12
 1991 78 78 12
 1992 78 78 12
 1993 78 78 12
 1994 78 78 12
 

Are you indicating that you want the years to be the rownames and NOT a
column in the data frame?

If so, use the 'row.names' argument to indicate that the first column of
the incoming data file contains the rownames:

 dat - read.csv2(clipboard, row.names = 1)

 dat
 aa bb cc
1988 12 12 12
1989 78 78 12
1990 78 78 12
1991 78 78 12
1992 78 78 12
1993 78 78 12
1994 78 78 12

 str(dat)
`data.frame':   7 obs. of  3 variables:
 $ aa: int  12 78 78 78 78 78 78
 $ bb: int  12 78 78 78 78 78 78
 $ cc: int  12 12 12 12 12 12 12

 rownames(dat)
[1] 1988 1989 1990 1991 1992 1993 1994



Note that you are getting the X in your initial example above, since
you are missing the first column name in the text file. A better
approach here might be (depending upon your intent) to use the
'col.names' argument to specify the column names explicitly:

 dat - read.csv2(clipboard, header = TRUE, 
col.names = c(Years, aa, bb, cc))

 dat
  Years aa bb cc
1  1988 12 12 12
2  1989 78 78 12
3  1990 78 78 12
4  1991 78 78 12
5  1992 78 78 12
6  1993 78 78 12
7  1994 78 78 12

This way, you are defining the first column name during the import and
your 'Years' are available as a data column. It all depends upon what
you want to do with the data at this point.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] invert y-axis in barplot

2005-11-16 Thread Marc Schwartz (via MN)
On Wed, 2005-11-16 at 16:46 +, Jörg Schlingemann wrote:
 Hi!
 
  
 
 This is probably a very trivial question. Is there an easy way to invert the
 y-axis (low values on top) when using the function barplot()? 
 
  
 
 Thanks,
 
 Jrg

You mean something like this?:

  barplot(1:10, ylim = rev(c(0, 12)))

or, something like this?:

  barplot(1:10, yaxt = n) 
  axis(2, labels = rev(seq(0, 10, 2)), at = seq(0, 10, 2))


Note the use of rev() in each case.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] bug/feature with barplot?

2005-11-14 Thread Marc Schwartz (via MN)
On Mon, 2005-11-14 at 15:55 +0100, Karin Lagesen wrote:
 I have found a bug/feature with barplot that at least to me shows
 undesireable behaviour. When using barplot and plotting fewer
 groups/levels/factors(I am unsure what they are called) than the number
 of colors stated in a col statement, the colors wrap around such that
 the colors are not fixed to one group. This is mostly problematic when
 I make R figures using scripts, since I sometimes have empty input
 groups. I have in these cases experienced labeling the empty group as
 red, and then seeing a bar being red when that bar is actually from a
 different group.
 
 Reproducible example (I hope):
 
 barplot(VADeaths, beside=TRUE, col=c(red, green, blue, yellow, 
 black))
 barplot(VADeaths[1:4,], beside=TRUE, col=c(red, green, blue, yellow, 
 black))
 
 Now, I don't know if this is a bug or a feature, but it sure bugged me...:)
 
 Karin

Most definitely not a bug.

As with many vectorized function arguments, they will be recycled as
required to match the length of other appropriate arguments.

In this case, the number of colors (5) does not match the number of
groups (4). Thus, they are out of synch with each other and you get
the result you have.

Not unexpected behavior.

You should adjust your code and the function call so that the number of
groups matches the number of colors. Something along the lines of the
following:

col - c(red, green, blue, yellow, black)
no.groups - 4
barplot(VADeaths[1:no.groups, ], beside = TRUE, col = col[1:no.groups])


Now try:

 no.groups - 5
 barplot(VADeaths[1:no.groups, ], beside = TRUE, col = col[1:no.groups])

 no.groups - 3
 barplot(VADeaths[1:no.groups, ], beside = TRUE, col = col[1:no.groups])


HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Coercion of percentages by as.numeric

2005-11-14 Thread Marc Schwartz (via MN)
On Mon, 2005-11-14 at 19:07 +0200, Brandt, T. (Tobias) wrote:
  
 -Original Message-
 From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] 
 Sent: 14 November 2005 06:21 PM
 
 On 11/14/05, Brandt, T. (Tobias) [EMAIL PROTECTED] wrote:
  Hi
 
  Given that things like the following work
 
a - c(-.1, 2.7 ,B)
   a
  [1] -.12.7  B
   as.numeric(a)
  [1] -0.1  2.7   NA
  Warning message:
  NAs introduced by coercion
  
 
  I naively expected that the following would behave differently.
 
b - c('10%', '-20%', '30.0%', '.40%')
   b
  [1] 10%   -20%  30.0% .40%
   as.numeric(b)
  [1] NA NA NA NA
  Warning message:
  NAs introduced by coercion
 
 Try this:
 
 as.numeric(sub(%, e-2, b))
 
 
 Thank you, that accomplishes what I had intended.
 
 I would have thought though that the expression 53% would be a fairly
 standard representation of the number 0.53 and might be handled as such.  Is
 there a specific reason for avoiding this behaviour?  

53% is a 'shorthand' character representation of a mathematical
concept. To wit, the specific representation of a fraction using 100 as
the denominator (ie. 53 / 100). The symbol '%' can be replaced by the
word percent, such as 53 percent, which is also a character
representation.

0.53, in context, is a numeric representation of a proportion in the
range of 0 - 1.0.

 I can imagine that it might add unnecessary overhead to routines like
 as.numeric which one would like to keep as fast as possible.
 
 Perhaps there are other areas though where it might be desirable?  For
 example I'm thinking of the read.table function for reading in csv files
 since I have many of these that have been saved from excel and now contain
 numbers in the % format.

In Excel, numbers displayed with a '%' are what you see visually.
However, the internal representation (how the value is actually stored
in the program) is still as a floating point value, without the '%'. 

For example:

 a - 53
 a
[1] 53

 sprintf(%.0f%%, a)
[1] 53%

 is.numeric(a)
[1] TRUE

 is.numeric(sprintf(%.0f%%, a))
[1] FALSE


Unfortunately (depending upon your perspective), Excel, and other
similar programs, tend to export the visually displayed values and not
the internal representations of them. Thus, as Gabor pointed out, you
will need to do some 'editing' of the values before using them in R. You
can either do this in Excel, by removing the % formatting, or
post-import in R as Gabor has described.

You need to keep separate the internal representation of a value and its
printed or displayed representation for human readable consumption.

as.numeric() does basically one thing and it does it well and properly.
It is up to the user to ensure that it is passed the proper values. When
that is not the case, it issues an appropriate warning message and
returns NA.

Of course, using Gabor's hint, you can also write your own variation of
as.numeric(), creating a function that takes percent formatted values
and converts them as you require. One of the many strengths of R, is
that you can extend it to meet your own specific requirements when the
base functions do not.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] as.integer with base other than ten.

2005-11-14 Thread Marc Schwartz (via MN)
On Mon, 2005-11-14 at 19:01 +, William Astle wrote:
 Is there an R function analogous to the C function strtol? 
 I would like to convert a binary string into an integer.
 
 Cheers for advice
 
 Will


There was some discussion in the past and you might want to search the
archive for a more generic solution for any base to any base, but for
binary to decimal specifically, something like the following will work:

bin2dec - function(x)
{
  b - as.numeric(unlist(strsplit(x, )))
  pow - 2 ^ ((length(b) - 1):0)
  sum(pow[b == 1])
}


The function takes the binary string and splits it up into individual
numbers ('b'). It then creates a vector of powers of 2 as long as 'b'
less one through 0 ('pow').  It then takes the sum of the values of pow,
indexed by 'b == 1'.


 bin2dec(101)
[1] 5

 bin2dec()
[1] 15

 bin2dec(101)
[1] 95


HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Orientation of tickmarks labels in boxplot/plot

2005-11-02 Thread Marc Schwartz (via MN)
On Wed, 2005-11-02 at 16:06 -0600, Michal Lijowski wrote:
 Hi,
 
 I have been trying draw tickmark labels along
 the y - axis perpendicular to the y axis while
 labels along the x - axis parallel to x axis
 while making box plot.
 
 Here is my test dataset.
 
  TData
ID Ratio
 1   0 7.075
 2   0 7.414
 3   0 7.403
 4   0 7.168
 5   0 6.820
 6   0 7.294
 7   0 7.238
 8   0 7.938
 9   1 7.708
 10  1 8.691
 11  1 8.714
 12  1 8.066
 13  1 8.949
 14  1 8.590
 15  1 8.714
 16  1 8.601
 
  boxplot(Ratio ~ ID, data=TData)
 
 makes box plot with tickmark labels parallel to the y - axis.
 So I try 
 
 boxplot(Ratio ~ ID, data=TData, axes=FALSE)
 par(las=0)
 axis(1)
 and I get x - axis ranging from 0.5 to 2.5 (why?) and 
 boxes at 1 and 2.
 par(las=2)
 axis(2)
 box()
 So, if I set tickmark labels parallel to y - axis
 somehow the x - axis range is not what I expect even
 if I use xlim = c(0.0, 3.0)
  in boxplot(Ratio ~ Id, data=TData, axes=FALSE, xlim=c(0.0, 3.0))
  par(las=0)
  axis(1)
 
 Plots are in the attachments in pdf format.
 
 I appreciate any tips.
 
 I am using R 2.2.0 (2005-10-06) on FC4.
 
 Michal

I suspect that you want this:

  boxplot(Ratio ~ ID, data=TData, las = 1)

so that the x and y axis labels are horizontal. See ?par and review the
options for 'las'. '1' is for axis tick mark labels to be horizontal.

The x axis (horizontal axis if the boxes are vertical, the vertical axis
if the boxes are horizontal) is set by default to the number of boxes
(groups) +/- 0.5.  'xlim' is ignored in both cases.

The boxes themselves are placed at integer values from 1:N, where N is
the number of groups.

There is the 'at' argument and there is an example of its use
in ?boxplot. You can also search the archives for posts where variations
on the use of 'at' have been posted (recently, in fact...)

The actual plotting of the boxplots is done by bxp(), so review the help
for that function as well.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] multiple boxplots

2005-10-28 Thread Marc Schwartz (via MN)
On Fri, 2005-10-28 at 10:26 -0700, J.M. Breiwick wrote:
 Hello,
 
 I want to plot 3 boxplots [ par(mfrow=c(3,1)) ] but the first one has 8 
 groups, the 2nd has 7 and the third has 6. But I the groups to line up:
 
  1 2 3 4 5 6 7 8
2 3 4 5 6 7 8
   3 4 5 6 7 8
 
 where the numbers actually refer to years. I suspect I have to manipulate 
 the function bxp(). Is there a relatively simple way to do this? Thanks for 
 any help.
 
 Jeff Breiwick
 NMFS, Seattle


Here is one approach. It is based upon using 2 principal concepts:

1. Create the first plot, save the plot region ranges from par(usr)
and then use this information for the two subsequent plots, where we use
the 'add = TRUE' argument.

2. Use the 'at' argument to specify the placement of the boxes in the
second and third plots to line up with the first.


So:

# Create our data.
dat - rnorm(80)
years - rep(1991:1998, each = 10)

# MyDF will be a data frame with two columns and we will use 
# this in the formula method for boxplot.
# The second and third plots will use subset()s of MyDF
MyDF - cbind(dat, years)

# Set the plot matrix
par(mfrow = c(3, 1))

# Create the first boxplot and save par(usr)
boxplot(dat ~ years, MyDF)
usr - par(usr)

# Now open a new plot, setting it's par(usr) to 
# match the first plot.
# Then use boxplot and set the boxes 'at' x pos 2:8
# and add it to the open plot
plot.new()
par(usr = usr)
boxplot(dat ~ years, subset(MyDF, years %in% 1992:1998),
at = 2:8, add = TRUE)


# Rinse and repeat  ;-)
# Different subset and use 3:8 for 'at'
plot.new()
par(usr = usr)
boxplot(dat ~ years, subset(MyDF, years %in% 1993:1998),
at = 3:8, add = TRUE)



Replace MyDF in the above with your actual datasets of course.

This would of course be a bit easier if one could set an 'xlim' argument
in boxplot(), but this is ignored by bxp().

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] encrypted RData file?

2005-10-27 Thread Marc Schwartz (via MN)
On Thu, 2005-10-27 at 16:15 -0500, Na Li wrote:
 On 27 Oct 2005, Duncan Temple Lang wrote:
 
  Yes, it is of interest and was sitting on my todo list at
  some time.  If you want to go ahead and provide code to do it,
  that would be terrific.  There are other areas where encryption
  would be good to have, so a general mechanism would be nice.
  
  D.
  
  Na Li wrote:
   Hi, I wonder if there is interest/intention to allow for encrypted .RData
   files?  One can certainly do that outside R manually but that will leave a
   decrypted RData file somewhere which one has to remember to delete.
   
 
 I was hoping someone has already done it.  ;-(
 
 One possibility is to implement an interface package to gpgme library which
 itself is an interface to GnuPG.  
 
 But I'm not sure how the input of passphrase can be handled without using
 clear text.
 
 Michael

Seems to me that a better option would be to encrypt the full partition
such that (unless you write the files to a non-encrypted partition)
these issues are transparent. This would include the use of save(),
save.image() and write() type functions to save what was an encrypted
dataset/object to a unencrypted file.

Of course, you would also have to encrypt the swap and tmp partitions
(as appropriate) for similar reasons.

On Linuxen/Unixen, full encryption of partitions is available via
loopback devices and other mechanisms and some distros have this
available as a built-in option. I believe that the FC folks finally have
this on their list of functional additions for FC5. Windows of course
can do something similar.

The other consideration here, is that if R Core builds in some form of
encryption, there is the potential for import/export restrictions on
such technology since R is available via international CRAN mirrors. It
may be best to provide for a plug-in encryption black box of sorts, so
that folks can use a particular encryption schema that meets various
legal/regulatory requirements.

Of course, simply encrypting the file or even a complete partition has
to be considered within a larger security strategy (ie. network
security, physical access control, etc.) that meets a particular
functional requirement (such as HIPAA here in the U.S.)

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Graphics window always overlaps console window!

2005-10-25 Thread Marc Schwartz (via MN)
On Tue, 2005-10-25 at 11:55 -0600, [EMAIL PROTECTED] wrote:
 Does anyone know how I can set up R so that when I make a graphic, the
 graphics window remains behind the console window? It's annoying to
 have to reach for the mouse every time I want to type another line of
 code (e.g., to add another line to the plot). Thanks.

What operating system?

Default window focus behavior is highly OS and even window manager
specific and is not an R issue.

Depending upon your OS and window manager, you may need to check the
documentation and/or do a Google search on window focus for further
information.

Another alternative, if you are on Windows, is to review Windows FAQ 5.4
How do I move focus to a graphics window or the console?, but this is
a programmatic approach and not a means to affect default behavior.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Graphics window always overlaps console window!

2005-10-25 Thread Marc Schwartz (via MN)
On Tue, 2005-10-25 at 13:07 -0500, Marc Schwartz (via MN) wrote:
 On Tue, 2005-10-25 at 11:55 -0600, [EMAIL PROTECTED] wrote:
  Does anyone know how I can set up R so that when I make a graphic, the
  graphics window remains behind the console window? It's annoying to
  have to reach for the mouse every time I want to type another line of
  code (e.g., to add another line to the plot). Thanks.
 
 What operating system?
 
 Default window focus behavior is highly OS and even window manager
 specific and is not an R issue.
 
 Depending upon your OS and window manager, you may need to check the
 documentation and/or do a Google search on window focus for further
 information.
 
 Another alternative, if you are on Windows, is to review Windows FAQ 5.4
 How do I move focus to a graphics window or the console?, but this is
 a programmatic approach and not a means to affect default behavior.

Yet another approach which I just remembered is that (if on Windows) MS
offers a program called Tweak UI:

http://www.microsoft.com/windowsxp/downloads/powertoys/xppowertoys.mspx

as part of their Power Toys add-ons.

You might want to review that to see if there is a setting for the
handling of new window focus behavior.

HTH,

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Basic: setting resolution and size of an R graphic

2005-10-24 Thread Marc Schwartz (via MN)
On Mon, 2005-10-24 at 22:32 +0200, Dr. med. Peter Robinson wrote:
 Dear List,
 
 I am sorry if this perhaps a too basic question, but I have not found an
 answer in Google or in the R help system. I am trying to use R to do a
 very simple analysis of some data (RT-PCR and Western analysis) with a
 T-test and
 to plot the results as a histogram with error bars. (I have pasted an
 example script at the bottom of this mail).
 In order to use this for publication, I would like to adjust the
 resolution and size of the final image. However, even using file types
 such as postscript or pdf that are vector based, I get rather bad-looking
 results with
 pdf(file=test.pdf)
 source(script at bottom of mail)
 dev.off()
 
 using either pdf or postscript or jpg devices.
 
 
 Therefore I would like to ask the list, how to best produce a graphic from
 the script below that would fit into one column of a published article and
 have a high resolution (as eps, or failing that tiff or png)?
 Thanks in advance for any advice,
 
 Peter

Snip of code

What OS are you on?

Running your example on FC4, I get the attached output for a pdf().

I suspect that on your OS, the height and width arguments are not
appropriate by default.

Thus, you may need to adjust your pdf (or postscript) function call to
explicitly specify larger height and width arguments.

Also note that to generate an EPS file, pay attention to the details
section of ?postscript, taking note of the 'onefile', 'horizontal' and
'paper' arguments and settings.

Also, check with your journal to see if they specify dimensions for such
graphics so that you can abide by their specs if provided. If they are
using LaTeX, there are means of specifying and/or adjusting the height
and/or width specs in the code based upon proportions of various
measures (ie. \includegraphics[width=0.9\textwidth]{GraphicsFile.eps} ).

HTH,

Marc Schwartz



test.pdf
Description: Adobe PDF document
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] make three plot to one plot

2005-10-21 Thread Marc Schwartz (via MN)
On Fri, 2005-10-21 at 21:13 +0200, Jan Sabee wrote:
 Dear all,
 I want to make three plot below to only one plot together with legend,
 how can I do that?
 I have tried with matplot function but I did not succeed.
 Thanks for your help.
 Sincerelly,
 Jan Sabee
 
 test.five.x - 
 c(0.02,0.05,0.07,0.09,0.10,0.12,0.13,0.14,0.16,0.17,0.20,0.21,0.34,0.40)
 test.five.y - c(18,12,17,12,3,15,1,5,1,1,3,10,15,10)
 plot(test.five.x, test.five.y, type=l,lty=1, lwd=3, col='red')
 legend(par('usr')[2], par('usr')[4], xjust=1,
   c('five'),
   lwd=3,
   lty=1,
   col=c('red'))
 test.six.x - 
 c(0.03,0.05,0.07,0.08,0.09,0.10,0.12,0.13,0.14,0.15,0.17,0.18,0.21,0.22,0.24,0.33,0.39,0.43)
 test.six.y - c(8,32,14,21,3,8,11,14,11,21,16,14,10,21,1,2,13,19)
 plot(test.six.x, test.six.y, type=l,lty=2, lwd=3, col='green')
 legend(par('usr')[2], par('usr')[4], xjust=1,
   c('six'),
   lwd=3,
   lty=2,
   col=c('green'))
 test.seven.x -
 c(0.04,0.05,0.08,0.09,0.10,0.12,0.13,0.14,0.16,0.17,0.19,0.21,0.22,0.24,0.29,0.32,0.38,0.40,0.46,0.50)
 test.seven.y -
 c(18,135,240,42,63,128,215,267,127,36,21,23,223,12,66,96,12,26,64,159)
 plot(test.seven.x, test.seven.y, type=l,lty=3, lwd=2, col='blue')
 legend(par('usr')[2], par('usr')[4], xjust=1,
   c('seven'),
   lwd=3,
   lty=3,
   col=c('blue'))

Jan,

matplot() won't work well here, because you do not have common x axis
values across the three sets of data. Thus, you need to use plot() and
then lines().

Try this:

# First set up your data

test.five.x - c(0.02,0.05,0.07,0.09,0.10,0.12,
 0.13,0.14,0.16,0.17,0.20,0.21,
 0.34,0.40)
test.five.y - c(18,12,17,12,3,15,1,5,1,1,3,10,
 15,10)

test.six.x - c(0.03,0.05,0.07,0.08,0.09,0.10,
0.12,0.13,0.14,0.15,0.17,0.18,
0.21,0.22,0.24,0.33,0.39,0.43)
test.six.y - c(8,32,14,21,3,8,11,14,11,21,16,
14,10,21,1,2,13,19)

test.seven.x - c(0.04,0.05,0.08,0.09,0.10,0.12,
  0.13,0.14,0.16,0.17,0.19,0.21,
  0.22,0.24,0.29,0.32,0.38,0.40,
  0.46,0.50)
test.seven.y - c(18,135,240,42,63,128,215,267,
  127,36,21,23,223,12,66,96,12,
  26,64,159)


# Now use plot to draw the first set of data
# The key here is to properly set to the x and y axis
# ranges ('xlim' and 'ylim') so that they are based
# upon all three sets of data.
# Note that I also set the x and y axis labels to . You
# can adjust these and the plot title as you require.

plot(test.five.x, test.five.y, type=l,lty=1, lwd=3, col=red,
 xlim = range(test.five.x, test.six.x, test.seven.x),
 ylim = range(test.five.y, test.six.y, test.seven.y),
 xlab = , ylab = )


# Now use lines() to add the second and third sets of data

lines(test.six.x, test.six.y, lty=2, lwd=3, col=green)
lines(test.seven.x, test.seven.y, lty=3, lwd=2, col=blue)


# Now use legend(). Place it at the upper right hand corner
# and set the line types and colors to reflect the above

legend(topright, xjust = 1,
   legend = c(five, six, seven),
   lwd = 3, lty = 1:3,
   col = c(red, green, blue))


See ?lines for more information.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] peculiar matrices

2005-10-21 Thread Marc Schwartz (via MN)
On Fri, 2005-10-21 at 21:32 +, Ben Bolker wrote:
 As far as I can tell from reading The Fine Documentation
 (R Language Definition and Intro to R), matrices are supposed
 to be of homogeneous types.  Yet giving matrix() an inhomogeneous
 list seems to work, although it produces a peculiar object:
 
 v = list(1:3,4,5,a)
 m = matrix(v,nrow=2)
 m
 
  [,1]  [,2]
 [1,] Integer,3 5
 [2,] 4 a
 
 
 m[1,]
 
 [[1]]
 [1] 1 2 3
 
 [[2]]
 [1] 3
 
  (this is R 2.1.1, running under Linux)

Ben,

If you review the structure of 'm' note:

 str(m)
List of 4
 $ : int [1:3] 1 2 3
 $ : num 4
 $ : num 5
 $ : chr a
 - attr(*, dim)= int [1:2] 2 2

that it is actually a list, even though:

 class(m)
[1] matrix


Also:

 mode(m)
[1] list

 typeof(m)
[1] list


If you remove the dim attributes, you get:

 dim(m) - NULL
 m
[[1]]
[1] 1 2 3

[[2]]
[1] 4

[[3]]
[1] 5

[[4]]
[1] a


So I would argue that it is consistent with the documentation in that,
while the printed output is that of a matrix, it is a list, which of
course can handle heterogeneous data types.

This is Version 2.2.0 Patched (2005-10-20 r35979).


   Should there be a check/error? Or is this just analogous to
 the joke about going to the doctor and saying it hurts when
 I do this, and the doctor saying well then, don't do that?


Maybe more like:

Doctor, my eye hurts when I drink my tea. and the doctor says, Well,
remove the spoon from the cup before you drink.

;-)

Regards,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Automatic rounding of values after factors , converted to numeric, are multipled by a real number

2005-10-20 Thread Marc Schwartz (via MN)
On Thu, 2005-10-20 at 16:20 +0200, Peter Dalgaard wrote:
 Nelson, Gary (FWE) [EMAIL PROTECTED] writes:
 
  Peter,
  
  Thank you for your response. I knew how close the values are to
  integers, but I still don't understand why I don't have control over
  how the numbers are displayed (rounded or not)?
 
 It's all in the conventions. The digits in print() and friends are
 significant digits, so we first round to that many significant digits,
 then discard trailing zeros, which is why
 
  12.51
 [1] 12.5
  12.50001
 [1] 12.50001
 
 The exception is that we do not discard significant digits to the left
 of the decimal point, unless we are using scientific notation
 
  print(12345678,digits=2)
 [1] 1.2e+07
  print(12345678,digits=5)
 [1] 12345678
 
 (the scipen options controls the logic for switching notation).
 
 For finer control we have the formatC function:
 
  format(1234.1,digits=9) # same thing as with print()
 [1] 1234.1
  format(1234.1,digits=8)
 [1] 1234
  formatC(1234.1,digits=5,format=f)
 [1] 1234.1
  formatC(1234.1,digits=4,format=f)
 [1] 1234.


Also, sprintf():

 sprintf(%.9f, 1234.1)
[1] 1234.1

 sprintf(%.4f, 1234.1)
[1] 1234.

 sprintf(%12.4f, 1234.1)
[1]1234.


HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] having scaling problems with a histogram

2005-10-20 Thread Marc Schwartz (via MN)
On Thu, 2005-10-20 at 17:31 +0200, Christian Jones wrote:
 
 Hello,?xml:namespace prefix = o ns =
 urn:schemas-microsoft-com:office:office /o:p/o:p
 
 I would like to create a histogram from a data collumn consisting of 4
 classes (0; 0.05;0.5;25;75). Due to the difference in scale the
 classes 0;0.05 and 0.5 are displayed within one combined bin by
 default with the code:Hist(x, scale=percent, breaks=Sturges). How
 can I display them all, or as unique classes, leaving out the rest of
 the x_axes scale? There was no enlightenment in the help function and
 the command : breaks=5 did not do the trick.o:p/o:p
 
 Thanks in advance o:p/o:p
 
 Chriso:p/o:p

I may be mis-understanding what you are doing here, but it sounds like
you should be using barplot() rather than hist(), which is referenced in
the See Also in ?hist. This would be preferred if you want individual
counts or proportions (or percents) from specific groups without
binning.

Where does the 'scale = percent' come from?  I don't recognize that
argument from the standard histogram plotting functions. 

Oh, wait a minute, just found it doing a search. The Hist() function is
in John Fox's Rcmdr. Please be sure to note this if you are using a
function from a contributed package. It saves time and increases the
likelihood of your getting a reply.

Also, I am guessing that the extraneous content here is the result of
using MS Word as your e-mail editor? Please use plain text only, which
is the defined format for this list and from reviewing the archive, has
been pointed out to you previously by Frank Harrell and Ted Harding.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Boxplot labels

2005-10-20 Thread Marc Schwartz (via MN)
On Thu, 2005-10-20 at 14:46 -0400, Keith Sabol wrote:
 I am creating boxplots from a dataframe and would like to add to the 
 standard output a marker representing the value from a particular row in the 
 dataframe.
 
 .And I apologize if the solution is as trivial as it seems it should be.
 
 Thanks for any assistance.
 
 Keith

Some example code would be helpful here.

However, some hints that I suspect will be helpful:

1. Unless you use the 'at' argument in boxplot(), the box center
positions are at integer values from 1:n, where n is the number of
groups. 

2. You can thus use the points() function to add symbols or the text()
function to add labels to the existing plot as you may require.
See ?points and/or ?text for more information.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Boxplot labels

2005-10-20 Thread Marc Schwartz (via MN)
On Thu, 2005-10-20 at 15:41 -0500, Marc Schwartz (via MN) wrote:
 On Thu, 2005-10-20 at 14:46 -0400, Keith Sabol wrote:
  I am creating boxplots from a dataframe and would like to add to the 
  standard output a marker representing the value from a particular row in 
  the 
  dataframe.
  
  .And I apologize if the solution is as trivial as it seems it should be.
  
  Thanks for any assistance.
  
  Keith
 
 Some example code would be helpful here.
 
 However, some hints that I suspect will be helpful:
 
 1. Unless you use the 'at' argument in boxplot(), the box center
 positions are at integer values from 1:n, where n is the number of
 groups. 
 
 2. You can thus use the points() function to add symbols or the text()
 function to add labels to the existing plot as you may require.
 See ?points and/or ?text for more information.


One other note to add here is that boxplot() will return a list which
contains various stats about your grouped data. This requires using the
form:

  stats - boxplot(...)

Thus using the 'stats$group' and 'stats$out' components will give you
the x and y positions of any outliers. This will be helpful if these are
the values that you might want to identify

See the Value section in ?boxplot and ?boxplot.stats for more
information.

HTH,

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] matching two plots

2005-10-19 Thread Marc Schwartz (via MN)
On Wed, 2005-10-19 at 18:27 +0100, Rui Cardoso wrote:
 Hi,
 
 I have a problem about graphics. I would like to plot two graphs: a barplot 
 and curve. Here is the code:
 
   barplot(dpois(0:45,20),xlim=c(0,45),names=0:45)
   curve(dnorm(x,20,sqrt(20)),from=0,to=45,add=T)
 
 Both graphs are drawn in the same figure, however the scale in both graphs 
 dooes not match. For some reason the second plot is shifted to left. I 
 think there is a problem concerning the axis scale.
 
 Thanks a lot.
 
 Rui


This came up this past summer and Gabor and I had a couple of different
approaches to the solution. You can see the posts here:

http://finzi.psych.upenn.edu/R/Rhelp02a/archive/57431.html

and

http://finzi.psych.upenn.edu/R/Rhelp02a/archive/57432.html


Using my approach with barplot(), what you need to do is to set the
'space' argument to 0 so that there is no space between the bars. This
will then place the bar centers at increments of 0.5, in your case
seq(0.5, 45.5, 5).

Knowing this, you can then adjust the x axis position in curve() by +0.5
to coincide with the bars. So you end up with this:

barplot(dpois(0:45, 20), names = 0:45, space = 0, ylim = c(0, .1))
curve(dnorm(x, 20 + 0.5, sqrt(20)), add = TRUE)

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Efficient ways of finding functions and Breslow-Day test for homogeneity of the odds ratio

2005-10-18 Thread Marc Schwartz (via MN)
On Tue, 2005-10-18 at 14:27 +0100, MJ Price, Social Medicine wrote:
 Dear all,
 
 I have been trying to find a function to calculate the Breslow-Day test for 
 homogeneity of the odds ratio in R. I know the test can be preformed in SAS 
 but i was wondering if anyone could help me to perform this in r.
 
 In addition i have the fullrefman file to search for functions in the basic 
 R packages, does anyone have any suggestions of an efficient way of 
 searching for functions in the other add-on packages on the website?
 
 Thanks in advance
 
 Malcolm

RSiteSearch(Breslow), which will perform a search of the e-mail list
archives and other online documentation, reveals this post:

http://finzi.psych.upenn.edu/R/Rhelp02a/archive/50202.html

which in turn leads to this code:

http://www.math.montana.edu/~jimrc/classes/stat524/Rcode/breslowday.test.r


There is also code for the Woolf test in ?mantelhaen.test

Further searches can be conducted using help.search(Keyword), which
will search your installed set of packages.

See ?help.search and ?RSiteSearch for more help on these.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] p-value calculation

2005-10-18 Thread Marc Schwartz (via MN)
On Tue, 2005-10-18 at 17:13 +0200, richard mendes wrote:
 hello everybody
 
 i'm very new at using R so probably this is a very stupid question.
 I have a problem calculating a p-value. When i do this with excel i
 can use the method CHIDIST for 1.2654 with 1 freedom degree i get the
 answer 0.261
 
 i just want to do the same thing in R but i can't find a method.
 can somebody help me
 
 friendly regards
 
 richard


 pchisq(1.2654, 1, lower.tail = FALSE)
[1] 0.2606314

See ?pchisq for more information.

You might also want to read Chapter 8 Probability Distributions in An
Introduction To R, available with your R installation or from the
Documentation links on the main R web site.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Efficient ways of finding functions and Breslow-Day test for homogeneity of the odds ratio

2005-10-18 Thread Marc Schwartz (via MN)
On Wed, 2005-10-19 at 06:47 +1000, Tim Churches wrote:
 Marc Schwartz (via MN) wrote:
 
  There is also code for the Woolf test in ?mantelhaen.test
 
 Is there? How is it obtained? The documentation on mantelhaen.test in R
 2.2.0 contains a note: Currently, no inference on homogeneity of the
 odds ratios is performed. and a quick scan of the source code for the
 function didn't reveal any meantion of Woolf's test.
 
 Tim C


Review the code in the examples on the cited help page...

:-)

HTH,

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] y axis in histograms

2005-10-17 Thread Marc Schwartz (via MN)
On Mon, 2005-10-17 at 16:51 +0200, Jorge Gaspar Sanz Salinas wrote:
 Hi all,
 
 This is my first post, I hope you will help me.
 
 I've some data to present with histograms. I have few values with almost 99% 
 of 
 the frequencies (thousands) and some other values with low frequencies (below 
 one hundred) that I want to emphasize. I think if I could present the 
 logarithms 
 of the frequencies, these could be presented in a more convenient way, but I 
 don't know how to deal with this.
 
 Thanks, and sorry for my English


If you are plotting the counts, you should probably look at barplot(),
which as of R version 2.2.0 supports log scale axes (ie. log = y).

See ?barplot for more information.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] apply and plot

2005-10-13 Thread Marc Schwartz (via MN)
On Thu, 2005-10-13 at 14:50 +0100, Luis Ridao Cruz wrote:
 R-help,
 
 I use the code below to plot some data by applying apply function.
 But I don't know how I can get the argument type or col on the
 plot function to distinguish the different lines
 in the graph:
 
 apply ( my.data, 2, function ( x ) lines ( dimnames ( my.data ) [[1]] ,
 x ) )
 
 
 Thank you in advance


Rather than trying the use the construct above, take a look at ?matlines
and/or ?matplot (both on the same help page with ?matpoints.)

I think that you will find these purposefully designed functions better
suited to what I believe you are trying to do here.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Removing and restoring factor levels (TYPO CORRECTED)

2005-10-13 Thread Marc Schwartz (via MN)
On Thu, 2005-10-13 at 10:02 -0400, Duncan Murdoch wrote:
 Sorry, a typo in my previous message (parens in the wrong place in the 
 conversion).
 
 Here it is corrected:
 
 I'm doing a big slow computation, and profiling shows that it is
 spending a lot of time in match(), apparently because I have code like
 
 x %in% listofxvals
 
 Both x and listofxvals are factors with the same levels, so I could
 probably speed this up by stripping off the levels and just treating
 them as integer vectors, then restoring the levels at the end.
 
 What is the safest way to do this?  I am worried that at some point x
 and listofxvals will *not* have the same levels, and the optimization
 will give the wrong answer.  So I need code that guarantees they have
 the same coding.
 
 I think this works, where master is a factor with the master list of
 levels (guaranteed to be a superset of the levels of x and listofxvals),
 but can anyone spot anything that might go wrong?
 
 # Strip the levels
 x - as.integer( factor(x, levels = levels(master) ) )
 
 # Restore the levels
 x - structure( x, levels = levels(master), class = factor )
 
 Thanks for any advice...
 
 Duncan Murdoch

Duncan,

With the predicate that 'master' has the full superset of all possible
factor levels defined, it would seem that this would be a reasonable way
to go.

This approach would also seem to eliminate whatever overhead is
encountered as a result of the coercion of 'x' as a factor to a
character vector, which is done by match().

One question I have is, what is the advantage of using structure()
versus:

   x - factor(x, levels = levels(master))

?

Thanks,

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Removing and restoring factor levels (TYPO CORRECTED)

2005-10-13 Thread Marc Schwartz (via MN)
On Thu, 2005-10-13 at 14:31 -0400, Duncan Murdoch wrote:
 On 10/13/2005 1:07 PM, Marc Schwartz (via MN) wrote:
  On Thu, 2005-10-13 at 10:02 -0400, Duncan Murdoch wrote:
  Sorry, a typo in my previous message (parens in the wrong place in the 
  conversion).
  
  Here it is corrected:
  
  I'm doing a big slow computation, and profiling shows that it is
  spending a lot of time in match(), apparently because I have code like
  
  x %in% listofxvals
  
  Both x and listofxvals are factors with the same levels, so I could
  probably speed this up by stripping off the levels and just treating
  them as integer vectors, then restoring the levels at the end.
  
  What is the safest way to do this?  I am worried that at some point x
  and listofxvals will *not* have the same levels, and the optimization
  will give the wrong answer.  So I need code that guarantees they have
  the same coding.
  
  I think this works, where master is a factor with the master list of
  levels (guaranteed to be a superset of the levels of x and listofxvals),
  but can anyone spot anything that might go wrong?
  
  # Strip the levels
  x - as.integer( factor(x, levels = levels(master) ) )
  
  # Restore the levels
  x - structure( x, levels = levels(master), class = factor )
  
  Thanks for any advice...
  
  Duncan Murdoch
  
  Duncan,
  
  With the predicate that 'master' has the full superset of all possible
  factor levels defined, it would seem that this would be a reasonable way
  to go.
  
  This approach would also seem to eliminate whatever overhead is
  encountered as a result of the coercion of 'x' as a factor to a
  character vector, which is done by match().
  
  One question I have is, what is the advantage of using structure()
  versus:
  
 x - factor(x, levels = levels(master))
  
  ?
 
 That one doesn't work.  What factor(x, levels=levels(master)) says is 
 to convert x to a factor, coding the values in it according the levels 
 in master.  But at this point x has values which are integers, so  they 
 won't match the levels of master, which are probably character strings.
 
 For example:
 
   master - factor(letters)
   print(x - factor(letters[1:3]))
 [1] a b c
 Levels: a b c
   print(x - as.integer( factor(x, levels = levels(master) ) ) )
 [1] 1 2 3
   print(x - factor(x, levels = levels(master)))
 [1] NA NA NA
 Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
 
 I get NA's at the end because the values 1,2,3 aren't in the vector of 
 factor levels (which are the lowercase letters).

As opposed to:

 print(x - structure(x, levels = levels(master), class = factor ))
[1] a b c
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z


OK.  Makes sense. Thanks for the clarification.

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] How to generate for one vector matrix

2005-10-13 Thread Marc Schwartz (via MN)
On Thu, 2005-10-13 at 19:47 +0200, Jan Sabee wrote:
 Is there any routine to generate for one vector matrix.
 If I have X I want to generate start from zero to maximum value each vector.
 
 For example, I have a vector x = (4,2,3,1,4)
 I want to generate n=6 times, for 4, start 0 to 4, then 2 start 0 to 2, ect.
 
 The result something like this:
 
 generate(x,n=6)
 1,1,2,1,4
 1,2,3,0,3
 4,0,1,1,1
 3,1,0,1,4
 0,0,3,0,0
 4,1,3,0,4
 
 Could anyone help me. Thanks.
 Regards,
 Jan Sabee


If I am properly understanding what you are doing here, you have an
initial vector of values. You want to create a matrix, whose columns are
the result of random sampling with replacement from the initial vector
'n' times, where the sampling space for each column i is from 0:x[i]?

If correct, this should do it:

 x - c(4, 2, 3, 1, 4)

 x
[1] 4 2 3 1 4

 sapply(x, function(x) sample(0:x, 6, replace = TRUE))
 [,1] [,2] [,3] [,4] [,5]
[1,]42112
[2,]30000
[3,]12013
[4,]42111
[5,]31310
[6,]01110


Just replace '6' in the sample() arguments with the 'n' you require.

See ?sapply and ?sample.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] subsetting data frame using by() or tapply() or other

2005-10-13 Thread Marc Schwartz (via MN)
On Thu, 2005-10-13 at 14:28 -0600, Brian S Cade wrote:
 Ok so I see the problem that I'm having creating a new variable (LAG1DBC) 
 in the example data transformation below is that tapply() is creating a 
 list that is not dimensionally consistent with the data frame (data).  So 
 how do I go from the list output of tapply() to create a dimensionally 
 consistent vector that can create the new variable in my original data 
 frame?  I've been trying to use a function like
 data$LAG1DBC - tapply(data$DBC, data$LOCID, function(x) c(NA, 
 x[-length(x)]))
 which creates a list of dimension much smaller than the nrows in data. And 
 I've tried things like using as.data.frame.array() or as.data.frame.list() 
 in front of tapply() and still have the same problem.  I know this can't 
 be that unusual of a data manipulation and that someone has to have done 
 similar things before.
 
 I want to go from something like this:
 
LOCID  POPULATION  YEARDBC
 1  algb-1   A 1992 0.70451575
 2  algb-1   A 1993 0.59506851
 3  algb-1   A 1997 0.84837544
 4  algb-1   A 1998 0.50283182
 5  algb-1   A 2000 0.91242707
 6  algb-2   A 1992 0.09747155
 7  algb-2   A 1993 0.84772253
 8  algb-2   A 1997 0.43974081
 9  algb-2   A 1998 0.83108544
 10 algb-2   A 2000 0.22291192
 11 algb-3   A 1992 0.44234175
 12 algb-3   A 1993 0.54089534
 5680 taylr-73   B 2001 0.43918082
 5681 taylr-73   B 2002 0.34694427
 5682 taylr-73   B 2003 3.35619190
 5683 taylr-73   B 2004 0.71575815
 5684 taylr-73   B 2005 0.42038506
 5685 taylr-74   B 1992 3.88410354
 5686 taylr-74   B 1993 3.32472557
 5687 taylr-74   B 1994 3.29861501
 5688 taylr-74   B 1996 0.48153827
 5689 taylr-74   B 1997 3.63570636
 5690 taylr-74   B 1998 1.94630194
 
 to something like this:
 
LOCID  POPULATION  YEARDBC LAG1DBC
 1  algb-1   A 1992 0.70451575   NA 
 2  algb-1   A 1993 0.59506851 0.70451575
 3  algb-1   A 1997 0.84837544   0.59506851
 4  algb-1   A 1998 0.50283182 0.84837544
 5  algb-1   A 2000 0.91242707   0.50283182
 6  algb-2   A 1992 0.09747155   NA
 7  algb-2   A 1993 0.84772253 0.09747155
 8  algb-2   A 1997 0.43974081   0.84772253
 9  algb-2   A 1998 0.83108544   0.43974081
 10 algb-2   A 2000 0.22291192   0.83108544
 11 algb-3   A 1992 0.44234175   NA
 12 algb-3   A 1993 0.54089534   0.44234175
 5680 taylr-73   B 2001 0.43918082   NA
 5681 taylr-73   B 2002 0.34694427   0.43918082
 5682 taylr-73   B 2003 3.35619190   0.34694427
 5683 taylr-73   B 2004 0.71575815   3.35619190
 5684 taylr-73   B 2005 0.42038506   0.71575815
 5685 taylr-74   B 1992 3.88410354   NA
 5686 taylr-74   B 1993 3.32472557   3.88410354
 5687 taylr-74   B 1994 3.29861501   3.32472557
 5688 taylr-74   B 1996 0.48153827   3.29861501
 5689 taylr-74   B 1997 3.63570636   0.48153827
 5690 taylr-74   B 1998 1.94630194   3.63570636
 
 Brian

Brian,

Use unlist():

 data$LAG1DBC - unlist(tapply(data$DBC, data$LOCID, 
 function(x) c(NA, x[-length(x)])))

 data
LOCID POPULATION YEARDBCLAG1DBC
1  algb-1  A 1992 0.70451575 NA
2  algb-1  A 1993 0.59506851 0.70451575
3  algb-1  A 1997 0.84837544 0.59506851
4  algb-1  A 1998 0.50283182 0.84837544
5  algb-1  A 2000 0.91242707 0.50283182
6  algb-2  A 1992 0.09747155 NA
7  algb-2  A 1993 0.84772253 0.09747155
8  algb-2  A 1997 0.43974081 0.84772253
9  algb-2  A 1998 0.83108544 0.43974081
10 algb-2  A 2000 0.22291192 0.83108544
11 algb-3  A 1992 0.44234175 NA
12 algb-3  A 1993 0.54089534 0.44234175
5680 taylr-73  B 2001 0.43918082 NA
5681 taylr-73  B 2002 0.34694427 0.43918082
5682 taylr-73  B 2003 3.35619190 0.34694427
5683 taylr-73  B 2004 0.71575815 3.35619190
5684 taylr-73  B 2005 0.42038506 0.71575815
5685 taylr-74  B 1992 3.88410354 NA
5686 taylr-74  B 1993 3.32472557 3.88410354
5687 taylr-74  B 1994 3.29861501 3.32472557
5688 taylr-74  B 1996 0.48153827 3.29861501
5689 taylr-74  B 1997 3.63570636 0.48153827
5690 taylr-74  B 1998 1.94630194 3.63570636

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] shell scripts in R

2005-10-13 Thread Marc Schwartz (via MN)
On Fri, 2005-10-14 at 00:04 +0200, Benedykt P. Barszcz wrote:
 Dnia czwartek, 13 października 2005 23:25, Andrew Robinson napisał:
  Marco,
 
  use the system command.
 
  ?system
 
  I hope that this helps,
 
 system(ls, intern = FALSE, ignore.stderr = TRUE)
 Error in as.character(args[[i]]) : cannot coerce to vector
 
 In fact it ignores the stderr !

As per ?system:

command  the system command to be invoked, as a string.

Thus,

 system(ls, intern = FALSE, ignore.stderr = TRUE)


HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] high resolution images for publication

2005-10-13 Thread Marc Schwartz (via MN)
On Thu, 2005-10-13 at 15:20 -0600, Chris Buddenhagen wrote:
 Dear all

 I am using R to produce ordinations library(vegan) and the plot function
 produced looks great on the screen but when I send it to jpg or pdf or eps
 the resolution is not so good. Can you tell me how to get high resolution
 images out of R for publication?

It would be helpful to see the actual code that you are using, as is
asked for in the Posting Guide.

For publication, it would be rare to want to use a bitmapped format such
as jpg/png.

pdf and eps are vector based formats and would be generally preferred
over the above.

I can only hazard a guess to consider that perhaps your height and width
arguments for pdf() and/or postscript() are not set properly, as there
are no other resolution based settings for those formats. By design, the
output resolution is target device dependent.

Please provide the code that you are using and we can respond with more
specific guidance.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Hmisc latex function

2005-10-12 Thread Marc Schwartz (via MN)
On Wed, 2005-10-12 at 08:33 -0500, Charles Dupont wrote:
 Marc Schwartz (via MN) wrote:
  On Tue, 2005-10-11 at 10:01 -0400, Rick Bilonick wrote:
  
 I'm using R 2.2.0 on an up-to-date version of Fedora Core 4 with the
 latest version of Hmisc. When I run an example from the latex function I
 get the following:
 
 
 x - matrix(1:6, nrow=2, dimnames=list(c('a','b'),c('c','d','enLine
 
 2')))
 
 x
 
   c d enLine 2
 a 1 35
 b 2 46
 
 latex(x)   # creates x.tex in working directory
 
 sh: line 0: cd: “/tmp/Rtmpl10983”: No such file or directory
 This is pdfeTeX, Version 3.141592-1.21a-2.2 (Web2C 7.5.4)
 entering extended mode
 ! I can't find file `“/tmp/Rtmpl10983/file643c9869”'.
 * “/tmp/Rtmpl10983/file643c9869”
 
 Please type another input file name: q
 (/usr/share/texmf/tex/latex/tools/q.tex
 LaTeX2e 2003/12/01
 Babel v3.8d and hyphenation patterns for american, french, german,
 ngerman, b
 ahasa, basque, bulgarian, catalan, croatian, czech, danish, dutch,
 esperanto, e
 stonian, finnish, greek, icelandic, irish, italian, latin, magyar,
 norsk, polis
 h, portuges, romanian, russian, serbian, slovak, slovene, spanish,
 swedish, tur
 kish, ukrainian, nohyphenation, loaded.
 File ignored
 xdvi-motif.bin: Fatal error: /tmp/Rtmpl10983/file643c9869.dvi: No such
 file.
 
 
 How can I fix this?
 
 Rick B.
  
  
  I get the same results, also on FC4 with R 2.2.0.
  
  I am cc:ing Frank here for his input, but a quick review of the code and
  created files suggests that there may be conflict between the locations
  of some of the resultant files during the latex system call. Some files
  appear in a temporary R directory, while others appear in the current R
  working directory.
  
  For example, if I enter the full filename:
   
/tmp/RtmpC12100/file643c9869.tex
  
  at the latex prompt, I get:
  
  
 latex(x)
  
  sh: line 0: cd: “/tmp/RtmpC12100”: No such file or directory
  This is pdfeTeX, Version 3.141592-1.21a-2.2 (Web2C 7.5.4)
  entering extended mode
  ! I can't find file `“/tmp/RtmpC12100/file643c9869”'.
  * “/tmp/RtmpC12100/file643c9869”
  
  Please type another input file name: *** loading the extensions
  datasource
  /tmp/RtmpC12100/file643c9869.tex
  (/tmp/RtmpC12100/file643c9869.tex
  LaTeX2e 2003/12/01
  Babel v3.8d and hyphenation patterns for american, french, german,
  ngerman, b
  ahasa, basque, bulgarian, catalan, croatian, czech, danish, dutch,
  esperanto, e
  stonian, finnish, greek, icelandic, irish, italian, latin, magyar,
  norsk, polis
  h, portuges, romanian, russian, serbian, slovak, slovene, spanish,
  swedish, tur
  kish, ukrainian, nohyphenation, loaded.
  (/usr/share/texmf/tex/latex/base/report.cls
  Document Class: report 2004/02/16 v1.4f Standard LaTeX document class
  (/usr/share/texmf/tex/latex/base/size10.clo))
  (/usr/share/texmf/tex/latex/geometry/geometry.sty
  (/usr/share/texmf/tex/latex/graphics/keyval.sty)
  (/usr/share/texmf/tex/latex/geometry/geometry.cfg))
  No file file643c9869.aux.
  [1] (./file643c9869.aux) )
  Output written on file643c9869.dvi (1 page, 368 bytes).
  Transcript written on file643c9869.log.
  xdvi-motif.bin: Fatal error: /tmp/RtmpC12100/file643c9869.dvi
 
 
 H,  It works for me.  Interesting.
 
 It almost looks like the temp dir is not being created, but thats not 
 possible because R does that.  It might be a Unicode issue with you 
 system shell.  Can you run this statement in R
 
 sys(paste('cd',dQuote(tempdir()),;,
 echo Hello BOB  test.test,
 ;,cat test.test))
 
 
 What version of Hmisc are you using?  What local are you using?
 
 Charles

Hmisc version 3.0-7, Dated 2005-09-15, which is the latest according to
CRAN.

 sys(paste('cd',dQuote(tempdir()),;,
+ echo Hello BOB  test.test,
+ ;,cat test.test))
sh: line 0: cd: “/tmp/RtmpGY5553”: No such file or directory
[1] Hello BOB


From a bash console:

$ cd /tmp/RtmpGY5553
$ pwd
/tmp/RtmpGY5553


$ locale
LANG=en_US.UTF-8
LC_CTYPE=en_US.UTF-8
LC_NUMERIC=en_US.UTF-8
LC_TIME=en_US.UTF-8
LC_COLLATE=en_US.UTF-8
LC_MONETARY=en_US.UTF-8
LC_MESSAGES=en_US.UTF-8
LC_PAPER=en_US.UTF-8
LC_NAME=en_US.UTF-8
LC_ADDRESS=en_US.UTF-8
LC_TELEPHONE=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8
LC_ALL=


On the creation of the sys() call, it looks like the backquotes are
causing the problem:

 paste('cd',dQuote(tempdir()))
[1] cd “/tmp/RtmpGY5553”


From a bash shell:

$ cd “/tmp/RtmpGY5553”
bash: cd: “/tmp/RtmpGY5553”: No such file or directory
$ cd /tmp/RtmpGY5553
$ pwd
/tmp/RtmpGY5553


According to ?dQuote:

By default, sQuote and dQuote provide undirectional ASCII quotation
style. In a UTF-8 locale (see l10n_info), the Unicode directional quotes
are used.

The See Also points to shQuote for quoting OS commands.


HTH,

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Correlation, by date, of two variables?

2005-10-12 Thread Marc Schwartz (via MN)
On Wed, 2005-10-12 at 12:01 -0700, t c wrote:
 I have a dataset with three variables: date, var1, var2

 How can I calculate the correlation, by date, between var1 and var2?
 
 e.g.
 
 
 datevar1var2
 1/1/200154
 1/1/200185
 1/1/200197
 
  
 
 2/1/200172
 2/1/200121
 2/1/200146
 
  
 
 3/1/200135
 3/1/200143
 3/1/200169
 3/1/20017-1
 
  
 
 the results I want:
 1/1/2001 0.891042111
 2/1/2001 0.075093926
 3/1/2001 -0.263117406

t c,

Given your series of posts here, I would highly recommend that before
you consider spending money on a tutor or other similar resource, you
take the time to read the freely available documentation that both R
Core and useRs have kindly provided to the Community. The R Core manuals
are all available with your downloaded installation of R and/or from the
main R web site under Documentation as are the Contributed documents.
You will find that your time will be well invested in that effort.

If you read the Posting Guide for the e-mail lists, a link to which is
provided at the bottom of every e-mail that comes through, you will find
an excellent guide to the resources that are available to assist you in
using R.

Trying to learn R (as with any technical endeavor) without reading at
least a basic set of the growing list of documents, books and
publications on R is going to be problematic.

A hint for you here:

See ?by, ?tapply, ?split and ?lapply for guidance on how to generate
summary statistics on subsets of data.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Two factor (or more) non-parametric comparison of means

2005-10-11 Thread Marc Schwartz (via MN)
I suspect that John may be looking for ?friedman.test, which I believe
will allow for a two-way non-parametric test.

kruskal.test() will perform a non-parametric test on two or more samples
(or factor levels) as a generalization of wilcox.test() for one or two
samples (or factor levels).

HTH,

Marc Schwartz

On Tue, 2005-10-11 at 18:22 +0200, Johan Sandblom wrote:
 ?kruskal.test
 
 2005/10/11, John Sorkin [EMAIL PROTECTED]:
  Can anyone suggest a good non-parametric test, and an R
 implementation of the test,  that allows for two or more factors?
 Wilcoxon signed rank allows for only one.
  Thanks,
  John

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Hmisc latex function

2005-10-11 Thread Marc Schwartz (via MN)
On Tue, 2005-10-11 at 10:01 -0400, Rick Bilonick wrote:
 I'm using R 2.2.0 on an up-to-date version of Fedora Core 4 with the
 latest version of Hmisc. When I run an example from the latex function I
 get the following:
 
  x - matrix(1:6, nrow=2, dimnames=list(c('a','b'),c('c','d','enLine
 2')))
  x
   c d enLine 2
 a 1 35
 b 2 46
  latex(x)   # creates x.tex in working directory
 sh: line 0: cd: “/tmp/Rtmpl10983”: No such file or directory
 This is pdfeTeX, Version 3.141592-1.21a-2.2 (Web2C 7.5.4)
 entering extended mode
 ! I can't find file `“/tmp/Rtmpl10983/file643c9869”'.
 * “/tmp/Rtmpl10983/file643c9869”
 
 Please type another input file name: q
 (/usr/share/texmf/tex/latex/tools/q.tex
 LaTeX2e 2003/12/01
 Babel v3.8d and hyphenation patterns for american, french, german,
 ngerman, b
 ahasa, basque, bulgarian, catalan, croatian, czech, danish, dutch,
 esperanto, e
 stonian, finnish, greek, icelandic, irish, italian, latin, magyar,
 norsk, polis
 h, portuges, romanian, russian, serbian, slovak, slovene, spanish,
 swedish, tur
 kish, ukrainian, nohyphenation, loaded.
 File ignored
 xdvi-motif.bin: Fatal error: /tmp/Rtmpl10983/file643c9869.dvi: No such
 file.
 
 
 How can I fix this?
 
 Rick B.

I get the same results, also on FC4 with R 2.2.0.

I am cc:ing Frank here for his input, but a quick review of the code and
created files suggests that there may be conflict between the locations
of some of the resultant files during the latex system call. Some files
appear in a temporary R directory, while others appear in the current R
working directory.

For example, if I enter the full filename:
 
  /tmp/RtmpC12100/file643c9869.tex

at the latex prompt, I get:

 latex(x)
sh: line 0: cd: “/tmp/RtmpC12100”: No such file or directory
This is pdfeTeX, Version 3.141592-1.21a-2.2 (Web2C 7.5.4)
entering extended mode
! I can't find file `“/tmp/RtmpC12100/file643c9869”'.
* “/tmp/RtmpC12100/file643c9869”

Please type another input file name: *** loading the extensions
datasource
/tmp/RtmpC12100/file643c9869.tex
(/tmp/RtmpC12100/file643c9869.tex
LaTeX2e 2003/12/01
Babel v3.8d and hyphenation patterns for american, french, german,
ngerman, b
ahasa, basque, bulgarian, catalan, croatian, czech, danish, dutch,
esperanto, e
stonian, finnish, greek, icelandic, irish, italian, latin, magyar,
norsk, polis
h, portuges, romanian, russian, serbian, slovak, slovene, spanish,
swedish, tur
kish, ukrainian, nohyphenation, loaded.
(/usr/share/texmf/tex/latex/base/report.cls
Document Class: report 2004/02/16 v1.4f Standard LaTeX document class
(/usr/share/texmf/tex/latex/base/size10.clo))
(/usr/share/texmf/tex/latex/geometry/geometry.sty
(/usr/share/texmf/tex/latex/graphics/keyval.sty)
(/usr/share/texmf/tex/latex/geometry/geometry.cfg))
No file file643c9869.aux.
[1] (./file643c9869.aux) )
Output written on file643c9869.dvi (1 page, 368 bytes).
Transcript written on file643c9869.log.
xdvi-motif.bin: Fatal error: /tmp/RtmpC12100/file643c9869.dvi: No such
file.


The temporary .tex file is present, but the .dvi, .aux and .log files
are created in the current working R directory.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Two factor (or more) non-parametric comparison of means

2005-10-11 Thread Marc Schwartz (via MN)
Peter,

Good points all around.

Hopefully John will provide further clarification.

Best regards,

Marc

On Tue, 2005-10-11 at 19:46 +0200, Peter Dalgaard wrote:
 Marc Schwartz (via MN) [EMAIL PROTECTED] writes:
 
  I suspect that John may be looking for ?friedman.test, which I believe
  will allow for a two-way non-parametric test.
 
 You could be right, but you could also be wrong... It depends quite a
 bit on what is meant by a two-factor  comparison of means. If he
 literally means means (!), then a permutation test (permuting within
 levels of the other factor) could be appropriate. It also makes a
 difference whether there's a single replication or more per cell in
 the cross-classification.
 
  kruskal.test() will perform a non-parametric test on two or more samples
  (or factor levels) as a generalization of wilcox.test() for one or two
  samples (or factor levels).
  
  HTH,
  
  Marc Schwartz
  
  On Tue, 2005-10-11 at 18:22 +0200, Johan Sandblom wrote:
   ?kruskal.test
   
   2005/10/11, John Sorkin [EMAIL PROTECTED]:
Can anyone suggest a good non-parametric test, and an R
   implementation of the test,  that allows for two or more factors?
   Wilcoxon signed rank allows for only one.
Thanks,
John
  
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide! 
  http://www.R-project.org/posting-guide.html
  


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] wildcards and removing variables

2005-10-10 Thread Marc Schwartz (via MN)
On Mon, 2005-10-10 at 10:37 -0400, Afshartous, David wrote:
   All,
 
   Is there are a wildcard in R for varible names as in unix?  For example,
 
   rm(results*)
 
   to remove all variable or function names that begin w/ results?  
 
   cheers,
   Dave
   ps - please respond directly to [EMAIL PROTECTED] 


See ?ls, which has a 'pattern' argument, enabling the use of Regex to
define the objects to be listed and subsequently removed using rm().

You can then use something like:

  rm(list = ls(pattern = \\bresults.))

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] wildcards and removing variables

2005-10-10 Thread Marc Schwartz (via MN)
On Mon, 2005-10-10 at 16:01 +0100, Prof Brian Ripley wrote:
 On Mon, 10 Oct 2005, Marc Schwartz (via MN) wrote:
 
  On Mon, 2005-10-10 at 10:37 -0400, Afshartous, David wrote:
 All,
 
 Is there are a wildcard in R for varible names as in unix?  For example,
 
 rm(results*)
 
 to remove all variable or function names that begin w/ results?
 
 cheers,
 Dave
 ps - please respond directly to [EMAIL PROTECTED]
 
 
  See ?ls, which has a 'pattern' argument, enabling the use of Regex to
  define the objects to be listed and subsequently removed using rm().
 
  You can then use something like:
 
   rm(list = ls(pattern = \\bresults.))
 
 One new feature of R-2.2.0 is glob2rx, which converts wildcards to regexps 
 for you. E.g.
 
 rm(list = ls(pattern = glob2rc(results*)))
 
 (I think if does it a little better, as that trailing dot is not I think 
 correct: result* matches result.)

Thanks to both Gabor and Prof. Ripley for pointing out the use of
glob2rc(). One of the new features I had forgotten about since reading
NEWS.

On the trailing dot, I was just in the process of drafting a follow up
after realizing my error, since as you point out, 'results' would not be
matched in that case.

Thanks,

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] spline.des

2005-10-05 Thread Marc Schwartz (via MN)
On Mon, 2005-10-03 at 15:28 -0500, lforzani wrote:
 Hello, I am using library fda and I can not run a lot of functions because
 I receive the error:
 
 Error in bsplineS(evalarg, breaks, norder, nderiv) : 
 couldn't find function spline.des
 
 
 do you know how I can fix that? Thnaks. Liliana


spline.des() is in the 'splines' package, installed with the basic R
distribution. It appears that bsplineS() has a dependency on this not
otherwise referenced in the fda package documentation. Looks like fda
has not been updated in some time as well.

I have cc'd Jim Ramsay on this reply as an FYI.

You probably need a:

  library(splines)

before calling your code above.


HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Getting eps into Word documents.

2005-10-04 Thread Marc Schwartz (via MN)
On Tue, 2005-10-04 at 08:38 -0500, Robert Baer wrote:
   On 03-Oct-05 Marc Schwartz (via MN) wrote:
On Mon, 2005-10-03 at 16:31 -0300, Rolf Turner wrote:
A student in one of my courses has asked me about getting R graphics
output (under Linux) into a Word document.  I.e. she wants to do her
R thing under Linux, but then do her word processing using Word.
 --snip--
So use something like the following:
   
   
postscript(RPlot.eps, height = 4, width = 4,
   horizontal = FALSE, onefile = FALSE,
   paper = special)
   
plot(1:5)
   
dev.off()
   
   
You can then import the .eps file into Word or most other such
applications that can import encapsulated postscript files.
 
 -snip
More information is available from MS here:
   
http://support.microsoft.com/?kbid=290362
   
HTH,
   
Marc Schwartz
 --snip---
   b) It won't work anyway if printed to a non-PostScript printer.
 
  True, which is the case irrespective of Word/Windows. If you don't have
  a PS printer locally or accessible via network, you can always install a
  PS printer driver and print to a file, which can then be printed by a
  third party if required.
 
 Well, as a lowly Windows and Office user, I most often right click on R
 grahics, cut to clipboard, and paste into Word.   So one possiblility is for
 the student to install R on her own machine (Windows or Mac?).
 
 But I just tried Marc's suggestion, and it looks VERY VIABLE to me.  I
 generated the graph from his code snippet and used Insert picture from
 file in Word 2003 to place the graphic in a Word document.  I then tried
 printing on both an HP 4100 TN laserjet and an HP 960c deskjet.  The image
 printed perfectly on both printers with crisp lines and text that apprear to
 be vector-based not degraded bitmapped representations.  Certainly worth the
 student trying.


Glad to hear that worked for you.

I do think that the majority of problems with importing EPS files from R
into other applications (Word, Powerpoint, Writer, Impress, etc.) are
due to not properly configuring the postscript() arguments that are on
the help page. 

I should also note that beyond plots, I use this same approach for
putting LaTeX tables into these documents as well. There are times where
I need create one or more nicely formatted tables for inclusion in
another document, to then be used by someone else.

I generate the LaTeX preamble, document and table code using R, then
run:

  latex FileName.tex
  dvips -E FileName -o FileName.ps

where the '-E' option to dvips attempts to create an EPS output file
with a tight bounding box. The result is a single EPS file with the
table (much like the plot example) ready for import.

I had been using 'epstool' to create the image preview, but now that
OO.org 2.0 has included this automatically upon import (as do the latest
versions of Word), this step is no longer required.

In this way, I can use R to create reproducible and publication ready
tables for use by others in their documents, in situations where they
are not (or cannot) use LaTeX for the entire document.

HTH,

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] gnomeGUI installation

2005-10-03 Thread Marc Schwartz (via MN)
On Mon, 2005-10-03 at 18:01 +0100, Prof Brian Ripley wrote:
 On Mon, 3 Oct 2005, Daniel Pick wrote:
 
I have successfully downloaded the sources and built R as a shared
  library on a Red Hat Enterprise Level 3 box.  I am now trying to build the
  GNOME GUI, but configure is barfing on glade.  According to the system
  logs, the RPM for libglade2-devel-2.0.1-3.x86_64 is installed, but in
  /usr/bin, where gnomeGUI configure is looking, what's there is
  libglade-convert.  How do I fix this?
 
 By reading the manual.  In the R-admin manual it says
 
 `Please check you have all the requirements.  You need at least the
 following packages or later installed:
 
 audiofile-0.2.1
 esound-0.2.23
 glib-1.2.10
 gtk+-1.2.10
 imlib-1.9.10
 ORBit-0.5.12
 gnome-libs-1.4.1.2
 libxml-1.8.16
 libglade-0.17
 
 ...
 
 Remember that some package management systems (such as @acronym{RPM} and
 deb) make a distinction between the user version of a package and the
 developer version.  The latter usually has the same name but with the
 extension @samp{-devel} or @samp{-dev}.  If you use a pre-packaged
 version of @acronym{GNOME} then you must have the developer versions of
 the above packages in order to compile the R-GNOME console.'
 
 My FC3 box has
 
 gannet% rpm -qa | grep ^libglade
 libglade2-devel-2.4.0-5
 libglade-devel-0.17-15
 libglade2-2.4.0-5
 libglade-0.17-15
 
 and note that libglade2[-devel] is NOT a later version of libglade (in the 
 same way that gtk2 is different from gtk).
 
 AFAICS it is likely that /usr/bin/libglade-config is provided by 
 libglade-devel-0.17-15.


That is correct, subject of course to RPM version differences between
FC3/4 and RHEL3/4.  Using RPM, this can be confirmed with:

$ rpm -q --whatprovides /usr/bin/libglade-config
libglade-devel-0.17-16

which is the output on FC4.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Getting eps into Word documents.

2005-10-03 Thread Marc Schwartz (via MN)
On Mon, 2005-10-03 at 16:31 -0300, Rolf Turner wrote:
 A student in one of my courses has asked me about getting R graphics
 output (under Linux) into a Word document.  I.e. she wants to do her
 R thing under Linux, but then do her word processing using Word.
 
 Scanning around the r-help archives I encountered an inquiry about
 this topic --- eps into Word documents --- from Paul Johnson but
 found no replies to it.  I tried contacting him but the email address
 in the archives appeared not to be valid.  Does anyone know a
 satisfactory solution to the problem of including a graphic which
 exists in the form of a *.eps (encapsulated postscript) file into a
 Word document.  If so, would you be willing to share it with me and
 my student?
 
 If so, please be gentle in your explanation.  I am not myself (repeat
 ***NOT***) a user of Word!

Hehe...  :-)

Rolf, just use the guidance provided in ?postscript. In the details
section it indicates:

 The postscript produced by R is EPS (_Encapsulated PostScript_)
 compatible, and can be included into other documents, e.g., into
 LaTeX, using '\includegraphics{filename}'.  For use in this way
 you will probably want to set 'horizontal = FALSE, onefile =
 FALSE, paper = special'.

So use something like the following:


postscript(RPlot.eps, height = 4, width = 4, 
   horizontal = FALSE, onefile = FALSE,
   paper = special)

plot(1:5)

dev.off()


You can then import the .eps file into Word or most other such
applications that can import encapsulated postscript files.

The recent versions of Word will also automatically generate a bitmapped
preview of the plot upon import.  BTW, OO.org 2.0, which is in late beta
testing now, also generates EPS preview images upon import.

The key to doing this successfully is using the arguments to
postscript() as defined above. I have never had a problem with this.

More information is available from MS here:

http://support.microsoft.com/?kbid=290362

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Getting eps into Word documents.

2005-10-03 Thread Marc Schwartz (via MN)
On Mon, 2005-10-03 at 22:00 +0100, Ted Harding wrote:
 Rolf ( Marc)
 
 On 03-Oct-05 Marc Schwartz (via MN) wrote:
  On Mon, 2005-10-03 at 16:31 -0300, Rolf Turner wrote:
  A student in one of my courses has asked me about getting R graphics
  output (under Linux) into a Word document.  I.e. she wants to do her
  R thing under Linux, but then do her word processing using Word.
  
  Scanning around the r-help archives I encountered an inquiry about
  this topic --- eps into Word documents --- from Paul Johnson but
  found no replies to it.  I tried contacting him but the email address
  in the archives appeared not to be valid.  Does anyone know a
  satisfactory solution to the problem of including a graphic which
  exists in the form of a *.eps (encapsulated postscript) file into a
  Word document.  If so, would you be willing to share it with me and
  my student?
  
  If so, please be gentle in your explanation.  I am not myself (repeat
  ***NOT***) a user of Word!
  
  Hehe...  :-)
  
  Rolf, just use the guidance provided in ?postscript. In the details
  section it indicates:
  
   The postscript produced by R is EPS (_Encapsulated PostScript_)
   compatible, and can be included into other documents, e.g., into
   LaTeX, using '\includegraphics{filename}'.  For use in this way
   you will probably want to set 'horizontal = FALSE, onefile =
   FALSE, paper = special'.
  
  So use something like the following:
  
  
  postscript(RPlot.eps, height = 4, width = 4, 
 horizontal = FALSE, onefile = FALSE,
 paper = special)
  
  plot(1:5)
  
  dev.off()
  
  
  You can then import the .eps file into Word or most other such
  applications that can import encapsulated postscript files.
  
  The recent versions of Word will also automatically generate a
  bitmapped preview of the plot upon import.  BTW, OO.org 2.0,
  which is in late beta testing now, also generates EPS preview
  images upon import.
  
  The key to doing this successfully is using the arguments to
  postscript() as defined above. I have never had a problem with this.
  
  More information is available from MS here:
  
  http://support.microsoft.com/?kbid=290362
  
  HTH,
  
  Marc Schwartz
 
 This suggestion could be problematic in that

Ted,

 a) According to the MS web site above, it applies to recent Word
(Office 2002/2003) or possibly earlier depending on installed
graphics filters.

The Word EPS filter has been around for some time. I recall using it
years ago. Note that it is listed on that site in both categories of
filters for some reason.  In the older versions of Word, there was no
preview image generated, hence you ended up with a box/frame place
holder of sorts, unless you added a preview image before importing.

That being said, one does need to install the import filters from the
Office CD, which may not be part of the default settings on all prior
versions. It may require going back into the Office set up program to
install them.

 b) It won't work anyway if printed to a non-PostScript printer.

True, which is the case irrespective of Word/Windows. If you don't have
a PS printer locally or accessible via network, you can always install a
PS printer driver and print to a file, which can then be printed by a
third party if required.  

If one has ps2pdf/Ghostscript available, you can also convert the PS
file to PDF and then print it via Acrobat or other PDF viewers.  

 If either of these applies to Rolf's student, she could have problems.
 
 [Just to add my own disclaimer: the only version of Word I'm in
 any position to ever touch, and then only if driven to, belongs
 to Office 98; and I'm sure that this doesn't know a thing about
 PostScript!]
 
 Another option to consider, since she's doing her R work on Linux,
 is that recent versions of the ImageMagick program 'convert' have
 the capability to convert EPS into WMF (Windows Metafile; use
 file extension .wmf for 'convert') or EMF (Enhanced Metafile);
 use file extension .emf. The gubbins is built in to a file
 wmf.so in the lib/ImageMagick tree.
 
 Likewise, the program 'pstoedit' can do it (to .wmf or .emf),
 using library /usr/local/lib/pstoedit/libp2edrvwmf.so (on my machine).
 
 Most Linux distributions these days come with ImageMagick and
 pstoedit. If not already installed in the machine she's using it
 should be straightforward to get this dome.
 
 Hoping this helps,
 Ted.

Ted, the critical issue with doing the WMF/EMF conversion on Linux is
that the libEMF stuff generates lousy quality (and visually bitmapped)
output. I have tried it in the past and gave up, staying with EPS/PDF
formats for vector graphics. If one wants to convert the EPS file to
WMF/EMF, it is best done on Windows using native Windows drivers, rather
than on Linux.

When SVG graphics go mainstream and cross-platform, that should help
significantly with this whole issue.

HTH,

Marc

__
R-help

Re: [R] boxplot and xlim confusion?

2005-09-29 Thread Marc Schwartz (via MN)
On Thu, 2005-09-29 at 15:28 +0200, Karin Lagesen wrote:
 Petr Pikal [EMAIL PROTECTED] writes:
 
  Hi Karin
 
  I did not have seen any answer for your question yet so here is a 
  try.
 
  I gues you want the horizotal layout or your boxplot.
 
  boxplot(split(rnorm(30), rep(1:3, each=10)), horizontal =T, 
  names=letters[1:3])
 
  boxplot(split(rnorm(30), rep(1:3, each=10)), horizontal =T, 
  names=c(NA,b,NA))
 
 These look like the plots I want, yes. However, is there a way of
 locking the x-axis? When I do these several times in a row, the range
 of the x axis moves with the data being plotted. If I set xlim in the
 boxplot command, it makes no difference what so ever.

That's because when you use 'horizontal = TRUE' in boxplot(), the x and
y axes are rotated. Thus you need to set 'ylim' and not 'xlim', where
ylim is the range of values in your data irrespective of the axis
orientation.

This is done in bxp(), which is the actual function doing the plotting
for boxplot(). In bxp(), there is the follow code snippet:

if (horizontal) 
plot.window(ylim = c(0.5, n + 0.5), xlim = ylim, 
log = log)
else plot.window(xlim = c(0.5, n + 0.5), ylim = ylim, 
 log = log)

Note in the first case, that the xlim value for plot.window() is set to
the ylim value passed from boxplot() or from bxp(), if called directly.

So, the above examples by Petr should be something like:

boxplot(split(rnorm(30), rep(1:3, each=10)), horizontal = TRUE, 
names=letters[1:3], ylim = c(-4, 4))

boxplot(split(rnorm(30), rep(1:3, each=10)), horizontal =TRUE, 
names=c(NA,b,NA), ylim = c(-4, 4))


HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Display values in piechart/barplot

2005-09-29 Thread Marc Schwartz (via MN)
On Thu, 2005-09-29 at 14:34 +0200, Volker Rehbock wrote:
 Is it possible to automatically display the underlying values of a 
 piechart/barplot in the graphic? If so, which package/function/argument do I 
 need for it?
 Thanks,
 Volker


Using pie charts are not a particularly good way of displaying data,
even though the pie() function is available in R. See ?pie for more
information on this.

For barplots, the following provide two approaches:


# Place the bar values above the bars
# Note that I set 'ylim' to make room for 
# the text labels above the bars

vals - 1:5
names(vals) - LETTERS[1:5]
mp - barplot(vals, ylim = c(0, 6))
text(mp, vals, labels = vals, pos = 3)



# Place the bar values below the x axis

vals - 1:5
names(vals) - LETTERS[1:5]
mp - barplot(vals)
mtext(side = 1, at = mp, text = vals, line = 3)


Note that barplot() returns the bar midpoints, so you can use these
values for annotation placement.

See ?barplot, ?text and ?mtext for more information.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] How to add frame (frame.plot=T) to Plot.Design?

2005-09-29 Thread Marc Schwartz (via MN)
On Thu, 2005-09-29 at 18:36 +0200, Jan Verbesselt wrote:
 Hi R-help,
 
  
 
 When using the package Design and the plot.Design function all the graphs of
 an lrm.fit (lrm()) are plotted without a frame (only with axes). How can the
 frame be added to these plots?
 
  
 
 In the plot.default function the frame can be added/or removed via the
 frame.plot=T/F, respectively.
 
 e.g., plot(1,axes=T,frame.plot=T). How can this be done with
 plot.Design(lrm.fit)
 
  
 
 Thanks,
 
 Jan


See ?box

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Binary Logit Regression with R

2005-09-29 Thread Marc Schwartz (via MN)
On Thu, 2005-09-29 at 18:08 -0400, Johann Park wrote:
 Hi to all, 
 
 I am a PH.D Student doing statistical analysis.
 I am totally new to R. I previously use Stata and am changing into R. I 
 ususally do with logit regression with binary dependent variable (war 
 occurence:1 or 0). 
 
 I just want to know command to do that. More sepcifically, 
 
 Let say, my Y is war occurence (occur=1, otherwise 0). And my independent 
 variables (Xs) are trade, democracy, military poweretc. 
 
 In Stata, I do like what follows: 
 
 logit war trade democracy militarypower... 
 
 Then I will get results. 
 
 What are the equivalent command in R? 
 
 Many thanks, 
 
 JP

See ?glm in the base stats package or ?lrm in Frank Harrell's Design
package on CRAN.

BTW, doing:

  help.search(logit)

or 

  RSiteSearch(logit)

would provide you with the ability to do keyword searches of your
current R installation or the online e-mail list and documentation
archives, respectively.

You should also review Chapter 11 - Statistical Models in R in An
Introduction to R, which is installed with R or available online under
the Documentation/Manuals link on the main R web site.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] correct syntax

2005-09-28 Thread Marc Schwartz (via MN)
On Thu, 2005-09-29 at 06:57 +1000, Stephen Choularton wrote:
 Hi 
  
 I am trying to produce a little table like this:
  
 0 1
 0  601   408
 1  290   2655
  
 but I cannot get the syntax right.  
  
 Can anyone help.
  
 Stephen


Do you have the raw data or only the tabulated counts?

If you have the raw data, see ?table for ways to create n-dimensional
tables.

If you only have the tabulated counts, then you can use matrix():

 mat - matrix(c(601, 290, 408, 2655), ncol = 2)
 dimnames(mat) - list(c(0, 1), c(0, 1))

 mat
01
0 601  408
1 290 2655


See ?matrix for more information.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Using unsplit - unsplit does not seem to reverse the effect of split

2005-09-27 Thread Marc Schwartz (via MN)
On Tue, 2005-09-27 at 19:12 +0200, Søren Højsgaard wrote:
 In data OME in MASS I would like to extract the first 5 observations
 per subject (=ID). So I do
  
 library(MASS)
 OMEsub - split(OME, OME$ID)
 OMEsub - lapply(OMEsub,function(x)x[1:5,])
 unsplit(OMEsub, OME$ID)
 
 - which results in 
  
 [[1]]
 [1] 1 1 1 1 1
 [[2]]
 [1] 30 30 30 30 30
 [[3]]
 [1] low low low low low
 Levels: N/A high low
 [[4]]
 [1] 35 35 40 40 45
 [[5]]
 [1] coherent   incoherent coherent   incoherent coherent  
 Levels: coherent incoherent
 [[6]]
 [1] 1 4 0 1 2
 
 
  
 [[1094]]
 [1] 4 5 5 5 2
 [[1095]]
 [1] 100 100 100 100 100
 [[1096]]
 [1] 18 18 18 18 18
 [[1097]]
 [1] N/A N/A N/A N/A N/A
 Levels: N/A high low
 There were 50 or more warnings (use warnings() to see the first 50)
 
 warnings()
 Warning messages:
 1: number of items to replace is not a multiple of replacement length
 2: number of items to replace is not a multiple of replacement length
 3: number of items to replace is not a multiple of replacement length
 
  
 According to documentation unsplit is the reverse of split, but I must
 be missing a point somewhere... Can anyone help? Thanks in advance.
 Søren


If you read the documentation for split/unsplit, you will also note that
in the Details section it says:

 'unsplit' works only with lists of vectors

as opposed to lists of data frames, which is the result of your split()
operation.

Also note that in the Value section, it indicates:

'unsplit' returns a vector for which 'split(x, f)' equals 'value'

as opposed to unsplit returning a data frame.


Thus, use:

  OME1 - do.call(rbind, OMEsub)

where OME1 will be the result of rbind()'ing the data frames in the
OMEsub list.

See ?do.call for more information.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Help: x11 position in the Unix environment

2005-09-26 Thread Marc Schwartz (via MN)
On Mon, 2005-09-26 at 17:45 +0200, Shengzhe Wu wrote:
 Hello,
 
 In the Unix environment, I open a window by x11(). May I specify the
 position of this window by specifying the position of the top left of
 the window as in Windows environment? Or some other parameters can be
 used to do that?
 
 Thank you,
 Shengzhe


I don't believe so. In general, under Unix/Linux, the Window Manager
determines window positioning upon startup unless the application
overrides this behavior. Some applications let you specify application
window positioning via command line 'geometry' arguments or via the use
of an .Xresources file. 

Some WM's provide more or less functionality for this behavior relative
to user customization. For example, Sawfish provides quite a bit,
whereas Metacity hides much of it.

You may want to check the documentation for the WM that you are using.

There is also an application called Devil's Pie:

  http://www.burtonini.com/blog/computers/devilspie

which provides additional Sawfish-like customization for window
positioning, etc. However, this is global for a given window, not on a
per instance basis.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Fligner-Policello robust rank test

2005-09-22 Thread Marc Schwartz (via MN)
On Thu, 2005-09-22 at 10:44 -0400, Elisabetta Manduchi wrote:
 Can anybody tell me if there is an R implementation of the 
 Fligner-Policello robust rank test?
 Thanks,
 Elisabetta


I don't know about the Fligner-Policello test, but the Fligner-Killeen
test has been implemented in fligner.test(). From ?fligner.test:


The Fligner-Killeen (median) test has been determined in a simulation
study as one of the many tests for homogeneity of variances which is
most robust against departures from normality, see Conover, Johnson 
Johnson (1981). It is a k-sample simple linear rank which uses the ranks
of the absolute values of the centered samples and weights a(i) =
qnorm((1 + i/(n+1))/2). The version implemented here uses median
centering in each of the samples (F-K:med X^2 in the reference).


FWIW, in a Google search, I located the following SAS macro by Paul von
Hippel, which includes a comment that does not lead to optimism
regarding software implementations:

http://www.sociology.ohio-state.edu/people/ptv/macros/fligner_policello.htm

The above should not be overly difficult to convert to R, presuming you
know SAS macros...


HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] how to extract the column name or value from the numerical value of the matrix

2005-09-19 Thread Marc Schwartz (via MN)
On Mon, 2005-09-19 at 12:30 -0700, shanmuha boopathy wrote:
 Dear sir,
  i have a matrix like
  x-c(1:20)
 A-matrix(x,4,5)
  A
  [,1] [,2] [,3] [,4] [,5]
 [1,]159   13   17
 [2,]26   10   14   18
 [3,]37   11   15   19
 [4,]48   12   16   20
 
 I want to extract the column value for the matrix
 value 11...
 
 or the row value for 14..
 how it is possible?
 
 thanks for your help
 
 with regards,
 boopathy.


Why did you copy r-devel? r-help is the appropriate forum.


See ?which and take note of the 'arr.ind' argument:

 which(A == 11, arr.ind = TRUE)
 row col
[1,]   3   3


 which(A == 14, arr.ind = TRUE)
 row col
[1,]   2   4


HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] How do I get the row indices?

2005-09-16 Thread Marc Schwartz (via MN)
On Fri, 2005-09-16 at 10:34 -0700, Martin Lam wrote:
 Hi,
 
 I was wondering if it's possible to get the row
 numbers from a filtering. Here's an example:
 
 # give me the rows with sepal.length == 6.2
 iris[(iris[,1]==6.2),]
 
 # output
 Sepal.Length Sepal.Width Petal.Length Petal.Width 
   Species
 69   6.2 2.2  4.5 1.5
 versicolor
 98   6.2 2.9  4.3 1.3
 versicolor
 127  6.2 2.8  4.8 1.8 
 virginica
 149  6.2 3.4  5.4 2.3 
 virginica
 
 What I really want is that it return the row numbers:
 69, 98, 127, 149.
 
 Thanks in advance,
 
 Martin

Martin,

First: Be very, very careful when performing exact equalities on
floating point numbers. They won't always result in the answer you
expect. For more information see R FAQ 7.31: Why doesn't R think these
numbers are equal?

Second:

See ?all.equal, ?sapply and ?which. Here is a possible vectorized
solution:

 which(sapply(iris[, 1], function(x) isTRUE(all.equal(x, 6.2
[1]  69  98 127 149


The above applies isTRUE(all.equal(x, 6.2)) for each element 'x' in 
iris[, 1], returning the indices of the TRUE results for the near
equality comparison, based upon the tolerance argument in all.equal().

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Graphics 'snapshots' in Linux?

2005-09-15 Thread Marc Schwartz (via MN)
On Thu, 2005-09-15 at 10:43 -0400, Tyler Smith wrote:
 Hi,
 
 I'm working on a MEPIS (Debian-based Linux) computer, using the 
 emacs/ESS package to do my R work. I've got some plots that I label 
 interactively using the locate function. With the Windows GUI there is 
 an option to take a snapshot of the graphics output, saving it as an 
 image file. Is there a way to do this with emacs/ESS?
 
 Thanks,
 
 Tyler

Tyler,

Take a look at ?dev.copy2eps and on the same page dev.copy(), which
enable you to copy the current X11 plot supported output devices.

You could do something like the following for an EPS file:

plot(1:5)
text(locator(1), Place Text Here)
dev.copy2eps(file = MyPlot.eps)


or the following for a PNG file:

par(bg = white)
plot(1:5)
text(locator(1), Place Text Here)
dev.copy(device = png, file = MyPlot.png)
dev.off()


Note that in the first example, dev.off() is not required, as the EPS
output device is closed after the call.

Also, note in the second example, you will need to set the background to
white (unless already specified for whatever color you may be using), as
the default output file will have a transparent background, even though
the png() function shows the default as white. If my memory is correct
this is because the X11 device itself has a transparent background by
default and this is what is copied.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Forcing hist()

2005-09-14 Thread Marc Schwartz (via MN)
On Wed, 2005-09-14 at 20:17 +0200, Par Leijonhufvud wrote:
 I'm trying to create histogram (using hist()) that fullfill the following
 criteria:
 
 * data is on a ordinal scale (1, 2, 3, 4, 5)
 * I want bars centered over the number on the x-axis
 * I want 5 bars of equal width
 
 I have tried various versions of the hist() command, with no luck. what
 am I missing?
 
 /Par


More than likely, you want to use barplot() and not hist():

Try:

  barplot(1:5, names.arg = 1:5)

See ?barplot for more information.

HTH,

Marc Schwartz

P.S. To R Core: It probably makes sense to add barplot to the See Also
section of ?hist, since hist is listed in the See Also for ?barplot.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] axis of plot

2005-09-01 Thread Marc Schwartz (via MN)
On Thu, 2005-09-01 at 16:58 +0100, [EMAIL PROTECTED] wrote:
 hy,
 
 I need to have the 0 on the bottom left corner of the graph being
 joined , not with this little hole between x axis and y axis...
 
 I've saw this question with the answer one time but i'm unable to find
 it again..
 
 
 thks.
 guillaume

See 'xaxs' and 'yaxs' in ?par:

plot(1:5, xaxs = i, yaxs = i, xlim = c(0, 5), ylim = c(0, 5))

By default, both are set to r, which adds +/- 4% to the range of each
axis.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] png scaling problem

2005-09-01 Thread Marc Schwartz (via MN)
On Thu, 2005-09-01 at 21:51 +0200, Knut Krueger wrote:
 scaling-4
 xywidth-480
 resolution-150
 png(filename = c:/r/anschluss/plots/4.png, width = xywidth*scaling, 
 height = xywidth*scaling,pointsize = 12, bg = white, res = 
 resolution*scaling)
 ..
 
 barplot(xrow,col = barcolors,cex.axis=scaling, ylab=mean time till 
 attachment in sec,cex.lab=1.2*scaling) 
 
 I tried to scale the barplot but there is one strange result:
 scaling=1
 http://biostatistic.de/temp/1.png--- the ylab is ok
 
 scaling=2
 http://biostatistic.de/temp/2.png--- the ylab is not ok
 
 scaling=4
 http://biostatistic.de/temp/4.png--- the ylab is terrible
 
 is there any better solution to scale the resolution and the width/height?
 
 
 with regards
 Knut

Probably a better first question is, why are you using a bitmapped
graphics format if you need the image to be re-scaled? In general,
bitmapped graphics do not resize well, though if you have a specific
need and know a target image size, you can adjust various parameters to
look decent. Are you going to view these images in a web browser, where
you are concerned with display size and resolution?

From your e-mail headers it appears you are on Windows. If you need a
re-sizable graphic, use a vector based format such as wmf/emf,
especially if you need the graphics embedded in a Windows application
such as Word or Powerpoint. This is the default format under Windows
when copying and pasting a graphic between applications. You can then,
fairly freely, resize the image in the target application as you may
require.

If you are going to print the graphic directly or include it in a
document for printing (as opposed to just viewing it), then use PDF or
Postscript. The latter in EPS format, can be imported into many Windows
applications like Word, including the generation of a preview image.
However, they don't look good for direct use in presentations (unless
you print to a PS file and then convert to PDF for viewing).

See ?Devices for more information.

With a better idea of how you plan to use the graphic(s), we can offer
more specific recommendations on how to proceed.

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] crosstab for n-way contingency tables

2005-08-30 Thread Marc Schwartz (via MN)
On Tue, 2005-08-30 at 12:28 -0700, Isotta Felli wrote:
 Dear list.
  
 New to R, I'm looking for a way of using crosstab to output
 low-dimensional (higher than 2) contingency tables (frequencies,
 per-cents by rows, % by columns, mean, quantiles) I'm looking for
 something of the following sort
  
 dataframe: singers, 
 categorical variates: voice category (soprano,mezzo-soprano, ...) ,
 voice type( drammatic, spinto, lirico-spinto, lirico, leggero), school
 (german, italian, french, russian, anglo-saxon, other);repertory
 (opera, Lieder, oratorio, operetta)
 continuous variate: age
  
 I would like to tabulate the frequencies (relative percentages)  say
 in the following way
 columns: school  repertory
 rows :  voice category  voice type
  
 or to output in the cells of the above table, the statistics
 (mean/median/quantiles) for age
  
 
 I've seen that the function bwplot(age~school | repertory, data=
 singers, layout=c(4,2)) would do graphically something similar to what
 I want, but I desire the output also in tabular form
  
 Thanks
  
 Isotta


I don't know that you will find a single function that will do all of
what you desire, but you can look at the ctab() function, which is in
the 'catspec' package on CRAN by John Hendrickx. This will do multi-way
tables with summary row/col statistics.

There is also the CrossTable() function in the 'gmodels' package on
CRAN, though this will only do one way and two way tables, with summary
row/col statistics.

Neither of the above provide typical summary output for continuous data.
They are more for categorical variables.

For simple count multi-way output, you can also look at the ftable()
function which is in the base package. Also look at the summary()
function in the base package which will provide range, mean, quantile
data for continuous variables. It may just be a matter of formatting the
output in a fashion that you desire on a post analysis basis.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] ylim for graphic

2005-08-29 Thread Marc Schwartz (via MN)
On Mon, 2005-08-29 at 12:58 -0400, Doran, Harold wrote:
 Dear list:
 
 I have some data for which I am generating a series of barplots for
 percentages. One issue that I am dealing with is that I am trying to get
 the legend to print in a fixed location for each chart generated by the
 data. Because these charts are being created in a loop, with different
 data, my code searches the data to identify the maximum value in the
 data and then print the data values 5 points above the tallest bar. 
 
 Now, in situations where the largest value is 100, I needed to create
 the y-axis high enough to accommodate the legend w/o crowding up the
 data or the bars. So, I have ylim =c(0,200) and then place the legend at
 c(2,150). Visually, this places things exactly where I want them. But,
 it seems silly to have a y-axis labeled as high as 200%. I'm certain
 there is a smarter technique. Is it possible to place the legend at a
 location higher than 100 to avoid the crowding of the bars and the data
 and then also have the labels for the y-axis not print after the value
 100%? 
 
 In looking at ?legend I didn't see an option that would address this.
 Below is some code that you can use to create a similar chart.
 
 Thanks,
 Harold
 
 
 math.bar - c(53,31,55,28,55,100)
 apmxpmeet - c(47, 50, 49, 50, 49, 46)
 par(ps=10)
 math.bar - rbind(math.bar, apmxpmeet)
 math.barplot - barplot(math.bar, beside=T, col=c('blue','orange'),
 names=c('Grade \n 3','Grade \n 4','Grade \n 5','Grade \n 6','Grade \n
 7','Grade \n 8'), 
 ylim=c(0,200), ylab=Percentage, xlab=Grade Level)
 tot - round(math.bar,digits=0)
 graph.max - max(math.bar, apmxpmeet, na.rm=T)
 text(math.barplot, graph.max+5, tot, xpd = TRUE, col = c(blue,
 orange) )
 legend(2,150,legend=(c(Label A, Average)), fill=c(blue,orange))

Harold,

A few thoughts:

1. Instead of fixing the y axis max value at 200, simply set ylim to
c(0, max(math.bar * 1.2)) or a similar constant. In this case, you get
an extra 20% above the max(y) value for the legend placement.


2. In the legend call, use:

  legend(topleft, legend=(c(Label A, Average)),
  fill = c(blue,orange))

This will place the legend at the topleft of the plot region, rather
than you having to calculate the x and y coords. If you want it moved in
from the upper left hand corner, you can use the 'inset' argument as
well, which moves the legend by a proportion of the plot region limits
(0 - 1):

legend(topleft, legend=(c(Label A, Average)),
  fill = c(blue,orange), inset = .1)


3. You can use barplot(..., yaxt = n) to have the y axis not drawn and
then use axis() to place the labels at locations of your choosing, which
do not need to run the full length of the axis range:

 barplot(1:5, yaxt = n, ylim = c(0, 10))
 axis(2, at = 0:5)


4. You can place the legend outside the plot region, enabling you to
keep the y axis range to =100. This would need some tweaking, but the
idea is the same:

 # Increase the size of the top margin
 par(mar = c(5, 4, 8, 2) + 0.1)

 # Draw a barplot
 barplot(1:5)

 # Disable clipping outside the plot region
 par(xpd = TRUE)

 # Now draw the legend, but move it up by 30% from the top left
 legend(topleft, legend = LETTERS[1:5], inset = c(0, -.3))


You could also place the legend to the right or left of the plot region
if you prefer, adjusting the above accordingly.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Matrix oriented computing

2005-08-26 Thread Marc Schwartz (via MN)
On Fri, 2005-08-26 at 15:25 +0200, Peter Dalgaard wrote:
 Marc Schwartz [EMAIL PROTECTED] writes:
 
  x - c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 
 0.95, 0.975, 0.99, 0.995)
  
  df - c(1:100)
  
  mat - sapply(x, qchisq, df)
 
   dim(mat)
  [1] 100  11
   
   str(mat)
   num [1:100, 1:11] 3.93e-05 1.00e-02 7.17e-02 2.07e-01 4.12e-01 ...
 
 outer() is perhaps a more natural first try... It does give the
 transpose of the sapply approach, though. 
 
 round(t(outer(x,df,qchisq)),2)
 
 should be close. You should likely add dimnames.



What I find interesting, is that I would have intuitively expected
outer() to be faster than sapply().  However:


  system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE)
[1] 0.01 0.00 0.01 0.00 0.00
 
  system.time(mat1 - round(t(outer(x, df, qchisq)), 2), 
   gcFirst = TRUE)
[1] 0.01 0.00 0.01 0.00 0.00
 
# No round() or t() to test for overhead
  system.time(mat2 - outer(x, df, qchisq), gcFirst = TRUE)
[1] 0.01 0.00 0.02 0.00 0.00


# Bear in mind the round() on mat1 above
 all.equal(mat, mat1)
[1] Mean relative  difference: 4.905485e-05

 all.equal(mat, t(mat2))
[1] TRUE


Even when increasing the size of 'df' to 1:1000:


  system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE)
[1] 0.16 0.01 0.16 0.00 0.00

  system.time(mat1 - round(t(outer(x, df, qchisq)), 2), gcFirst =
TRUE)
[1] 0.16 0.00 0.18 0.00 0.00
 
  # No round() or t() to test for overhead
  system.time(mat2 - outer(x, df, qchisq), gcFirst = TRUE)
[1] 0.16 0.01 0.17 0.00 0.00



It also seems that, at least in this case, t() and round() do not add
much overhead.

Best regards,

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Matrix oriented computing

2005-08-26 Thread Marc Schwartz (via MN)
Prof. Ripley,

Excellent point. Neither sapply() nor outer() are the elephant in the
room in this situation.



On Fri, 2005-08-26 at 16:55 +0100, Prof Brian Ripley wrote:
 Try profiling.  Doing this many times to get an overview, e.g. for sapply 
 with df=1:1000:
 
 %   self%   total
   self secondstotalsecondsname
   98.26  6.78 98.26  6.78 FUN
0.58  0.04  0.58  0.04 unlist
0.29  0.02  0.87  0.06 as.vector
0.29  0.02  0.58  0.04 names-
0.29  0.02  0.29  0.02 names-.default
0.29  0.02  0.29  0.02 names
 
 so almost all the time is in qchisq.
 
 
 On Fri, 26 Aug 2005, Marc Schwartz (via MN) wrote:
 
  On Fri, 2005-08-26 at 15:25 +0200, Peter Dalgaard wrote:
  Marc Schwartz [EMAIL PROTECTED] writes:
 
  x - c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9,
 0.95, 0.975, 0.99, 0.995)
 
  df - c(1:100)
 
  mat - sapply(x, qchisq, df)
 
  dim(mat)
  [1] 100  11
 
  str(mat)
   num [1:100, 1:11] 3.93e-05 1.00e-02 7.17e-02 2.07e-01 4.12e-01 ...
 
  outer() is perhaps a more natural first try... It does give the
  transpose of the sapply approach, though.
 
  round(t(outer(x,df,qchisq)),2)
 
  should be close. You should likely add dimnames.
 
 
 
  What I find interesting, is that I would have intuitively expected
  outer() to be faster than sapply().  However:
 
 
   system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE)
  [1] 0.01 0.00 0.01 0.00 0.00
 
   system.time(mat1 - round(t(outer(x, df, qchisq)), 2),
gcFirst = TRUE)
  [1] 0.01 0.00 0.01 0.00 0.00
 
  # No round() or t() to test for overhead
   system.time(mat2 - outer(x, df, qchisq), gcFirst = TRUE)
  [1] 0.01 0.00 0.02 0.00 0.00
 
 
  # Bear in mind the round() on mat1 above
  all.equal(mat, mat1)
  [1] Mean relative  difference: 4.905485e-05
 
  all.equal(mat, t(mat2))
  [1] TRUE
 
 
  Even when increasing the size of 'df' to 1:1000:
 
 
   system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE)
  [1] 0.16 0.01 0.16 0.00 0.00
 
   system.time(mat1 - round(t(outer(x, df, qchisq)), 2), gcFirst =
  TRUE)
  [1] 0.16 0.00 0.18 0.00 0.00
 
   # No round() or t() to test for overhead
   system.time(mat2 - outer(x, df, qchisq), gcFirst = TRUE)
  [1] 0.16 0.01 0.17 0.00 0.00
 
 
 
  It also seems that, at least in this case, t() and round() do not add
  much overhead.
 
 Definitely not for such small matrices.


True and both are C functions, which of course helps as well.

Best regards,

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Unpaste Problem

2005-08-26 Thread Marc Schwartz (via MN)
On Fri, 2005-08-26 at 22:39 +0530, A Mani wrote:
 Hello,
 Easy ways to unpaste?
  xp - paste(x2, x3) # x2, x3 are two non-numeric columns.
 .
 .
 xfg - data.frame(xp,sc1, sc2, sc3) # sc1,sc2, sc3 are numeric cols.
 
 I want xp to be split up to form a new dataframe of the form (x3, sc1,
 sc2, sc3).
 IMPORTANT info : elements of xp have the form abcspaceefg, with abc
 in x2 and efg in x3.
 
 Thanks in advance, 


I think I understand what you are trying to do. Hopefully the below may
be helpful:

# Create the data frame with 3 rows
x2 - letters[1:3]
x3 - LETTERS[1:3]
xp - paste(x2, x3)

sc1 - rnorm(3)
sc2 - rnorm(3)
sc3 - rnorm(3)

xfg - data.frame(xp, sc1, sc2, sc3)

 xfg
   xpsc1sc2sc3
1 a A  1.3479123 -1.0642578  0.2479218
2 b B -0.1586587  1.1237456 -1.3952176
3 c C  2.7807484 -0.9778066 -1.9322279


# Use strsplit() here to break apart 'xp' using   as the split
# Use [ in sapply() to get the second (2) element from each
# 'xp' list pair. Note that I use as.character() here, since xfg$xp is 
# a factor.
# See the output of: strsplit(as.character(xfg$xp),  )
# for some insight into this approach

xp.split - sapply(strsplit(as.character(xfg$xp),  ), [, 2)


# show post split values
 xp.split
[1] A B C


# Now cbind it all together into a new data frame
# don't include column 1 from xfg (xp)
xfg.new - cbind(xp.split, xfg[, -1])


 xfg.new
  xp.splitsc1sc2sc3
1A  1.3479123 -1.0642578  0.2479218
2B -0.1586587  1.1237456 -1.3952176
3C  2.7807484 -0.9778066 -1.9322279


See ?strsplit for more information.
 
HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] help on retrieving output from by( ) for regression

2005-08-25 Thread Marc Schwartz (via MN)
That works also (using the example in ?lmList)

library(lme4)

?lmList

fm1 - lmList(breaks ~ wool | tension, warpbreaks)


However, one still would need to use either sapply() or lapply() as
below to get the details that Krishna is looking for. 

'fm1' above is a list of models (S4 class 'lmList') not overly different
from 'tmp' below, which is a list of models (S3 class 'by').

If you review str(fm1) and str(tmp), you would note that they are
virtually identical, save the use of slots, etc.

HTH,

Marc Schwartz

On Thu, 2005-08-25 at 12:17 -0400, Randy Johnson wrote:
 What about using lmList from the lme4 package?
 
 Randy
 
 
 On 8/25/05 9:44 AM, Marc Schwartz [EMAIL PROTECTED] wrote:
 
  Also, looking at the last example in ?by would be helpful:
  
  attach(warpbreaks)
  tmp - by(warpbreaks, tension, function(x) lm(breaks ~ wool, data = x))
  
  # To get coefficients:
  sapply(tmp, coef)
  
  # To get residuals:
  sapply(tmp, resid)
  
  # To get the model matrix:
  sapply(tmp, model.matrix)
  
  
  
  To get the summary() output, I suspect that using:
  
lapply(tmp, summary)
  
  would yield more familiar output as compared to using:
  
sapply(tmp, summary)
  
  The output from the latter might require a bit more navigation through
  the resultant matrix, depending upon how the output is to be ultimately
  used.
  
  HTH,
  
  Marc Schwartz
  
  
  
  On Thu, 2005-08-25 at 14:57 +0200, TEMPL Matthias wrote:
  Look more carefully at
  ?lm 
  at the See Also section ...
  
  X - rnorm(30)
  Y - rnorm(30)
  lm(Y~X)
  summary(lm(Y~X))
  
  Best,
  Matthias
  
  
  Hi all 
  
  I used a function
  qtrregr - by(AB, AB$qtr, function(AB) lm(AB$X~AB$Y))
  
  objective is to run a regression on quartery subsets in the
  data set AB, having variables X and Y, grouped by variable qtr.
  
  Now i retrieved the output using qtrregr, however it only
  showed the coefficients (intercept and B) with out
  significant levels and residuals for each qtr. Can some on
  help me on how can retrieve the detailed regression output.
  
  rgds
  
  snvk
  
  
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide! 
  http://www.R-project.org/posting-guide.html
  
 
 ~~
 Randy Johnson
 Laboratory of Genomic Diversity
 NCI-Frederick
 Bldg 560, Rm 11-85
 Frederick, MD 21702
 (301)846-1304
 ~~
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] help on retrieving output from by( ) for regression

2005-08-25 Thread Marc Schwartz (via MN)
Sorry to reply to my own post, but in reviewing the NAMESPACE file for
lme4, it looks like Doug is perhaps planning to add additional model
object accessor methods for the lmList class, including resid() and
summary(), which are commented out now. coef() is available presently. 

So when in place, it would appear that the use of lmList would be
advantageous over the use of by().

Marc

On Thu, 2005-08-25 at 11:52 -0500, Marc Schwartz (via MN) wrote:
 That works also (using the example in ?lmList)
 
 library(lme4)
 
 ?lmList
 
 fm1 - lmList(breaks ~ wool | tension, warpbreaks)
 
 
 However, one still would need to use either sapply() or lapply() as
 below to get the details that Krishna is looking for. 
 
 'fm1' above is a list of models (S4 class 'lmList') not overly different
 from 'tmp' below, which is a list of models (S3 class 'by').
 
 If you review str(fm1) and str(tmp), you would note that they are
 virtually identical, save the use of slots, etc.
 
 HTH,
 
 Marc Schwartz
 
 On Thu, 2005-08-25 at 12:17 -0400, Randy Johnson wrote:
  What about using lmList from the lme4 package?
  
  Randy
  
  
  On 8/25/05 9:44 AM, Marc Schwartz [EMAIL PROTECTED] wrote:
  
   Also, looking at the last example in ?by would be helpful:
   
   attach(warpbreaks)
   tmp - by(warpbreaks, tension, function(x) lm(breaks ~ wool, data = x))
   
   # To get coefficients:
   sapply(tmp, coef)
   
   # To get residuals:
   sapply(tmp, resid)
   
   # To get the model matrix:
   sapply(tmp, model.matrix)
   
   
   
   To get the summary() output, I suspect that using:
   
 lapply(tmp, summary)
   
   would yield more familiar output as compared to using:
   
 sapply(tmp, summary)
   
   The output from the latter might require a bit more navigation through
   the resultant matrix, depending upon how the output is to be ultimately
   used.
   
   HTH,
   
   Marc Schwartz
   
   
   
   On Thu, 2005-08-25 at 14:57 +0200, TEMPL Matthias wrote:
   Look more carefully at
   ?lm 
   at the See Also section ...
   
   X - rnorm(30)
   Y - rnorm(30)
   lm(Y~X)
   summary(lm(Y~X))
   
   Best,
   Matthias
   
   
   Hi all 
   
   I used a function
   qtrregr - by(AB, AB$qtr, function(AB) lm(AB$X~AB$Y))
   
   objective is to run a regression on quartery subsets in the
   data set AB, having variables X and Y, grouped by variable qtr.
   
   Now i retrieved the output using qtrregr, however it only
   showed the coefficients (intercept and B) with out
   significant levels and residuals for each qtr. Can some on
   help me on how can retrieve the detailed regression output.
   
   rgds
   
   snvk
   
   
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide! 
   http://www.R-project.org/posting-guide.html
   
  
  ~~
  Randy Johnson
  Laboratory of Genomic Diversity
  NCI-Frederick
  Bldg 560, Rm 11-85
  Frederick, MD 21702
  (301)846-1304
  ~~
  
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide! 
  http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Remove NAs from Barplot

2005-08-24 Thread Marc Schwartz (via MN)
On Wed, 2005-08-24 at 13:06 -0400, Doran, Harold wrote:
 Dear List:
 
 I'm creating a series of barplots using Sweave that must assume a
 standard format. This is student achievement data and the x-axis must
 include all grades 3 to 8. In some cases, the data for a grade (or more
 than one grade) are missing in the vector math.bar, but are never
 missing for the vector apmxpmeet. The following sample code illustrates
 the issue.
 
 Using the code below to create the plot works fine. However, the
 following command is designed to place the data onto the plot:
 
 text(math.barplot, graph.max+5, format(tot), xpd = TRUE, col = c(blue,
 orange) )
 
 This does work, but, it also places 'NA's on the plot. Is there a way to
 place the data onto the plot such that the numbers appear, but for the
 NAs not to appear?  I've searched through ?text but was not able to find
 a solution. 
 
 math.bar - c(NA,NA,NA,NA,55,48)
 apmxpmeet - c(47, 50, 49, 50, 49, 46)
 
 par(ps=10)
 math.bar - rbind(math.bar, apmxpmeet)
 math.barplot - barplot(math.bar, beside=T, col=c('blue','orange'),
 names=c('Grade \n 3','Grade \n 4','Grade \n 5','Grade \n 6','Grade \n
 7','Grade \n 8'), 
 ylim=c(0,120), ylab=Percentage, xlab=Grade Level)
 tot - round(math.bar,digits=0)
 graph.max - max(math.bar, apmxpmeet, na.rm=T)
 text(math.barplot, graph.max+5, format(tot), xpd = TRUE, col = c(blue,
 orange) )
 tmp$sch_nameS05 - as.character(tmp$sch_nameS05)
 legend(2,120,legend=(c(tmp$sch_nameS05, Average)),
 fill=c(blue,orange))
 
 Thanks,
 Harold


Harold,

Get rid of the 'format(tot)' in your text() call:

text(math.barplot, graph.max+5, tot, xpd = TRUE, 
 col = c(blue, orange))


By using format(tot), you are coercing the values to text. Thus:

 format(tot)
  [,1] [,2] [,3] [,4] [,5] [,6]
math.bar  NA NA NA NA 55 48
apmxpmeet 47 50 49 50 49 46

versus:

 tot
  [,1] [,2] [,3] [,4] [,5] [,6]
math.barNA   NA   NA   NA   55   48
apmxpmeet   47   50   49   50   49   46


In the first case, text() will draw NA, whereas in the second, the NA
is ignored.

Also, your code above is not entirely reproducible, since:

 tmp$sch_nameS05 - as.character(tmp$sch_nameS05)
Error: Object tmp not found


HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Seeking help with an apparently simple recoding problem

2005-08-23 Thread Marc Schwartz (via MN)
On Tue, 2005-08-23 at 10:12 -0500, Greg Blevins wrote:
 Hello,
 
 I have struggled, for longer than I care to admit, with this seemingly
 simple problem, but I cannot find a solution other than the use of
 long drawn out ifelse statements.  I know there has to be a better
 way.  Here is stripped down version of the situation:
 
 I start with:
 a - c(1,0,1,0,0,0,0)
 b - c(1,1,1,1,0,0,0)
 c - c(1,1,0,1,0,0,0)
 
 rbind(a,b,c)
   [,1] [,2] [,3] [,4] [,5] [,6] [,7]
 a1010000
 b1111000
 c1101000
 
 I refer to column 3 as the target column, which at the end of the day
 will be NA in all instances.
 
 The logic involved:
 
 1) If columns 2, 4 thru 7 do NOT include at least one '1', then recode
 columns 2 thru 7 to NA and recode column 1 to code 2.
 
 2) If columns 2, 4 thru 7 contain at least one '1', then recode column
 3 to NA.
 
 Desired recoding of the above three rows:
   [,1][,2][,3][,4][,5][,6][,7]
 a2NA  NA  NA  NA  NA  NA
 b11   NA  1   0   0   0
 c11   NA  1   0   0   0
 
 Thanks you.


You left out one key detail in the explanation, which is that the
recoding appears to be done on a row by row basis, not overall.

The following gets the job done, though there may be a more efficient
approach:

 a - c(1,0,1,0,0,0,0)
 b - c(1,1,1,1,0,0,0)
 c - c(1,1,0,1,0,0,0)
 
 d - rbind(a, b, c)
 
 d
  [,1] [,2] [,3] [,4] [,5] [,6] [,7]
a1010000
b1111000
c1101000
 
 
 mod.row - function(x)
 {
   if (all(x[c(2, 4:7)] == 0))
   {
 x[2:7] - NA
 x[1] - 2
   } else {
   x[3] - NA
   }
 
   x
 }
 
 y - t(apply(d, 1, mod.row))

 y
  [,1] [,2] [,3] [,4] [,5] [,6] [,7]
a2   NA   NA   NA   NA   NA   NA
b11   NA1000
c11   NA1000


HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] plotting with same axes

2005-08-22 Thread Marc Schwartz (via MN)
On Mon, 2005-08-22 at 12:58 -0400, [EMAIL PROTECTED] wrote:
 I have used the 'par' command to 
 overlay one plot on another.  But how 
 do I overlay it with the x-values 
 plotted at the same points on the 
 x-axis?
 
 Thank you,
 Steven

The specific answer depends to an extent on the graphics you are
plotting and whether there is a need to actually use par(new).

In some cases there is an 'add = TRUE' option to plotting functions that
allows for this. 

In others, you can create the initial plot and then use lines(),
points() or the like to simply add further plotting components to the
existing plot.

If all else fails and you need to use par(new), then you will want to
explicitly define the x and y axes to the same ranges by using the same
'xlim' and 'ylim' arguments in _both_ plotting functions. 

A reproducible example of what it is you are doing would enable us to
give you more specific guidance, if the above is not helpful.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Copying rows from a matrix using a vector of indices

2005-08-17 Thread Marc Schwartz (via MN)
On Wed, 2005-08-17 at 12:02 -0700, Martin Lam wrote:
 Hi,
 
 I am trying to use a vector of indices to select some
 rows from a matrix. But before I can do that I somehow
 need to convert 'combinations' into a list, since
 'mode(combinations)' says it's 'numerical'. Any idea
 how I can do that?
 
 library(combinat)
 
 combinations - t(combn(8,2))
 
 indices - c(sample(1:length(combinations),10))
 
 # convert
 ???
 
 subset - combinations[indices]
 
 Thanks in advance,
 
 Martin


Your goal is not entirely clear.  Given your code above, you end up
with:

 library(combinat)
 
 combinations - t(combn(8,2))
 combinations
  [,1] [,2]
 [1,]12
 [2,]13
 [3,]14
 [4,]15
 [5,]16
 [6,]17
 [7,]18
 [8,]23
 [9,]24
[10,]25
[11,]26
[12,]27
[13,]28
[14,]34
[15,]35
[16,]36
[17,]37
[18,]38
[19,]45
[20,]46
[21,]47
[22,]48
[23,]56
[24,]57
[25,]58
[26,]67
[27,]68
[28,]78

combinations is a 28 x 2 matrix (or a 1d vector of length 56).

Your use of sample() with '1:length(combinations)' is the same as 1:56.

 length(combinations)
[1] 56

Hence, you end up with:

 set.seed(128)
 indices - sample(1:length(combinations), 10)

 indices
 [1] 41 53 38 46 50 44 25 11 43  7


I suspect that you really want to use '1:nrow(combinations)', which
yields 1:28.

 nrow(combinations)
[1] 28


Thus,

 set.seed(128)
 indices - sample(1:nrow(combinations), 10)

 indices
 [1] 21 26 18 22 23 20 11  5 27  3


Now, you can use:

 subset - combinations[indices, ]
 subset
  [,1] [,2]
 [1,]47
 [2,]67
 [3,]38
 [4,]48
 [5,]56
 [6,]46
 [7,]26
 [8,]16
 [9,]68
[10,]14

which yields the rows from combinations, defined by 'indices'.

If correct, please review An Introduction to R and ?[ for more
information on subsetting objects in R.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] recategorizing a vector into discrete states

2005-08-16 Thread Marc Schwartz (via MN)
On Tue, 2005-08-16 at 13:18 -0400, Allan Strand wrote:
 Hi All,
 
 I'm trying to take a numerical vector  and  produce a new vector of the 
 same length where each element in the first is placed into a category 
 given by a 'breaks-like' vector. The values in the result should equal 
 the lower bounds of each category as defined in the breaks vector.
 
   I suspect that a vectorized solution  is pretty simple, but I can't 
 seem to figure it out today.  Here is an example of my problem:
 
 Vector 'a' is the original vector.  Vector 'b' gives the lower bounds 
 of the categories.  Vector 'c' is the result I am seeking.
 
 a - c(0.9, 11, 1.2, 2.4, 4.0, 5.0, 7.3, 8.1, 3.3, 4.5)
 b - c(0, 2, 4, 6, 8)
 
 c - c(0, 8, 0, 2, 4, 4, 6, 8, 2, 4)
 
 Any suggestions would be greatly appreciated.
 
 cheers,
 Allan Strand

See ?cut

 a - c(0.9, 11, 1.2, 2.4, 4.0, 5.0, 7.3, 8.1, 3.3, 4.5)
 b - c(0, 2, 4, 6, 8)

 cut(a, c(b, Inf), labels = b, right = FALSE)
 [1] 0 8 0 2 4 4 6 8 2 4
Levels: 0 2 4 6 8

Note that cut() returns a factor.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] use different symbols for frequency in a plot

2005-08-08 Thread Marc Schwartz (via MN)
On Mon, 2005-08-08 at 11:57 -0700, Kerry Bush wrote:
 suppose I have the following data
 
 x-c(rep(.1,5),rep(.2,6),rep(.4,10),rep(.5,20))
 y-c(rep(.5,3),rep(.6,8),rep(1.2,8),rep(2.5,18),rep(3,4))
 
 If I plot(x,y) in R, I will only get seven distinct
 points. What I want to do is to use different symbols
 to show the frequency at each point.
 
 e.g. if the frequncey is between 1 and 5, then I plot
 the point as a circle; if the frequency is between 6
 and 10, then I plot the point as a square; if the
 frequency is above 10, then I plot the point as a
 triangle.
 
 I am not sure how to do this in R. Can anybody help me?

You might want to review this recent post by Deepayan Sarkar:

https://stat.ethz.ch/pipermail/r-help/2005-July/074042.html

with modest modification you can replace his example, which plots the
frequencies with:

x - c(rep(.1,5),rep(.2,6),rep(.4,10),rep(.5,20))
y - c(rep(.5,3),rep(.6,8),rep(1.2,8),rep(2.5,18),rep(3,4))

temp - data.frame(x, y)

foo - subset(as.data.frame(table(temp)), Freq  0)

 foo
 x   y Freq
1  0.1 0.53
5  0.1 0.62
6  0.2 0.66
11 0.4 1.28
15 0.4 2.52
16 0.5 2.5   16
20 0.5   34

# Use cut() to create the bins and specify the plotting symbols
# for each bin, which are the 'label' values
foo$sym - with(foo, cut(Freq, c(0, 5, 10, Inf), 
 labels = c(21, 22, 24))) 


# convert 'foo' to all numeric from factors above for plotting
foo - apply(foo, 2, function(x) as.numeric(as.character(x)))


 foo
 x   y Freq sym
1  0.1 0.53  21
5  0.1 0.62  21
6  0.2 0.66  22
11 0.4 1.28  22
15 0.4 2.52  21
16 0.5 2.5   16  24
20 0.5   34  21


# Now do the plot. Keep in mind that 'foo' is now
# a matrix, rather than a data frame
plot(foo[, x], foo[, y], pch = foo[, sym])


HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] plotting horizontally

2005-07-26 Thread Marc Schwartz (via MN)
On Tue, 2005-07-26 at 08:58 +0700, [EMAIL PROTECTED] wrote:
 Hello,
 
 Is there any way to use plot() horizontally similar to
 boxplot(., horiz=TRUE)?  I want to use to illustrate
 the distribution of y-values on an adjacent plot using
 layout().
 
 Thanks in advance for any help.
 
 Regards,
 
 --Dan

Have you looked at the last example in ?layout, which has a scatterplot
with marginal histograms for the x and y axes?

Alternatively, you can generally reverse the x and y values for most
plots. For example, compare:

 d - density(rnorm(100))

 # Vertical density plot
 plot(d)

 # Horizontal density plot 
 plot(d$y, d$x, type = l, xlab = Density, 
  main = density(y = rnorm(100)),
  ylab = paste(N =, d$n,   Bandwidth =, formatC(d$bw)))


HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] elegant solution to transform vector into percentages?

2005-07-26 Thread Marc Schwartz (via MN)
On Tue, 2005-07-26 at 15:48 -0400, [EMAIL PROTECTED] wrote:
 Hi,
 
 I am looking for an elegant way to transform a vector into percentages of 
 values
 that meet certain criteria.
 
 store-c(1,1.4,3,1.1,0.3,0.6,4,5)
 
 # now I want to get the precentages of values
 # that fall into the categories =M , M  =N , N
 # let
 M -.8
 N - 1.2
 # In my real example I have many more of these cutoff-points
 
 # What I did is:
 
 out - matrix(NA,1,3)
 
   out[1,1] - ( (sum(store=M))  /length(store) )*100
   out[1,2] - ( (sum(store M   store= N   ))  /length(store) )*100
   out[1,3] - ( (sum(store N))  /length(store) )*100
 
 colnames(out)-c(percent=M,percentM  =N,percentN)
 out
 
 But this gets very tedious if I have many cutoff-points. Does anybody know a
 more elegant way to do this task?
 
 Thanks so much.
 
 Cheers,
 Jens

Something alone the lines of:

store - c(1, 1.4, 3, 1.1, 0.3, 0.6, 4, 5)
M - 0.8
N - 1.2

x - hist(store, br = c(min(store), M, N, max(store)), 
  plot = FALSE)$counts  

pct.x - prop.table(x) * 100

names(pct.x) - c(percent = M,percent  M  = N,percent  N)

 pct.x
  percent = M percent  M  = Npercent  N 
25 25 50 


I think that should do it. See ?hist for more information and take note
of the 'include.lowest' and 'right' arguments relative to whether or not
values are or are not included in the specified intervals.

See ?prop.table as well.

Also be acutely aware of potential problems with exact equality
comparisons with floating point numbers and the break points...if you
have a float value equal to a breakpoint in your vector.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] problems with submitting an eps-file created in R

2005-07-22 Thread Marc Schwartz (via MN)
On Fri, 2005-07-22 at 16:26 +0200, Christian Bieli wrote:
 Dear all
 
 I've got some problems submitting a manuscript, because I can't
 manage 
 creating the favourable eps-file of a graph created in R. The
 journal's 
 graphic requirements are as followed:
 format: eps
 width: max. 6 inches
 resolution: min. 1000 dpi
 supported fonts: Arial, Courier, Helvetica, Symbol, Times, Charcoal, 
 Chicago, Geneva, Georgia, Monaco, Zapf, New York
 
 Itried to ways of getting appropriate file:
 
 1.Creating eps-file in R by drawing into a x11-device and then:
 /dev.copy2eps(file = file.eps, onefile = TRUE, paper = a4, family
 = 
 Helvetica,  pointsize=1, print.it = FALSE,  fonts = Helvetica)
 
 /2. Generating a postscript-file in R with /
 //postscript(file = file.ps, onefile = TRUE, paper = a4, family = 
 Helvetica, width = 6, height = 2.2, pointsize=1, print.it = FALSE, 
 fonts = Helvetica)/
 an trying to convert it in ghostview.
 
 Neither approach brought the favoured result. The error message I got 
 from the quality checking program was :
 
 Warning: Document is Missing Non-Standard Font 
 
 One or more non-standard fonts used in this image is not embedded.
 Standard fonts are: Arial, Courier, Helvetica, Symbol, Times,
 Charcoal, Chicago, Geneva, Georgia, Monaco, Zapf, New York.
 
 In order to repair this problem, save your document with fonts
 embedded.
 
 Error: Missing Fonts 
 
 Challenge 
 One or more linked or used fonts cannot be found. This is caused by
 DigitalExpert not being able to locate fonts on your system. All fonts
 used in the document must be active or have a defined and valid path
 so that DigitalExpert can find them.
  
 Solution 
  The only way to repair this problem is to make the fonts available to
 DigitalExpert, and then reprocess the files. In order to make fonts
 active, either activate them using a font management program, or move
 them into the system:fonts folder.
 
 
 
 1. Obviously there's a font problem. Helvetica IS installed in my OS 
 (windows 2000). Why DigitalExpert does not recognize the
 Helvetica 
 font as a standard font? I thought the fonts option would embed the 
 specified font in the file. Am I right or is there another way to
 embed 
 fonts?
 2. Is there a way to set up the resolution of the created ps/eps-file
 in 
 R (until now I did the settings in ghostview)?
 3. I did not find particulars about the pointsize option. What is the 
 effect of changing the pointsize value?
 
 With best regards
 Christian


You need to read the help for postscript(), which specifically tells you
in the Details section to use:

postscript(..., onefile = FALSE, horizontal = FALSE, paper = special)

to generate EPS files.

There is no resolution setting for a postscript file per se, since PS is
a device independent vector based format and the resolution of the
output is dependent upon the target device/viewer. One exception to this
would be the embedding of a bitmapped object in a PS file, but that is
not applicable here.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Question about 'text' (add lm summary to a plot)

2005-07-22 Thread Marc Schwartz (via MN)
Ok guys,

So I played around with this a bit, going back to Dan's original
requirements and using Thomas' do.call() approach with legend(). Gabor's
approach using sapply() will also work here. I have the following:

# Note the leading spaces here for alignment in the table
# This could be automated with formatC() or sprintf()
my.slope.1 -3.22
my.slope.2 -  0.13
my.inter.1 -  -10.66
my.inter.2 -  1.96
my.Rsqua -  0.97

plot(1:5)

L - list(Intercept:,
  Slope:,
  bquote(paste(R^2, :)),
  bquote(.(my.inter.1) %+-% .(my.inter.2)),
  bquote(.(my.slope.1) %+-% .(my.slope.2)),
  bquote(.(my.Rsqua)))

par(family = mono)

legend(topleft, legend = do.call(expression, L), ncol = 2)



Note however, that while using the mono font helps with vertical
alignment of numbers, the +/- sign still comes out in the default font,
which is bold[er] than the text.

If one uses the default font, which is variable spaced, it is
problematic to get the proper alignment for the numbers. I even tried
using phantom(), but that didn't quite get it, since the spacing is
variable, as opposed to LaTeX's mono numeric spacing with default fonts.

Also, note that I am only using two columns, rather than three, since
trying to place the : as a middle column results in spacing that is
too wide, given that the text.width argument is a scalar and is set to
the maximum width of the character vectors.

Note also that even with mono spaced fonts, the exponent in R^2 is still
horizontally smaller than the other characters. Thus, spacing on that
line may also be affected depending upon what else one might attempt.

Not sure where else to go from here.

HTH,

Marc Schwartz

On Fri, 2005-07-22 at 14:01 -0400, Gabor Grothendieck wrote:
 You are right.   One would have to use do.call as you did
 or the sapply method of one of my previous posts:
 
 a - 7
 plot(1)
 L - list(bquote(alpha==.(a)),bquote(alpha^2+1==.(a^2+1)))
 legend(topleft,legend=sapply(L, as.expression))
 
 
 On 7/22/05, Thomas Lumley [EMAIL PROTECTED] wrote:
  On Fri, 22 Jul 2005, Gabor Grothendieck wrote:
  
   I think legend accepts a list argument directly so that could be
   simplified to just:
  
a-7
plot(1)
L - list(bquote(alpha==.(a)),bquote(alpha^2+1==.(a^2+1)))
legend(topleft,legend=L)
  
  Except that it wouldn't then work: the mathematical stuff comes out as
  text.
  
   The same comment seems to apply to my prior suggestion about
   as.expression(bquote(...)), namely that one can just write the
   following as text also supports a list argument:
  
  And this doesn't work either: you end up with %+-% rather than the
  plus-or-minus symbol.
  
  The reason I gave the do.call() version is that I had tried these simpler
  versions and they didn't work.
  
 -thomas
 

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R:plot and dots

2005-07-21 Thread Marc Schwartz (via MN)
On Thu, 2005-07-21 at 16:18 +0200, Clark Allan wrote:
 hi all
 
 a very simple question.
 
 i have plot(x,y)
 
 but i would like to add in on the plot the observation number associated
 with each point.
 
 how can this be done?
 
 /
 allan

If you mean the unique observation number associated with each x,y pair,
you can use:

  text(x, y, labels = ObsNumberVector, pos = 3)

after the plot(x, y) call:

 df - data.frame(x = rnorm(10), y = rnorm(10), ID = 1:10)
 with(df, plot(x, y))
 with(df, text(x, y, labels = ID, pos = 3))

See ?text for more information.  Note that I used pos = 3 which places
the label above the data point. There are other positioning parameters
available, which are noted in the help file.

Note also that you might have to adjust the plot axis limits depending
upon where you place the text and your extreme points.

If you mean the frequency of each x,y pair (if there is more than one
observation per x,y pair), you might want to review Deepayan's recent
post here:

https://stat.ethz.ch/pipermail/r-help/2005-July/074042.html

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Question about 'text' (add lm summary to a plot)

2005-07-21 Thread Marc Schwartz (via MN)
[Note: the initial posts have been re-arranged to attempt to maintain
the flow from top to bottom]

 Dan Bolser writes:
   
   I would like to annotate my plot with a little box containing the slope,
   intercept and R^2 of a lm on the data.
   
   I would like it to look like...
   
++
| Slope :   3.45 +- 0.34 |
| Intercept : -10.43 +- 1.42 |
| R^2   :   0.78 |
++
   
   However I can't make anything this neat, and I can't find out how to
   combine this with symbols for R^2 / +- (plus minus).
   
   Below is my best attempt (which is franky quite pour). Can anyone
   improve on the below?
   
   Specifically, 
   
   aligned text and numbers, 
   aligned decimal places, 
   symbol for R^2 in the text (expression(R^2) seems to fail with
   'paste') and +- 
   
   
   
   Cheers,
   Dan.
   
   
   dat.lm - lm(dat$AVG_CH_PAIRS ~ dat$CHAINS)
   
   abline(coef(dat.lm),lty=2,lwd=1.5)
   
   
   dat.lm.sum - summary(dat.lm)
   dat.lm.sum
   
   attributes(dat.lm.sum)
   
   my.text.1 -
 paste(Slope : , round(dat.lm.sum$coefficients[2],2),
   +/-,  round(dat.lm.sum$coefficients[4],2))
   
   my.text.2 -
 paste(Intercept : , round(dat.lm.sum$coefficients[1],2),
   +/-,  round(dat.lm.sum$coefficients[3],2))
   
   my.text.3 -
 paste(R^2 : ,   round(dat.lm.sum$r.squared,2))
   
   my.text.1
   my.text.2
   my.text.3
   
   
   ## Add legend
   text(x=3,
y=300,
paste(my.text.1,
  my.text.2,
  my.text.3,
  sep=\n),
adj=c(0,0),
cex=1


 On Thu, 21 Jul 2005, Christoph Buser wrote:
 
 Dear Dan
 
 I can only help you with your third problem, expression and
 paste. You can use:
 
 plot(1:5,1:5, type = n)
 text(2,4,expression(paste(Slope : , 3.45%+-%0.34, sep = )), pos = 4)
 text(2,3.8,expression(paste(Intercept : , -10.43%+-%1.42)), pos = 4)
 text(2,3.6,expression(paste(R^2,: , 0.78, sep = )), pos = 4)

 I do not have an elegant solution for the alignment.


On Thu, 2005-07-21 at 19:55 +0100, Dan Bolser wrote:
 Cheers for this.
 
 I was trying to get it to work, but the problem is that I need to replace
 the values above with variables, from the following code...
 
 
 dat.lm - lm(dat$AVG_CH_PAIRS ~ dat$CHAINS)
 dat.lm.sum - summary(dat.lm)
 
 my.slope.1 - round(dat.lm.sum$coefficients[2],2)
 my.slope.2 - round(dat.lm.sum$coefficients[4],2)
 
 my.inter.1 - round(dat.lm.sum$coefficients[1],2)
 my.inter.2 - round(dat.lm.sum$coefficients[3],2)
 
 my.Rsqua.1 - round(dat.lm.sum$r.squared,2)
 
 
 Anything I try results in either the words 'paste(Slope:, my.slope.1,
 %+-%my.slope.2,sep=)' being written to the plot, or just
 'my.slope.1+-my.slope2' (where the +- is correctly written).
 
 I want to script it up and write all three lines to the plot with
 'sep=\n', rather than deciding three different heights.


 Thanks very much for what you gave, its a good start for me to figure out 
 how I am supposed to be telling R what to do!
 
 Any way to just get fixed width fonts with text? (for the alignment
 problem)


Dan,

Here is one approach. It may not be the best, but it gets the job done.
You can certainly take this and encapsulate it in a function to automate
the text/box placement and to pass values as arguments.

A couple of quick concepts:

1. As far as I know, plotmath cannot do multiple lines, so each line in
your box needs to be done separately.

2. The horizontal alignment is a bit problematic when using expression()
or bquote() since I don't believe that multiple spaces are honored as
such after parsing. Thus I break up each component (label, : and
values) into separate text() calls. The labels are left justified.

3. The alignment for the numeric values are done with right
justification. So, as long as you use a consistent number of decimals in
the value outputs (2 here), you should be OK. This means you might need
to use formatC() or sprintf() to control the numeric output values on
either side of the +/- sign.

4. In the variable replacement, note the use of substitute() and the
list of x and y arguments as replacement values in the expressions.



# Set your values
my.slope.1 - 3.45
my.slope.2 - 0.34

my.inter.1 - -10.43
my.inter.2 - 1.42

my.Rsqua - 0.78


# Create the initial plot as per Christoph's post
plot(1:5, 1:5, type = n)


#-
# Do the Slope
#-

text(1, 4.5,  Slope, pos = 4)
text(2, 4.5, :)
text(3, 4.5, substitute(x %+-% y, 
list(x = my.slope.1, 
 y = my.slope.2)),
 pos = 2)


#-
# Do the Intercept
#-

text(1, 4.25, Intercept, pos = 4)
text(2, 4.25, :)
text(3, 4.25, substitute(x %+-% y, 
 list(x = my.inter.1, 
  y = my.inter.2)),
 pos = 2)



Re: [R] using argument names (of indeterminate number) within a function

2005-07-19 Thread Marc Schwartz (via MN)
On Tue, 2005-07-19 at 17:29 +0200, Dirk Enzmann wrote:
 Although I tried to find an answer in the manuals and archives, I cannot 
 solve this (please excuse that my English and/or R programming skills 
 are not good enough to state my problem more clearly):
 
 I want to write a function with an indeterminate (not pre-defined) 
 number of arguments and think that I should use the ... construct and 
 the match.call() function. The goal is to write a function that (among 
 other things) uses cbind() to combine a not pre-defined number of 
 vectors specified in the function call. For example, if my vectors are 
 x1, x2, x3, ... xn, within the function I want to use cbind(x1, x2) or 
 cbind(x1, x3, x5) or ... depending on the vector names I use in the 
 funcion call. Additionally, the function has other arguments.
 
 In the archives I found the following thread (followed by Marc Schwartz)
 
 http://finzi.psych.upenn.edu/R/Rhelp02a/archive/15186.html
 [R] returning argument names from Peter Dalgaard BSA on 2003-04-10 (stdin)
 
 that seems to contain the solution to my problem, but I am stuck because 
   sapply(match.call()[-1], deparse) gives me a vector of strings and I 
 don't know how to use the names in this vector in the cbind() function.
 
 Up to now my (clearly deficit) function looks like:
 
 test - function(..., mvalid=1)
 {
args = sapply(match.call()[-1], deparse)
 # and here, I don't know how the vector names in args
 # can be used in the cbind() function to follow:
 #
 # temp - cbind( ???
if (mvalid  1)
{
 #  here it goes on
}
 }
 
 Ultimately, I want that the function can be called like
 test(x1,x2,mvalid=1)
 or
 test(x1,x3,x5,mavlid=2)
 and that within the function
 cbind(x1,x2)
 or cbind(x1,x3,x5)
 will be used.
 
 Can someone give and explain an example / a solution on how to proceed?

Hi Dirk,

How about this:

my.cbind - function(...)
{
  do.call(cbind, list(...))
}

 a - 1:10
 b - 11:20
 c - 21:30
 d - 31:40

 my.cbind(a, b)
  [,1] [,2]
 [1,]1   11
 [2,]2   12
 [3,]3   13
 [4,]4   14
 [5,]5   15
 [6,]6   16
 [7,]7   17
 [8,]8   18
 [9,]9   19
[10,]   10   20

 my.cbind(b, c, d)
  [,1] [,2] [,3]
 [1,]   11   21   31
 [2,]   12   22   32
 [3,]   13   23   33
 [4,]   14   24   34
 [5,]   15   25   35
 [6,]   16   26   36
 [7,]   17   27   37
 [8,]   18   28   38
 [9,]   19   29   39
[10,]   20   30   40

 my.cbind(a, b, c, d)
  [,1] [,2] [,3] [,4]
 [1,]1   11   21   31
 [2,]2   12   22   32
 [3,]3   13   23   33
 [4,]4   14   24   34
 [5,]5   15   25   35
 [6,]6   16   26   36
 [7,]7   17   27   37
 [8,]8   18   28   38
 [9,]9   19   29   39
[10,]   10   20   30   40


The use of list(...) in the function allows you to use list based
functions such as do.call() or lapply() against the argument objects
directly without having to deparse and re-parse the character names of
the arguments, which is the approach that Peter used in his response in
the thread you referenced. 

The OP in that thread wanted the argument names as character vectors, as
opposed to the argument objects themselves, which is what you need here.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] .gct file

2005-07-19 Thread Marc Schwartz (via MN)
On Tue, 2005-07-19 at 12:28 -0400, Duncan Murdoch wrote:
 On 7/19/2005 12:10 PM, mark salsburg wrote:
  I have two files to compare, one is a regular txt file that I can read
  in no prob.
  
  The other is a .gct file (How do I read in this one?)
  
  I tried a simple
  
  read.table(data.gct, header = T)
  
  How do you suggest reading in this file??
  
 
 .gct is not a standard filename extension.  You need to know what is in 
 that file.  Where did you get it?  What program created it?
 
 Chances are the easiest thing to do is to get the program that created 
 it to export in a well known format, e.g. .csv.
 
 Duncan Murdoch

A quick Google search would suggest Gene Cluster Text file:

http://www.broad.mit.edu/cancer/software/genepattern/tutorial/gp_tutorial_fileformats.html#gct

produced by Gene Pattern:

http://www.broad.mit.edu/cancer/software/genepattern/

If correct, I would point Mark to the Bioconductor folks for more
information and assistance:

http://www.bioconductor.org/

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] .gct file

2005-07-19 Thread Marc Schwartz (via MN)
On Tue, 2005-07-19 at 13:16 -0400, mark salsburg wrote:
 ok so the gct file looks like this:
 
 #1.2  (version number)
 7283 19   (matrix size)
 Name Description Values
   ...  ..
 
 How can I tell R to disregard the first two lines and start reading
 the 3rd line in this gct file. I would just delete them, but I do not
 know how to open a gct. file
 
 thank you
 
 On 7/19/05, Duncan Murdoch [EMAIL PROTECTED] wrote:
  On 7/19/2005 12:10 PM, mark salsburg wrote:
   I have two files to compare, one is a regular txt file that I can read
   in no prob.
  
   The other is a .gct file (How do I read in this one?)
  
   I tried a simple
  
   read.table(data.gct, header = T)
  
   How do you suggest reading in this file??
  
  
  .gct is not a standard filename extension.  You need to know what is in
  that file.  Where did you get it?  What program created it?
  
  Chances are the easiest thing to do is to get the program that created
  it to export in a well known format, e.g. .csv.
  
  Duncan Murdoch


The above would be consistent with the info in my reply.

I guess if the format is consistent, as per Mark's example above, you
can use:

read.table(data.gct, skip = 2, header = TRUE)

which will start by skipping the first two lines and then reading in the
header row and then the data.

See ?read.table

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] .gct file

2005-07-19 Thread Marc Schwartz (via MN)
For the TAB delimited columns, adjust the 'sep' argument to:

read.table(data.gct, skip = 2, header = TRUE, sep = \t)

The 'quote' argument is by default:

quote = \'

which should take care of the quoted strings and bring them in as a
single value.

The above presumes that the header row is also TAB delimited. If not,
you may have to set 'skip = 3' to skip over the header row and manually
set the column names.

HTH,

Marc Schwartz


On Tue, 2005-07-19 at 13:52 -0400, mark salsburg wrote:
 This is all extremely helpful.
 
 The data turns out is a little atypical, the columns are tab-delemited
 except for the description columns
 
 
 DATA1.gct looks like this
 
 #1.2
 23 3423
 NAME DESCRIPTION VALUE
 gene1 a protein inducer 1123
 .  . ..
 
 How do I get R to read the data as tab delemited, but read in the 2nd
 coloumn as one value based on the quotation marks..
 
 thanks..
 
 On 7/19/05, Marc Schwartz (via MN) [EMAIL PROTECTED] wrote:
  On Tue, 2005-07-19 at 13:16 -0400, mark salsburg wrote:
   ok so the gct file looks like this:
  
   #1.2  (version number)
   7283 19   (matrix size)
   Name Description Values
     ...  ..
  
   How can I tell R to disregard the first two lines and start reading
   the 3rd line in this gct file. I would just delete them, but I do not
   know how to open a gct. file
  
   thank you
  
   On 7/19/05, Duncan Murdoch [EMAIL PROTECTED] wrote:
On 7/19/2005 12:10 PM, mark salsburg wrote:
 I have two files to compare, one is a regular txt file that I can read
 in no prob.

 The other is a .gct file (How do I read in this one?)

 I tried a simple

 read.table(data.gct, header = T)

 How do you suggest reading in this file??

   
.gct is not a standard filename extension.  You need to know what is in
that file.  Where did you get it?  What program created it?
   
Chances are the easiest thing to do is to get the program that created
it to export in a well known format, e.g. .csv.
   
Duncan Murdoch
  
  
  The above would be consistent with the info in my reply.
  
  I guess if the format is consistent, as per Mark's example above, you
  can use:
  
  read.table(data.gct, skip = 2, header = TRUE)
  
  which will start by skipping the first two lines and then reading in the
  header row and then the data.
  
  See ?read.table
  
  HTH,
  
  Marc Schwartz
  
  
 

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] .gct file

2005-07-19 Thread Marc Schwartz (via MN)
On Tue, 2005-07-19 at 13:08 -0500, Marc Schwartz (via MN) wrote:
 For the TAB delimited columns, adjust the 'sep' argument to:
 
 read.table(data.gct, skip = 2, header = TRUE, sep = \t)
 
 The 'quote' argument is by default:
 
 quote = \'
 
 which should take care of the quoted strings and bring them in as a
 single value.
 
 The above presumes that the header row is also TAB delimited. If not,
 you may have to set 'skip = 3' to skip over the header row and manually
 set the column names.

One correction. If the final para applies and you need to use 'skip =
3', you would also need to leave out the 'header = TRUE' argument, which
defaults to FALSE.

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Can't get sample function from An Introduction to R to work

2005-07-15 Thread Marc Schwartz (via MN)
On Fri, 2005-07-15 at 15:40 -0500, David Groos wrote:
 I'm trying to figure out R, a piece at a time, hours at a time...  I 
 was trying to copy the sample function in, An Introduction to R  (for 
 version 2.1.0) by W. N. Venables, D. M. Smith, page 42.  Section 10.1 
 Simple examples provides a sample function which I tried to duplicate 
 (I'm using Mac OS X 10.3.9, and R for Mac OS X Aqua GUI v1.11).  The 
 following is what I typed and the last line is R's response when I hit 
 the return key after the penultimate line.  I've re-checked and 
 re-typed the code many times to no avail.  I wasn't able to find this 
 issue using search options, either.  Any help is GREATLY appreciated!
 
   twosam-function(y1, y2) {
 + n1-length(y1);n2 -length(y2)
 + yb1-mean(y1); yb2-mean(y2)
 + s1-var(y1);s2-var(y2)
 + s-((n1-1)*s1 + (n2-1)*s2)/(n1+n2-2)
 + tst-(yb1-yb2)/sqrt(s*(1/n1+1/n2))
 Error: syntax error
 
 David

The code as you have above (without the + on each line) works for me
both in ESS and in the R console under Linux. There should be another
+ on the next line, in anticipation of the remaining two lines:

 tst
}


Try to copy and paste the following into the console as is:

twosam-function(y1, y2) {
 n1 - length(y1);n2 - length(y2)
 yb1 - mean(y1); yb2 - mean(y2)
 s1 - var(y1);s2 - var(y2)
 s - ((n1-1)*s1 + (n2-1)*s2)/(n1+n2-2)
 tst - (yb1-yb2)/sqrt(s*(1/n1+1/n2))

and see what happens. You should be left at a:

+

on a new line again.

It is possible that there is a bug in the Aqua GUI, but not using a Mac,
I cannot replicate it.

You might want to consider subscribing and posting to the R-SIG-Mac
e-mail list, which is focused on Mac users of R. More information is
here:

https://stat.ethz.ch/mailman/listinfo/r-sig-mac

HTH,

Marc Schwartz

P.S. Greetings from Eden Prairie

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] plot the number of replicates at the same point

2005-07-14 Thread Marc Schwartz (via MN)
On Thu, 2005-07-14 at 12:30 -0500, Deepayan Sarkar wrote:
 On 7/14/05, Kerry Bush [EMAIL PROTECTED] wrote:
  Thank you for thinking about the problem for me.
  However, I have found that your method doesn't work at
  all.
  
  You may test the following example:
  
  x1=c(0.6,0.4,.4,.4,.2,.2,.2,0,0)
  x2=c(0.4,.2,.4,.6,0,.2,.4,0,.2)
  x1=rep(x1,4)
  x2=rep(x2,4)
  temp=data.frame(x1,x2)
  temp1=table(temp)
  plot(temp$x1,temp$x2,cex=0)
  text(as.numeric(rownames(temp1)),
  as.numeric(colnames(temp1)), temp1)
  
  what I got here is not what I wanted. You may compare
  with
  plot(x1,x2)
  
  I actually want some plots similar to what SAS proc
  plot produced.
  
  Does anybody have a clue of how to do this easily in
  R?
 
 Try this:
 
 
 x1=c(0.6,0.4,.4,.4,.2,.2,.2,0,0)
 x2=c(0.4,.2,.4,.6,0,.2,.4,0,.2)
 x1=rep(x1,4)
 x2=rep(x2,4)
 temp=data.frame(x1,x2)
 
 foo - subset(as.data.frame(table(temp)), Freq  0)
 foo$x1 - as.numeric(as.character(foo$x1))
 foo$x2 - as.numeric(as.character(foo$x2))
 
 with(foo, plot(x1, x2, type = n))
 with(foo, text(x1, x2, lab = Freq))
 
 
 -Deepayan


Great solution Deepayan. I was just in the process of working on
something similar.

One quick modification is that the two plot()/text() functions can be
combined into:

  with(foo, plot(x1, x2, pch = as.character(Freq)))

Best regards,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] plot the number of replicates at the same point

2005-07-14 Thread Marc Schwartz (via MN)
On Thu, 2005-07-14 at 15:08 -0500, Deepayan Sarkar wrote:
 On 7/14/05, Marc Schwartz (via MN) [EMAIL PROTECTED] wrote:
  On Thu, 2005-07-14 at 12:30 -0500, Deepayan Sarkar wrote:
   On 7/14/05, Kerry Bush [EMAIL PROTECTED] wrote:
Thank you for thinking about the problem for me.
However, I have found that your method doesn't work at
all.
   
You may test the following example:
   
x1=c(0.6,0.4,.4,.4,.2,.2,.2,0,0)
x2=c(0.4,.2,.4,.6,0,.2,.4,0,.2)
x1=rep(x1,4)
x2=rep(x2,4)
temp=data.frame(x1,x2)
temp1=table(temp)
plot(temp$x1,temp$x2,cex=0)
text(as.numeric(rownames(temp1)),
as.numeric(colnames(temp1)), temp1)
   
what I got here is not what I wanted. You may compare
with
plot(x1,x2)
   
I actually want some plots similar to what SAS proc
plot produced.
   
Does anybody have a clue of how to do this easily in
R?
  
   Try this:
  
  
   x1=c(0.6,0.4,.4,.4,.2,.2,.2,0,0)
   x2=c(0.4,.2,.4,.6,0,.2,.4,0,.2)
   x1=rep(x1,4)
   x2=rep(x2,4)
   temp=data.frame(x1,x2)
  
   foo - subset(as.data.frame(table(temp)), Freq  0)
   foo$x1 - as.numeric(as.character(foo$x1))
   foo$x2 - as.numeric(as.character(foo$x2))
  
   with(foo, plot(x1, x2, type = n))
   with(foo, text(x1, x2, lab = Freq))
  
  
   -Deepayan
  
  
  Great solution Deepayan. I was just in the process of working on
  something similar.
  
  One quick modification is that the two plot()/text() functions can be
  combined into:
  
with(foo, plot(x1, x2, pch = as.character(Freq)))
 
 Only as long as all(Freq  10) (I've been bitten by this before. :-) ).
 
 Deepayan


D'oh!

Indeed you are correct, since 'pch' is a single character:

  plot(1:20, pch = as.character(1:20))

as opposed to:

  plot(1:20, type = n)
  text(1:20, lab = 1:20)

Thanks for the correction.

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] question for IF ELSE usage

2005-07-12 Thread Marc Schwartz (via MN)
On Tue, 2005-07-12 at 16:22 +0200, ecoinfo wrote:
 Hi R users,
  Maybe the question is too simple.
  In a IF ... ELSE ... statement if(cond) cons.expr else alt.expr, IF and 
 ELSE should be at the same line? 
 For example,
  if (x1==12)
 {
 y1 - 5 
 }else
 {
 y1 - 3
 }
  is right, while
  if (x1==12)
 {
 y1 - 5 
 }
 else # Error: syntax error
 {
 y1 - 3
 }
  is wrong? 
  Thanks

Note the following from the Details section of ?if

Note that it is a common mistake to forget to put braces ({ .. })
around your statements, e.g., after if(..) or for(). In particular,
you should not have a newline between } and else to avoid a syntax error
in entering a if ... else construct at the keyboard or via source. For
that reason, one (somewhat extreme) attitude of defensive programming is
to always use braces, e.g., for if clauses.


One other approach is the following:

if (x1 == 12) 
{
  y1 - 5 
} else {
  y1 - 3
}

Note the presence of both braces on the 'else' line.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] simple question: reading lower triangular matrix

2005-07-12 Thread Marc Schwartz (via MN)
On Tue, 2005-07-12 at 14:37 -0300, Rogério Rosa da Silva wrote:
 Dear list,
 
 I will like to learn how to read a lower triangular matrix in R. The
 input file *.txt have the following format:
 
  A   B   C   D   E
 A   0   
 B   10
 C   250   
 D   3680  
 E   479   100
 
 
 How this can be done?
 
 Thanks in advance for your help
 
 Rogério


I don't know that this is the easiest way of doing it, but here is one
approach:

# I saved your data above in a file called test.txt

# Read the first line of test.txt to get the colnames as chars
col.names - unlist(read.table(test.txt, nrow = 1, as.is = TRUE))

# now read the rest of the file using 'fill = TRUE' to pad lines
# with NAs 
# skip the first line
# set the row.names as the first column in the text file
# coerce to a matrix
df - as.matrix(read.table(test.txt, fill = TRUE, skip = 1, 
row.names = 1))

# Now set the colnames of df
colnames(df) - col.names

 df
  A  B  C  D  E
A 0 NA NA NA NA
B 1  0 NA NA NA
C 2  5  0 NA NA
D 3  6  8  0 NA
E 4  7  9 10  0

If you should further want to set the diagonal to NA:

 diag(df) - NA

 df
   A  B  C  D  E
A NA NA NA NA NA
B  1 NA NA NA NA
C  2  5 NA NA NA
D  3  6  8 NA NA
E  4  7  9 10 NA


See ?read.table for more information on the file reading part.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Plotting Character Variable

2005-07-07 Thread Marc Schwartz (via MN)
On Thu, 2005-07-07 at 11:47 -0400, [EMAIL PROTECTED] wrote:
 Any ideas about the following problem:
 
 I have a matrix (A) that looks like this:
 
 gene_names values
 hsa-mir-124  0.3
 hsa-mir-234  0.1
 hsa-mir-344  0.4
 hsa-mir-333  0.7
 .  ...
 
 (This is a 2 by 22283 matrix: quite large)

To split hairs, it would be a 22283 by 2 object in R's [row, column]
approach to indexing.

I am also presuming that A is a data frame, since you seem to have two
different data types above, with gene_names being a factor?

 I would like to plot the values, but output the gene_names as the plotting
 symbol. I have tried regular x,y plots, but since the gene_names are quite
 large and there are 22283 of them, it's impossible to fit them on the x-axis.
 
 Basically, can I plot the above matrix
 
 plot(gene_names, value) where the gene_names are used as the plotting symbol.
 
 thank you,

Well...

I would defer to those with more experience in plotting genetic data,
but from a practical standpoint, it seems to me to be highly problematic
to plot 20,000 data points with labels and have them be human readable
without an STMunless you have _very_ wide paper on a large format
plotter  ;-)

That being said, one approach is to rotate the x axis labels vertically,
to make more room, while using points for the plotting symbols:

# Adjust bottom margin to make room for vertical labels
par(mar = c(7, 4, 4, 2))

plot(1:nrow(A), A$values, xaxt = n, ann = FALSE, las = 2)

# use 'las = 3' to rotate the labels
axis(1, at = 1:nrow(A), 
 labels = as.character(A$gene_names), las = 3)


You might want to review some of the tools available at the Bioconductor
site to see if there are specialized plotting functions for this type of
data:

http://www.bioconductor.org/

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] xmat[1, 2:3] - NULL

2005-07-07 Thread Marc Schwartz (via MN)
On Thu, 2005-07-07 at 10:20 -0700, Mikkel Grum wrote:
 I have a situation where I'm filling out a dataframe
 from a database. Sometimes the database query doesn't
 get anything, so I end up trying to place NULL in the
 dataframe like below.
 
  temp - NULL
  xmat - as.data.frame(matrix(NA, 2, 3))
  xmat[1, 2:3] - temp
 Error in if (m  n * p  (n * p)%%m)
 stop(gettextf(replacement has %d items, need %d,  : 
 missing value where TRUE/FALSE needed
 
 I can't get the programme to accept that sometimes
 what the query looks for just doesn't exist, and I
 just want to move on to the next calculation leaving
 the dataframe with a missing value in the given cell.
 It's a real show stopper and I haven't found a way
 round it.
 
 Best wishes,
 Mikkel
 
 PS. I'm using dbGetQuery to query an SQLite database.

NULL represents a zero length object in R.

Thus, trying to set only the first row in a data frame to NULL makes no
sense, since you cannot have a 0 length object that also has a single
row (as you seem to be trying to do above).

Since a data frame is a series of lists, you could do the following:

 temp - NULL
 xmat - as.data.frame(matrix(NA, 2, 3))

 xmat
  V1 V2 V3
1 NA NA NA
2 NA NA NA

 xmat[, 1] - temp

 xmat
  V2 V3
1 NA NA
2 NA NA

which removes the first column in the data frame. This is the same as:

 xmat[, -1]
  V2 V3
1 NA NA
2 NA NA


You could also set the entire xmat to NULL as follows:

 xmat
  V1 V2 V3
1 NA NA NA
2 NA NA NA

 xmat - NULL

 xmat
NULL


You can then test to see if 'xmat' is a NULL:

 is.null(xmat)
[1] TRUE

and base a boolean expression and resultant action on that result:

if (!is.null(xmat))
{
  do_calculations...
}

If your calculations are on a row by row basis, where NA's represent
missing data, you can also use one of several functions to eliminate
those rows. See ?na.action, ?na.omit and ?complete.cases for more
information and examples.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] the format of the result

2005-07-01 Thread Marc Schwartz (via MN)
On Fri, 2005-07-01 at 19:40 +0800, ronggui wrote:
 I write a function to get the frequency and prop of a variable.
 
 freq-function(x,digits=3)
 {naa-is.na(x)
 nas-sum(naa)
 if (any(naa))
 x-x[!naa]
 n-length(x)
 ta-table(x)
 prop-prop.table(ta)*100
 res-rbind(ta,prop)
 rownames(res)-c(Freq,Prop)
 cat(Missing value(s) are,nas,.\n)
 cat(Valid case(s) are,n,.\n)
 cat(Total case(s) are,(n+nas),.\n\n)
 print(res,digits=(digits+2))
 cat(\n)
 }
 
  freq(sample(letters[1:3],48,T),2)
 Missing value(s) are 0 .
 Valid case(s) are 48 .
 Total case(s) are 48 .
 
  a b c
 Freq 11.00 20.00 17.00
 Prop 22.92 41.67 35.42
 
 and i want the result to be like
  a  b c
 Freq 11.00  20.00  17.00
 Prop 22.92% 41.67% 35.42%
 
 how should i change my function to get what i want?


Here is a modification of the function that I think should work. Note
that part of the output formatting process has to take into account the
a priori unknowns involving your 'digits' argument, the lengths of the
dimnames resulting from the table and the lengths of the frequency
counts in the table. Thus, a fair amount of the code is establishing the
'width' argument, which is then used in formatC() so that the output can
be column aligned properly.

Note that by default, table() will exclude NA, so you do not need to
subset 'x' before using table().

Also, note that I change Prop to Pct.


freq - function(x, digits = 3)
{
  n - length(x)
  missing - sum(is.na(x))
  ta - table(x)
  pct - prop.table(ta) * 100

  width - max(nchar(unlist(dimnames(ta))) + 1,
   nchar(ta) + digits + 1,
   5 + digits)
  
  Vals - paste(formatC(unlist(dimnames(ta)), format = s,
width = width),
collapse =   )

  Freq - paste(formatC(ta, format = f, digits = digits,
width = width),
collapse =   )

  Pct - paste(formatC(pct, format = f, digits = digits,
   width = width),
   %, sep = , collapse =  )

  cat(Missing value(s) are, missing, .\n)
  cat(Valid case(s) are, n - missing,.\n)
  cat(Total case(s) are, n, .\n\n)

  cat(, Vals, \n)
  cat(Freq, Freq, \n)
  cat(Pct , Pct, \n)
  cat(\n)
}


Thus:


 freq(sample(letters[1:3], 48, TRUE), 2)
Missing value(s) are 0 .
Valid case(s) are 48 .
Total case(s) are 48 .

   abc 
Freq   28.00 8.0012.00 
Pct58.33%   16.67%   25.00% 


 freq(sample(c(letters[1:3], NA), 1000, TRUE), 2)
Missing value(s) are 257 .
Valid case(s) are 743 .
Total case(s) are 1000 .

   abc 
Freq  250.00   218.00   275.00 
Pct33.65%   29.34%   37.01% 


 freq(iris$Species)
Missing value(s) are 0 .
Valid case(s) are 150 .
Total case(s) are 150 .

  setosa   versicolorvirginica 
Freq  50.000   50.000   50.000 
Pct   33.333%  33.333%  33.333% 


 freq(iris$Species, 0)
Missing value(s) are 0 .
Valid case(s) are 150 .
Total case(s) are 150 .

  setosa   versicolorvirginica 
Freq  50   50   50 
Pct   33%  33%  33% 


HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] How to rotate the axisnames in a BARPLOT

2005-06-30 Thread Marc Schwartz (via MN)
On Thu, 2005-06-30 at 23:05 +0200, [EMAIL PROTECTED] wrote:
 Hi all,
 
 - how can I do a barplot with rotated axis labels? I've seen the example for
 just a plot in the FAQ, but I'll missing the coordinates to plot my text at
 the right position beneath the bars. 
 Is there any (easy?) solution?
 
 - how can I set the y-axis in a barplot to logarithmic scale?
 
 Many thanks in advance!
 
 Best Regards
  Tom

The same concept for rotating axis labels in a regular plot per the FAQ
can be used in a barplot, as long as you know that barplot() returns the
bar midpoints. So:

 ## Increase bottom margin to make room for rotated labels
 par(mar = c(7, 4, 4, 2) + 0.1)

 ## Create plot and get bar midpoints in 'mp'
 mp - barplot(1:10)

 ## Set up x axis with tick marks alone
 axis(1, at = mp, labels = FALSE)

 ## Create some text labels
 labels - paste(Label, 1:10, sep =  )

 ## Plot x axis labels at mp
 text(mp, par(usr)[3] - 0.5, srt = 45, adj = 1,
  labels = labels, xpd = TRUE)

 ## Plot x axis label at line 4
 mtext(1, text = X Axis Label, line = 4)

See ?barplot for more information.

With respect to the log scales, you would need to use barplot2(), which
is in the 'gplots' package on CRAN. barplot2() supports log scales using
the same 'log' argument as plot().

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] faster algorithm for Kendall's tau

2005-06-28 Thread Marc Schwartz (via MN)
On Tue, 2005-06-28 at 13:03 -0400, ferdinand principia wrote:
 Hi,
 
 I need to calculate Kendall's tau for large data
 vectors (length  100'000). 
 Is somebody aware of a faster algorithm or package
 function than cor(, method=kendall)? 
 There are ties in the data to be considered (Kendall's
 tau-b).
 
 Any suggestions?
 
 Regards
 Ferdinand


The time intensive part of the process is typically the ranking/ordering
of the vector pairs to calculate the numbers of concordant and
discordant pairs.

If the number of _unique pairs_ in your data is substantially less than
the number of total pairs (in other words, creating a smaller 2d
contingency table from a pair of your vectors makes sense), then the
following may be of help.

# Calculate CONcordant Pairs in a table
# cycle through x[r, c] and multiply by
# sum(x elements below and to the right of x[r, c])
# x = table
concordant - function(x)
{
  x - matrix(as.numeric(x), dim(x))
  
  # get sum(matrix values  r AND  c)
  # for each matrix[r, c]
  mat.lr - function(r, c)
  { 
lr - x[(r.x  r)  (c.x  c)]
sum(lr)
  }

  # get row and column index for each
  # matrix element
  r.x - row(x)
  c.x - col(x)

  # return the sum of each matrix[r, c] * sums
  # using mapply to sequence thru each matrix[r, c]
  sum(x * mapply(mat.lr, r = r.x, c = c.x))
}

# Calculate DIScordant Pairs in a table
# cycle through x[r, c] and multiply by
# sum(x elements below and to the left of x[r, c])
# x = table
discordant - function(x)
{
  x - matrix(as.numeric(x), dim(x))
  
  # get sum(matrix values  r AND  c)
  # for each matrix[r, c]
  mat.ll - function(r, c)
  { 
ll - x[(r.x  r)  (c.x  c)]
sum(ll)
  }

  # get row and column index for each
  # matrix element
  r.x - row(x)
  c.x - col(x)

  # return the sum of each matrix[r, c] * sums
  # using mapply to sequence thru each matrix[r, c]
  sum(x * mapply(mat.ll, r = r.x, c = c.x))
}


# Calculate Kendall's Tau-b
# x = table
calc.KTb - function(x)
{
  x - matrix(as.numeric(x), dim(x))
  
  c - concordant(x)
  d - discordant(x)

  n - sum(x)
  SumR - rowSums(x)
  SumC - colSums(x)

  KTb - (2 * (c - d)) / sqrt(((n ^ 2) -
 (sum(SumR ^ 2))) * ((n ^ 2) -
 (sum(SumC ^ 2

  KTb
}


Note that I made some modifications of the above, relative to prior
versions that I have posted to handle large numbers of pairs to avoid
integer overflows in summations. Hence the:

  x - matrix(as.numeric(x), dim(x))

conversion in each function.

Now, create some random test data, with 100,000 elements in each vector,
sampling from 'letters', which would yield a 26 x 26 table:

 a - sample(letters, 10, replace = TRUE)
 b - sample(letters, 10, replace = TRUE)
 
  dim(table(a, b))
 [1] 26 26

  system.time(print(calc.KTb(table(a, b
[1] 0.0006906088
[1] 0.77 0.02 0.83 0.00 0.00

Note that in the above, the initial table takes most of the time:

 system.time(table(a, b))
[1] 0.55 0.00 0.56 0.00 0.00

Hence:

 tab.ab - table(a, b)
 system.time(print(calc.KTb(tab.ab)))
[1] 0.0006906088
[1] 0.25 0.01 0.27 0.00 0.00


I should note that I also ran:

 system.time(print(cor(a, b, method = kendall)))
[1] 0.0006906088 
[1] 694.80   7.72 931.89   0.00   0.00 

Nice to know the results work out at least...  :-)


I have not tested with substantially larger 2d matrices, but would
envision that as the dimensions of the resultant tabulation increases,
my method probably approaches and may even become less efficient than
the approach implemented in cor(). Some testing would validate this and
perhaps point to coding the concordant() and discordant() functions in C
for improvement in timing.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] conversion

2005-06-28 Thread Marc Schwartz (via MN)
On Tue, 2005-06-28 at 17:23 -0400, Anna Oganyan wrote:
 Dear List,
 How can I convert a list with elements being character strings, like: 
 c(1,2,3,4), “c(1,3,4,2) … to a list with elements as numerical 
 vectors: c(1,2,3,4), c(1,3,4,2)…?
 Thanks!
 Anna

 l - list(c(1,2,3,4), c(1,3,4,2))

 l
[[1]]
[1] c(1,2,3,4)

[[2]]
[1] c(1,3,4,2)

Now use lapply() over each list element in 'l', converting the character
vectors to R expressions and then evaluating them:

 lapply(l, function(x) eval(parse(text = x)))
[[1]]
[1] 1 2 3 4

[[2]]
[1] 1 3 4 2


See ?lapply, ?eval and ?parse.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

<    1   2   3