Re: [R] Comparing rows of matrices with different dimensions

2005-11-21 Thread Marc Schwartz (via MN)
On Mon, 2005-11-21 at 16:57 +0100, Antje Döring wrote:
>  
> Hi there,
> 
>  
> 
> I have a question, which I thought is very easy to solve, but somehow
> I can't find a solution. Probably someone could help me quickly?
> 
>  
> 
> Here it is:
> 
>  
> 
> I have two matrices:
> 
>  
> 
> a
> 
>  [,1] [,2] [,3]
> 
> [1,]149
> 
> [2,]26   10
> 
> [3,]36   11
> 
> [4,]48   12
> 
>  
> 
> 
> 
> b
> 
>  [,1] [,2]
> 
> [1,]14
> 
> [2,]25
> 
> [3,]36
> 
>  
> 
> Now I want to find out which rows of b can also be found in a (without
> its last column). So the solution must be something like either "TRUE
> FALSE TRUE" or the rows where their is a match (rows 1 and 3)
> 
>  
> 
> Till now I have tried things like b %in% a[,1:2] or so but that
> doesn't work because I want to compare the WHOLE row of b with the
> whole row of a without column 3.
> 
>  
> 
> Thank you very much for any help.
> 
>  
> 
> Regards, Antje


Here is one possible approach, though not tested beyond this example:

> which(apply(matrix(b %in% a, dim(b)), 1, all))
[1] 1 3


The steps are:

# which elements of b are in a
> b %in% a
[1]  TRUE  TRUE  TRUE  TRUE FALSE  TRUE


# Turn that into a matrix the same shape as b
> matrix(b %in% a, dim(b))
 [,1]  [,2]
[1,] TRUE  TRUE
[2,] TRUE FALSE
[3,] TRUE  TRUE


# get T/F for which rows are all TRUE
> apply(matrix(b %in% a, dim(b)), 1, all)
[1]  TRUE FALSE  TRUE


# Now get the indices
> which(apply(matrix(b %in% a, dim(b)), 1, all))
[1] 1 3


HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Help with read.csv2

2005-11-17 Thread Marc Schwartz (via MN)
On Thu, 2005-11-17 at 17:09 +0100, Matthieu Cornec wrote:
> Hello,
>  I am importing the following file
>   ;aa;bb;cc
> 1988;12;12;12
> 1989;78;78;12
> 1990;78;78;12
> 1991;78;78;12
> 1992;78;78;12
> 1993;78;78;12
> 1994;78;78;12
> 
> data<-read.csv2("test.csv",header=T)
> --
> it gives
>   X aa bb cc
> 1 1988 12 12 12
> 2 1989 78 78 12
> 3 1990 78 78 12
> 4 1991 78 78 12
> 5 1992 78 78 12
> 6 1993 78 78 12
> 7 1994 78 78 12
>  How do I get :
> 
>  aa bb cc
> 1988 12 12 12
> 1989 78 78 12
> 1990 78 78 12
> 1991 78 78 12
> 1992 78 78 12
> 1993 78 78 12
> 1994 78 78 12
> 

Are you indicating that you want the years to be the rownames and NOT a
column in the data frame?

If so, use the 'row.names' argument to indicate that the first column of
the incoming data file contains the rownames:

> dat <- read.csv2("clipboard", row.names = 1)

> dat
 aa bb cc
1988 12 12 12
1989 78 78 12
1990 78 78 12
1991 78 78 12
1992 78 78 12
1993 78 78 12
1994 78 78 12

> str(dat)
`data.frame':   7 obs. of  3 variables:
 $ aa: int  12 78 78 78 78 78 78
 $ bb: int  12 78 78 78 78 78 78
 $ cc: int  12 12 12 12 12 12 12

> rownames(dat)
[1] "1988" "1989" "1990" "1991" "1992" "1993" "1994"



Note that you are getting the "X" in your initial example above, since
you are missing the first column name in the text file. A better
approach here might be (depending upon your intent) to use the
'col.names' argument to specify the column names explicitly:

> dat <- read.csv2("clipboard", header = TRUE, 
col.names = c("Years", "aa", "bb", "cc"))

> dat
  Years aa bb cc
1  1988 12 12 12
2  1989 78 78 12
3  1990 78 78 12
4  1991 78 78 12
5  1992 78 78 12
6  1993 78 78 12
7  1994 78 78 12

This way, you are defining the first column name during the import and
your 'Years' are available as a data column. It all depends upon what
you want to do with the data at this point.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] dev.copy legend problem

2005-11-17 Thread Marc Schwartz (via MN)
On Thu, 2005-11-17 at 17:03 +0100, Florence Combes wrote:
> Dear all,
> 
> We are facing this problem for long, and so ask for your help.
> 
> We are plotting 2 graphs in a postscript device (left part -layout
> function-), and the common legend for these graphs on the right part.
> The legend in the postscript device looks ok: this is color lines with
> numbers on the right (6 columns) , see the code below:
> 
> > nblock<-c(1:48)
> > leg<-paste(c(1:npin)," ",sep=" ")
> > legend(0,19,legend = leg, col=rainbow(nblock), lty=1,
> merge=TRUE,ncol=6,bty="n",cex=0.6)
> 
> The problem we are facing is that we dev.copy to a pdf device and then, the
> legend doesn't look the same: numbers overlap a little lines.

The problem with using dev.copy() is that it can result in slightly (or
notably) different plotting characteristics, which are device dependent.

A plot that is viewed on the screen, for example, may or may not (most
likely not) look the same when created using the same code to a
postscript device or a pdf device. This can be further exacerbated if
the plot device dimensions and pointsizes are not explicitly defined, as
"device shrinkage" may occur. I suspect that this is what you are
experiencing.

If you want to end up with a PDF plot, use pdf() and set the various
plotting parameters (ie. font size, etc.) based upon what you see there,
not what you see on the screen or in another device.

One other possibility, just to throw it out there since you are on
Linux, is to use ps2pdf from a console to convert the PS file to PDF.
This uses Ghostscript to do the conversion and will generally work well,
but testing with your particular plot, given possible issues with fonts,
etc. would still be a good idea. See 'man ps2pdf' for more information.

> Has someone already encountered such a thing ?
> 
> Any help apreciated, thanks a lot.
> 
> Florence.
> 
> 
> R 2.1.0 on a Linux Debian.

Have not been able to get your SysAdmin to upgrade you yet?

:-)

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] [Rd] Scan data from a .txt file

2005-11-17 Thread Marc Schwartz (via MN)
I have a feeling that Vasu wants (mistakenly) this:

dat <- read.table("clipboard", header = FALSE)

> dat
  V1 V2 V3 V4
1   Name Weight Height Gender
2   Anne150 65  F
3Rob160 68  M
4 George180 65  M
5   Greg205 69  M

> str(dat)
`data.frame':   5 obs. of  4 variables:
 $ V1: Factor w/ 5 levels "Anne","George",..: 4 1 5 2 3
 $ V2: Factor w/ 5 levels "150","160","180",..: 5 1 2 3 4
 $ V3: Factor w/ 4 levels "65","68","69",..: 4 1 2 1 3
 $ V4: Factor w/ 3 levels "F","Gender","M": 2 1 3 3 3

> dat$V1
[1] Name   Anne   RobGeorge Greg
Levels: Anne George Greg Name Rob

> dat$V2
[1] Weight 150160180205
Levels: 150 160 180 205 Weight

> dat$V3
[1] Height 65 68 65 69
Levels: 65 68 69 Height

> dat$V4
[1] Gender F  M  M  M
Levels: F Gender M


So that the colnames are actually part of the data frame columns.

Vasu, note however that all values become factors or you can convert to
character, for example:

> as.character(dat$V1)
[1] "Name"   "Anne"   "Rob""George" "Greg"

neither of which I suspect is what you really want.


You can access the column names of the data frame using colnames():

> dat <- read.table("clipboard", header = TRUE)

> dat
Name Weight Height Gender
1   Anne150 65  F
2Rob160 68  M
3 George180 65  M
4   Greg205 69  M

> colnames(dat)
[1] "Name"   "Weight" "Height" "Gender"


This keeps the column names separate from the actual data, which unless
we are missing something here, is the proper way to do this. Think of a
data frame as a rectangular data set, which can contain more than one
data type across the columns, much like a spreadsheet.  The difference
here (unlike a spreadsheet) is that the first row does not contain the
column names/labels. These are separate from the data itself, which in a
typical spreadsheet would start on row 2.

Note as Andy pointed out, that in this case, you should use
read.table(), not scan().

Review "An Introduction To R" and the "R Data Import/Export" manuals for
more information. Both are available with your installation and/or from
the main R web site under Documentation.

HTH,

Marc Schwartz


On Thu, 2005-11-17 at 10:41 -0500, Liaw, Andy wrote:
> [Re-directing to R-help, as this is more appropriate there.]
> 
> I tried copying the snippet of data into the windows clipboard and tried it:
> 
> > dat <- read.table("clipboard", header=T)
> > dat
> Name Weight Height Gender
> 1   Anne150 65  F
> 2Rob160 68  M
> 3 George180 65  M
> 4   Greg205 69  M
> > str(dat)
> `data.frame':   4 obs. of  4 variables:
>  $ Name  : Factor w/ 4 levels "Anne","George",..: 1 4 2 3
>  $ Weight: int  150 160 180 205
>  $ Height: int  65 68 65 69
>  $ Gender: Factor w/ 2 levels "F","M": 1 2 2 2
> > dat <- read.table("clipboard", header=T, row=1)
> > str(dat)
> `data.frame':   4 obs. of  3 variables:
>  $ Weight: int  150 160 180 205
>  $ Height: int  65 68 65 69
>  $ Gender: Factor w/ 2 levels "F","M": 1 2 2 2
> > dat
>Weight Height Gender
> Anne  150 65  F
> Rob   160 68  M
> George180 65  M
> Greg  205 69  M
> 
> Don't see how it "doesn't work".  Please give more detail on what "doesn't
> work" means.
> 
> Andy
> 
> > From: Vasundhara Akkineni
> > 
> > Hi all,
> > Am trying to read data from a .txt file in such a way that i 
> > can access the
> > column names too. For example, the data in the table.txt file 
> > is as below:
> >  Name Weight Height Gender
> > Anne 150 65 F
> > Rob 160 68 M
> > George 180 65 M
> > Greg 205 69 M
> >  i used the following commands:
> >  data<-scan("table.txt",list("",0,0,0),sep="")
> > a<-data[[1]]
> > b<-data[[2]]
> > c<-data[[3]]
> > d<-data[[4]]
> >  But this doesn't work because of type mismatch. I want to 
> > pull the col
> > names also into the respective lists. For example i want 'b' to have
> > (weight,150,160,180,205) so that i can access the col name 
> > and also the
> > induvidual weights. I tried using the read.table method too, 
> > but couldn't
> > get this working. Can someone suggest a way to do this.
> > Thanks,
> > Vasu.
> >

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] invert y-axis in barplot

2005-11-16 Thread Marc Schwartz (via MN)
On Wed, 2005-11-16 at 16:46 +, Jörg Schlingemann wrote:
> Hi!
> 
>  
> 
> This is probably a very trivial question. Is there an easy way to invert the
> y-axis (low values on top) when using the function barplot()? 
> 
>  
> 
> Thanks,
> 
> Jrg

You mean something like this?:

  barplot(1:10, ylim = rev(c(0, 12)))

or, something like this?:

  barplot(1:10, yaxt = "n") 
  axis(2, labels = rev(seq(0, 10, 2)), at = seq(0, 10, 2))


Note the use of rev() in each case.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] as.integer with base other than ten.

2005-11-14 Thread Marc Schwartz (via MN)
On Mon, 2005-11-14 at 19:01 +, William Astle wrote:
> Is there an R function analogous to the C function strtol? 
> I would like to convert a binary string into an integer.
> 
> Cheers for advice
> 
> Will


There was some discussion in the past and you might want to search the
archive for a more generic solution for any base to any base, but for
binary to decimal specifically, something like the following will work:

bin2dec <- function(x)
{
  b <- as.numeric(unlist(strsplit(x, "")))
  pow <- 2 ^ ((length(b) - 1):0)
  sum(pow[b == 1])
}


The function takes the binary string and splits it up into individual
numbers ('b'). It then creates a vector of powers of 2 as long as 'b'
less one through 0 ('pow').  It then takes the sum of the values of pow,
indexed by 'b == 1'.


> bin2dec("101")
[1] 5

> bin2dec("")
[1] 15

> bin2dec("101")
[1] 95


HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Coercion of percentages by as.numeric

2005-11-14 Thread Marc Schwartz (via MN)
On Mon, 2005-11-14 at 19:07 +0200, Brandt, T. (Tobias) wrote:
>  
> >-Original Message-
> >From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] 
> >Sent: 14 November 2005 06:21 PM
> >
> >On 11/14/05, Brandt, T. (Tobias) <[EMAIL PROTECTED]> wrote:
> >> Hi
> >>
> >> Given that things like the following work
> >>
> >>  > a <- c("-.1"," 2.7 ","B")
> >> > a
> >> [1] "-.1"   " 2.7 " "B"
> >> > as.numeric(a)
> >> [1] -0.1  2.7   NA
> >> Warning message:
> >> NAs introduced by coercion
> >> >
> >>
> >> I naively expected that the following would behave differently.
> >>
> >>  > b <- c('10%', '-20%', '30.0%', '.40%')
> >> > b
> >> [1] "10%"   "-20%"  "30.0%" ".40%"
> >> > as.numeric(b)
> >> [1] NA NA NA NA
> >> Warning message:
> >> NAs introduced by coercion
> >
> >Try this:
> >
> >as.numeric(sub("%", "e-2", b))
> >
> 
> Thank you, that accomplishes what I had intended.
> 
> I would have thought though that the expression "53%" would be a fairly
> standard representation of the number 0.53 and might be handled as such.  Is
> there a specific reason for avoiding this behaviour?  

"53%" is a 'shorthand' character representation of a mathematical
concept. To wit, the specific representation of a fraction using 100 as
the denominator (ie. 53 / 100). The symbol '%' can be replaced by the
word "percent", such as "53 percent", which is also a character
representation.

0.53, in context, is a numeric representation of a proportion in the
range of 0 - 1.0.

> I can imagine that it might add unnecessary overhead to routines like
> "as.numeric" which one would like to keep as fast as possible.
> 
> Perhaps there are other areas though where it might be desirable?  For
> example I'm thinking of the read.table function for reading in csv files
> since I have many of these that have been saved from excel and now contain
> numbers in the "%" format.

In Excel, numbers displayed with a '%' are what you see visually.
However, the internal representation (how the value is actually stored
in the program) is still as a floating point value, without the '%'. 

For example:

> a <- 53
> a
[1] 53

> sprintf("%.0f%%", a)
[1] "53%"

> is.numeric(a)
[1] TRUE

> is.numeric(sprintf("%.0f%%", a))
[1] FALSE


Unfortunately (depending upon your perspective), Excel, and other
similar programs, tend to export the visually displayed values and not
the internal representations of them. Thus, as Gabor pointed out, you
will need to do some 'editing' of the values before using them in R. You
can either do this in Excel, by removing the "%" formatting, or
post-import in R as Gabor has described.

You need to keep separate the internal representation of a value and its
printed or displayed representation for human readable consumption.

as.numeric() does basically one thing and it does it well and properly.
It is up to the user to ensure that it is passed the proper values. When
that is not the case, it issues an appropriate warning message and
returns NA.

Of course, using Gabor's hint, you can also write your own variation of
as.numeric(), creating a function that takes percent formatted values
and converts them as you require. One of the many strengths of R, is
that you can extend it to meet your own specific requirements when the
base functions do not.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] bug/feature with barplot?

2005-11-14 Thread Marc Schwartz (via MN)
On Mon, 2005-11-14 at 15:55 +0100, Karin Lagesen wrote:
> I have found a bug/feature with barplot that at least to me shows
> undesireable behaviour. When using barplot and plotting fewer
> groups/levels/factors(I am unsure what they are called) than the number
> of colors stated in a col statement, the colors wrap around such that
> the colors are not fixed to one group. This is mostly problematic when
> I make R figures using scripts, since I sometimes have empty input
> groups. I have in these cases experienced labeling the empty group as
> red, and then seeing a bar being red when that bar is actually from a
> different group.
> 
> Reproducible example (I hope):
> 
> barplot(VADeaths, beside=TRUE, col=c("red", "green", "blue", "yellow", 
> "black"))
> barplot(VADeaths[1:4,], beside=TRUE, col=c("red", "green", "blue", "yellow", 
> "black"))
> 
> Now, I don't know if this is a bug or a feature, but it sure bugged me...:)
> 
> Karin

Most definitely not a bug.

As with many vectorized function arguments, they will be recycled as
required to match the length of other appropriate arguments.

In this case, the number of colors (5) does not match the number of
groups (4). Thus, they are "out of synch" with each other and you get
the result you have.

Not unexpected behavior.

You should adjust your code and the function call so that the number of
groups matches the number of colors. Something along the lines of the
following:

col <- c("red", "green", "blue", "yellow", "black")
no.groups <- 4
barplot(VADeaths[1:no.groups, ], beside = TRUE, col = col[1:no.groups])


Now try:

 no.groups <- 5
 barplot(VADeaths[1:no.groups, ], beside = TRUE, col = col[1:no.groups])

 no.groups <- 3
 barplot(VADeaths[1:no.groups, ], beside = TRUE, col = col[1:no.groups])


HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Orientation of tickmarks labels in boxplot/plot

2005-11-02 Thread Marc Schwartz (via MN)
On Wed, 2005-11-02 at 16:06 -0600, Michal Lijowski wrote:
> Hi,
> 
> I have been trying draw tickmark labels along
> the y - axis perpendicular to the y axis while
> labels along the x - axis parallel to x axis
> while making box plot.
> 
> Here is my test dataset.
> 
>  TData
>ID Ratio
> 1   0 7.075
> 2   0 7.414
> 3   0 7.403
> 4   0 7.168
> 5   0 6.820
> 6   0 7.294
> 7   0 7.238
> 8   0 7.938
> 9   1 7.708
> 10  1 8.691
> 11  1 8.714
> 12  1 8.066
> 13  1 8.949
> 14  1 8.590
> 15  1 8.714
> 16  1 8.601
> 
>  boxplot(Ratio ~ ID, data=TData)
> 
> makes box plot with tickmark labels parallel to the y - axis.
> So I try 
> 
> boxplot(Ratio ~ ID, data=TData, axes=FALSE)
> par(las=0)
> axis(1)
> and I get x - axis ranging from 0.5 to 2.5 (why?) and 
> boxes at 1 and 2.
> par(las=2)
> axis(2)
> box()
> So, if I set tickmark labels parallel to y - axis
> somehow the x - axis range is not what I expect even
> if I use xlim = c(0.0, 3.0)
>  in boxplot(Ratio ~ Id, data=TData, axes=FALSE, xlim=c(0.0, 3.0))
>  par(las=0)
>  axis(1)
> 
> Plots are in the attachments in pdf format.
> 
> I appreciate any tips.
> 
> I am using R 2.2.0 (2005-10-06) on FC4.
> 
> Michal

I suspect that you want this:

  boxplot(Ratio ~ ID, data=TData, las = 1)

so that the x and y axis labels are horizontal. See ?par and review the
options for 'las'. '1' is for axis tick mark labels to be horizontal.

The x axis (horizontal axis if the boxes are vertical, the vertical axis
if the boxes are horizontal) is set by default to the number of boxes
(groups) +/- 0.5.  'xlim' is ignored in both cases.

The boxes themselves are placed at integer values from 1:N, where N is
the number of groups.

There is the 'at' argument and there is an example of its use
in ?boxplot. You can also search the archives for posts where variations
on the use of 'at' have been posted (recently, in fact...)

The actual plotting of the boxplots is done by bxp(), so review the help
for that function as well.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R Graphs in Powerpoint

2005-11-01 Thread Marc Schwartz (via MN)
One other option, just to throw it out there, though it involves a few
more steps.

1. Generate the R plots as EPS files.

2. Import them into Powerpoint onto the required slides. Resize and/or
place as required. Recent versions of Powerpoint will auto-generate a
bitmapped preview image upon import.

3. Print the full Powerpoint presentation to a PS file, using a PS
printer driver. This will result in high quality images.

4. Convert the PS file to PDF, using Ghostscript (ps2pdf) or similar.

5. Display the presentation using Acrobat Reader in full screen mode to
your audience.


This works well, as long as you are not using complex object/slide
transitions, animations and the like in Powerpoint and takes advantage
of the higher quality vector format of EPS graphics as opposed to the
bitmapped graphic formats.

HTH,

Marc Schwartz


On Tue, 2005-11-01 at 13:43 -0800, Smith, Daniel (DHS-DEODC-EHIB) wrote:
> I've tried several methods in OS X, and here's what works best for me.
> Save the R graphic as a PDF file.  Open it with Apple's "Preview"
> application, and save it as a PNG file.  The resulting .png file can
> be inserted into MS Word or PowerPoint, can be resized, and looks good
> on either OS X or Windows.  There are other programs available for
> translating the pdf file to png (like the shareware application
> Graphic Converter), but I've found that Preview produces the best
> results.
> 
> Daniel Smith
> Environmental Health Investigations Branch
> California Dept of Health Services
> 
> 
> -Original Message-
> Date: Mon, 31 Oct 2005 15:14:06 -0800
> From: Jarrett Byrnes <[EMAIL PROTECTED]>
> Subject: [R] R Graphs in Powerpoint
> To: r-help@stat.math.ethz.ch
> Message-ID: <[EMAIL PROTECTED]>
> Content-Type: text/plain; charset=US-ASCII; format=flowed
> 
> Hey, all.  Quick question.  I'm attempting to use some of the great 
> graphs generated in R for an upcoming talk that I'm writing in 
> Powerpoint.  Copying and pasting (I'm using OSX) yields graphs that 
> look great in Powerpoint - until I resize them.  Then fonts, points, 
> and lines all become quite pixelated and blurry.  Even if I size the 
> window properly first, and then copy and paste in the graph, when I 
> then view the slideshow, the graphs come out pixelated and blurry.
> 
> Is there any good solution to this, or is this some fundamental 
> incompatibility that I can't get around?
> 
> -Jarrett

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] multiple boxplots

2005-10-28 Thread Marc Schwartz (via MN)
On Fri, 2005-10-28 at 10:26 -0700, J.M. Breiwick wrote:
> Hello,
> 
> I want to plot 3 boxplots [ par(mfrow=c(3,1)) ] but the first one has 8 
> groups, the 2nd has 7 and the third has 6. But I the groups to line up:
> 
>  1 2 3 4 5 6 7 8
>2 3 4 5 6 7 8
>   3 4 5 6 7 8
> 
> where the numbers actually refer to years. I suspect I have to manipulate 
> the function bxp(). Is there a relatively simple way to do this? Thanks for 
> any help.
> 
> Jeff Breiwick
> NMFS, Seattle


Here is one approach. It is based upon using 2 principal concepts:

1. Create the first plot, save the plot region ranges from par("usr")
and then use this information for the two subsequent plots, where we use
the 'add = TRUE' argument.

2. Use the 'at' argument to specify the placement of the boxes in the
second and third plots to line up with the first.


So:

# Create our data.
dat <- rnorm(80)
years <- rep(1991:1998, each = 10)

# MyDF will be a data frame with two columns and we will use 
# this in the formula method for boxplot.
# The second and third plots will use subset()s of MyDF
MyDF <- cbind(dat, years)

# Set the plot matrix
par(mfrow = c(3, 1))

# Create the first boxplot and save par("usr")
boxplot(dat ~ years, MyDF)
usr <- par("usr")

# Now open a new plot, setting it's par("usr") to 
# match the first plot.
# Then use boxplot and set the boxes 'at' x pos 2:8
# and add it to the open plot
plot.new()
par(usr = usr)
boxplot(dat ~ years, subset(MyDF, years %in% 1992:1998),
at = 2:8, add = TRUE)


# Rinse and repeat  ;-)
# Different subset and use 3:8 for 'at'
plot.new()
par(usr = usr)
boxplot(dat ~ years, subset(MyDF, years %in% 1993:1998),
at = 3:8, add = TRUE)



Replace MyDF in the above with your actual datasets of course.

This would of course be a bit easier if one could set an 'xlim' argument
in boxplot(), but this is ignored by bxp().

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] encrypted RData file?

2005-10-27 Thread Marc Schwartz (via MN)
On Thu, 2005-10-27 at 16:15 -0500, Na Li wrote:
> On 27 Oct 2005, Duncan Temple Lang wrote:
> 
> > Yes, it is of interest and was sitting on my todo list at
> > some time.  If you want to go ahead and provide code to do it,
> > that would be terrific.  There are other areas where encryption
> > would be good to have, so a general mechanism would be nice.
> > 
> > D.
> > 
> > Na Li wrote:
> > > Hi, I wonder if there is interest/intention to allow for encrypted .RData
> > > files?  One can certainly do that outside R manually but that will leave a
> > > decrypted RData file somewhere which one has to remember to delete.
> > > 
> 
> I was hoping someone has already done it.  ;-(
> 
> One possibility is to implement an interface package to gpgme library which
> itself is an interface to GnuPG.  
> 
> But I'm not sure how the input of passphrase can be handled without using
> clear text.
> 
> Michael

Seems to me that a better option would be to encrypt the full partition
such that (unless you write the files to a non-encrypted partition)
these issues are transparent. This would include the use of save(),
save.image() and write() type functions to save what was an encrypted
dataset/object to a unencrypted file.

Of course, you would also have to encrypt the swap and tmp partitions
(as appropriate) for similar reasons.

On Linuxen/Unixen, full encryption of partitions is available via
loopback devices and other mechanisms and some distros have this
available as a built-in option. I believe that the FC folks finally have
this on their list of functional additions for FC5. Windows of course
can do something similar.

The other consideration here, is that if R Core builds in some form of
encryption, there is the potential for import/export restrictions on
such technology since R is available via international CRAN mirrors. It
may be best to provide for a plug-in "encryption black box" of sorts, so
that folks can use a particular encryption schema that meets various
legal/regulatory requirements.

Of course, simply encrypting the file or even a complete partition has
to be considered within a larger security strategy (ie. network
security, physical access control, etc.) that meets a particular
functional requirement (such as HIPAA here in the U.S.)

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Graphics window always overlaps console window!

2005-10-25 Thread Marc Schwartz (via MN)
On Tue, 2005-10-25 at 13:07 -0500, Marc Schwartz (via MN) wrote:
> On Tue, 2005-10-25 at 11:55 -0600, [EMAIL PROTECTED] wrote:
> > Does anyone know how I can set up R so that when I make a graphic, the
> > graphics window remains behind the console window? It's annoying to
> > have to reach for the mouse every time I want to type another line of
> > code (e.g., to add another line to the plot). Thanks.
> 
> What operating system?
> 
> Default window focus behavior is highly OS and even window manager
> specific and is not an R issue.
> 
> Depending upon your OS and window manager, you may need to check the
> documentation and/or do a Google search on "window focus" for further
> information.
> 
> Another alternative, if you are on Windows, is to review Windows FAQ 5.4
> "How do I move focus to a graphics window or the console?", but this is
> a programmatic approach and not a means to affect default behavior.

Yet another approach which I just remembered is that (if on Windows) MS
offers a program called Tweak UI:

http://www.microsoft.com/windowsxp/downloads/powertoys/xppowertoys.mspx

as part of their Power Toys add-ons.

You might want to review that to see if there is a setting for the
handling of new window focus behavior.

HTH,

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Graphics window always overlaps console window!

2005-10-25 Thread Marc Schwartz (via MN)
On Tue, 2005-10-25 at 11:55 -0600, [EMAIL PROTECTED] wrote:
> Does anyone know how I can set up R so that when I make a graphic, the
> graphics window remains behind the console window? It's annoying to
> have to reach for the mouse every time I want to type another line of
> code (e.g., to add another line to the plot). Thanks.

What operating system?

Default window focus behavior is highly OS and even window manager
specific and is not an R issue.

Depending upon your OS and window manager, you may need to check the
documentation and/or do a Google search on "window focus" for further
information.

Another alternative, if you are on Windows, is to review Windows FAQ 5.4
"How do I move focus to a graphics window or the console?", but this is
a programmatic approach and not a means to affect default behavior.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Basic: setting resolution and size of an R graphic

2005-10-24 Thread Marc Schwartz (via MN)
On Mon, 2005-10-24 at 22:32 +0200, Dr. med. Peter Robinson wrote:
> Dear List,
> 
> I am sorry if this perhaps a too basic question, but I have not found an
> answer in Google or in the R help system. I am trying to use R to do a
> very simple analysis of some data (RT-PCR and Western analysis) with a
> T-test and
> to plot the results as a histogram with error bars. (I have pasted an
> example script at the bottom of this mail).
> In order to use this for publication, I would like to adjust the
> resolution and size of the final image. However, even using file types
> such as postscript or pdf that are vector based, I get rather bad-looking
> results with
> >pdf(file="test.pdf")
> >source("script at bottom of mail")
> >dev.off()
> 
> using either pdf or postscript or jpg devices.
> 
> 
> Therefore I would like to ask the list, how to best produce a graphic from
> the script below that would fit into one column of a published article and
> have a high resolution (as eps, or failing that tiff or png)?
> Thanks in advance for any advice,
> 
> Peter



What OS are you on?

Running your example on FC4, I get the attached output for a pdf().

I suspect that on your OS, the height and width arguments are not
appropriate by default.

Thus, you may need to adjust your pdf (or postscript) function call to
explicitly specify larger height and width arguments.

Also note that to generate an EPS file, pay attention to the details
section of ?postscript, taking note of the 'onefile', 'horizontal' and
'paper' arguments and settings.

Also, check with your journal to see if they specify dimensions for such
graphics so that you can abide by their specs if provided. If they are
using LaTeX, there are means of specifying and/or adjusting the height
and/or width specs in the code based upon proportions of various
measures (ie. \includegraphics[width=0.9\textwidth]{GraphicsFile.eps} ).

HTH,

Marc Schwartz



test.pdf
Description: Adobe PDF document
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] peculiar matrices

2005-10-21 Thread Marc Schwartz (via MN)
On Fri, 2005-10-21 at 21:32 +, Ben Bolker wrote:
> As far as I can tell from reading The Fine Documentation
> (R Language Definition and Intro to R), matrices are supposed
> to be of homogeneous types.  Yet giving matrix() an inhomogeneous
> list seems to work, although it produces a peculiar object:
> 
> v = list(1:3,4,5,"a")
> m = matrix(v,nrow=2)
> m
> 
>  [,1]  [,2]
> [1,] Integer,3 5
> [2,] 4 "a"
> 
> 
> m[1,]
> 
> [[1]]
> [1] 1 2 3
> 
> [[2]]
> [1] 3
> 
>  (this is R 2.1.1, running under Linux)

Ben,

If you review the structure of 'm' note:

> str(m)
List of 4
 $ : int [1:3] 1 2 3
 $ : num 4
 $ : num 5
 $ : chr "a"
 - attr(*, "dim")= int [1:2] 2 2

that it is actually a list, even though:

> class(m)
[1] "matrix"


Also:

> mode(m)
[1] "list"

> typeof(m)
[1] "list"


If you remove the dim attributes, you get:

> dim(m) <- NULL
> m
[[1]]
[1] 1 2 3

[[2]]
[1] 4

[[3]]
[1] 5

[[4]]
[1] "a"


So I would argue that it is consistent with the documentation in that,
while the printed output is that of a matrix, it is a list, which of
course can handle heterogeneous data types.

This is Version 2.2.0 Patched (2005-10-20 r35979).


>   Should there be a check/error? Or is this just analogous to
> the joke about going to the doctor and saying "it hurts when
> I do this", and the doctor saying "well then, don't do that"?


Maybe more like:

"Doctor, my eye hurts when I drink my tea." and the doctor says, "Well,
remove the spoon from the cup before you drink."

;-)

Regards,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] make three plot to one plot

2005-10-21 Thread Marc Schwartz (via MN)
On Fri, 2005-10-21 at 21:13 +0200, Jan Sabee wrote:
> Dear all,
> I want to make three plot below to only one plot together with legend,
> how can I do that?
> I have tried with matplot function but I did not succeed.
> Thanks for your help.
> Sincerelly,
> Jan Sabee
> 
> test.five.x <- 
> c(0.02,0.05,0.07,0.09,0.10,0.12,0.13,0.14,0.16,0.17,0.20,0.21,0.34,0.40)
> test.five.y <- c(18,12,17,12,3,15,1,5,1,1,3,10,15,10)
> plot(test.five.x, test.five.y, type="l",lty=1, lwd=3, col='red')
> legend(par('usr')[2], par('usr')[4], xjust=1,
>   c('five'),
>   lwd=3,
>   lty=1,
>   col=c('red'))
> test.six.x <- 
> c(0.03,0.05,0.07,0.08,0.09,0.10,0.12,0.13,0.14,0.15,0.17,0.18,0.21,0.22,0.24,0.33,0.39,0.43)
> test.six.y <- c(8,32,14,21,3,8,11,14,11,21,16,14,10,21,1,2,13,19)
> plot(test.six.x, test.six.y, type="l",lty=2, lwd=3, col='green')
> legend(par('usr')[2], par('usr')[4], xjust=1,
>   c('six'),
>   lwd=3,
>   lty=2,
>   col=c('green'))
> test.seven.x <-
> c(0.04,0.05,0.08,0.09,0.10,0.12,0.13,0.14,0.16,0.17,0.19,0.21,0.22,0.24,0.29,0.32,0.38,0.40,0.46,0.50)
> test.seven.y <-
> c(18,135,240,42,63,128,215,267,127,36,21,23,223,12,66,96,12,26,64,159)
> plot(test.seven.x, test.seven.y, type="l",lty=3, lwd=2, col='blue')
> legend(par('usr')[2], par('usr')[4], xjust=1,
>   c('seven'),
>   lwd=3,
>   lty=3,
>   col=c('blue'))

Jan,

matplot() won't work well here, because you do not have common x axis
values across the three sets of data. Thus, you need to use plot() and
then lines().

Try this:

# First set up your data

test.five.x <- c(0.02,0.05,0.07,0.09,0.10,0.12,
 0.13,0.14,0.16,0.17,0.20,0.21,
 0.34,0.40)
test.five.y <- c(18,12,17,12,3,15,1,5,1,1,3,10,
 15,10)

test.six.x <- c(0.03,0.05,0.07,0.08,0.09,0.10,
0.12,0.13,0.14,0.15,0.17,0.18,
0.21,0.22,0.24,0.33,0.39,0.43)
test.six.y <- c(8,32,14,21,3,8,11,14,11,21,16,
14,10,21,1,2,13,19)

test.seven.x <- c(0.04,0.05,0.08,0.09,0.10,0.12,
  0.13,0.14,0.16,0.17,0.19,0.21,
  0.22,0.24,0.29,0.32,0.38,0.40,
  0.46,0.50)
test.seven.y <- c(18,135,240,42,63,128,215,267,
  127,36,21,23,223,12,66,96,12,
  26,64,159)


# Now use plot to draw the first set of data
# The key here is to properly set to the x and y axis
# ranges ('xlim' and 'ylim') so that they are based
# upon all three sets of data.
# Note that I also set the x and y axis labels to "". You
# can adjust these and the plot title as you require.

plot(test.five.x, test.five.y, type="l",lty=1, lwd=3, col="red",
 xlim = range(test.five.x, test.six.x, test.seven.x),
 ylim = range(test.five.y, test.six.y, test.seven.y),
 xlab = "", ylab = "")


# Now use lines() to add the second and third sets of data

lines(test.six.x, test.six.y, lty=2, lwd=3, col="green")
lines(test.seven.x, test.seven.y, lty=3, lwd=2, col="blue")


# Now use legend(). Place it at the upper right hand corner
# and set the line types and colors to reflect the above

legend("topright", xjust = 1,
   legend = c("five", "six", "seven"),
   lwd = 3, lty = 1:3,
   col = c("red", "green", "blue"))


See ?lines for more information.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Boxplot labels

2005-10-20 Thread Marc Schwartz (via MN)
On Thu, 2005-10-20 at 15:41 -0500, Marc Schwartz (via MN) wrote:
> On Thu, 2005-10-20 at 14:46 -0400, Keith Sabol wrote:
> > I am creating boxplots from a dataframe and would like to add to the 
> > standard output a marker representing the value from a particular row in 
> > the 
> > dataframe.
> > 
> > .And I apologize if the solution is as trivial as it seems it should be.
> > 
> > Thanks for any assistance.
> > 
> > Keith
> 
> Some example code would be helpful here.
> 
> However, some hints that I suspect will be helpful:
> 
> 1. Unless you use the 'at' argument in boxplot(), the box center
> positions are at integer values from 1:n, where n is the number of
> groups. 
> 
> 2. You can thus use the points() function to add symbols or the text()
> function to add labels to the existing plot as you may require.
> See ?points and/or ?text for more information.


One other note to add here is that boxplot() will return a list which
contains various stats about your grouped data. This requires using the
form:

  stats <- boxplot(...)

Thus using the 'stats$group' and 'stats$out' components will give you
the x and y positions of any outliers. This will be helpful if these are
the values that you might want to identify

See the "Value" section in ?boxplot and ?boxplot.stats for more
information.

HTH,

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Boxplot labels

2005-10-20 Thread Marc Schwartz (via MN)
On Thu, 2005-10-20 at 14:46 -0400, Keith Sabol wrote:
> I am creating boxplots from a dataframe and would like to add to the 
> standard output a marker representing the value from a particular row in the 
> dataframe.
> 
> .And I apologize if the solution is as trivial as it seems it should be.
> 
> Thanks for any assistance.
> 
> Keith

Some example code would be helpful here.

However, some hints that I suspect will be helpful:

1. Unless you use the 'at' argument in boxplot(), the box center
positions are at integer values from 1:n, where n is the number of
groups. 

2. You can thus use the points() function to add symbols or the text()
function to add labels to the existing plot as you may require.
See ?points and/or ?text for more information.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] having scaling problems with a histogram

2005-10-20 Thread Marc Schwartz (via MN)
On Thu, 2005-10-20 at 17:31 +0200, Christian Jones wrote:
> 
> Hello, "urn:schemas-microsoft-com:office:office" />
> 
> I would like to create a histogram from a data collumn consisting of 4
> classes (0; 0.05;0.5;25;75). Due to the difference in scale the
> classes 0;0.05 and 0.5 are displayed within one combined bin by
> default with the code:Hist(x, scale="percent", breaks="Sturges"). How
> can I display them all, or as unique classes, leaving out the rest of
> the x_axes scale? There was no enlightenment in the help function and
> the command : breaks=5 did not do the trick.
> 
> Thanks in advance 
> 
> Chris

I may be mis-understanding what you are doing here, but it sounds like
you should be using barplot() rather than hist(), which is referenced in
the "See Also" in ?hist. This would be preferred if you want individual
counts or proportions (or percents) from specific groups without
binning.

Where does the 'scale = "percent"' come from?  I don't recognize that
argument from the standard histogram plotting functions. 

Oh, wait a minute, just found it doing a search. The Hist() function is
in John Fox's Rcmdr. Please be sure to note this if you are using a
function from a contributed package. It saves time and increases the
likelihood of your getting a reply.

Also, I am guessing that the extraneous content here is the result of
using MS Word as your e-mail editor? Please use plain text only, which
is the defined format for this list and from reviewing the archive, has
been pointed out to you previously by Frank Harrell and Ted Harding.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Automatic rounding of values after factors , converted to numeric, are multipled by a real number

2005-10-20 Thread Marc Schwartz (via MN)
On Thu, 2005-10-20 at 16:20 +0200, Peter Dalgaard wrote:
> "Nelson, Gary (FWE)" <[EMAIL PROTECTED]> writes:
> 
> > Peter,
> > 
> > Thank you for your response. I knew how close the values are to
> > integers, but I still don't understand why I don't have control over
> > how the numbers are displayed (rounded or not)?
> 
> It's all in the conventions. The digits in print() and friends are
> significant digits, so we first round to that many significant digits,
> then discard trailing zeros, which is why
> 
> > 12.51
> [1] 12.5
> > 12.50001
> [1] 12.50001
> 
> The exception is that we do not discard significant digits to the left
> of the decimal point, unless we are using scientific notation
> 
> > print(12345678,digits=2)
> [1] 1.2e+07
> > print(12345678,digits=5)
> [1] 12345678
> 
> (the "scipen" options controls the logic for switching notation).
> 
> For finer control we have the formatC function:
> 
> > format(1234.1,digits=9) # same thing as with print()
> [1] "1234.1"
> > format(1234.1,digits=8)
> [1] "1234"
> > formatC(1234.1,digits=5,format="f")
> [1] "1234.1"
> > formatC(1234.1,digits=4,format="f")
> [1] "1234."


Also, sprintf():

> sprintf("%.9f", 1234.1)
[1] "1234.1"

> sprintf("%.4f", 1234.1)
[1] "1234."

> sprintf("%12.4f", 1234.1)
[1] "   1234."


HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] matching two plots

2005-10-19 Thread Marc Schwartz (via MN)
On Wed, 2005-10-19 at 18:27 +0100, Rui Cardoso wrote:
> Hi,
> 
> I have a problem about graphics. I would like to plot two graphs: a barplot 
> and curve. Here is the code:
> 
>  > barplot(dpois(0:45,20),xlim=c(0,45),names=0:45)
>  > curve(dnorm(x,20,sqrt(20)),from=0,to=45,add=T)
> 
> Both graphs are drawn in the same figure, however the scale in both graphs 
> dooes not match. For some reason the second plot is shifted to left. I 
> think there is a problem concerning the axis scale.
> 
> Thanks a lot.
> 
> Rui


This came up this past summer and Gabor and I had a couple of different
approaches to the solution. You can see the posts here:

http://finzi.psych.upenn.edu/R/Rhelp02a/archive/57431.html

and

http://finzi.psych.upenn.edu/R/Rhelp02a/archive/57432.html


Using my approach with barplot(), what you need to do is to set the
'space' argument to 0 so that there is no space between the bars. This
will then place the bar centers at increments of 0.5, in your case
seq(0.5, 45.5, 5).

Knowing this, you can then adjust the x axis position in curve() by +0.5
to coincide with the bars. So you end up with this:

barplot(dpois(0:45, 20), names = 0:45, space = 0, ylim = c(0, .1))
curve(dnorm(x, 20 + 0.5, sqrt(20)), add = TRUE)

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Efficient ways of finding functions and Breslow-Day test for homogeneity of the odds ratio

2005-10-18 Thread Marc Schwartz (via MN)
On Wed, 2005-10-19 at 06:47 +1000, Tim Churches wrote:
> Marc Schwartz (via MN) wrote:
> 
> > There is also code for the Woolf test in ?mantelhaen.test
> 
> Is there? How is it obtained? The documentation on mantelhaen.test in R
> 2.2.0 contains a note: "Currently, no inference on homogeneity of the
> odds ratios is performed." and a quick scan of the source code for the
> function didn't reveal any meantion of Woolf's test.
> 
> Tim C


Review the code in the examples on the cited help page...

:-)

HTH,

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] p-value calculation

2005-10-18 Thread Marc Schwartz (via MN)
On Tue, 2005-10-18 at 17:13 +0200, richard mendes wrote:
> hello everybody
> 
> i'm very new at using R so probably this is a very stupid question.
> I have a problem calculating a p-value. When i do this with excel i
> can use the method CHIDIST for 1.2654 with 1 freedom degree i get the
> answer 0.261
> 
> i just want to do the same thing in R but i can't find a method.
> can somebody help me
> 
> friendly regards
> 
> richard


> pchisq(1.2654, 1, lower.tail = FALSE)
[1] 0.2606314

See ?pchisq for more information.

You might also want to read Chapter 8 "Probability Distributions" in "An
Introduction To R", available with your R installation or from the
Documentation links on the main R web site.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Efficient ways of finding functions and Breslow-Day test for homogeneity of the odds ratio

2005-10-18 Thread Marc Schwartz (via MN)
On Tue, 2005-10-18 at 14:27 +0100, MJ Price, Social Medicine wrote:
> Dear all,
> 
> I have been trying to find a function to calculate the Breslow-Day test for 
> homogeneity of the odds ratio in R. I know the test can be preformed in SAS 
> but i was wondering if anyone could help me to perform this in r.
> 
> In addition i have the fullrefman file to search for functions in the basic 
> R packages, does anyone have any suggestions of an efficient way of 
> searching for functions in the other add-on packages on the website?
> 
> Thanks in advance
> 
> Malcolm

RSiteSearch("Breslow"), which will perform a search of the e-mail list
archives and other online documentation, reveals this post:

http://finzi.psych.upenn.edu/R/Rhelp02a/archive/50202.html

which in turn leads to this code:

http://www.math.montana.edu/~jimrc/classes/stat524/Rcode/breslowday.test.r


There is also code for the Woolf test in ?mantelhaen.test

Further searches can be conducted using help.search("Keyword"), which
will search your installed set of packages.

See ?help.search and ?RSiteSearch for more help on these.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] y axis in histograms

2005-10-17 Thread Marc Schwartz (via MN)
On Mon, 2005-10-17 at 16:51 +0200, Jorge Gaspar Sanz Salinas wrote:
> Hi all,
> 
> This is my first post, I hope you will help me.
> 
> I've some data to present with histograms. I have few values with almost 99% 
> of 
> the frequencies (thousands) and some other values with low frequencies (below 
> one hundred) that I want to emphasize. I think if I could present the 
> logarithms 
> of the frequencies, these could be presented in a more convenient way, but I 
> don't know how to deal with this.
> 
> Thanks, and sorry for my English


If you are plotting the counts, you should probably look at barplot(),
which as of R version 2.2.0 supports log scale axes (ie. log = "y").

See ?barplot for more information.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] high resolution images for publication

2005-10-13 Thread Marc Schwartz (via MN)
On Thu, 2005-10-13 at 15:20 -0600, Chris Buddenhagen wrote:
> Dear all

> I am using R to produce ordinations library(vegan) and the plot function
> produced looks great on the screen but when I send it to jpg or pdf or eps
> the resolution is not so good. Can you tell me how to get high resolution
> images out of R for publication?

It would be helpful to see the actual code that you are using, as is
asked for in the Posting Guide.

For publication, it would be rare to want to use a bitmapped format such
as jpg/png.

pdf and eps are vector based formats and would be generally preferred
over the above.

I can only hazard a guess to consider that perhaps your height and width
arguments for pdf() and/or postscript() are not set properly, as there
are no other resolution based settings for those formats. By design, the
output resolution is target device dependent.

Please provide the code that you are using and we can respond with more
specific guidance.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] shell scripts in R

2005-10-13 Thread Marc Schwartz (via MN)
On Fri, 2005-10-14 at 00:04 +0200, Benedykt P. Barszcz wrote:
> Dnia czwartek, 13 października 2005 23:25, Andrew Robinson napisał:
> > Marco,
> >
> > use the system command.
> >
> > ?system
> >
> > I hope that this helps,
> 
> system(ls, intern = FALSE, ignore.stderr = TRUE)
> Error in as.character(args[[i]]) : cannot coerce to vector
> 
> In fact it ignores the stderr !

As per ?system:

command  the system command to be invoked, as a string.

Thus,

 system("ls", intern = FALSE, ignore.stderr = TRUE)


HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] subsetting data frame using by() or tapply() or other

2005-10-13 Thread Marc Schwartz (via MN)
On Thu, 2005-10-13 at 14:28 -0600, Brian S Cade wrote:
> Ok so I see the problem that I'm having creating a new variable (LAG1DBC) 
> in the example data transformation below is that tapply() is creating a 
> list that is not dimensionally consistent with the data frame (data).  So 
> how do I go from the list output of tapply() to create a dimensionally 
> consistent vector that can create the new variable in my original data 
> frame?  I've been trying to use a function like
> data$LAG1DBC <- tapply(data$DBC, data$LOCID, function(x) c(NA, 
> x[-length(x)]))
> which creates a list of dimension much smaller than the nrows in data. And 
> I've tried things like using as.data.frame.array() or as.data.frame.list() 
> in front of tapply() and still have the same problem.  I know this can't 
> be that unusual of a data manipulation and that someone has to have done 
> similar things before.
> 
> I want to go from something like this:
> 
>LOCID  POPULATION  YEARDBC
> 1  algb-1   A 1992 0.70451575
> 2  algb-1   A 1993 0.59506851
> 3  algb-1   A 1997 0.84837544
> 4  algb-1   A 1998 0.50283182
> 5  algb-1   A 2000 0.91242707
> 6  algb-2   A 1992 0.09747155
> 7  algb-2   A 1993 0.84772253
> 8  algb-2   A 1997 0.43974081
> 9  algb-2   A 1998 0.83108544
> 10 algb-2   A 2000 0.22291192
> 11 algb-3   A 1992 0.44234175
> 12 algb-3   A 1993 0.54089534
> 5680 taylr-73   B 2001 0.43918082
> 5681 taylr-73   B 2002 0.34694427
> 5682 taylr-73   B 2003 3.35619190
> 5683 taylr-73   B 2004 0.71575815
> 5684 taylr-73   B 2005 0.42038506
> 5685 taylr-74   B 1992 3.88410354
> 5686 taylr-74   B 1993 3.32472557
> 5687 taylr-74   B 1994 3.29861501
> 5688 taylr-74   B 1996 0.48153827
> 5689 taylr-74   B 1997 3.63570636
> 5690 taylr-74   B 1998 1.94630194
> 
> to something like this:
> 
>LOCID  POPULATION  YEARDBC LAG1DBC
> 1  algb-1   A 1992 0.70451575   NA 
> 2  algb-1   A 1993 0.59506851 0.70451575
> 3  algb-1   A 1997 0.84837544   0.59506851
> 4  algb-1   A 1998 0.50283182 0.84837544
> 5  algb-1   A 2000 0.91242707   0.50283182
> 6  algb-2   A 1992 0.09747155   NA
> 7  algb-2   A 1993 0.84772253 0.09747155
> 8  algb-2   A 1997 0.43974081   0.84772253
> 9  algb-2   A 1998 0.83108544   0.43974081
> 10 algb-2   A 2000 0.22291192   0.83108544
> 11 algb-3   A 1992 0.44234175   NA
> 12 algb-3   A 1993 0.54089534   0.44234175
> 5680 taylr-73   B 2001 0.43918082   NA
> 5681 taylr-73   B 2002 0.34694427   0.43918082
> 5682 taylr-73   B 2003 3.35619190   0.34694427
> 5683 taylr-73   B 2004 0.71575815   3.35619190
> 5684 taylr-73   B 2005 0.42038506   0.71575815
> 5685 taylr-74   B 1992 3.88410354   NA
> 5686 taylr-74   B 1993 3.32472557   3.88410354
> 5687 taylr-74   B 1994 3.29861501   3.32472557
> 5688 taylr-74   B 1996 0.48153827   3.29861501
> 5689 taylr-74   B 1997 3.63570636   0.48153827
> 5690 taylr-74   B 1998 1.94630194   3.63570636
> 
> Brian

Brian,

Use unlist():

> data$LAG1DBC <- unlist(tapply(data$DBC, data$LOCID, 
 function(x) c(NA, x[-length(x)])))

> data
LOCID POPULATION YEARDBCLAG1DBC
1  algb-1  A 1992 0.70451575 NA
2  algb-1  A 1993 0.59506851 0.70451575
3  algb-1  A 1997 0.84837544 0.59506851
4  algb-1  A 1998 0.50283182 0.84837544
5  algb-1  A 2000 0.91242707 0.50283182
6  algb-2  A 1992 0.09747155 NA
7  algb-2  A 1993 0.84772253 0.09747155
8  algb-2  A 1997 0.43974081 0.84772253
9  algb-2  A 1998 0.83108544 0.43974081
10 algb-2  A 2000 0.22291192 0.83108544
11 algb-3  A 1992 0.44234175 NA
12 algb-3  A 1993 0.54089534 0.44234175
5680 taylr-73  B 2001 0.43918082 NA
5681 taylr-73  B 2002 0.34694427 0.43918082
5682 taylr-73  B 2003 3.35619190 0.34694427
5683 taylr-73  B 2004 0.71575815 3.35619190
5684 taylr-73  B 2005 0.42038506 0.71575815
5685 taylr-74  B 1992 3.88410354 NA
5686 taylr-74  B 1993 3.32472557 3.88410354
5687 taylr-74  B 1994 3.29861501 3.32472557
5688 taylr-74  B 1996 0.48153827 3.29861501
5689 taylr-74  B 1997 3.63570636 0.48153827
5690 taylr-74  B 1998 1.94630194 3.63570636

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do

Re: [R] How to generate for one vector matrix

2005-10-13 Thread Marc Schwartz (via MN)
On Thu, 2005-10-13 at 19:47 +0200, Jan Sabee wrote:
> Is there any routine to generate for one vector matrix.
> If I have X I want to generate start from zero to maximum value each vector.
> 
> For example, I have a vector x = (4,2,3,1,4)
> I want to generate n=6 times, for 4, start 0 to 4, then 2 start 0 to 2, ect.
> 
> The result something like this:
> 
> generate(x,n=6)
> 1,1,2,1,4
> 1,2,3,0,3
> 4,0,1,1,1
> 3,1,0,1,4
> 0,0,3,0,0
> 4,1,3,0,4
> 
> Could anyone help me. Thanks.
> Regards,
> Jan Sabee


If I am properly understanding what you are doing here, you have an
initial vector of values. You want to create a matrix, whose columns are
the result of random sampling with replacement from the initial vector
'n' times, where the sampling space for each column "i" is from 0:x[i]?

If correct, this should do it:

> x <- c(4, 2, 3, 1, 4)

> x
[1] 4 2 3 1 4

> sapply(x, function(x) sample(0:x, 6, replace = TRUE))
 [,1] [,2] [,3] [,4] [,5]
[1,]42112
[2,]30000
[3,]12013
[4,]42111
[5,]31310
[6,]01110


Just replace '6' in the sample() arguments with the 'n' you require.

See ?sapply and ?sample.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Removing and restoring factor levels (TYPO CORRECTED)

2005-10-13 Thread Marc Schwartz (via MN)
On Thu, 2005-10-13 at 14:31 -0400, Duncan Murdoch wrote:
> On 10/13/2005 1:07 PM, Marc Schwartz (via MN) wrote:
> > On Thu, 2005-10-13 at 10:02 -0400, Duncan Murdoch wrote:
> >> Sorry, a typo in my previous message (parens in the wrong place in the 
> >> conversion).
> >> 
> >> Here it is corrected:
> >> 
> >> I'm doing a big slow computation, and profiling shows that it is
> >> spending a lot of time in match(), apparently because I have code like
> >> 
> >> x %in% listofxvals
> >> 
> >> Both x and listofxvals are factors with the same levels, so I could
> >> probably speed this up by stripping off the levels and just treating
> >> them as integer vectors, then restoring the levels at the end.
> >> 
> >> What is the safest way to do this?  I am worried that at some point x
> >> and listofxvals will *not* have the same levels, and the optimization
> >> will give the wrong answer.  So I need code that guarantees they have
> >> the same coding.
> >> 
> >> I think this works, where "master" is a factor with the master list of
> >> levels (guaranteed to be a superset of the levels of x and listofxvals),
> >> but can anyone spot anything that might go wrong?
> >> 
> >> # Strip the levels
> >> x <- as.integer( factor(x, levels = levels(master) ) )
> >> 
> >> # Restore the levels
> >> x <- structure( x, levels = levels(master), class = "factor" )
> >> 
> >> Thanks for any advice...
> >> 
> >> Duncan Murdoch
> > 
> > Duncan,
> > 
> > With the predicate that 'master' has the full superset of all possible
> > factor levels defined, it would seem that this would be a reasonable way
> > to go.
> > 
> > This approach would also seem to eliminate whatever overhead is
> > encountered as a result of the coercion of 'x' as a factor to a
> > character vector, which is done by match().
> > 
> > One question I have is, what is the advantage of using structure()
> > versus:
> > 
> >x <- factor(x, levels = levels(master))
> > 
> > ?
> 
> That one doesn't work.  What "factor(x, levels=levels(master))" says is 
> to convert x to a factor, coding the values in it according the levels 
> in master.  But at this point x has values which are integers, so  they 
> won't match the levels of master, which are probably character strings.
> 
> For example:
> 
>  > master <- factor(letters)
>  > print(x <- factor(letters[1:3]))
> [1] a b c
> Levels: a b c
>  > print(x <- as.integer( factor(x, levels = levels(master) ) ) )
> [1] 1 2 3
>  > print(x <- factor(x, levels = levels(master)))
> [1]   
> Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
> 
> I get NA's at the end because the values 1,2,3 aren't in the vector of 
> factor levels (which are the lowercase letters).

As opposed to:

> print(x <- structure(x, levels = levels(master), class = "factor" ))
[1] a b c
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z


OK.  Makes sense. Thanks for the clarification.

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Removing and restoring factor levels (TYPO CORRECTED)

2005-10-13 Thread Marc Schwartz (via MN)
On Thu, 2005-10-13 at 10:02 -0400, Duncan Murdoch wrote:
> Sorry, a typo in my previous message (parens in the wrong place in the 
> conversion).
> 
> Here it is corrected:
> 
> I'm doing a big slow computation, and profiling shows that it is
> spending a lot of time in match(), apparently because I have code like
> 
> x %in% listofxvals
> 
> Both x and listofxvals are factors with the same levels, so I could
> probably speed this up by stripping off the levels and just treating
> them as integer vectors, then restoring the levels at the end.
> 
> What is the safest way to do this?  I am worried that at some point x
> and listofxvals will *not* have the same levels, and the optimization
> will give the wrong answer.  So I need code that guarantees they have
> the same coding.
> 
> I think this works, where "master" is a factor with the master list of
> levels (guaranteed to be a superset of the levels of x and listofxvals),
> but can anyone spot anything that might go wrong?
> 
> # Strip the levels
> x <- as.integer( factor(x, levels = levels(master) ) )
> 
> # Restore the levels
> x <- structure( x, levels = levels(master), class = "factor" )
> 
> Thanks for any advice...
> 
> Duncan Murdoch

Duncan,

With the predicate that 'master' has the full superset of all possible
factor levels defined, it would seem that this would be a reasonable way
to go.

This approach would also seem to eliminate whatever overhead is
encountered as a result of the coercion of 'x' as a factor to a
character vector, which is done by match().

One question I have is, what is the advantage of using structure()
versus:

   x <- factor(x, levels = levels(master))

?

Thanks,

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] apply and plot

2005-10-13 Thread Marc Schwartz (via MN)
On Thu, 2005-10-13 at 14:50 +0100, Luis Ridao Cruz wrote:
> R-help,
> 
> I use the code below to plot some data by applying "apply" function.
> But I don't know how I can get the argument "type" or "col" on the
> "plot" function to distinguish the different lines
> in the graph:
> 
> apply ( my.data, 2, function ( x ) lines ( dimnames ( my.data ) [[1]] ,
> x ) )
> 
> 
> Thank you in advance


Rather than trying the use the construct above, take a look at ?matlines
and/or ?matplot (both on the same help page with ?matpoints.)

I think that you will find these purposefully designed functions better
suited to what I believe you are trying to do here.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Correlation, by date, of two variables?

2005-10-12 Thread Marc Schwartz (via MN)
On Wed, 2005-10-12 at 12:01 -0700, t c wrote:
> I have a dataset with three variables: date, var1, var2

> How can I calculate the correlation, by date, between var1 and var2?
> 
> e.g.
> 
> 
> datevar1var2
> 1/1/200154
> 1/1/200185
> 1/1/200197
> 
>  
> 
> 2/1/200172
> 2/1/200121
> 2/1/200146
> 
>  
> 
> 3/1/200135
> 3/1/200143
> 3/1/200169
> 3/1/20017-1
> 
>  
> 
> the results I want:
> 1/1/2001 0.891042111
> 2/1/2001 0.075093926
> 3/1/2001 -0.263117406

t c,

Given your series of posts here, I would highly recommend that before
you consider spending money on a tutor or other similar resource, you
take the time to read the freely available documentation that both R
Core and useRs have kindly provided to the Community. The R Core manuals
are all available with your downloaded installation of R and/or from the
main R web site under Documentation as are the Contributed documents.
You will find that your time will be well invested in that effort.

If you read the Posting Guide for the e-mail lists, a link to which is
provided at the bottom of every e-mail that comes through, you will find
an excellent guide to the resources that are available to assist you in
using R.

Trying to learn R (as with any technical endeavor) without reading at
least a basic set of the growing list of documents, books and
publications on R is going to be problematic.

A hint for you here:

See ?by, ?tapply, ?split and ?lapply for guidance on how to generate
summary statistics on subsets of data.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Hmisc latex function

2005-10-12 Thread Marc Schwartz (via MN)
On Wed, 2005-10-12 at 08:33 -0500, Charles Dupont wrote:
> Marc Schwartz (via MN) wrote:
> > On Tue, 2005-10-11 at 10:01 -0400, Rick Bilonick wrote:
> > 
> >>I'm using R 2.2.0 on an up-to-date version of Fedora Core 4 with the
> >>latest version of Hmisc. When I run an example from the latex function I
> >>get the following:
> >>
> >>
> >>>x <- matrix(1:6, nrow=2, dimnames=list(c('a','b'),c('c','d','enLine
> >>
> >>2')))
> >>
> >>>x
> >>
> >>  c d enLine 2
> >>a 1 35
> >>b 2 46
> >>
> >>>latex(x)   # creates x.tex in working directory
> >>
> >>sh: line 0: cd: “/tmp/Rtmpl10983”: No such file or directory
> >>This is pdfeTeX, Version 3.141592-1.21a-2.2 (Web2C 7.5.4)
> >>entering extended mode
> >>! I can't find file `“/tmp/Rtmpl10983/file643c9869”'.
> >><*> “/tmp/Rtmpl10983/file643c9869”
> >>
> >>Please type another input file name: q
> >>(/usr/share/texmf/tex/latex/tools/q.tex
> >>LaTeX2e <2003/12/01>
> >>Babel  and hyphenation patterns for american, french, german,
> >>ngerman, b
> >>ahasa, basque, bulgarian, catalan, croatian, czech, danish, dutch,
> >>esperanto, e
> >>stonian, finnish, greek, icelandic, irish, italian, latin, magyar,
> >>norsk, polis
> >>h, portuges, romanian, russian, serbian, slovak, slovene, spanish,
> >>swedish, tur
> >>kish, ukrainian, nohyphenation, loaded.
> >>File ignored
> >>xdvi-motif.bin: Fatal error: /tmp/Rtmpl10983/file643c9869.dvi: No such
> >>file.
> >>
> >>
> >>How can I fix this?
> >>
> >>Rick B.
> > 
> > 
> > I get the same results, also on FC4 with R 2.2.0.
> > 
> > I am cc:ing Frank here for his input, but a quick review of the code and
> > created files suggests that there may be conflict between the locations
> > of some of the resultant files during the latex system call. Some files
> > appear in a temporary R directory, while others appear in the current R
> > working directory.
> > 
> > For example, if I enter the full filename:
> >  
> >   /tmp/RtmpC12100/file643c9869.tex
> > 
> > at the latex prompt, I get:
> > 
> > 
> >>latex(x)
> > 
> > sh: line 0: cd: “/tmp/RtmpC12100”: No such file or directory
> > This is pdfeTeX, Version 3.141592-1.21a-2.2 (Web2C 7.5.4)
> > entering extended mode
> > ! I can't find file `“/tmp/RtmpC12100/file643c9869”'.
> > <*> “/tmp/RtmpC12100/file643c9869”
> > 
> > Please type another input file name: *** loading the extensions
> > datasource
> > /tmp/RtmpC12100/file643c9869.tex
> > (/tmp/RtmpC12100/file643c9869.tex
> > LaTeX2e <2003/12/01>
> > Babel  and hyphenation patterns for american, french, german,
> > ngerman, b
> > ahasa, basque, bulgarian, catalan, croatian, czech, danish, dutch,
> > esperanto, e
> > stonian, finnish, greek, icelandic, irish, italian, latin, magyar,
> > norsk, polis
> > h, portuges, romanian, russian, serbian, slovak, slovene, spanish,
> > swedish, tur
> > kish, ukrainian, nohyphenation, loaded.
> > (/usr/share/texmf/tex/latex/base/report.cls
> > Document Class: report 2004/02/16 v1.4f Standard LaTeX document class
> > (/usr/share/texmf/tex/latex/base/size10.clo))
> > (/usr/share/texmf/tex/latex/geometry/geometry.sty
> > (/usr/share/texmf/tex/latex/graphics/keyval.sty)
> > (/usr/share/texmf/tex/latex/geometry/geometry.cfg))
> > No file file643c9869.aux.
> > [1] (./file643c9869.aux) )
> > Output written on file643c9869.dvi (1 page, 368 bytes).
> > Transcript written on file643c9869.log.
> > xdvi-motif.bin: Fatal error: /tmp/RtmpC12100/file643c9869.dvi
> 
> 
> H,  It works for me.  Interesting.
> 
> It almost looks like the temp dir is not being created, but thats not 
> possible because R does that.  It might be a Unicode issue with you 
> system shell.  Can you run this statement in R
> 
> sys(paste('cd',dQuote(tempdir()),";",
> "echo Hello BOB > test.test",
> ";","cat test.test"))
> 
> 
> What version of Hmisc are you using?  What local are you using?
> 
> Charles

Hmisc version 3.0-7, Dated 2005-09-15, which is the latest according to
CRAN.

> sys(paste('cd',dQuote(tempdir()),";",
+ "echo Hello BOB > test.test",
+ ";","cat test.test&qu

Re: [R] Two factor (or more) non-parametric comparison of means

2005-10-11 Thread Marc Schwartz (via MN)
Peter,

Good points all around.

Hopefully John will provide further clarification.

Best regards,

Marc

On Tue, 2005-10-11 at 19:46 +0200, Peter Dalgaard wrote:
> "Marc Schwartz (via MN)" <[EMAIL PROTECTED]> writes:
> 
> > I suspect that John may be looking for ?friedman.test, which I believe
> > will allow for a two-way non-parametric test.
> 
> You could be right, but you could also be wrong... It depends quite a
> bit on what is meant by a "two-factor  comparison of means". If he
> literally means means (!), then a permutation test (permuting within
> levels of the other factor) could be appropriate. It also makes a
> difference whether there's a single replication or more per cell in
> the cross-classification.
> 
> > kruskal.test() will perform a non-parametric test on two or more samples
> > (or factor levels) as a generalization of wilcox.test() for one or two
> > samples (or factor levels).
> > 
> > HTH,
> > 
> > Marc Schwartz
> > 
> > On Tue, 2005-10-11 at 18:22 +0200, Johan Sandblom wrote:
> > > ?kruskal.test
> > > 
> > > 2005/10/11, John Sorkin <[EMAIL PROTECTED]>:
> > > > Can anyone suggest a good non-parametric test, and an R
> > > implementation of the test,  that allows for two or more factors?
> > > Wilcoxon signed rank allows for only one.
> > > > Thanks,
> > > > John
> > 
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> > http://www.R-project.org/posting-guide.html
> > 
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Hmisc latex function

2005-10-11 Thread Marc Schwartz (via MN)
On Tue, 2005-10-11 at 10:01 -0400, Rick Bilonick wrote:
> I'm using R 2.2.0 on an up-to-date version of Fedora Core 4 with the
> latest version of Hmisc. When I run an example from the latex function I
> get the following:
> 
> > x <- matrix(1:6, nrow=2, dimnames=list(c('a','b'),c('c','d','enLine
> 2')))
> > x
>   c d enLine 2
> a 1 35
> b 2 46
> > latex(x)   # creates x.tex in working directory
> sh: line 0: cd: “/tmp/Rtmpl10983”: No such file or directory
> This is pdfeTeX, Version 3.141592-1.21a-2.2 (Web2C 7.5.4)
> entering extended mode
> ! I can't find file `“/tmp/Rtmpl10983/file643c9869”'.
> <*> “/tmp/Rtmpl10983/file643c9869”
> 
> Please type another input file name: q
> (/usr/share/texmf/tex/latex/tools/q.tex
> LaTeX2e <2003/12/01>
> Babel  and hyphenation patterns for american, french, german,
> ngerman, b
> ahasa, basque, bulgarian, catalan, croatian, czech, danish, dutch,
> esperanto, e
> stonian, finnish, greek, icelandic, irish, italian, latin, magyar,
> norsk, polis
> h, portuges, romanian, russian, serbian, slovak, slovene, spanish,
> swedish, tur
> kish, ukrainian, nohyphenation, loaded.
> File ignored
> xdvi-motif.bin: Fatal error: /tmp/Rtmpl10983/file643c9869.dvi: No such
> file.
> 
> 
> How can I fix this?
> 
> Rick B.

I get the same results, also on FC4 with R 2.2.0.

I am cc:ing Frank here for his input, but a quick review of the code and
created files suggests that there may be conflict between the locations
of some of the resultant files during the latex system call. Some files
appear in a temporary R directory, while others appear in the current R
working directory.

For example, if I enter the full filename:
 
  /tmp/RtmpC12100/file643c9869.tex

at the latex prompt, I get:

> latex(x)
sh: line 0: cd: “/tmp/RtmpC12100”: No such file or directory
This is pdfeTeX, Version 3.141592-1.21a-2.2 (Web2C 7.5.4)
entering extended mode
! I can't find file `“/tmp/RtmpC12100/file643c9869”'.
<*> “/tmp/RtmpC12100/file643c9869”

Please type another input file name: *** loading the extensions
datasource
/tmp/RtmpC12100/file643c9869.tex
(/tmp/RtmpC12100/file643c9869.tex
LaTeX2e <2003/12/01>
Babel  and hyphenation patterns for american, french, german,
ngerman, b
ahasa, basque, bulgarian, catalan, croatian, czech, danish, dutch,
esperanto, e
stonian, finnish, greek, icelandic, irish, italian, latin, magyar,
norsk, polis
h, portuges, romanian, russian, serbian, slovak, slovene, spanish,
swedish, tur
kish, ukrainian, nohyphenation, loaded.
(/usr/share/texmf/tex/latex/base/report.cls
Document Class: report 2004/02/16 v1.4f Standard LaTeX document class
(/usr/share/texmf/tex/latex/base/size10.clo))
(/usr/share/texmf/tex/latex/geometry/geometry.sty
(/usr/share/texmf/tex/latex/graphics/keyval.sty)
(/usr/share/texmf/tex/latex/geometry/geometry.cfg))
No file file643c9869.aux.
[1] (./file643c9869.aux) )
Output written on file643c9869.dvi (1 page, 368 bytes).
Transcript written on file643c9869.log.
xdvi-motif.bin: Fatal error: /tmp/RtmpC12100/file643c9869.dvi: No such
file.


The temporary .tex file is present, but the .dvi, .aux and .log files
are created in the current working R directory.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Two factor (or more) non-parametric comparison of means

2005-10-11 Thread Marc Schwartz (via MN)
I suspect that John may be looking for ?friedman.test, which I believe
will allow for a two-way non-parametric test.

kruskal.test() will perform a non-parametric test on two or more samples
(or factor levels) as a generalization of wilcox.test() for one or two
samples (or factor levels).

HTH,

Marc Schwartz

On Tue, 2005-10-11 at 18:22 +0200, Johan Sandblom wrote:
> ?kruskal.test
> 
> 2005/10/11, John Sorkin <[EMAIL PROTECTED]>:
> > Can anyone suggest a good non-parametric test, and an R
> implementation of the test,  that allows for two or more factors?
> Wilcoxon signed rank allows for only one.
> > Thanks,
> > John

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] wildcards and removing variables

2005-10-10 Thread Marc Schwartz (via MN)
On Mon, 2005-10-10 at 16:01 +0100, Prof Brian Ripley wrote:
> On Mon, 10 Oct 2005, Marc Schwartz (via MN) wrote:
> 
> > On Mon, 2005-10-10 at 10:37 -0400, Afshartous, David wrote:
> >>All,
> >>
> >>Is there are a wildcard in R for varible names as in unix?  For example,
> >>
> >>rm(results*)
> >>
> >>to remove all variable or function names that begin w/ "results"?
> >>
> >>cheers,
> >>Dave
> >>ps - please respond directly to [EMAIL PROTECTED]
> >
> >
> > See ?ls, which has a 'pattern' argument, enabling the use of Regex to
> > define the objects to be listed and subsequently removed using rm().
> >
> > You can then use something like:
> >
> >  rm(list = ls(pattern = "\\bresults."))
> 
> One new feature of R-2.2.0 is glob2rx, which converts wildcards to regexps 
> for you. E.g.
> 
> rm(list = ls(pattern = glob2rc("results*")))
> 
> (I think if does it a little better, as that trailing dot is not I think 
> correct: result* matches result.)

Thanks to both Gabor and Prof. Ripley for pointing out the use of
glob2rc(). One of the new features I had forgotten about since reading
NEWS.

On the trailing dot, I was just in the process of drafting a follow up
after realizing my error, since as you point out, 'results' would not be
matched in that case.

Thanks,

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] wildcards and removing variables

2005-10-10 Thread Marc Schwartz (via MN)
On Mon, 2005-10-10 at 10:37 -0400, Afshartous, David wrote:
>   All,
> 
>   Is there are a wildcard in R for varible names as in unix?  For example,
> 
>   rm(results*)
> 
>   to remove all variable or function names that begin w/ "results"?  
> 
>   cheers,
>   Dave
>   ps - please respond directly to [EMAIL PROTECTED] 


See ?ls, which has a 'pattern' argument, enabling the use of Regex to
define the objects to be listed and subsequently removed using rm().

You can then use something like:

  rm(list = ls(pattern = "\\bresults."))

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] spline.des

2005-10-05 Thread Marc Schwartz (via MN)
On Mon, 2005-10-03 at 15:28 -0500, lforzani wrote:
> Hello, I am using library fda and I can not run a lot of functions because
> I receive the error:
> 
> Error in bsplineS(evalarg, breaks, norder, nderiv) : 
> couldn't find function "spline.des"
> 
> 
> do you know how I can fix that? Thnaks. Liliana


spline.des() is in the 'splines' package, installed with the basic R
distribution. It appears that bsplineS() has a dependency on this not
otherwise referenced in the fda package documentation. Looks like fda
has not been updated in some time as well.

I have cc'd Jim Ramsay on this reply as an FYI.

You probably need a:

  library(splines)

before calling your code above.


HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Getting eps into Word documents.

2005-10-04 Thread Marc Schwartz (via MN)
On Tue, 2005-10-04 at 08:38 -0500, Robert Baer wrote:
> > > On 03-Oct-05 Marc Schwartz (via MN) wrote:
> > > > On Mon, 2005-10-03 at 16:31 -0300, Rolf Turner wrote:
> > > >> A student in one of my courses has asked me about getting R graphics
> > > >> output (under Linux) into a Word document.  I.e. she wants to do her
> > > >> R thing under Linux, but then do her word processing using Word.
> --snip--
> > > > So use something like the following:
> > > >
> > > >
> > > > postscript("RPlot.eps", height = 4, width = 4,
> > > >horizontal = FALSE, onefile = FALSE,
> > > >paper = "special")
> > > >
> > > > plot(1:5)
> > > >
> > > > dev.off()
> > > >
> > > >
> > > > You can then import the .eps file into Word or most other such
> > > > applications that can import encapsulated postscript files.
> 
> -snip
> > > > More information is available from MS here:
> > > >
> > > > http://support.microsoft.com/?kbid=290362
> > > >
> > > > HTH,
> > > >
> > > > Marc Schwartz
> --snip---
> > > b) It won't work anyway if printed to a non-PostScript printer.
> >
> > True, which is the case irrespective of Word/Windows. If you don't have
> > a PS printer locally or accessible via network, you can always install a
> > PS printer driver and print to a file, which can then be printed by a
> > third party if required.
> >
> Well, as a lowly Windows and Office user, I most often right click on R
> grahics, cut to clipboard, and paste into Word.   So one possiblility is for
> the student to install R on her own machine (Windows or Mac?).
> 
> But I just tried Marc's suggestion, and it looks VERY VIABLE to me.  I
> generated the graph from his code snippet and used "Insert picture from
> file" in Word 2003 to place the graphic in a Word document.  I then tried
> printing on both an HP 4100 TN laserjet and an HP 960c deskjet.  The image
> printed perfectly on both printers with crisp lines and text that apprear to
> be vector-based not degraded bitmapped representations.  Certainly worth the
> student trying.


Glad to hear that worked for you.

I do think that the majority of problems with importing EPS files from R
into other applications (Word, Powerpoint, Writer, Impress, etc.) are
due to not properly configuring the postscript() arguments that are on
the help page. 

I should also note that beyond plots, I use this same approach for
putting LaTeX tables into these documents as well. There are times where
I need create one or more nicely formatted tables for inclusion in
another document, to then be used by someone else.

I generate the LaTeX preamble, document and table code using R, then
run:

  latex FileName.tex
  dvips -E FileName -o FileName.ps

where the '-E' option to dvips attempts to create an EPS output file
with a tight bounding box. The result is a single EPS file with the
table (much like the plot example) ready for import.

I had been using 'epstool' to create the image preview, but now that
OO.org 2.0 has included this automatically upon import (as do the latest
versions of Word), this step is no longer required.

In this way, I can use R to create reproducible and publication ready
tables for use by others in their documents, in situations where they
are not (or cannot) use LaTeX for the entire document.

HTH,

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Getting eps into Word documents.

2005-10-03 Thread Marc Schwartz (via MN)
On Mon, 2005-10-03 at 22:00 +0100, Ted Harding wrote:
> Rolf (& Marc)
> 
> On 03-Oct-05 Marc Schwartz (via MN) wrote:
> > On Mon, 2005-10-03 at 16:31 -0300, Rolf Turner wrote:
> >> A student in one of my courses has asked me about getting R graphics
> >> output (under Linux) into a Word document.  I.e. she wants to do her
> >> R thing under Linux, but then do her word processing using Word.
> >> 
> >> Scanning around the r-help archives I encountered an inquiry about
> >> this topic --- eps into Word documents --- from Paul Johnson but
> >> found no replies to it.  I tried contacting him but the email address
> >> in the archives appeared not to be valid.  Does anyone know a
> >> satisfactory solution to the problem of including a graphic which
> >> exists in the form of a *.eps (encapsulated postscript) file into a
> >> Word document.  If so, would you be willing to share it with me and
> >> my student?
> >> 
> >> If so, please be gentle in your explanation.  I am not myself (repeat
> >> ***NOT***) a user of Word!
> > 
> > Hehe...  :-)
> > 
> > Rolf, just use the guidance provided in ?postscript. In the details
> > section it indicates:
> > 
> >  The postscript produced by R is EPS (_Encapsulated PostScript_)
> >  compatible, and can be included into other documents, e.g., into
> >  LaTeX, using '\includegraphics{}'.  For use in this way
> >  you will probably want to set 'horizontal = FALSE, onefile =
> >  FALSE, paper = "special"'.
> > 
> > So use something like the following:
> > 
> > 
> > postscript("RPlot.eps", height = 4, width = 4, 
> >horizontal = FALSE, onefile = FALSE,
> >paper = "special")
> > 
> > plot(1:5)
> > 
> > dev.off()
> > 
> > 
> > You can then import the .eps file into Word or most other such
> > applications that can import encapsulated postscript files.
> > 
> > The recent versions of Word will also automatically generate a
> > bitmapped preview of the plot upon import.  BTW, OO.org 2.0,
> > which is in late beta testing now, also generates EPS preview
> > images upon import.
> > 
> > The key to doing this successfully is using the arguments to
> > postscript() as defined above. I have never had a problem with this.
> > 
> > More information is available from MS here:
> > 
> > http://support.microsoft.com/?kbid=290362
> > 
> > HTH,
> > 
> > Marc Schwartz
> 
> This suggestion could be problematic in that

Ted,

> a) According to the MS web site above, it applies to recent Word
>(Office 2002/2003) or possibly earlier "depending on installed
>graphics filters".

The Word EPS filter has been around for some time. I recall using it
years ago. Note that it is listed on that site in both categories of
filters for some reason.  In the older versions of Word, there was no
preview image generated, hence you ended up with a box/frame place
holder of sorts, unless you added a preview image before importing.

That being said, one does need to install the import filters from the
Office CD, which may not be part of the default settings on all prior
versions. It may require going back into the Office set up program to
install them.

> b) It won't work anyway if printed to a non-PostScript printer.

True, which is the case irrespective of Word/Windows. If you don't have
a PS printer locally or accessible via network, you can always install a
PS printer driver and print to a file, which can then be printed by a
third party if required.  

If one has ps2pdf/Ghostscript available, you can also convert the PS
file to PDF and then print it via Acrobat or other PDF viewers.  

> If either of these applies to Rolf's student, she could have problems.
> 
> [Just to add my own "disclaimer": the only version of Word I'm in
> any position to ever touch, and then only if driven to, belongs
> to Office 98; and I'm sure that this doesn't know a thing about
> PostScript!]
> 
> Another option to consider, since she's doing her R work on Linux,
> is that recent versions of the ImageMagick program 'convert' have
> the capability to convert EPS into WMF (Windows Metafile; use
> file extension ".wmf" for 'convert') or EMF (Enhanced Metafile);
> use file extension ".emf". The gubbins is built in to a file
> "wmf.so" in the lib/ImageMagick tree.
> 
> Likewise, the program 'pstoedit' can do it (to ".wmf" or ".emf"),
> usi

Re: [R] Getting eps into Word documents.

2005-10-03 Thread Marc Schwartz (via MN)
On Mon, 2005-10-03 at 16:31 -0300, Rolf Turner wrote:
> A student in one of my courses has asked me about getting R graphics
> output (under Linux) into a Word document.  I.e. she wants to do her
> R thing under Linux, but then do her word processing using Word.
> 
> Scanning around the r-help archives I encountered an inquiry about
> this topic --- eps into Word documents --- from Paul Johnson but
> found no replies to it.  I tried contacting him but the email address
> in the archives appeared not to be valid.  Does anyone know a
> satisfactory solution to the problem of including a graphic which
> exists in the form of a *.eps (encapsulated postscript) file into a
> Word document.  If so, would you be willing to share it with me and
> my student?
> 
> If so, please be gentle in your explanation.  I am not myself (repeat
> ***NOT***) a user of Word!

Hehe...  :-)

Rolf, just use the guidance provided in ?postscript. In the details
section it indicates:

 The postscript produced by R is EPS (_Encapsulated PostScript_)
 compatible, and can be included into other documents, e.g., into
 LaTeX, using '\includegraphics{}'.  For use in this way
 you will probably want to set 'horizontal = FALSE, onefile =
 FALSE, paper = "special"'.

So use something like the following:


postscript("RPlot.eps", height = 4, width = 4, 
   horizontal = FALSE, onefile = FALSE,
   paper = "special")

plot(1:5)

dev.off()


You can then import the .eps file into Word or most other such
applications that can import encapsulated postscript files.

The recent versions of Word will also automatically generate a bitmapped
preview of the plot upon import.  BTW, OO.org 2.0, which is in late beta
testing now, also generates EPS preview images upon import.

The key to doing this successfully is using the arguments to
postscript() as defined above. I have never had a problem with this.

More information is available from MS here:

http://support.microsoft.com/?kbid=290362

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] gnomeGUI installation

2005-10-03 Thread Marc Schwartz (via MN)
On Mon, 2005-10-03 at 18:01 +0100, Prof Brian Ripley wrote:
> On Mon, 3 Oct 2005, Daniel Pick wrote:
> 
> >   I have successfully downloaded the sources and built R as a shared
> > library on a Red Hat Enterprise Level 3 box.  I am now trying to build the
> > GNOME GUI, but configure is barfing on glade.  According to the system
> > logs, the RPM for libglade2-devel-2.0.1-3.x86_64 is installed, but in
> > /usr/bin, where gnomeGUI configure is looking, what's there is
> > libglade-convert.  How do I fix this?
> 
> By reading the manual.  In the R-admin manual it says
> 
> `Please check you have all the requirements.  You need at least the
> following packages or later installed:
> 
> audiofile-0.2.1
> esound-0.2.23
> glib-1.2.10
> gtk+-1.2.10
> imlib-1.9.10
> ORBit-0.5.12
> gnome-libs-1.4.1.2
> libxml-1.8.16
> libglade-0.17
> 
> ...
> 
> Remember that some package management systems (such as @acronym{RPM} and
> deb) make a distinction between the user version of a package and the
> developer version.  The latter usually has the same name but with the
> extension @samp{-devel} or @samp{-dev}.  If you use a pre-packaged
> version of @acronym{GNOME} then you must have the developer versions of
> the above packages in order to compile the R-GNOME console.'
> 
> My FC3 box has
> 
> gannet% rpm -qa | grep ^libglade
> libglade2-devel-2.4.0-5
> libglade-devel-0.17-15
> libglade2-2.4.0-5
> libglade-0.17-15
> 
> and note that libglade2[-devel] is NOT a later version of libglade (in the 
> same way that gtk2 is different from gtk).
> 
> AFAICS it is likely that /usr/bin/libglade-config is provided by 
> libglade-devel-0.17-15.


That is correct, subject of course to RPM version differences between
FC3/4 and RHEL3/4.  Using RPM, this can be confirmed with:

$ rpm -q --whatprovides /usr/bin/libglade-config
libglade-devel-0.17-16

which is the output on FC4.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Binary Logit Regression with R

2005-09-29 Thread Marc Schwartz (via MN)
On Thu, 2005-09-29 at 18:08 -0400, Johann Park wrote:
> Hi to all, 
> 
> I am a PH.D Student doing statistical analysis.
> I am totally new to R. I previously use Stata and am changing into R. I 
> ususally do with logit regression with binary dependent variable (war 
> occurence:1 or 0). 
> 
> I just want to know command to do that. More sepcifically, 
> 
> Let say, my Y is war occurence (occur=1, otherwise 0). And my independent 
> variables (Xs) are trade, democracy, military poweretc. 
> 
> In Stata, I do like what follows: 
> 
> logit war trade democracy militarypower... 
> 
> Then I will get results. 
> 
> What are the equivalent command in R? 
> 
> Many thanks, 
> 
> JP

See ?glm in the base stats package or ?lrm in Frank Harrell's Design
package on CRAN.

BTW, doing:

  help.search("logit")

or 

  RSiteSearch("logit")

would provide you with the ability to do keyword searches of your
current R installation or the online e-mail list and documentation
archives, respectively.

You should also review Chapter 11 - Statistical Models in R in "An
Introduction to R", which is installed with R or available online under
the Documentation/Manuals link on the main R web site.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] How to add frame (frame.plot=T) to Plot.Design?

2005-09-29 Thread Marc Schwartz (via MN)
On Thu, 2005-09-29 at 18:36 +0200, Jan Verbesselt wrote:
> Hi R-help,
> 
>  
> 
> When using the package Design and the plot.Design function all the graphs of
> an lrm.fit (lrm()) are plotted without a frame (only with axes). How can the
> frame be added to these plots?
> 
>  
> 
> In the plot.default function the frame can be added/or removed via the
> frame.plot=T/F, respectively.
> 
> e.g., plot(1,axes=T,frame.plot=T). How can this be done with
> plot.Design(lrm.fit)
> 
>  
> 
> Thanks,
> 
> Jan


See ?box

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Display values in piechart/barplot

2005-09-29 Thread Marc Schwartz (via MN)
On Thu, 2005-09-29 at 14:34 +0200, Volker Rehbock wrote:
> Is it possible to automatically display the underlying values of a 
> piechart/barplot in the graphic? If so, which package/function/argument do I 
> need for it?
> Thanks,
> Volker


Using pie charts are not a particularly good way of displaying data,
even though the pie() function is available in R. See ?pie for more
information on this.

For barplots, the following provide two approaches:


# Place the bar values above the bars
# Note that I set 'ylim' to make room for 
# the text labels above the bars

vals <- 1:5
names(vals) <- LETTERS[1:5]
mp <- barplot(vals, ylim = c(0, 6))
text(mp, vals, labels = vals, pos = 3)



# Place the bar values below the x axis

vals <- 1:5
names(vals) <- LETTERS[1:5]
mp <- barplot(vals)
mtext(side = 1, at = mp, text = vals, line = 3)


Note that barplot() returns the bar midpoints, so you can use these
values for annotation placement.

See ?barplot, ?text and ?mtext for more information.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] boxplot and xlim confusion?

2005-09-29 Thread Marc Schwartz (via MN)
On Thu, 2005-09-29 at 15:28 +0200, Karin Lagesen wrote:
> "Petr Pikal" <[EMAIL PROTECTED]> writes:
> 
> > Hi Karin
> >
> > I did not have seen any answer for your question yet so here is a 
> > try.
> >
> > I gues you want the horizotal layout or your boxplot.
> >
> > boxplot(split(rnorm(30), rep(1:3, each=10)), horizontal =T, 
> > names=letters[1:3])
> >
> > boxplot(split(rnorm(30), rep(1:3, each=10)), horizontal =T, 
> > names=c(NA,"b",NA))
> 
> These look like the plots I want, yes. However, is there a way of
> locking the x-axis? When I do these several times in a row, the range
> of the x axis moves with the data being plotted. If I set xlim in the
> boxplot command, it makes no difference what so ever.

That's because when you use 'horizontal = TRUE' in boxplot(), the x and
y axes are rotated. Thus you need to set 'ylim' and not 'xlim', where
ylim is the range of values in your data irrespective of the axis
orientation.

This is done in bxp(), which is the actual function doing the plotting
for boxplot(). In bxp(), there is the follow code snippet:

if (horizontal) 
plot.window(ylim = c(0.5, n + 0.5), xlim = ylim, 
log = log)
else plot.window(xlim = c(0.5, n + 0.5), ylim = ylim, 
 log = log)

Note in the first case, that the xlim value for plot.window() is set to
the ylim value passed from boxplot() or from bxp(), if called directly.

So, the above examples by Petr should be something like:

boxplot(split(rnorm(30), rep(1:3, each=10)), horizontal = TRUE, 
names=letters[1:3], ylim = c(-4, 4))

boxplot(split(rnorm(30), rep(1:3, each=10)), horizontal =TRUE, 
names=c(NA,"b",NA), ylim = c(-4, 4))


HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] correct syntax

2005-09-28 Thread Marc Schwartz (via MN)
On Thu, 2005-09-29 at 06:57 +1000, Stephen Choularton wrote:
> Hi 
>  
> I am trying to produce a little table like this:
>  
> 0 1
> 0  601   408
> 1  290   2655
>  
> but I cannot get the syntax right.  
>  
> Can anyone help.
>  
> Stephen


Do you have the raw data or only the tabulated counts?

If you have the raw data, see ?table for ways to create n-dimensional
tables.

If you only have the tabulated counts, then you can use matrix():

> mat <- matrix(c(601, 290, 408, 2655), ncol = 2)
> dimnames(mat) <- list(c(0, 1), c(0, 1))

> mat
01
0 601  408
1 290 2655


See ?matrix for more information.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Using unsplit - unsplit does not seem to reverse the effect of split

2005-09-27 Thread Marc Schwartz (via MN)
On Tue, 2005-09-27 at 19:12 +0200, Søren Højsgaard wrote:
> In data OME in MASS I would like to extract the first 5 observations
> per subject (=ID). So I do
>  
> library(MASS)
> OMEsub <- split(OME, OME$ID)
> OMEsub <- lapply(OMEsub,function(x)x[1:5,])
> unsplit(OMEsub, OME$ID)
> 
> - which results in 
>  
> [[1]]
> [1] 1 1 1 1 1
> [[2]]
> [1] 30 30 30 30 30
> [[3]]
> [1] low low low low low
> Levels: N/A high low
> [[4]]
> [1] 35 35 40 40 45
> [[5]]
> [1] coherent   incoherent coherent   incoherent coherent  
> Levels: coherent incoherent
> [[6]]
> [1] 1 4 0 1 2
> 
> 
>  
> [[1094]]
> [1] 4 5 5 5 2
> [[1095]]
> [1] 100 100 100 100 100
> [[1096]]
> [1] 18 18 18 18 18
> [[1097]]
> [1] N/A N/A N/A N/A N/A
> Levels: N/A high low
> There were 50 or more warnings (use warnings() to see the first 50)
> 
> warnings()
> Warning messages:
> 1: number of items to replace is not a multiple of replacement length
> 2: number of items to replace is not a multiple of replacement length
> 3: number of items to replace is not a multiple of replacement length
> 
>  
> According to documentation unsplit is the reverse of split, but I must
> be missing a point somewhere... Can anyone help? Thanks in advance.
> Søren


If you read the documentation for split/unsplit, you will also note that
in the Details section it says:

 'unsplit' works only with lists of vectors

as opposed to lists of data frames, which is the result of your split()
operation.

Also note that in the Value section, it indicates:

'unsplit' returns a vector for which 'split(x, f)' equals 'value'

as opposed to unsplit returning a data frame.


Thus, use:

  OME1 <- do.call("rbind", OMEsub)

where OME1 will be the result of rbind()'ing the data frames in the
OMEsub list.

See ?do.call for more information.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Help: x11 position in the Unix environment

2005-09-26 Thread Marc Schwartz (via MN)
On Mon, 2005-09-26 at 17:45 +0200, Shengzhe Wu wrote:
> Hello,
> 
> In the Unix environment, I open a window by x11(). May I specify the
> position of this window by specifying the position of the top left of
> the window as in Windows environment? Or some other parameters can be
> used to do that?
> 
> Thank you,
> Shengzhe


I don't believe so. In general, under Unix/Linux, the Window Manager
determines window positioning upon startup unless the application
overrides this behavior. Some applications let you specify application
window positioning via command line 'geometry' arguments or via the use
of an .Xresources file. 

Some WM's provide more or less functionality for this behavior relative
to user customization. For example, Sawfish provides quite a bit,
whereas Metacity hides much of it.

You may want to check the documentation for the WM that you are using.

There is also an application called Devil's Pie:

  http://www.burtonini.com/blog/computers/devilspie

which provides additional Sawfish-like customization for window
positioning, etc. However, this is global for a given window, not on a
per instance basis.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Fligner-Policello robust rank test

2005-09-22 Thread Marc Schwartz (via MN)
On Thu, 2005-09-22 at 10:44 -0400, Elisabetta Manduchi wrote:
> Can anybody tell me if there is an R implementation of the 
> Fligner-Policello robust rank test?
> Thanks,
> Elisabetta


I don't know about the Fligner-Policello test, but the Fligner-Killeen
test has been implemented in fligner.test(). From ?fligner.test:


The Fligner-Killeen (median) test has been determined in a simulation
study as one of the many tests for homogeneity of variances which is
most robust against departures from normality, see Conover, Johnson &
Johnson (1981). It is a k-sample simple linear rank which uses the ranks
of the absolute values of the centered samples and weights a(i) =
qnorm((1 + i/(n+1))/2). The version implemented here uses median
centering in each of the samples (F-K:med X^2 in the reference).


FWIW, in a Google search, I located the following SAS macro by Paul von
Hippel, which includes a comment that does not lead to optimism
regarding software implementations:

http://www.sociology.ohio-state.edu/people/ptv/macros/fligner_policello.htm

The above should not be overly difficult to convert to R, presuming you
know SAS macros...


HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] how to extract the column name or value from the numerical value of the matrix

2005-09-19 Thread Marc Schwartz (via MN)
On Mon, 2005-09-19 at 12:30 -0700, shanmuha boopathy wrote:
> Dear sir,
>  i have a matrix like
>  x<-c(1:20)
> A<-matrix(x,4,5)
> > A
>  [,1] [,2] [,3] [,4] [,5]
> [1,]159   13   17
> [2,]26   10   14   18
> [3,]37   11   15   19
> [4,]48   12   16   20
> 
> I want to extract the column value for the matrix
> value 11...
> 
> or the row value for 14..
> how it is possible?
> 
> thanks for your help
> 
> with regards,
> boopathy.


Why did you copy r-devel? r-help is the appropriate forum.


See ?which and take note of the 'arr.ind' argument:

> which(A == 11, arr.ind = TRUE)
 row col
[1,]   3   3


> which(A == 14, arr.ind = TRUE)
 row col
[1,]   2   4


HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] How do I get the row indices?

2005-09-16 Thread Marc Schwartz (via MN)
On Fri, 2005-09-16 at 10:34 -0700, Martin Lam wrote:
> Hi,
> 
> I was wondering if it's possible to get the row
> numbers from a filtering. Here's an example:
> 
> # give me the rows with sepal.length == 6.2
> iris[(iris[,1]==6.2),]
> 
> # output
> Sepal.Length Sepal.Width Petal.Length Petal.Width 
>   Species
> 69   6.2 2.2  4.5 1.5
> versicolor
> 98   6.2 2.9  4.3 1.3
> versicolor
> 127  6.2 2.8  4.8 1.8 
> virginica
> 149  6.2 3.4  5.4 2.3 
> virginica
> 
> What I really want is that it return the row numbers:
> 69, 98, 127, 149.
> 
> Thanks in advance,
> 
> Martin

Martin,

First: Be very, very careful when performing exact equalities on
floating point numbers. They won't always result in the answer you
expect. For more information see R FAQ 7.31: Why doesn't R think these
numbers are equal?

Second:

See ?all.equal, ?sapply and ?which. Here is a possible vectorized
solution:

> which(sapply(iris[, 1], function(x) isTRUE(all.equal(x, 6.2
[1]  69  98 127 149


The above applies isTRUE(all.equal(x, 6.2)) for each element 'x' in 
iris[, 1], returning the indices of the TRUE results for the near
equality comparison, based upon the tolerance argument in all.equal().

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Graphics 'snapshots' in Linux?

2005-09-15 Thread Marc Schwartz (via MN)
On Thu, 2005-09-15 at 10:43 -0400, Tyler Smith wrote:
> Hi,
> 
> I'm working on a MEPIS (Debian-based Linux) computer, using the 
> emacs/ESS package to do my R work. I've got some plots that I label 
> interactively using the locate function. With the Windows GUI there is 
> an option to take a snapshot of the graphics output, saving it as an 
> image file. Is there a way to do this with emacs/ESS?
> 
> Thanks,
> 
> Tyler

Tyler,

Take a look at ?dev.copy2eps and on the same page dev.copy(), which
enable you to copy the current X11 plot supported output devices.

You could do something like the following for an EPS file:

plot(1:5)
text(locator(1), "Place Text Here")
dev.copy2eps(file = "MyPlot.eps")


or the following for a PNG file:

par(bg = "white")
plot(1:5)
text(locator(1), "Place Text Here")
dev.copy(device = png, file = "MyPlot.png")
dev.off()


Note that in the first example, dev.off() is not required, as the EPS
output device is closed after the call.

Also, note in the second example, you will need to set the background to
white (unless already specified for whatever color you may be using), as
the default output file will have a transparent background, even though
the png() function shows the default as white. If my memory is correct
this is because the X11 device itself has a transparent background by
default and this is what is copied.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Forcing hist()

2005-09-14 Thread Marc Schwartz (via MN)
On Wed, 2005-09-14 at 20:17 +0200, Par Leijonhufvud wrote:
> I'm trying to create histogram (using hist()) that fullfill the following
> criteria:
> 
> * data is on a ordinal scale (1, 2, 3, 4, 5)
> * I want bars centered over the number on the x-axis
> * I want 5 bars of equal width
> 
> I have tried various versions of the hist() command, with no luck. what
> am I missing?
> 
> /Par


More than likely, you want to use barplot() and not hist():

Try:

  barplot(1:5, names.arg = 1:5)

See ?barplot for more information.

HTH,

Marc Schwartz

P.S. To R Core: It probably makes sense to add barplot to the See Also
section of ?hist, since hist is listed in the See Also for ?barplot.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] png scaling problem

2005-09-01 Thread Marc Schwartz (via MN)
On Thu, 2005-09-01 at 21:51 +0200, Knut Krueger wrote:
> scaling<-4
> xywidth<-480
> resolution<-150
> png(filename = "c:/r/anschluss/plots/4.png", width = xywidth*scaling, 
> height = xywidth*scaling,pointsize = 12, bg = "white", res = 
> resolution*scaling)
> ..
> 
> barplot(xrow,col = barcolors,cex.axis=scaling, ylab="mean time till 
> attachment in sec",cex.lab=1.2*scaling) 
> 
> I tried to scale the barplot but there is one strange result:
> scaling=1
> http://biostatistic.de/temp/1.png--- the ylab is ok
> 
> scaling=2
> http://biostatistic.de/temp/2.png--- the ylab is not ok
> 
> scaling=4
> http://biostatistic.de/temp/4.png--- the ylab is terrible
> 
> is there any better solution to scale the resolution and the width/height?
> 
> 
> with regards
> Knut

Probably a better first question is, why are you using a bitmapped
graphics format if you need the image to be re-scaled? In general,
bitmapped graphics do not resize well, though if you have a specific
need and know a target image size, you can adjust various parameters to
look decent. Are you going to view these images in a web browser, where
you are concerned with display size and resolution?

>From your e-mail headers it appears you are on Windows. If you need a
re-sizable graphic, use a vector based format such as wmf/emf,
especially if you need the graphics embedded in a Windows application
such as Word or Powerpoint. This is the default format under Windows
when copying and pasting a graphic between applications. You can then,
fairly freely, resize the image in the target application as you may
require.

If you are going to print the graphic directly or include it in a
document for printing (as opposed to just viewing it), then use PDF or
Postscript. The latter in EPS format, can be imported into many Windows
applications like Word, including the generation of a preview image.
However, they don't look good for direct use in presentations (unless
you print to a PS file and then convert to PDF for viewing).

See ?Devices for more information.

With a better idea of how you plan to use the graphic(s), we can offer
more specific recommendations on how to proceed.

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] axis of plot

2005-09-01 Thread Marc Schwartz (via MN)
On Thu, 2005-09-01 at 16:58 +0100, [EMAIL PROTECTED] wrote:
> hy,
> 
> I need to have the 0 on the bottom left corner of the graph being
> joined , not with this little hole between x axis and y axis...
> 
> I've saw this question with the answer one time but i'm unable to find
> it again..
> 
> 
> thks.
> guillaume

See 'xaxs' and 'yaxs' in ?par:

plot(1:5, xaxs = "i", yaxs = "i", xlim = c(0, 5), ylim = c(0, 5))

By default, both are set to "r", which adds +/- 4% to the range of each
axis.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] crosstab for n-way contingency tables

2005-08-30 Thread Marc Schwartz (via MN)
On Tue, 2005-08-30 at 12:28 -0700, Isotta Felli wrote:
> Dear list.
>  
> New to R, I'm looking for a way of using crosstab to output
> low-dimensional (higher than 2) contingency tables (frequencies,
> per-cents by rows, % by columns, mean, quantiles) I'm looking for
> something of the following sort
>  
> dataframe: singers, 
> categorical variates: voice category (soprano,mezzo-soprano, ...) ,
> voice type( drammatic, spinto, lirico-spinto, lirico, leggero), school
> (german, italian, french, russian, anglo-saxon, other);repertory
> (opera, Lieder, oratorio, operetta)
> continuous variate: age
>  
> I would like to tabulate the frequencies (relative percentages)  say
> in the following way
> columns: school  repertory
> rows :  voice category  voice type
>  
> or to output in the cells of the above table, the statistics
> (mean/median/quantiles) for age
>  
> 
> I've seen that the function bwplot(age~school | repertory, data=
> singers, layout=c(4,2)) would do graphically something similar to what
> I want, but I desire the output also in tabular form
>  
> Thanks
>  
> Isotta


I don't know that you will find a single function that will do all of
what you desire, but you can look at the ctab() function, which is in
the 'catspec' package on CRAN by John Hendrickx. This will do multi-way
tables with summary row/col statistics.

There is also the CrossTable() function in the 'gmodels' package on
CRAN, though this will only do one way and two way tables, with summary
row/col statistics.

Neither of the above provide typical summary output for continuous data.
They are more for categorical variables.

For simple count multi-way output, you can also look at the ftable()
function which is in the base package. Also look at the summary()
function in the base package which will provide range, mean, quantile
data for continuous variables. It may just be a matter of formatting the
output in a fashion that you desire on a post analysis basis.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] ylim for graphic

2005-08-29 Thread Marc Schwartz (via MN)
On Mon, 2005-08-29 at 12:58 -0400, Doran, Harold wrote:
> Dear list:
> 
> I have some data for which I am generating a series of barplots for
> percentages. One issue that I am dealing with is that I am trying to get
> the legend to print in a fixed location for each chart generated by the
> data. Because these charts are being created in a loop, with different
> data, my code searches the data to identify the maximum value in the
> data and then print the data values 5 points above the tallest bar. 
> 
> Now, in situations where the largest value is 100, I needed to create
> the y-axis high enough to accommodate the legend w/o crowding up the
> data or the bars. So, I have ylim =c(0,200) and then place the legend at
> c(2,150). Visually, this places things exactly where I want them. But,
> it seems silly to have a y-axis labeled as high as 200%. I'm certain
> there is a smarter technique. Is it possible to place the legend at a
> location higher than 100 to avoid the crowding of the bars and the data
> and then also have the labels for the y-axis not print after the value
> 100%? 
> 
> In looking at ?legend I didn't see an option that would address this.
> Below is some code that you can use to create a similar chart.
> 
> Thanks,
> Harold
> 
> 
> math.bar <- c(53,31,55,28,55,100)
> apmxpmeet <- c(47, 50, 49, 50, 49, 46)
> par(ps=10)
> math.bar <- rbind(math.bar, apmxpmeet)
> math.barplot <- barplot(math.bar, beside=T, col=c('blue','orange'),
> names=c('Grade \n 3','Grade \n 4','Grade \n 5','Grade \n 6','Grade \n
> 7','Grade \n 8'), 
> ylim=c(0,200), ylab="Percentage", xlab="Grade Level")
> tot <- round(math.bar,digits=0)
> graph.max <- max(math.bar, apmxpmeet, na.rm=T)
> text(math.barplot, graph.max+5, tot, xpd = TRUE, col = c("blue",
> "orange") )
> legend(2,150,legend=(c("Label A", "Average")), fill=c("blue","orange"))

Harold,

A few thoughts:

1. Instead of fixing the y axis max value at 200, simply set ylim to
c(0, max(math.bar * 1.2)) or a similar constant. In this case, you get
an extra 20% above the max(y) value for the legend placement.


2. In the legend call, use:

  legend("topleft", legend=(c("Label A", "Average")),
  fill = c("blue","orange"))

This will place the legend at the topleft of the plot region, rather
than you having to calculate the x and y coords. If you want it moved in
from the upper left hand corner, you can use the 'inset' argument as
well, which moves the legend by a proportion of the plot region limits
(0 - 1):

legend("topleft", legend=(c("Label A", "Average")),
  fill = c("blue","orange"), inset = .1)


3. You can use barplot(..., yaxt = "n") to have the y axis not drawn and
then use axis() to place the labels at locations of your choosing, which
do not need to run the full length of the axis range:

 barplot(1:5, yaxt = "n", ylim = c(0, 10))
 axis(2, at = 0:5)


4. You can place the legend outside the plot region, enabling you to
keep the y axis range to <=100. This would need some tweaking, but the
idea is the same:

 # Increase the size of the top margin
 par(mar = c(5, 4, 8, 2) + 0.1)

 # Draw a barplot
 barplot(1:5)

 # Disable clipping outside the plot region
 par(xpd = TRUE)

 # Now draw the legend, but move it up by 30% from the top left
 legend("topleft", legend = LETTERS[1:5], inset = c(0, -.3))


You could also place the legend to the right or left of the plot region
if you prefer, adjusting the above accordingly.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Unpaste Problem

2005-08-26 Thread Marc Schwartz (via MN)
On Fri, 2005-08-26 at 22:39 +0530, A Mani wrote:
> Hello,
> Easy ways to "unpaste"?
>  xp <- paste(x2, x3) # x2, x3 are two non-numeric columns.
> .
> .
> xfg <- data.frame(xp,sc1, sc2, sc3) # sc1,sc2, sc3 are numeric cols.
> 
> I want xp to be split up to form a new dataframe of the form (x3, sc1,
> sc2, sc3).
> IMPORTANT info : elements of xp have the form abcefg, with abc
> in x2 and efg in x3.
> 
> Thanks in advance, 


I think I understand what you are trying to do. Hopefully the below may
be helpful:

# Create the data frame with 3 rows
x2 <- letters[1:3]
x3 <- LETTERS[1:3]
xp <- paste(x2, x3)

sc1 <- rnorm(3)
sc2 <- rnorm(3)
sc3 <- rnorm(3)

xfg <- data.frame(xp, sc1, sc2, sc3)

> xfg
   xpsc1sc2sc3
1 a A  1.3479123 -1.0642578  0.2479218
2 b B -0.1586587  1.1237456 -1.3952176
3 c C  2.7807484 -0.9778066 -1.9322279


# Use strsplit() here to break apart 'xp' using " " as the split
# Use "[" in sapply() to get the second (2) element from each
# 'xp' list pair. Note that I use as.character() here, since xfg$xp is 
# a factor.
# See the output of: strsplit(as.character(xfg$xp), " ")
# for some insight into this approach

xp.split <- sapply(strsplit(as.character(xfg$xp), " "), "[", 2)


# show post split values
> xp.split
[1] "A" "B" "C"


# Now cbind it all together into a new data frame
# don't include column 1 from xfg (xp)
xfg.new <- cbind(xp.split, xfg[, -1])


> xfg.new
  xp.splitsc1sc2sc3
1A  1.3479123 -1.0642578  0.2479218
2B -0.1586587  1.1237456 -1.3952176
3C  2.7807484 -0.9778066 -1.9322279


See ?strsplit for more information.
 
HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Matrix oriented computing

2005-08-26 Thread Marc Schwartz (via MN)
Prof. Ripley,

Excellent point. Neither sapply() nor outer() are the "elephant in the
room" in this situation.



On Fri, 2005-08-26 at 16:55 +0100, Prof Brian Ripley wrote:
> Try profiling.  Doing this many times to get an overview, e.g. for sapply 
> with df=1:1000:
> 
> %   self%   total
>   self secondstotalsecondsname
>   98.26  6.78 98.26  6.78 "FUN"
>0.58  0.04  0.58  0.04 "unlist"
>0.29  0.02  0.87  0.06 "as.vector"
>0.29  0.02  0.58  0.04 "names<-"
>0.29  0.02  0.29  0.02 "names<-.default"
>0.29  0.02  0.29  0.02     "names"
> 
> so almost all the time is in qchisq.
> 
> 
> On Fri, 26 Aug 2005, Marc Schwartz (via MN) wrote:
> 
> > On Fri, 2005-08-26 at 15:25 +0200, Peter Dalgaard wrote:
> >> Marc Schwartz <[EMAIL PROTECTED]> writes:
> >>
> >>> x <- c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9,
> >>>0.95, 0.975, 0.99, 0.995)
> >>>
> >>> df <- c(1:100)
> >>>
> >>> mat <- sapply(x, qchisq, df)
> >>>
> >>>> dim(mat)
> >>> [1] 100  11
> >>>
> >>>> str(mat)
> >>>  num [1:100, 1:11] 3.93e-05 1.00e-02 7.17e-02 2.07e-01 4.12e-01 ...
> >>
> >> outer() is perhaps a more natural first try... It does give the
> >> transpose of the sapply approach, though.
> >>
> >> round(t(outer(x,df,qchisq)),2)
> >>
> >> should be close. You should likely add dimnames.
> >
> >
> >
> > What I find interesting, is that I would have intuitively expected
> > outer() to be faster than sapply().  However:
> >
> >
> >>  system.time(mat <- sapply(x, qchisq, df), gcFirst = TRUE)
> > [1] 0.01 0.00 0.01 0.00 0.00
> >
> >>  system.time(mat1 <- round(t(outer(x, df, qchisq)), 2),
> >   gcFirst = TRUE)
> > [1] 0.01 0.00 0.01 0.00 0.00
> >
> > # No round() or t() to test for overhead
> >>  system.time(mat2 <- outer(x, df, qchisq), gcFirst = TRUE)
> > [1] 0.01 0.00 0.02 0.00 0.00
> >
> >
> > # Bear in mind the round() on mat1 above
> >> all.equal(mat, mat1)
> > [1] "Mean relative  difference: 4.905485e-05"
> >
> >> all.equal(mat, t(mat2))
> > [1] TRUE
> >
> >
> > Even when increasing the size of 'df' to 1:1000:
> >
> >
> >>  system.time(mat <- sapply(x, qchisq, df), gcFirst = TRUE)
> > [1] 0.16 0.01 0.16 0.00 0.00
> >
> >>  system.time(mat1 <- round(t(outer(x, df, qchisq)), 2), gcFirst =
> > TRUE)
> > [1] 0.16 0.00 0.18 0.00 0.00
> >
> >>  # No round() or t() to test for overhead
> >>  system.time(mat2 <- outer(x, df, qchisq), gcFirst = TRUE)
> > [1] 0.16 0.01 0.17 0.00 0.00
> >
> >
> >
> > It also seems that, at least in this case, t() and round() do not add
> > much overhead.
> 
> Definitely not for such small matrices.


True and both are C functions, which of course helps as well.

Best regards,

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Matrix oriented computing

2005-08-26 Thread Marc Schwartz (via MN)
On Fri, 2005-08-26 at 15:25 +0200, Peter Dalgaard wrote:
> Marc Schwartz <[EMAIL PROTECTED]> writes:
> 
> > x <- c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 
> >0.95, 0.975, 0.99, 0.995)
> > 
> > df <- c(1:100)
> > 
> > mat <- sapply(x, qchisq, df)
> >
> > > dim(mat)
> > [1] 100  11
> >  
> > > str(mat)
> >  num [1:100, 1:11] 3.93e-05 1.00e-02 7.17e-02 2.07e-01 4.12e-01 ...
> 
> outer() is perhaps a more natural first try... It does give the
> transpose of the sapply approach, though. 
> 
> round(t(outer(x,df,qchisq)),2)
> 
> should be close. You should likely add dimnames.



What I find interesting, is that I would have intuitively expected
outer() to be faster than sapply().  However:


>  system.time(mat <- sapply(x, qchisq, df), gcFirst = TRUE)
[1] 0.01 0.00 0.01 0.00 0.00
 
>  system.time(mat1 <- round(t(outer(x, df, qchisq)), 2), 
   gcFirst = TRUE)
[1] 0.01 0.00 0.01 0.00 0.00
 
# No round() or t() to test for overhead
>  system.time(mat2 <- outer(x, df, qchisq), gcFirst = TRUE)
[1] 0.01 0.00 0.02 0.00 0.00


# Bear in mind the round() on mat1 above
> all.equal(mat, mat1)
[1] "Mean relative  difference: 4.905485e-05"

> all.equal(mat, t(mat2))
[1] TRUE


Even when increasing the size of 'df' to 1:1000:


>  system.time(mat <- sapply(x, qchisq, df), gcFirst = TRUE)
[1] 0.16 0.01 0.16 0.00 0.00

>  system.time(mat1 <- round(t(outer(x, df, qchisq)), 2), gcFirst =
TRUE)
[1] 0.16 0.00 0.18 0.00 0.00
 
>  # No round() or t() to test for overhead
>  system.time(mat2 <- outer(x, df, qchisq), gcFirst = TRUE)
[1] 0.16 0.01 0.17 0.00 0.00



It also seems that, at least in this case, t() and round() do not add
much overhead.

Best regards,

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] help on retrieving output from by( ) for regression

2005-08-25 Thread Marc Schwartz (via MN)
Sorry to reply to my own post, but in reviewing the NAMESPACE file for
lme4, it looks like Doug is perhaps planning to add additional model
object accessor methods for the lmList class, including resid() and
summary(), which are commented out now. coef() is available presently. 

So when in place, it would appear that the use of lmList would be
advantageous over the use of by().

Marc

On Thu, 2005-08-25 at 11:52 -0500, Marc Schwartz (via MN) wrote:
> That works also (using the example in ?lmList)
> 
> library(lme4)
> 
> ?lmList
> 
> fm1 <- lmList(breaks ~ wool | tension, warpbreaks)
> 
> 
> However, one still would need to use either sapply() or lapply() as
> below to get the details that Krishna is looking for. 
> 
> 'fm1' above is a list of models (S4 class 'lmList') not overly different
> from 'tmp' below, which is a list of models (S3 class 'by').
> 
> If you review str(fm1) and str(tmp), you would note that they are
> virtually identical, save the use of slots, etc.
> 
> HTH,
> 
> Marc Schwartz
> 
> On Thu, 2005-08-25 at 12:17 -0400, Randy Johnson wrote:
> > What about using lmList from the lme4 package?
> > 
> > Randy
> > 
> > 
> > On 8/25/05 9:44 AM, "Marc Schwartz" <[EMAIL PROTECTED]> wrote:
> > 
> > > Also, looking at the last example in ?by would be helpful:
> > > 
> > > attach(warpbreaks)
> > > tmp <- by(warpbreaks, tension, function(x) lm(breaks ~ wool, data = x))
> > > 
> > > # To get coefficients:
> > > sapply(tmp, coef)
> > > 
> > > # To get residuals:
> > > sapply(tmp, resid)
> > > 
> > > # To get the model matrix:
> > > sapply(tmp, model.matrix)
> > > 
> > > 
> > > 
> > > To get the summary() output, I suspect that using:
> > > 
> > >   lapply(tmp, summary)
> > > 
> > > would yield more familiar output as compared to using:
> > > 
> > >   sapply(tmp, summary)
> > > 
> > > The output from the latter might require a bit more "navigation" through
> > > the resultant matrix, depending upon how the output is to be ultimately
> > > used.
> > > 
> > > HTH,
> > > 
> > > Marc Schwartz
> > > 
> > > 
> > > 
> > > On Thu, 2005-08-25 at 14:57 +0200, TEMPL Matthias wrote:
> > >> Look more carefully at
> > >> ?lm 
> > >> at the See Also section ...
> > >> 
> > >> X <- rnorm(30)
> > >> Y <- rnorm(30)
> > >> lm(Y~X)
> > >> summary(lm(Y~X))
> > >> 
> > >> Best,
> > >> Matthias
> > >> 
> > >> 
> > >>> Hi all 
> > >>> 
> > >>> I used a function
> > >>>> qtrregr <- by(AB, AB$qtr, function(AB) lm(AB$X~AB$Y))
> > >>> 
> > >>> objective is to run a regression on quartery subsets in the
> > >>> data set AB, having variables X and Y, grouped by variable qtr.
> > >>> 
> > >>> Now i retrieved the output using qtrregr, however it only
> > >>> showed the coefficients (intercept and B) with out
> > >>> significant levels and residuals for each qtr. Can some on
> > >>> help me on how can retrieve the detailed regression output.
> > >>> 
> > >>> rgds
> > >>> 
> > >>> snvk
> > >>> 
> > > 
> > > __
> > > R-help@stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide! 
> > > http://www.R-project.org/posting-guide.html
> > > 
> > 
> > ~~
> > Randy Johnson
> > Laboratory of Genomic Diversity
> > NCI-Frederick
> > Bldg 560, Rm 11-85
> > Frederick, MD 21702
> > (301)846-1304
> > ~~
> > 
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> > http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] help on retrieving output from by( ) for regression

2005-08-25 Thread Marc Schwartz (via MN)
That works also (using the example in ?lmList)

library(lme4)

?lmList

fm1 <- lmList(breaks ~ wool | tension, warpbreaks)


However, one still would need to use either sapply() or lapply() as
below to get the details that Krishna is looking for. 

'fm1' above is a list of models (S4 class 'lmList') not overly different
from 'tmp' below, which is a list of models (S3 class 'by').

If you review str(fm1) and str(tmp), you would note that they are
virtually identical, save the use of slots, etc.

HTH,

Marc Schwartz

On Thu, 2005-08-25 at 12:17 -0400, Randy Johnson wrote:
> What about using lmList from the lme4 package?
> 
> Randy
> 
> 
> On 8/25/05 9:44 AM, "Marc Schwartz" <[EMAIL PROTECTED]> wrote:
> 
> > Also, looking at the last example in ?by would be helpful:
> > 
> > attach(warpbreaks)
> > tmp <- by(warpbreaks, tension, function(x) lm(breaks ~ wool, data = x))
> > 
> > # To get coefficients:
> > sapply(tmp, coef)
> > 
> > # To get residuals:
> > sapply(tmp, resid)
> > 
> > # To get the model matrix:
> > sapply(tmp, model.matrix)
> > 
> > 
> > 
> > To get the summary() output, I suspect that using:
> > 
> >   lapply(tmp, summary)
> > 
> > would yield more familiar output as compared to using:
> > 
> >   sapply(tmp, summary)
> > 
> > The output from the latter might require a bit more "navigation" through
> > the resultant matrix, depending upon how the output is to be ultimately
> > used.
> > 
> > HTH,
> > 
> > Marc Schwartz
> > 
> > 
> > 
> > On Thu, 2005-08-25 at 14:57 +0200, TEMPL Matthias wrote:
> >> Look more carefully at
> >> ?lm 
> >> at the See Also section ...
> >> 
> >> X <- rnorm(30)
> >> Y <- rnorm(30)
> >> lm(Y~X)
> >> summary(lm(Y~X))
> >> 
> >> Best,
> >> Matthias
> >> 
> >> 
> >>> Hi all 
> >>> 
> >>> I used a function
>  qtrregr <- by(AB, AB$qtr, function(AB) lm(AB$X~AB$Y))
> >>> 
> >>> objective is to run a regression on quartery subsets in the
> >>> data set AB, having variables X and Y, grouped by variable qtr.
> >>> 
> >>> Now i retrieved the output using qtrregr, however it only
> >>> showed the coefficients (intercept and B) with out
> >>> significant levels and residuals for each qtr. Can some on
> >>> help me on how can retrieve the detailed regression output.
> >>> 
> >>> rgds
> >>> 
> >>> snvk
> >>> 
> > 
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> > http://www.R-project.org/posting-guide.html
> > 
> 
> ~~
> Randy Johnson
> Laboratory of Genomic Diversity
> NCI-Frederick
> Bldg 560, Rm 11-85
> Frederick, MD 21702
> (301)846-1304
> ~~
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Remove NAs from Barplot

2005-08-24 Thread Marc Schwartz (via MN)
On Wed, 2005-08-24 at 13:06 -0400, Doran, Harold wrote:
> Dear List:
> 
> I'm creating a series of barplots using Sweave that must assume a
> standard format. This is student achievement data and the x-axis must
> include all grades 3 to 8. In some cases, the data for a grade (or more
> than one grade) are missing in the vector math.bar, but are never
> missing for the vector apmxpmeet. The following sample code illustrates
> the issue.
> 
> Using the code below to create the plot works fine. However, the
> following command is designed to place the data onto the plot:
> 
> text(math.barplot, graph.max+5, format(tot), xpd = TRUE, col = c("blue",
> "orange") )
> 
> This does work, but, it also places 'NA's on the plot. Is there a way to
> place the data onto the plot such that the numbers appear, but for the
> NAs not to appear?  I've searched through ?text but was not able to find
> a solution. 
> 
> math.bar <- c(NA,NA,NA,NA,55,48)
> apmxpmeet <- c(47, 50, 49, 50, 49, 46)
> 
> par(ps=10)
> math.bar <- rbind(math.bar, apmxpmeet)
> math.barplot <- barplot(math.bar, beside=T, col=c('blue','orange'),
> names=c('Grade \n 3','Grade \n 4','Grade \n 5','Grade \n 6','Grade \n
> 7','Grade \n 8'), 
> ylim=c(0,120), ylab="Percentage", xlab="Grade Level")
> tot <- round(math.bar,digits=0)
> graph.max <- max(math.bar, apmxpmeet, na.rm=T)
> text(math.barplot, graph.max+5, format(tot), xpd = TRUE, col = c("blue",
> "orange") )
> tmp$sch_nameS05 <- as.character(tmp$sch_nameS05)
> legend(2,120,legend=(c(tmp$sch_nameS05, "Average")),
> fill=c("blue","orange"))
> 
> Thanks,
> Harold


Harold,

Get rid of the 'format(tot)' in your text() call:

text(math.barplot, graph.max+5, tot, xpd = TRUE, 
 col = c("blue", "orange"))


By using format(tot), you are coercing the values to text. Thus:

> format(tot)
  [,1] [,2] [,3] [,4] [,5] [,6]
math.bar  "NA" "NA" "NA" "NA" "55" "48"
apmxpmeet "47" "50" "49" "50" "49" "46"

versus:

> tot
  [,1] [,2] [,3] [,4] [,5] [,6]
math.barNA   NA   NA   NA   55   48
apmxpmeet   47   50   49   50   49   46


In the first case, text() will draw "NA", whereas in the second, the NA
is ignored.

Also, your code above is not entirely reproducible, since:

> tmp$sch_nameS05 <- as.character(tmp$sch_nameS05)
Error: Object "tmp" not found


HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Seeking help with an apparently simple recoding problem

2005-08-23 Thread Marc Schwartz (via MN)
On Tue, 2005-08-23 at 10:12 -0500, Greg Blevins wrote:
> Hello,
> 
> I have struggled, for longer than I care to admit, with this seemingly
> simple problem, but I cannot find a solution other than the use of
> long drawn out ifelse statements.  I know there has to be a better
> way.  Here is stripped down version of the situation:
> 
> I start with:
> a <- c(1,0,1,0,0,0,0)
> b <- c(1,1,1,1,0,0,0)
> c <- c(1,1,0,1,0,0,0)
> 
> rbind(a,b,c)
>   [,1] [,2] [,3] [,4] [,5] [,6] [,7]
> a1010000
> b1111000
> c1101000
> 
> I refer to column 3 as the target column, which at the end of the day
> will be NA in all instances.
> 
> The logic involved:
> 
> 1) If columns 2, 4 thru 7 do NOT include at least one '1', then recode
> columns 2 thru 7 to NA and recode column 1 to code 2.
> 
> 2) If columns 2, 4 thru 7 contain at least one '1', then recode column
> 3 to NA.
> 
> Desired recoding of the above three rows:
>   [,1][,2][,3][,4][,5][,6][,7]
> a2NA  NA  NA  NA  NA  NA
> b11   NA  1   0   0   0
> c11   NA  1   0   0   0
> 
> Thanks you.


You left out one key detail in the explanation, which is that the
recoding appears to be done on a row by row basis, not overall.

The following gets the job done, though there may be a more efficient
approach:

> a <- c(1,0,1,0,0,0,0)
> b <- c(1,1,1,1,0,0,0)
> c <- c(1,1,0,1,0,0,0)
 
> d <- rbind(a, b, c)
 
> d
  [,1] [,2] [,3] [,4] [,5] [,6] [,7]
a1010000
b1111000
c1101000
 
 
> mod.row <- function(x)
 {
   if (all(x[c(2, 4:7)] == 0))
   {
 x[2:7] <- NA
 x[1] <- 2
   } else {
   x[3] <- NA
   }
 
   x
 }
 
> y <- t(apply(d, 1, mod.row))

> y
  [,1] [,2] [,3] [,4] [,5] [,6] [,7]
a2   NA   NA   NA   NA   NA   NA
b11   NA1000
c11   NA1000


HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] plotting with same axes

2005-08-22 Thread Marc Schwartz (via MN)
On Mon, 2005-08-22 at 12:58 -0400, [EMAIL PROTECTED] wrote:
> I have used the 'par' command to 
> overlay one plot on another.  But how 
> do I overlay it with the x-values 
> plotted at the same points on the 
> x-axis?
> 
> Thank you,
> Steven

The specific answer depends to an extent on the graphics you are
plotting and whether there is a need to actually use par("new").

In some cases there is an 'add = TRUE' option to plotting functions that
allows for this. 

In others, you can create the initial plot and then use lines(),
points() or the like to simply add further plotting components to the
existing plot.

If all else fails and you need to use par("new"), then you will want to
explicitly define the x and y axes to the same ranges by using the same
'xlim' and 'ylim' arguments in _both_ plotting functions. 

A reproducible example of what it is you are doing would enable us to
give you more specific guidance, if the above is not helpful.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Copying rows from a matrix using a vector of indices

2005-08-17 Thread Marc Schwartz (via MN)
On Wed, 2005-08-17 at 12:02 -0700, Martin Lam wrote:
> Hi,
> 
> I am trying to use a vector of indices to select some
> rows from a matrix. But before I can do that I somehow
> need to convert 'combinations' into a list, since
> 'mode(combinations)' says it's 'numerical'. Any idea
> how I can do that?
> 
> library("combinat")
> 
> combinations <- t(combn(8,2))
> 
> indices <- c(sample(1:length(combinations),10))
> 
> # convert
> ???
> 
> subset <- combinations[indices]
> 
> Thanks in advance,
> 
> Martin


Your goal is not entirely clear.  Given your code above, you end up
with:

> library("combinat")
> 
> combinations <- t(combn(8,2))
> combinations
  [,1] [,2]
 [1,]12
 [2,]13
 [3,]14
 [4,]15
 [5,]16
 [6,]17
 [7,]18
 [8,]23
 [9,]24
[10,]25
[11,]26
[12,]27
[13,]28
[14,]34
[15,]35
[16,]36
[17,]37
[18,]38
[19,]45
[20,]46
[21,]47
[22,]48
[23,]56
[24,]57
[25,]58
[26,]67
[27,]68
[28,]78

combinations is a 28 x 2 matrix (or a 1d vector of length 56).

Your use of sample() with '1:length(combinations)' is the same as 1:56.

> length(combinations)
[1] 56

Hence, you end up with:

> set.seed(128)
> indices <- sample(1:length(combinations), 10)

> indices
 [1] 41 53 38 46 50 44 25 11 43  7


I suspect that you really want to use '1:nrow(combinations)', which
yields 1:28.

> nrow(combinations)
[1] 28


Thus,

> set.seed(128)
> indices <- sample(1:nrow(combinations), 10)

> indices
 [1] 21 26 18 22 23 20 11  5 27  3


Now, you can use:

> subset <- combinations[indices, ]
> subset
  [,1] [,2]
 [1,]47
 [2,]67
 [3,]38
 [4,]48
 [5,]56
 [6,]46
 [7,]26
 [8,]16
 [9,]68
[10,]14

which yields the rows from combinations, defined by 'indices'.

If correct, please review "An Introduction to R" and ?"[" for more
information on subsetting objects in R.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] recategorizing a vector into discrete states

2005-08-16 Thread Marc Schwartz (via MN)
On Tue, 2005-08-16 at 13:18 -0400, Allan Strand wrote:
> Hi All,
> 
> I'm trying to take a numerical vector  and  produce a new vector of the 
> same length where each element in the first is placed into a category 
> given by a 'breaks-like' vector. The values in the result should equal 
> the lower bounds of each category as defined in the breaks vector.
> 
>   I suspect that a vectorized solution  is pretty simple, but I can't 
> seem to figure it out today.  Here is an example of my problem:
> 
> Vector 'a' is the original vector.  Vector 'b' gives the lower bounds 
> of the categories.  Vector 'c' is the result I am seeking.
> 
> a <- c(0.9, 11, 1.2, 2.4, 4.0, 5.0, 7.3, 8.1, 3.3, 4.5)
> b <- c(0, 2, 4, 6, 8)
> 
> c <- c(0, 8, 0, 2, 4, 4, 6, 8, 2, 4)
> 
> Any suggestions would be greatly appreciated.
> 
> cheers,
> Allan Strand

See ?cut

> a <- c(0.9, 11, 1.2, 2.4, 4.0, 5.0, 7.3, 8.1, 3.3, 4.5)
> b <- c(0, 2, 4, 6, 8)

> cut(a, c(b, Inf), labels = b, right = FALSE)
 [1] 0 8 0 2 4 4 6 8 2 4
Levels: 0 2 4 6 8

Note that cut() returns a factor.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] use different symbols for frequency in a plot

2005-08-08 Thread Marc Schwartz (via MN)
On Mon, 2005-08-08 at 11:57 -0700, Kerry Bush wrote:
> suppose I have the following data
> 
> x<-c(rep(.1,5),rep(.2,6),rep(.4,10),rep(.5,20))
> y<-c(rep(.5,3),rep(.6,8),rep(1.2,8),rep(2.5,18),rep(3,4))
> 
> If I plot(x,y) in R, I will only get seven distinct
> points. What I want to do is to use different symbols
> to show the frequency at each point.
> 
> e.g. if the frequncey is between 1 and 5, then I plot
> the point as a circle; if the frequency is between 6
> and 10, then I plot the point as a square; if the
> frequency is above 10, then I plot the point as a
> triangle.
> 
> I am not sure how to do this in R. Can anybody help me?

You might want to review this recent post by Deepayan Sarkar:

https://stat.ethz.ch/pipermail/r-help/2005-July/074042.html

with modest modification you can replace his example, which plots the
frequencies with:

x <- c(rep(.1,5),rep(.2,6),rep(.4,10),rep(.5,20))
y <- c(rep(.5,3),rep(.6,8),rep(1.2,8),rep(2.5,18),rep(3,4))

temp <- data.frame(x, y)

foo <- subset(as.data.frame(table(temp)), Freq > 0)

> foo
 x   y Freq
1  0.1 0.53
5  0.1 0.62
6  0.2 0.66
11 0.4 1.28
15 0.4 2.52
16 0.5 2.5   16
20 0.5   34

# Use cut() to create the bins and specify the plotting symbols
# for each bin, which are the 'label' values
foo$sym <- with(foo, cut(Freq, c(0, 5, 10, Inf), 
 labels = c(21, 22, 24))) 


# convert 'foo' to all numeric from factors above for plotting
foo <- apply(foo, 2, function(x) as.numeric(as.character(x)))


> foo
 x   y Freq sym
1  0.1 0.53  21
5  0.1 0.62  21
6  0.2 0.66  22
11 0.4 1.28  22
15 0.4 2.52  21
16 0.5 2.5   16  24
20 0.5   34  21


# Now do the plot. Keep in mind that 'foo' is now
# a matrix, rather than a data frame
plot(foo[, "x"], foo[, "y"], pch = foo[, "sym"])


HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] elegant solution to transform vector into percentages?

2005-07-26 Thread Marc Schwartz (via MN)
On Tue, 2005-07-26 at 15:48 -0400, [EMAIL PROTECTED] wrote:
> Hi,
> 
> I am looking for an elegant way to transform a vector into percentages of 
> values
> that meet certain criteria.
> 
> store<-c(1,1.4,3,1.1,0.3,0.6,4,5)
> 
> # now I want to get the precentages of values
> # that fall into the categories <=M , >M & <=N , >N
> # let
> M <-.8
> N <- 1.2
> # In my real example I have many more of these cutoff-points
> 
> # What I did is:
> 
> out <- matrix(NA,1,3)
> 
>   out[1,1] <- ( (sum(store<=M))  /length(store) )*100
>   out[1,2] <- ( (sum(store> M  & store<= N   ))  /length(store) )*100
>   out[1,3] <- ( (sum(store> N))  /length(store) )*100
> 
> colnames(out)<-c("percent<=M","percent>M & <=N","percent>N")
> out
> 
> But this gets very tedious if I have many cutoff-points. Does anybody know a
> more elegant way to do this task?
> 
> Thanks so much.
> 
> Cheers,
> Jens

Something alone the lines of:

store <- c(1, 1.4, 3, 1.1, 0.3, 0.6, 4, 5)
M <- 0.8
N <- 1.2

x <- hist(store, br = c(min(store), M, N, max(store)), 
  plot = FALSE)$counts  

pct.x <- prop.table(x) * 100

names(pct.x) <- c("percent <= M","percent > M & <= N","percent > N")

> pct.x
  percent <= M percent > M & <= Npercent > N 
25 25 50 


I think that should do it. See ?hist for more information and take note
of the 'include.lowest' and 'right' arguments relative to whether or not
values are or are not included in the specified intervals.

See ?prop.table as well.

Also be acutely aware of potential problems with exact equality
comparisons with floating point numbers and the break points...if you
have a float value equal to a breakpoint in your vector.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] plotting horizontally

2005-07-26 Thread Marc Schwartz (via MN)
On Tue, 2005-07-26 at 08:58 +0700, [EMAIL PROTECTED] wrote:
> Hello,
> 
> Is there any way to use plot() horizontally similar to
> boxplot(., horiz=TRUE)?  I want to use to illustrate
> the distribution of y-values on an adjacent plot using
> layout().
> 
> Thanks in advance for any help.
> 
> Regards,
> 
> --Dan

Have you looked at the last example in ?layout, which has a scatterplot
with marginal histograms for the x and y axes?

Alternatively, you can generally reverse the x and y values for most
plots. For example, compare:

 d <- density(rnorm(100))

 # Vertical density plot
 plot(d)

 # Horizontal density plot 
 plot(d$y, d$x, type = "l", xlab = "Density", 
  main = "density(y = rnorm(100))",
  ylab = paste("N =", d$n, "  Bandwidth =", formatC(d$bw)))


HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Question about 'text' (add lm summary to a plot)

2005-07-22 Thread Marc Schwartz (via MN)
Ok guys,

So I played around with this a bit, going back to Dan's original
requirements and using Thomas' do.call() approach with legend(). Gabor's
approach using sapply() will also work here. I have the following:

# Note the leading spaces here for alignment in the table
# This could be automated with formatC() or sprintf()
my.slope.1 <-  "  3.22"
my.slope.2 <-  "0.13"
my.inter.1 <-  "-10.66"
my.inter.2 <-  "1.96"
my.Rsqua <-"  0.97"

plot(1:5)

L <- list("Intercept:",
  "Slope:",
  bquote(paste(R^2, ":")),
  bquote(.(my.inter.1) %+-% .(my.inter.2)),
  bquote(.(my.slope.1) %+-% .(my.slope.2)),
  bquote(.(my.Rsqua)))

par(family = "mono")

legend("topleft", legend = do.call("expression", L), ncol = 2)



Note however, that while using the mono font helps with vertical
alignment of numbers, the +/- sign still comes out in the default font,
which is bold[er] than the text.

If one uses the default font, which is variable spaced, it is
problematic to get the proper alignment for the numbers. I even tried
using phantom(), but that didn't quite get it, since the spacing is
variable, as opposed to LaTeX's mono numeric spacing with default fonts.

Also, note that I am only using two columns, rather than three, since
trying to place the ":" as a middle column results in spacing that is
too wide, given that the text.width argument is a scalar and is set to
the maximum width of the character vectors.

Note also that even with mono spaced fonts, the exponent in R^2 is still
horizontally smaller than the other characters. Thus, spacing on that
line may also be affected depending upon what else one might attempt.

Not sure where else to go from here.

HTH,

Marc Schwartz

On Fri, 2005-07-22 at 14:01 -0400, Gabor Grothendieck wrote:
> You are right.   One would have to use do.call as you did
> or the sapply method of one of my previous posts:
> 
> a <- 7
> plot(1)
> L <- list(bquote(alpha==.(a)),bquote(alpha^2+1==.(a^2+1)))
> legend("topleft",legend=sapply(L, as.expression))
> 
> 
> On 7/22/05, Thomas Lumley <[EMAIL PROTECTED]> wrote:
> > On Fri, 22 Jul 2005, Gabor Grothendieck wrote:
> > >
> > > I think legend accepts a list argument directly so that could be
> > > simplified to just:
> > >
> > >  a<-7
> > >  plot(1)
> > >  L <- list(bquote(alpha==.(a)),bquote(alpha^2+1==.(a^2+1)))
> > >  legend("topleft",legend=L)
> > 
> > Except that it wouldn't then work: the mathematical stuff comes out as
> > text.
> > 
> > > The same comment seems to apply to my prior suggestion about
> > > as.expression(bquote(...)), namely that one can just write the
> > > following as text also supports a list argument:
> > 
> > And this doesn't work either: you end up with %+-% rather than the
> > plus-or-minus symbol.
> > 
> > The reason I gave the do.call() version is that I had tried these simpler
> > versions and they didn't work.
> > 
> >-thomas
> >

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] problems with submitting an eps-file created in R

2005-07-22 Thread Marc Schwartz (via MN)
On Fri, 2005-07-22 at 16:26 +0200, Christian Bieli wrote:
> Dear all
> 
> I've got some problems submitting a manuscript, because I can't
> manage 
> creating the favourable eps-file of a graph created in R. The
> journal's 
> graphic requirements are as followed:
> format: eps
> width: max. 6 inches
> resolution: min. 1000 dpi
> supported fonts: Arial, Courier, Helvetica, Symbol, Times, Charcoal, 
> Chicago, Geneva, Georgia, Monaco, Zapf, New York
> 
> Itried to ways of getting appropriate file:
> 
> 1.Creating eps-file in R by drawing into a x11-device and then:
> /dev.copy2eps(file = "file.eps", onefile = TRUE, paper = "a4", family
> = 
> "Helvetica",  pointsize=1, print.it = FALSE,  fonts = "Helvetica")
> 
> /2. Generating a postscript-file in R with /
> //postscript(file = "file.ps", onefile = TRUE, paper = "a4", family = 
> "Helvetica", width = 6, height = 2.2, pointsize=1, print.it = FALSE, 
> fonts = "Helvetica")/
> an trying to convert it in ghostview.
> 
> Neither approach brought the favoured result. The error message I got 
> from the quality checking program was :
> 
> Warning: Document is Missing Non-Standard Font 
> 
> One or more non-standard fonts used in this image is not embedded.
> Standard fonts are: Arial, Courier, Helvetica, Symbol, Times,
> Charcoal, Chicago, Geneva, Georgia, Monaco, Zapf, New York.
> 
> In order to repair this problem, save your document with fonts
> embedded.
> 
> Error: Missing Fonts 
> 
> Challenge 
> One or more linked or used fonts cannot be found. This is caused by
> DigitalExpert not being able to locate fonts on your system. All fonts
> used in the document must be active or have a defined and valid path
> so that DigitalExpert can find them.
>  
> Solution 
>  The only way to repair this problem is to make the fonts available to
> DigitalExpert, and then reprocess the files. In order to make fonts
> active, either activate them using a font management program, or move
> them into the system:fonts folder.
> 
> 
> 
> 1. Obviously there's a font problem. Helvetica IS installed in my OS 
> (windows 2000). Why "DigitalExpert" does not recognize the
> "Helvetica" 
> font as a standard font? I thought the fonts option would embed the 
> specified font in the file. Am I right or is there another way to
> embed 
> fonts?
> 2. Is there a way to set up the resolution of the created ps/eps-file
> in 
> R (until now I did the settings in ghostview)?
> 3. I did not find particulars about the pointsize option. What is the 
> effect of changing the pointsize value?
> 
> With best regards
> Christian


You need to read the help for postscript(), which specifically tells you
in the Details section to use:

postscript(..., onefile = FALSE, horizontal = FALSE, paper = "special")

to generate EPS files.

There is no resolution setting for a postscript file per se, since PS is
a device independent vector based format and the resolution of the
output is dependent upon the target device/viewer. One exception to this
would be the embedding of a bitmapped object in a PS file, but that is
not applicable here.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Question about 'text' (add lm summary to a plot)

2005-07-21 Thread Marc Schwartz (via MN)
[Note: the initial posts have been re-arranged to attempt to maintain
the flow from top to bottom]

> >Dan Bolser writes:
> > > 
> > > I would like to annotate my plot with a little box containing the slope,
> > > intercept and R^2 of a lm on the data.
> > > 
> > > I would like it to look like...
> > > 
> > >  ++
> > >  | Slope :   3.45 +- 0.34 |
> > >  | Intercept : -10.43 +- 1.42 |
> > >  | R^2   :   0.78 |
> > >  ++
> > > 
> > > However I can't make anything this neat, and I can't find out how to
> > > combine this with symbols for R^2 / +- (plus minus).
> > > 
> > > Below is my best attempt (which is franky quite pour). Can anyone
> > > improve on the below?
> > > 
> > > Specifically, 
> > > 
> > > aligned text and numbers, 
> > > aligned decimal places, 
> > > symbol for R^2 in the text (expression(R^2) seems to fail with
> > > 'paste') and +- 
> > > 
> > > 
> > > 
> > > Cheers,
> > > Dan.
> > > 
> > > 
> > > dat.lm <- lm(dat$AVG_CH_PAIRS ~ dat$CHAINS)
> > > 
> > > abline(coef(dat.lm),lty=2,lwd=1.5)
> > > 
> > > 
> > > dat.lm.sum <- summary(dat.lm)
> > > dat.lm.sum
> > > 
> > > attributes(dat.lm.sum)
> > > 
> > > my.text.1 <-
> > >   paste("Slope : ", round(dat.lm.sum$coefficients[2],2),
> > > "+/-",  round(dat.lm.sum$coefficients[4],2))
> > > 
> > > my.text.2 <-
> > >   paste("Intercept : ", round(dat.lm.sum$coefficients[1],2),
> > > "+/-",  round(dat.lm.sum$coefficients[3],2))
> > > 
> > > my.text.3 <-
> > >   paste("R^2 : ",   round(dat.lm.sum$r.squared,2))
> > > 
> > > my.text.1
> > > my.text.2
> > > my.text.3
> > > 
> > > 
> > > ## Add legend
> > > text(x=3,
> > >  y=300,
> > >  paste(my.text.1,
> > >my.text.2,
> > >my.text.3,
> > >sep="\n"),
> > >  adj=c(0,0),
> > >  cex=1


> On Thu, 21 Jul 2005, Christoph Buser wrote:
> 
> >Dear Dan
> >
> >I can only help you with your third problem, expression and
> >paste. You can use:
> >
> >plot(1:5,1:5, type = "n")
> >text(2,4,expression(paste("Slope : ", 3.45%+-%0.34, sep = "")), pos = 4)
> >text(2,3.8,expression(paste("Intercept : ", -10.43%+-%1.42)), pos = 4)
> >text(2,3.6,expression(paste(R^2,": ", "0.78", sep = "")), pos = 4)

> >I do not have an elegant solution for the alignment.


On Thu, 2005-07-21 at 19:55 +0100, Dan Bolser wrote:
> Cheers for this.
> 
> I was trying to get it to work, but the problem is that I need to replace
> the values above with variables, from the following code...
> 
> 
> dat.lm <- lm(dat$AVG_CH_PAIRS ~ dat$CHAINS)
> dat.lm.sum <- summary(dat.lm)
> 
> my.slope.1 <- round(dat.lm.sum$coefficients[2],2)
> my.slope.2 <- round(dat.lm.sum$coefficients[4],2)
> 
> my.inter.1 <- round(dat.lm.sum$coefficients[1],2)
> my.inter.2 <- round(dat.lm.sum$coefficients[3],2)
> 
> my.Rsqua.1 <- round(dat.lm.sum$r.squared,2)
> 
> 
> Anything I try results in either the words 'paste("Slope:", my.slope.1,
> %+-%my.slope.2,sep="")' being written to the plot, or just
> 'my.slope.1+-my.slope2' (where the +- is correctly written).
> 
> I want to script it up and write all three lines to the plot with
> 'sep="\n"', rather than deciding three different heights.


> Thanks very much for what you gave, its a good start for me to figure out 
> how I am supposed to be telling R what to do!
> 
> Any way to just get fixed width fonts with text? (for the alignment
> problem)


Dan,

Here is one approach. It may not be the best, but it gets the job done.
You can certainly take this and encapsulate it in a function to automate
the text/box placement and to pass values as arguments.

A couple of quick concepts:

1. As far as I know, plotmath cannot do multiple lines, so each line in
your box needs to be done separately.

2. The horizontal alignment is a bit problematic when using expression()
or bquote() since I don't believe that multiple spaces are honored as
such after parsing. Thus I break up each component (label, ":" and
values) into separate text() calls. The labels are left justified.

3. The alignment for the numeric values are done with right
justification. So, as long as you use a consistent number of decimals in
the value outputs (2 here), you should be OK. This means you might need
to use formatC() or sprintf() to control the numeric output values on
either side of the +/- sign.

4. In the variable replacement, note the use of substitute() and the
list of x and y arguments as replacement values in the expressions.



# Set your values
my.slope.1 <- 3.45
my.slope.2 <- 0.34

my.inter.1 <- -10.43
my.inter.2 <- 1.42

my.Rsqua <- 0.78


# Create the initial plot as per Christoph's post
plot(1:5, 1:5, type = "n")


#-
# Do the Slope
#-

text(1, 4.5,  "Slope", pos = 4)
text(2, 4.5, ":")
text(3, 4.5, substitute(x %+-% y, 
list(x = my.slope.1, 
 y = my.slope.2)),
 pos =

Re: [R] R:plot and dots

2005-07-21 Thread Marc Schwartz (via MN)
On Thu, 2005-07-21 at 16:18 +0200, Clark Allan wrote:
> hi all
> 
> a very simple question.
> 
> i have plot(x,y)
> 
> but i would like to add in on the plot the observation number associated
> with each point.
> 
> how can this be done?
> 
> /
> allan

If you mean the unique observation number associated with each x,y pair,
you can use:

  text(x, y, labels = ObsNumberVector, pos = 3)

after the plot(x, y) call:

 df <- data.frame(x = rnorm(10), y = rnorm(10), ID = 1:10)
 with(df, plot(x, y))
 with(df, text(x, y, labels = ID, pos = 3))

See ?text for more information.  Note that I used pos = 3 which places
the label above the data point. There are other positioning parameters
available, which are noted in the help file.

Note also that you might have to adjust the plot axis limits depending
upon where you place the text and your extreme points.

If you mean the frequency of each x,y pair (if there is more than one
observation per x,y pair), you might want to review Deepayan's recent
post here:

https://stat.ethz.ch/pipermail/r-help/2005-July/074042.html

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] .gct file

2005-07-19 Thread Marc Schwartz (via MN)
On Tue, 2005-07-19 at 13:08 -0500, Marc Schwartz (via MN) wrote:
> For the TAB delimited columns, adjust the 'sep' argument to:
> 
> read.table("data.gct", skip = 2, header = TRUE, sep = "\t")
> 
> The 'quote' argument is by default:
> 
> quote = "\"'"
> 
> which should take care of the quoted strings and bring them in as a
> single value.
> 
> The above presumes that the header row is also TAB delimited. If not,
> you may have to set 'skip = 3' to skip over the header row and manually
> set the column names.

One correction. If the final para applies and you need to use 'skip =
3', you would also need to leave out the 'header = TRUE' argument, which
defaults to FALSE.

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] .gct file

2005-07-19 Thread Marc Schwartz (via MN)
For the TAB delimited columns, adjust the 'sep' argument to:

read.table("data.gct", skip = 2, header = TRUE, sep = "\t")

The 'quote' argument is by default:

quote = "\"'"

which should take care of the quoted strings and bring them in as a
single value.

The above presumes that the header row is also TAB delimited. If not,
you may have to set 'skip = 3' to skip over the header row and manually
set the column names.

HTH,

Marc Schwartz


On Tue, 2005-07-19 at 13:52 -0400, mark salsburg wrote:
> This is all extremely helpful.
> 
> The data turns out is a little atypical, the columns are tab-delemited
> except for the description columns
> 
> 
> DATA1.gct looks like this
> 
> #1.2
> 23 3423
> NAME DESCRIPTION VALUE
> gene1 "a protein inducer" 1123
> .  . ..
> 
> How do I get R to read the data as tab delemited, but read in the 2nd
> coloumn as one value based on the quotation marks..
> 
> thanks..
> 
> On 7/19/05, Marc Schwartz (via MN) <[EMAIL PROTECTED]> wrote:
> > On Tue, 2005-07-19 at 13:16 -0400, mark salsburg wrote:
> > > ok so the gct file looks like this:
> > >
> > > #1.2  (version number)
> > > 7283 19   (matrix size)
> > > Name Description Values
> > >   ...  ..
> > >
> > > How can I tell R to disregard the first two lines and start reading
> > > the 3rd line in this gct file. I would just delete them, but I do not
> > > know how to open a gct. file
> > >
> > > thank you
> > >
> > > On 7/19/05, Duncan Murdoch <[EMAIL PROTECTED]> wrote:
> > > > On 7/19/2005 12:10 PM, mark salsburg wrote:
> > > > > I have two files to compare, one is a regular txt file that I can read
> > > > > in no prob.
> > > > >
> > > > > The other is a .gct file (How do I read in this one?)
> > > > >
> > > > > I tried a simple
> > > > >
> > > > > read.table("data.gct", header = T)
> > > > >
> > > > > How do you suggest reading in this file??
> > > > >
> > > >
> > > > .gct is not a standard filename extension.  You need to know what is in
> > > > that file.  Where did you get it?  What program created it?
> > > >
> > > > Chances are the easiest thing to do is to get the program that created
> > > > it to export in a well known format, e.g. .csv.
> > > >
> > > > Duncan Murdoch
> > 
> > 
> > The above would be consistent with the info in my reply.
> > 
> > I guess if the format is consistent, as per Mark's example above, you
> > can use:
> > 
> > read.table("data.gct", skip = 2, header = TRUE)
> > 
> > which will start by skipping the first two lines and then reading in the
> > header row and then the data.
> > 
> > See ?read.table
> > 
> > HTH,
> > 
> > Marc Schwartz
> > 
> > 
> >

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] .gct file

2005-07-19 Thread Marc Schwartz (via MN)
On Tue, 2005-07-19 at 13:16 -0400, mark salsburg wrote:
> ok so the gct file looks like this:
> 
> #1.2  (version number)
> 7283 19   (matrix size)
> Name Description Values
>   ...  ..
> 
> How can I tell R to disregard the first two lines and start reading
> the 3rd line in this gct file. I would just delete them, but I do not
> know how to open a gct. file
> 
> thank you
> 
> On 7/19/05, Duncan Murdoch <[EMAIL PROTECTED]> wrote:
> > On 7/19/2005 12:10 PM, mark salsburg wrote:
> > > I have two files to compare, one is a regular txt file that I can read
> > > in no prob.
> > >
> > > The other is a .gct file (How do I read in this one?)
> > >
> > > I tried a simple
> > >
> > > read.table("data.gct", header = T)
> > >
> > > How do you suggest reading in this file??
> > >
> > 
> > .gct is not a standard filename extension.  You need to know what is in
> > that file.  Where did you get it?  What program created it?
> > 
> > Chances are the easiest thing to do is to get the program that created
> > it to export in a well known format, e.g. .csv.
> > 
> > Duncan Murdoch


The above would be consistent with the info in my reply.

I guess if the format is consistent, as per Mark's example above, you
can use:

read.table("data.gct", skip = 2, header = TRUE)

which will start by skipping the first two lines and then reading in the
header row and then the data.

See ?read.table

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] .gct file

2005-07-19 Thread Marc Schwartz (via MN)
On Tue, 2005-07-19 at 12:28 -0400, Duncan Murdoch wrote:
> On 7/19/2005 12:10 PM, mark salsburg wrote:
> > I have two files to compare, one is a regular txt file that I can read
> > in no prob.
> > 
> > The other is a .gct file (How do I read in this one?)
> > 
> > I tried a simple
> > 
> > read.table("data.gct", header = T)
> > 
> > How do you suggest reading in this file??
> > 
> 
> .gct is not a standard filename extension.  You need to know what is in 
> that file.  Where did you get it?  What program created it?
> 
> Chances are the easiest thing to do is to get the program that created 
> it to export in a well known format, e.g. .csv.
> 
> Duncan Murdoch

A quick Google search would suggest "Gene Cluster Text" file:

http://www.broad.mit.edu/cancer/software/genepattern/tutorial/gp_tutorial_fileformats.html#gct

produced by Gene Pattern:

http://www.broad.mit.edu/cancer/software/genepattern/

If correct, I would point Mark to the Bioconductor folks for more
information and assistance:

http://www.bioconductor.org/

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] using argument names (of indeterminate number) within a function

2005-07-19 Thread Marc Schwartz (via MN)
On Tue, 2005-07-19 at 17:29 +0200, Dirk Enzmann wrote:
> Although I tried to find an answer in the manuals and archives, I cannot 
> solve this (please excuse that my English and/or R programming skills 
> are not good enough to state my problem more clearly):
> 
> I want to write a function with an indeterminate (not pre-defined) 
> number of arguments and think that I should use the "..." construct and 
> the match.call() function. The goal is to write a function that (among 
> other things) uses cbind() to combine a not pre-defined number of 
> vectors specified in the function call. For example, if my vectors are 
> x1, x2, x3, ... xn, within the function I want to use cbind(x1, x2) or 
> cbind(x1, x3, x5) or ... depending on the vector names I use in the 
> funcion call. Additionally, the function has other arguments.
> 
> In the archives I found the following thread (followed by Marc Schwartz)
> 
> http://finzi.psych.upenn.edu/R/Rhelp02a/archive/15186.html
> [R] returning argument names from Peter Dalgaard BSA on 2003-04-10 (stdin)
> 
> that seems to contain the solution to my problem, but I am stuck because 
>   sapply(match.call()[-1], deparse) gives me a vector of strings and I 
> don't know how to use the names in this vector in the cbind() function.
> 
> Up to now my (clearly deficit) function looks like:
> 
> test <- function(..., mvalid=1)
> {
>args = sapply(match.call()[-1], deparse)
> # and here, I don't know how the vector names in args
> # can be used in the cbind() function to follow:
> #
> # temp <- cbind( ???
>if (mvalid > 1)
>{
> #  here it goes on
>}
> }
> 
> Ultimately, I want that the function can be called like
> test(x1,x2,mvalid=1)
> or
> test(x1,x3,x5,mavlid=2)
> and that within the function
> cbind(x1,x2)
> or cbind(x1,x3,x5)
> will be used.
> 
> Can someone give and explain an example / a solution on how to proceed?

Hi Dirk,

How about this:

my.cbind <- function(...)
{
  do.call("cbind", list(...))
}

> a <- 1:10
> b <- 11:20
> c <- 21:30
> d <- 31:40

> my.cbind(a, b)
  [,1] [,2]
 [1,]1   11
 [2,]2   12
 [3,]3   13
 [4,]4   14
 [5,]5   15
 [6,]6   16
 [7,]7   17
 [8,]8   18
 [9,]9   19
[10,]   10   20

> my.cbind(b, c, d)
  [,1] [,2] [,3]
 [1,]   11   21   31
 [2,]   12   22   32
 [3,]   13   23   33
 [4,]   14   24   34
 [5,]   15   25   35
 [6,]   16   26   36
 [7,]   17   27   37
 [8,]   18   28   38
 [9,]   19   29   39
[10,]   20   30   40

> my.cbind(a, b, c, d)
  [,1] [,2] [,3] [,4]
 [1,]1   11   21   31
 [2,]2   12   22   32
 [3,]3   13   23   33
 [4,]4   14   24   34
 [5,]5   15   25   35
 [6,]6   16   26   36
 [7,]7   17   27   37
 [8,]8   18   28   38
 [9,]9   19   29   39
[10,]   10   20   30   40


The use of list(...) in the function allows you to use list based
functions such as do.call() or lapply() against the argument objects
directly without having to deparse and re-parse the character names of
the arguments, which is the approach that Peter used in his response in
the thread you referenced. 

The OP in that thread wanted the argument names as character vectors, as
opposed to the argument objects themselves, which is what you need here.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Can't get sample function from "An Introduction to R" to work

2005-07-15 Thread Marc Schwartz (via MN)
On Fri, 2005-07-15 at 15:40 -0500, David Groos wrote:
> I'm trying to figure out R, a piece at a time, hours at a time...  I 
> was trying to copy the sample function in, "An Introduction to R"  (for 
> version 2.1.0) by W. N. Venables, D. M. Smith, page 42.  Section 10.1 
> "Simple examples" provides a sample function which I tried to duplicate 
> (I'm using Mac OS X 10.3.9, and "R for Mac OS X Aqua GUI v1.11).  The 
> following is what I typed and the last line is R's response when I hit 
> the return key after the penultimate line.  I've re-checked and 
> re-typed the code many times to no avail.  I wasn't able to find this 
> issue using search options, either.  Any help is GREATLY appreciated!
> 
>  > twosam<-function(y1, y2) {
> + n1<-length(y1);n2 <-length(y2)
> + yb1<-mean(y1); yb2<-mean(y2)
> + s1<-var(y1);s2<-var(y2)
> + s<-((n1-1)*s1 + (n2-1)*s2)/(n1+n2-2)
> + tst<-(yb1-yb2)/sqrt(s*(1/n1+1/n2))
> Error: syntax error
> 
> David

The code as you have above (without the "+" on each line) works for me
both in ESS and in the R console under Linux. There should be another
"+" on the next line, in anticipation of the remaining two lines:

 tst
}


Try to copy and paste the following into the console as is:

twosam<-function(y1, y2) {
 n1 <- length(y1);n2 <- length(y2)
 yb1 <- mean(y1); yb2 <- mean(y2)
 s1 <- var(y1);s2 <- var(y2)
 s <- ((n1-1)*s1 + (n2-1)*s2)/(n1+n2-2)
 tst <- (yb1-yb2)/sqrt(s*(1/n1+1/n2))

and see what happens. You should be left at a:

+

on a new line again.

It is possible that there is a bug in the Aqua GUI, but not using a Mac,
I cannot replicate it.

You might want to consider subscribing and posting to the R-SIG-Mac
e-mail list, which is focused on Mac users of R. More information is
here:

https://stat.ethz.ch/mailman/listinfo/r-sig-mac

HTH,

Marc Schwartz

P.S. Greetings from Eden Prairie

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] plot the number of replicates at the same point

2005-07-14 Thread Marc Schwartz (via MN)
On Thu, 2005-07-14 at 15:08 -0500, Deepayan Sarkar wrote:
> On 7/14/05, Marc Schwartz (via MN) <[EMAIL PROTECTED]> wrote:
> > On Thu, 2005-07-14 at 12:30 -0500, Deepayan Sarkar wrote:
> > > On 7/14/05, Kerry Bush <[EMAIL PROTECTED]> wrote:
> > > > Thank you for thinking about the problem for me.
> > > > However, I have found that your method doesn't work at
> > > > all.
> > > >
> > > > You may test the following example:
> > > >
> > > > x1=c(0.6,0.4,.4,.4,.2,.2,.2,0,0)
> > > > x2=c(0.4,.2,.4,.6,0,.2,.4,0,.2)
> > > > x1=rep(x1,4)
> > > > x2=rep(x2,4)
> > > > temp=data.frame(x1,x2)
> > > > temp1=table(temp)
> > > > plot(temp$x1,temp$x2,cex=0)
> > > > text(as.numeric(rownames(temp1)),
> > > > as.numeric(colnames(temp1)), temp1)
> > > >
> > > > what I got here is not what I wanted. You may compare
> > > > with
> > > > plot(x1,x2)
> > > >
> > > > I actually want some plots similar to what SAS proc
> > > > plot produced.
> > > >
> > > > Does anybody have a clue of how to do this easily in
> > > > R?
> > >
> > > Try this:
> > >
> > >
> > > x1=c(0.6,0.4,.4,.4,.2,.2,.2,0,0)
> > > x2=c(0.4,.2,.4,.6,0,.2,.4,0,.2)
> > > x1=rep(x1,4)
> > > x2=rep(x2,4)
> > > temp=data.frame(x1,x2)
> > >
> > > foo <- subset(as.data.frame(table(temp)), Freq > 0)
> > > foo$x1 <- as.numeric(as.character(foo$x1))
> > > foo$x2 <- as.numeric(as.character(foo$x2))
> > >
> > > with(foo, plot(x1, x2, type = "n"))
> > > with(foo, text(x1, x2, lab = Freq))
> > >
> > >
> > > -Deepayan
> > 
> > 
> > Great solution Deepayan. I was just in the process of working on
> > something similar.
> > 
> > One quick modification is that the two plot()/text() functions can be
> > combined into:
> > 
> >   with(foo, plot(x1, x2, pch = as.character(Freq)))
> 
> Only as long as all(Freq < 10) (I've been bitten by this before. :-) ).
> 
> Deepayan


D'oh!

Indeed you are correct, since 'pch' is a single character:

  plot(1:20, pch = as.character(1:20))

as opposed to:

  plot(1:20, type = "n")
  text(1:20, lab = 1:20)

Thanks for the correction.

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] plot the number of replicates at the same point

2005-07-14 Thread Marc Schwartz (via MN)
On Thu, 2005-07-14 at 12:30 -0500, Deepayan Sarkar wrote:
> On 7/14/05, Kerry Bush <[EMAIL PROTECTED]> wrote:
> > Thank you for thinking about the problem for me.
> > However, I have found that your method doesn't work at
> > all.
> > 
> > You may test the following example:
> > 
> > x1=c(0.6,0.4,.4,.4,.2,.2,.2,0,0)
> > x2=c(0.4,.2,.4,.6,0,.2,.4,0,.2)
> > x1=rep(x1,4)
> > x2=rep(x2,4)
> > temp=data.frame(x1,x2)
> > temp1=table(temp)
> > plot(temp$x1,temp$x2,cex=0)
> > text(as.numeric(rownames(temp1)),
> > as.numeric(colnames(temp1)), temp1)
> > 
> > what I got here is not what I wanted. You may compare
> > with
> > plot(x1,x2)
> > 
> > I actually want some plots similar to what SAS proc
> > plot produced.
> > 
> > Does anybody have a clue of how to do this easily in
> > R?
> 
> Try this:
> 
> 
> x1=c(0.6,0.4,.4,.4,.2,.2,.2,0,0)
> x2=c(0.4,.2,.4,.6,0,.2,.4,0,.2)
> x1=rep(x1,4)
> x2=rep(x2,4)
> temp=data.frame(x1,x2)
> 
> foo <- subset(as.data.frame(table(temp)), Freq > 0)
> foo$x1 <- as.numeric(as.character(foo$x1))
> foo$x2 <- as.numeric(as.character(foo$x2))
> 
> with(foo, plot(x1, x2, type = "n"))
> with(foo, text(x1, x2, lab = Freq))
> 
> 
> -Deepayan


Great solution Deepayan. I was just in the process of working on
something similar.

One quick modification is that the two plot()/text() functions can be
combined into:

  with(foo, plot(x1, x2, pch = as.character(Freq)))

Best regards,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] simple question: reading lower triangular matrix

2005-07-12 Thread Marc Schwartz (via MN)
On Tue, 2005-07-12 at 14:37 -0300, Rogério Rosa da Silva wrote:
> Dear list,
> 
> I will like to learn how to read a lower triangular matrix in R. The
> input file *.txt have the following format:
> 
>  A   B   C   D   E
> A   0   
> B   10
> C   250   
> D   3680  
> E   479   100
> 
> 
> How this can be done?
> 
> Thanks in advance for your help
> 
> Rogério


I don't know that this is the easiest way of doing it, but here is one
approach:

# I saved your data above in a file called "test.txt"

# Read the first line of test.txt to get the colnames as chars
col.names <- unlist(read.table("test.txt", nrow = 1, as.is = TRUE))

# now read the rest of the file using 'fill = TRUE' to pad lines
# with NAs 
# skip the first line
# set the row.names as the first column in the text file
# coerce to a matrix
df <- as.matrix(read.table("test.txt", fill = TRUE, skip = 1, 
row.names = 1))

# Now set the colnames of df
colnames(df) <- col.names

> df
  A  B  C  D  E
A 0 NA NA NA NA
B 1  0 NA NA NA
C 2  5  0 NA NA
D 3  6  8  0 NA
E 4  7  9 10  0

If you should further want to set the diagonal to NA:

> diag(df) <- NA

> df
   A  B  C  D  E
A NA NA NA NA NA
B  1 NA NA NA NA
C  2  5 NA NA NA
D  3  6  8 NA NA
E  4  7  9 10 NA


See ?read.table for more information on the file reading part.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] question for IF ELSE usage

2005-07-12 Thread Marc Schwartz (via MN)
On Tue, 2005-07-12 at 16:22 +0200, ecoinfo wrote:
> Hi R users,
>  Maybe the question is too simple.
>  In a IF ... ELSE ... statement "if(cond) cons.expr else alt.expr", IF and 
> ELSE should be at the same line? 
> For example,
>  if (x1==12)
> {
> y1 <- 5 
> }else
> {
> y1 <- 3
> }
>  is right, while
>  if (x1==12)
> {
> y1 <- 5 
> }
> else # Error: syntax error
> {
> y1 <- 3
> }
>  is wrong? 
>  Thanks

Note the following from the Details section of ?"if"

"Note that it is a common mistake to forget to put braces ({ .. })
around your statements, e.g., after if(..) or for(). In particular,
you should not have a newline between } and else to avoid a syntax error
in entering a if ... else construct at the keyboard or via source. For
that reason, one (somewhat extreme) attitude of defensive programming is
to always use braces, e.g., for if clauses."


One other approach is the following:

if (x1 == 12) 
{
  y1 <- 5 
} else {
  y1 <- 3
}

Note the presence of both braces on the 'else' line.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] xmat[1, 2:3] <- NULL

2005-07-07 Thread Marc Schwartz (via MN)
On Thu, 2005-07-07 at 10:20 -0700, Mikkel Grum wrote:
> I have a situation where I'm filling out a dataframe
> from a database. Sometimes the database query doesn't
> get anything, so I end up trying to place NULL in the
> dataframe like below.
> 
> > temp <- NULL
> > xmat <- as.data.frame(matrix(NA, 2, 3))
> > xmat[1, 2:3] <- temp
> Error in if (m < n * p && (n * p)%%m)
> stop(gettextf("replacement has %d items, need %d",  : 
> missing value where TRUE/FALSE needed
> 
> I can't get the programme to accept that sometimes
> what the query looks for just doesn't exist, and I
> just want to move on to the next calculation leaving
> the dataframe with a missing value in the given cell.
> It's a real show stopper and I haven't found a way
> round it.
> 
> Best wishes,
> Mikkel
> 
> PS. I'm using dbGetQuery to query an SQLite database.

NULL represents a zero length object in R.

Thus, trying to set only the first row in a data frame to NULL makes no
sense, since you cannot have a 0 length object that also has a single
row (as you seem to be trying to do above).

Since a data frame is a series of lists, you could do the following:

> temp <- NULL
> xmat <- as.data.frame(matrix(NA, 2, 3))

> xmat
  V1 V2 V3
1 NA NA NA
2 NA NA NA

> xmat[, 1] <- temp

> xmat
  V2 V3
1 NA NA
2 NA NA

which removes the first column in the data frame. This is the same as:

> xmat[, -1]
  V2 V3
1 NA NA
2 NA NA


You could also set the entire xmat to NULL as follows:

> xmat
  V1 V2 V3
1 NA NA NA
2 NA NA NA

> xmat <- NULL

> xmat
NULL


You can then test to see if 'xmat' is a NULL:

> is.null(xmat)
[1] TRUE

and base a boolean expression and resultant action on that result:

if (!is.null(xmat))
{
  do_calculations...
}

If your calculations are on a row by row basis, where NA's represent
missing data, you can also use one of several functions to eliminate
those rows. See ?na.action, ?na.omit and ?complete.cases for more
information and examples.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Plotting Character Variable

2005-07-07 Thread Marc Schwartz (via MN)
On Thu, 2005-07-07 at 11:47 -0400, [EMAIL PROTECTED] wrote:
> Any ideas about the following problem:
> 
> I have a matrix (A) that looks like this:
> 
> gene_names values
> hsa-mir-124  0.3
> hsa-mir-234  0.1
> hsa-mir-344  0.4
> hsa-mir-333  0.7
> .  ...
> 
> (This is a 2 by 22283 matrix: quite large)

To split hairs, it would be a 22283 by 2 object in R's [row, column]
approach to indexing.

I am also presuming that "A" is a data frame, since you seem to have two
different data types above, with gene_names being a factor?

> I would like to plot the values, but output the gene_names as the plotting
> symbol. I have tried regular x,y plots, but since the gene_names are quite
> large and there are 22283 of them, it's impossible to fit them on the x-axis.
> 
> Basically, can I plot the above matrix
> 
> plot(gene_names, value) where the gene_names are used as the plotting symbol.
> 
> thank you,

Well...

I would defer to those with more experience in plotting genetic data,
but from a practical standpoint, it seems to me to be highly problematic
to plot >20,000 data points with labels and have them be human readable
without an STMunless you have _very_ wide paper on a large format
plotter  ;-)

That being said, one approach is to rotate the x axis labels vertically,
to make more room, while using points for the plotting symbols:

# Adjust bottom margin to make room for vertical labels
par(mar = c(7, 4, 4, 2))

plot(1:nrow(A), A$values, xaxt = "n", ann = FALSE, las = 2)

# use 'las = 3' to rotate the labels
axis(1, at = 1:nrow(A), 
 labels = as.character(A$gene_names), las = 3)


You might want to review some of the tools available at the Bioconductor
site to see if there are specialized plotting functions for this type of
data:

http://www.bioconductor.org/

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] the format of the result

2005-07-01 Thread Marc Schwartz (via MN)
On Fri, 2005-07-01 at 19:40 +0800, ronggui wrote:
> I write a function to get the frequency and prop of a variable.
> 
> freq<-function(x,digits=3)
> {naa<-is.na(x)
> nas<-sum(naa)
> if (any(naa))
> x<-x[!naa]
> n<-length(x)
> ta<-table(x)
> prop<-prop.table(ta)*100
> res<-rbind(ta,prop)
> rownames(res)<-c("Freq","Prop")
> cat("Missing value(s) are",nas,".\n")
> cat("Valid case(s) are",n,".\n")
> cat("Total case(s) are",(n+nas),".\n\n")
> print(res,digits=(digits+2))
> cat("\n")
> }
> 
> > freq(sample(letters[1:3],48,T),2)
> Missing value(s) are 0 .
> Valid case(s) are 48 .
> Total case(s) are 48 .
> 
>  a b c
> Freq 11.00 20.00 17.00
> Prop 22.92 41.67 35.42
> 
> and i want the result to be like
>  a  b c
> Freq 11.00  20.00  17.00
> Prop 22.92% 41.67% 35.42%
> 
> how should i change my function to get what i want?


Here is a modification of the function that I think should work. Note
that part of the output formatting process has to take into account the
a priori unknowns involving your 'digits' argument, the lengths of the
dimnames resulting from the table and the lengths of the frequency
counts in the table. Thus, a fair amount of the code is establishing the
'width' argument, which is then used in formatC() so that the output can
be column aligned properly.

Note that by default, table() will exclude "NA", so you do not need to
subset 'x' before using table().

Also, note that I change "Prop" to "Pct".


freq <- function(x, digits = 3)
{
  n <- length(x)
  missing <- sum(is.na(x))
  ta <- table(x)
  pct <- prop.table(ta) * 100

  width <- max(nchar(unlist(dimnames(ta))) + 1,
   nchar(ta) + digits + 1,
   5 + digits)
  
  Vals <- paste(formatC(unlist(dimnames(ta)), format = "s",
width = width),
collapse = "  ")

  Freq <- paste(formatC(ta, format = "f", digits = digits,
width = width),
collapse = "  ")

  Pct <- paste(formatC(pct, format = "f", digits = digits,
   width = width),
   "%", sep = "", collapse = " ")

  cat("Missing value(s) are", missing, ".\n")
  cat("Valid case(s) are", n - missing,".\n")
  cat("Total case(s) are", n, ".\n\n")

  cat("", Vals, "\n")
  cat("Freq", Freq, "\n")
  cat("Pct ", Pct, "\n")
  cat("\n")
}


Thus:


> freq(sample(letters[1:3], 48, TRUE), 2)
Missing value(s) are 0 .
Valid case(s) are 48 .
Total case(s) are 48 .

   abc 
Freq   28.00 8.0012.00 
Pct58.33%   16.67%   25.00% 


> freq(sample(c(letters[1:3], NA), 1000, TRUE), 2)
Missing value(s) are 257 .
Valid case(s) are 743 .
Total case(s) are 1000 .

   abc 
Freq  250.00   218.00   275.00 
Pct33.65%   29.34%   37.01% 


> freq(iris$Species)
Missing value(s) are 0 .
Valid case(s) are 150 .
Total case(s) are 150 .

  setosa   versicolorvirginica 
Freq  50.000   50.000   50.000 
Pct   33.333%  33.333%  33.333% 


> freq(iris$Species, 0)
Missing value(s) are 0 .
Valid case(s) are 150 .
Total case(s) are 150 .

  setosa   versicolorvirginica 
Freq  50   50   50 
Pct   33%  33%  33% 


HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] How to rotate the axisnames in a BARPLOT

2005-06-30 Thread Marc Schwartz (via MN)
On Thu, 2005-06-30 at 23:05 +0200, [EMAIL PROTECTED] wrote:
> Hi all,
> 
> - how can I do a barplot with rotated axis labels? I've seen the example for
> just a plot in the FAQ, but I'll missing the coordinates to plot my text at
> the right position beneath the bars. 
> Is there any (easy?) solution?
> 
> - how can I set the y-axis in a barplot to logarithmic scale?
> 
> Many thanks in advance!
> 
> Best Regards
>  Tom

The same concept for rotating axis labels in a regular plot per the FAQ
can be used in a barplot, as long as you know that barplot() returns the
bar midpoints. So:

 ## Increase bottom margin to make room for rotated labels
 par(mar = c(7, 4, 4, 2) + 0.1)

 ## Create plot and get bar midpoints in 'mp'
 mp <- barplot(1:10)

 ## Set up x axis with tick marks alone
 axis(1, at = mp, labels = FALSE)

 ## Create some text labels
 labels <- paste("Label", 1:10, sep = " ")

 ## Plot x axis labels at mp
 text(mp, par("usr")[3] - 0.5, srt = 45, adj = 1,
  labels = labels, xpd = TRUE)

 ## Plot x axis label at line 4
 mtext(1, text = "X Axis Label", line = 4)

See ?barplot for more information.

With respect to the log scales, you would need to use barplot2(), which
is in the 'gplots' package on CRAN. barplot2() supports log scales using
the same 'log' argument as plot().

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] conversion

2005-06-28 Thread Marc Schwartz (via MN)
On Tue, 2005-06-28 at 17:23 -0400, Anna Oganyan wrote:
> Dear List,
> How can I convert a list with elements being character strings, like: 
> "c(1,2,3,4)", “c(1,3,4,2) … to a list with elements as numerical 
> vectors: c(1,2,3,4), c(1,3,4,2)…?
> Thanks!
> Anna

> l <- list("c(1,2,3,4)", "c(1,3,4,2)")

> l
[[1]]
[1] "c(1,2,3,4)"

[[2]]
[1] "c(1,3,4,2)"

Now use lapply() over each list element in 'l', converting the character
vectors to R expressions and then evaluating them:

> lapply(l, function(x) eval(parse(text = x)))
[[1]]
[1] 1 2 3 4

[[2]]
[1] 1 3 4 2


See ?lapply, ?eval and ?parse.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] faster algorithm for Kendall's tau

2005-06-28 Thread Marc Schwartz (via MN)
On Tue, 2005-06-28 at 13:03 -0400, ferdinand principia wrote:
> Hi,
> 
> I need to calculate Kendall's tau for large data
> vectors (length > 100'000). 
> Is somebody aware of a faster algorithm or package
> function than "cor(, method="kendall")"? 
> There are ties in the data to be considered (Kendall's
> tau-b).
> 
> Any suggestions?
> 
> Regards
> Ferdinand


The time intensive part of the process is typically the ranking/ordering
of the vector pairs to calculate the numbers of concordant and
discordant pairs.

If the number of _unique pairs_ in your data is substantially less than
the number of total pairs (in other words, creating a smaller 2d
contingency table from a pair of your vectors makes sense), then the
following may be of help.

# Calculate CONcordant Pairs in a table
# cycle through x[r, c] and multiply by
# sum(x elements below and to the right of x[r, c])
# x = table
concordant <- function(x)
{
  x <- matrix(as.numeric(x), dim(x))
  
  # get sum(matrix values > r AND > c)
  # for each matrix[r, c]
  mat.lr <- function(r, c)
  { 
lr <- x[(r.x > r) & (c.x > c)]
sum(lr)
  }

  # get row and column index for each
  # matrix element
  r.x <- row(x)
  c.x <- col(x)

  # return the sum of each matrix[r, c] * sums
  # using mapply to sequence thru each matrix[r, c]
  sum(x * mapply(mat.lr, r = r.x, c = c.x))
}

# Calculate DIScordant Pairs in a table
# cycle through x[r, c] and multiply by
# sum(x elements below and to the left of x[r, c])
# x = table
discordant <- function(x)
{
  x <- matrix(as.numeric(x), dim(x))
  
  # get sum(matrix values > r AND < c)
  # for each matrix[r, c]
  mat.ll <- function(r, c)
  { 
ll <- x[(r.x > r) & (c.x < c)]
sum(ll)
  }

  # get row and column index for each
  # matrix element
  r.x <- row(x)
  c.x <- col(x)

  # return the sum of each matrix[r, c] * sums
  # using mapply to sequence thru each matrix[r, c]
  sum(x * mapply(mat.ll, r = r.x, c = c.x))
}


# Calculate Kendall's Tau-b
# x = table
calc.KTb <- function(x)
{
  x <- matrix(as.numeric(x), dim(x))
  
  c <- concordant(x)
  d <- discordant(x)

  n <- sum(x)
  SumR <- rowSums(x)
  SumC <- colSums(x)

  KTb <- (2 * (c - d)) / sqrt(((n ^ 2) -
 (sum(SumR ^ 2))) * ((n ^ 2) -
 (sum(SumC ^ 2

  KTb
}


Note that I made some modifications of the above, relative to prior
versions that I have posted to handle large numbers of pairs to avoid
integer overflows in summations. Hence the:

  x <- matrix(as.numeric(x), dim(x))

conversion in each function.

Now, create some random test data, with 100,000 elements in each vector,
sampling from 'letters', which would yield a 26 x 26 table:

 a <- sample(letters, 10, replace = TRUE)
 b <- sample(letters, 10, replace = TRUE)
 
 > dim(table(a, b))
 [1] 26 26

 > system.time(print(calc.KTb(table(a, b
[1] 0.0006906088
[1] 0.77 0.02 0.83 0.00 0.00

Note that in the above, the initial table takes most of the time:

> system.time(table(a, b))
[1] 0.55 0.00 0.56 0.00 0.00

Hence:

> tab.ab <- table(a, b)
> system.time(print(calc.KTb(tab.ab)))
[1] 0.0006906088
[1] 0.25 0.01 0.27 0.00 0.00


I should note that I also ran:

> system.time(print(cor(a, b, method = "kendall")))
[1] 0.0006906088 
[1] 694.80   7.72 931.89   0.00   0.00 

Nice to know the results work out at least...  :-)


I have not tested with substantially larger 2d matrices, but would
envision that as the dimensions of the resultant tabulation increases,
my method probably approaches and may even become less efficient than
the approach implemented in cor(). Some testing would validate this and
perhaps point to coding the concordant() and discordant() functions in C
for improvement in timing.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


<    1   2   3