Re: [R] vectorize subscript ranges?

2010-06-14 Thread David Winsemius


On Jun 15, 2010, at 1:12 AM, James McCreight wrote:


Hi R'ers-

I'll believe this can be done and that I'm just not clever enough.

I'm trying to do this in 2D, but i think we have the same problem in  
1D.


#Assume you have some 1D time series
d<-rnorm(100)

#Say you want to get the average over intervals of different lengths
#like from time 10 to time 24 and time 51 to time 86
starts <- c(10,51)
ends <- c(24,86)
results <-c(NA,NA)

#then you'd typically loop
for (i in 1:2) { results[i] <- mean(d[starts[i]:ends[i]) }

can it be done without looping?


Depends on what you mean by "without looping":

> mapply(function(x,y,z){mean(z[x:y])} ,x=starts, y=ends,  
MoreArgs=list(d))

[1] -0.219429514 -0.008565724

Probably no more, and quite possibly less, efficient than a for-loop.



Thanks in advance.

James

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] shifted window of string

2010-06-14 Thread David Winsemius


On Jun 14, 2010, at 11:46 PM, david hilton shanabrook wrote:

basically I need to create a sliding window in a string.  a way to  
explain this is:


v <-  
c 
("a 
","b 
","c 
","d 
","e 
","f 
","g 
","h 
","i 
","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y")

window <- 5
shift <- 2


I want a matrix of characters with "window" columns filled with "v"  
by filling a row, then shifting over "shift" and continuing to the  
next row until "v" is exhausted.  You can assume "v" will evenly fit  
"m"


so the result needs to look like this matrix where each row is  
shifted 2 (in this case):



m

 [,1] [,2] [,3] [,4] [,5]
[1,] "a"  "b"  "c"  "d"  "e"
[2,] "c"  "d"  "e"  "f"  "g"
[3,] "e"  "f"  "g"  "h"  "i"
[4,] "g"  "h"  "i"  "j"  "k"
[5,] "i"  "j"  "k"  "l"  "m"
[6,] "k"  "l"  "m"  "n"  "o"
[7,] "m"  "n"  "o"  "p"  "q"
[8,] "o"  "p"  "q"  "r"  "s"
[9,] "q"  "r"  "s"  "t"  "u"
[10,] "s"  "t"  "u"  "v"  "w"
[11,] "t"  "u"  "v"  "w"  "x"


I think you got the last row wrong:

> m <- matrix(v[sapply(1:window, function(x) seq(x,
   (length(v)-window+x) ,
by = shift))],
 ncol=window)
> m
  [,1] [,2] [,3] [,4] [,5]
 [1,] "a"  "b"  "c"  "d"  "e"
 [2,] "c"  "d"  "e"  "f"  "g"
 [3,] "e"  "f"  "g"  "h"  "i"
 [4,] "g"  "h"  "i"  "j"  "k"
 [5,] "i"  "j"  "k"  "l"  "m"
 [6,] "k"  "l"  "m"  "n"  "o"
 [7,] "m"  "n"  "o"  "p"  "q"
 [8,] "o"  "p"  "q"  "r"  "s"
 [9,] "q"  "r"  "s"  "t"  "u"
[10,] "s"  "t"  "u"  "v"  "w"
[11,] "u"  "v"  "w"  "x"  "y"


This needs to be very efficient as my data is large, loops would be  
too slow.  Any ideas?  It could also be done in a string and then  
put into the matrix but I don't think this would be easier.


I'm not sure what you mean, but I think I might have done it the way  
you thought was harder. There might be a more efficient route with  
modulo arithmetic. Some elaboration of

> v[((1:11)*4)%/%2 -1]
 [1] "a" "c" "e" "g" "i" "k" "m" "o" "q" "s" "u"
But I don't see it immediately.



I will want to put this in a function:

shiftedMatrix <- function(v, window=5, shift=2){...

return(m)}


Left as an exercise for the reader.

-- David.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] vectorize subscript ranges?

2010-06-14 Thread James McCreight
Hi R'ers-

I'll believe this can be done and that I'm just not clever enough.

I'm trying to do this in 2D, but i think we have the same problem in 1D.

#Assume you have some 1D time series
d<-rnorm(100)

#Say you want to get the average over intervals of different lengths
#like from time 10 to time 24 and time 51 to time 86
starts <- c(10,51)
ends <- c(24,86)
results <-c(NA,NA)

#then you'd typically loop
for (i in 1:2) { results[i] <- mean(d[starts[i]:ends[i]) }

can it be done without looping?

Thanks in advance.

James

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] shifted window of string

2010-06-14 Thread Peter Alspach
Tena koe David

Something like:

matrix(v[1:win + rep(seq(0, (length(v)-5), shift), each=win)], ncol=win,
byrow=TRUE)

should work (I haven't tested it fully).  Note it gives a different
answer to your m since I think the last line of your m is incorrect.

HTH 

Peter Alspach

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of david hilton shanabrook
> Sent: Tuesday, 15 June 2010 3:47 p.m.
> To: r-help@r-project.org
> Subject: [R] shifted window of string
> 
> basically I need to create a sliding window in a string.  a way to
> explain this is:
> 
> > v <-
>
c("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","
> r","s","t","u","v","w","x","y")
> > window <- 5
> > shift <- 2
> 
> I want a matrix of characters with "window" columns filled with "v" by
> filling a row, then shifting over "shift" and continuing to the next
> row until "v" is exhausted.  You can assume "v" will evenly fit "m"
> 
> so the result needs to look like this matrix where each row is shifted
> 2 (in this case):
> 
> > m
>   [,1] [,2] [,3] [,4] [,5]
>  [1,] "a"  "b"  "c"  "d"  "e"
>  [2,] "c"  "d"  "e"  "f"  "g"
>  [3,] "e"  "f"  "g"  "h"  "i"
>  [4,] "g"  "h"  "i"  "j"  "k"
>  [5,] "i"  "j"  "k"  "l"  "m"
>  [6,] "k"  "l"  "m"  "n"  "o"
>  [7,] "m"  "n"  "o"  "p"  "q"
>  [8,] "o"  "p"  "q"  "r"  "s"
>  [9,] "q"  "r"  "s"  "t"  "u"
> [10,] "s"  "t"  "u"  "v"  "w"
> [11,] "t"  "u"  "v"  "w"  "x"
> 
> This needs to be very efficient as my data is large, loops would be
too
> slow.  Any ideas?  It could also be done in a string and then put into
> the matrix but I don't think this would be easier.
> 
> I will want to put this in a function:
> 
> shiftedMatrix <- function(v, window=5, shift=2){...
> 
> return(m)}
> 
> thanks
> 
> dhs
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate vs merge

2010-06-14 Thread David Winsemius


On Jun 14, 2010, at 10:06 PM, skan wrote:



Hello

Can someone explain me the difference between aggregate and merge,  
please?
I've read the help on both commands but I don't understant the  
difference.




Merge adds data from one dataframe to another based on a matching  
process. Aggregate, which is a wrapper for tapply, works within one  
dataframe to apply a chosen function to "like" rows.


--
David.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subtracting POSIXct data/times

2010-06-14 Thread William Dunlap
> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of James Rome
> Sent: Monday, June 14, 2010 2:27 PM
> To: Don MacQueen
> Cc: r-help@r-project.org
> Subject: Re: [R] Subtracting POSIXct data/times
> 
> That fixed it. Dumb me. I had assumed that the subtraction of the raw
> POSIXcts would always give the same results.

Subtracting any given two POSIXct values does
give just one value.  However, the format used
to print the values depends on the other values in the
same vector.  It looks like the print routine selects
units such that the smallest item in the vector is bigger
than one of the chosen unit.
   > z <- as.POSIXct("2010-06-14 20:32:02") + 10^(0:7)
   > z-z[1]
   Time differences in secs
   [1]   0   9  99 999   9  99
   [8] 999
   > z-mean(z[5:6])
   Time differences in hours
   [1]  -15.2775  -15.2750  -15.2500  -15.  -12.5000
   [6]   12.5000  262.5000 2762.5000
   > z-mean(z[7:8])
   Time differences in days
   [1] -63.65740 -63.65729 -63.65625 -63.64583 -63.54167
   [6] -62.5 -52.08333  52.08333

This is analogous to the print routines for numbers
choosing whether or not to use scientific notation
based on the range of the data or chooosing the number
of digits to print based on the number of digits required to
approximate all values in the vector to the desired
accuracy.  Once the format is  chosen it is used for
all the values in the vector.  E.g.,
   > 111^(0:5)
   [1]   1 111   12321 1367631
   [5]   151807041 16850581551
   > 111^(0:6)
   [1] 1.00e+00 1.11e+02 1.232100e+04 1.367631e+06
   [5] 1.518070e+08 1.685058e+10 1.870415e+12
or
   > 1/c(1,2,4,8)
   [1] 1.000 0.500 0.250 0.125
   > 1/c(1,2,3,4)
   [1] 1.000 0.500 0.333 0.250

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> 
> Thanks,
> Jim
> 
> On 6/14/10 5:22 PM, Don MacQueen wrote:
> > See the help page for the difftime() function, which will 
> tell you how
> > to specify the units of the differences.
> > (when you don't specify, it chooses the units according to 
> some rules)
> >
> > -Don
> >
> > At 4:24 PM -0400 6/14/10, James Rome wrote:
> >> I have two dataframe columns of POXIXct data/times that 
> include seconds.
> >> I got them into this format using for example
> >> zsort$ETA <- as.POSIXct(as.character(zsort$ETA), format="%m/%d/%Y
> >> %H:%M:%S")
> >>
> >> My problem is that when I subtract the two columns, sometimes the
> >> difference is given in seconds, and sometimes it is given 
> in minutes. I
> >> don't care which it is, but I need to know which one I will get.
> >>
> >> DateTimeETA
> >> 2010-05-16 02:19:562010-05-16 03:46:35
> >> ...
> >> Browse[1]> mins = zsort$ETA - zsort$DateTime
> >> Browse[1]> mins
> >> Time differences in hours
> >>  [1] 1.444167 2.685000 3.077222 3.210278 3.248056 3.281944 3.281944
> >> 3.360278 3.360278 3.582778 4.57 5.506111 5.857778 
> 6.150278 6.150278
> >> 6.243056 6.243889 6.248056 6.248611 6.248611 6.356667
> >> attr(,"tzone")
> >>
> >> But sometimes the answer is in seconds.
> >> # make a column with the minutes before landing
> >>> zsort$MinBeforeLand = zsort$ETA - zsort$DateTime
> >>>  zsort$MinBeforeLand
> >> Time differences in secs
> >>  [1]   -50   136   221   878  1192  2263  3296  3959  4968 
>  5846  8709
> >> 11537 12198 12442 12642 15952 18273 19952 20538
> >>
> >> How do I specify the resultant units?
> >>
> >> Thanks,
> >> Jim Rome
> >>
> >> __
> >> R-help@r-project.org mailing list
> >> https://*stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://*www.*R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> >
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] shifted window of string

2010-06-14 Thread david hilton shanabrook
basically I need to create a sliding window in a string.  a way to explain this 
is:

> v <- 
> c("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y")
> window <- 5
> shift <- 2

I want a matrix of characters with "window" columns filled with "v" by filling 
a row, then shifting over "shift" and continuing to the next row until "v" is 
exhausted.  You can assume "v" will evenly fit "m"

so the result needs to look like this matrix where each row is shifted 2 (in 
this case):

> m
  [,1] [,2] [,3] [,4] [,5]
 [1,] "a"  "b"  "c"  "d"  "e" 
 [2,] "c"  "d"  "e"  "f"  "g" 
 [3,] "e"  "f"  "g"  "h"  "i" 
 [4,] "g"  "h"  "i"  "j"  "k" 
 [5,] "i"  "j"  "k"  "l"  "m" 
 [6,] "k"  "l"  "m"  "n"  "o" 
 [7,] "m"  "n"  "o"  "p"  "q" 
 [8,] "o"  "p"  "q"  "r"  "s" 
 [9,] "q"  "r"  "s"  "t"  "u" 
[10,] "s"  "t"  "u"  "v"  "w" 
[11,] "t"  "u"  "v"  "w"  "x" 

This needs to be very efficient as my data is large, loops would be too slow.  
Any ideas?  It could also be done in a string and then put into the matrix but 
I don't think this would be easier.

I will want to put this in a function:

shiftedMatrix <- function(v, window=5, shift=2){...

return(m)}

thanks

dhs
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] executable script

2010-06-14 Thread skan

Maybe he wants to compile it to an exe file in order to make it faster.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/executable-script-tp839859p2255307.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] executable script

2010-06-14 Thread Cliff Clive

True, but all he said was that he wanted to auto-launch his program by
double-clicking it.

I don't know of any ways to speed up R other than to write the slower
functions in C and then call them in your R programs.  But I'm not
sure that's what he had in mind.



On Mon, Jun 14, 2010 at 9:19 PM, skan [via R]
 wrote:
> Maybe he wants to compile it to an exe file in order to make it faster.
>
> 
> View message @
> http://r.789695.n4.nabble.com/executable-script-tp839859p2255307.html
> To unsubscribe from Re: executable script, click here.
>

-- 
View this message in context: 
http://r.789695.n4.nabble.com/executable-script-tp839859p2255329.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] aggregate vs merge

2010-06-14 Thread skan

Hello

Can someone explain me the difference between aggregate and merge, please?
I've read the help on both commands but I don't understant the difference.

thanks
-- 
View this message in context: 
http://r.789695.n4.nabble.com/aggregate-vs-merge-tp2255300p2255300.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] NPMC

2010-06-14 Thread Peter Ehlers


On 2010-06-14 13:25, Moimt wrote:


HI,
I am a new user of R and want to analyse some data using npmc. My data have
several levels of factor (Site, Year and Season) and several variable
(Percentages).
I have tried to use npmc but I always get an error message. My data are in a
table following this example:

SiteYEarSeasonVar1 Var2   
A   2009Dry  1056
B   

here is the error message I get: Erreur dans as.vector(x, mode) : argument
'mode' incorrect.

How should I use the npmc formula?

Thank you for your help


What do you mean by "data are in a table"?
Is this a data.frame (as required by npmc)?

I think that npmc can only handle one numerical and
one categorical variable and they must be named "var"
and "class".

  -Peter Ehlers

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error: subscript out of bounds?

2010-06-14 Thread David Winsemius


On Jun 14, 2010, at 6:48 PM, Yesha Patel wrote:


Hi all,

I want to get results for cox proportional hazards on SNP data. I'm  
trying
to get HRs, CI's, & p-values for each individual SNP - this is  
represented
by cov[,i]. When I run my code, I get the following error: subscript  
out of

bounds. I don't know why I am getting this error.

I have looked through the R logs, but nothing that has been previously
suggested has helped so far. What am I doing wrong?

DS3 = time variable, S3 = censor/event variable, cov = my dataset,  
which is

in a data.frame format:

*
CODE:

test<-NULL
bb=names(cov)
col = dim(cov)[2]
for(i in 19:col)
{
mod=coxph(Surv(cov$DS3,cov$S3)~cov[,i]+AGE+sex,data=cov)
aa<-summary(mod)


# Consider:
test <-rbind( test, c( nrow(aa$coef), nrow(aa$conf.int) ) )

#test<-rbind(test,c(bb[i],aa$coef[1,1],aa$coef[1,3],aa$coef[1,5],aa 
$conf.int

#[1,1],aa$conf.int[1,3],aa$conf.int
#[1,4],aa$coef[2,5],aa$coef[3,5],aa$coef[4,5]))
}

#
c(min(test[ , 1] ), min(test[ ,2] ) )


#write.table(test,"result.txt",quote=F)


# Compare to your indices.




Thanks!
-CC

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] recursively Merging a list a zoo objects

2010-06-14 Thread Gabor Grothendieck
On Mon, Jun 14, 2010 at 6:13 PM, steven mosher  wrote:
> The zoo package as a merge function which merges a set of zoo objects
> result<-merge(zoo1,zoo2,...)
>
> Assume your zoo objects are already collected in a list
>
> # make a phony list to illustrate the situation. ( hat tip to david W for
> constructing a list in a loop)
>
> ddat <- as.list(rep("", 20))
>  ytd<-seq(3,14)
>  for(i in 1:20) {
> + ddat[[i]] <- zoo(data,ytd )
> + }
>
>

merge.zoo can handle multiple zoo objects (not just two) so:

   do.call("merge", ddat)

This also works but is a bit tedious:

merge(ddat[[1]], ddat[[2]], ddat[[3]], ddat[[4]], ddat[[5]],
   ddat[[6]], ddat[[7]], ddat[[8]], ddat[[9]], ddat[[10]],
   ddat[[11]], ddat[[12]], ddat[[13]], ddat[[14]], ddat[[15]],
   ddat[[16]], ddat[[17]], ddat[[18]], ddat[[19]], ddat[[20]])

Also note that there is an example of using do.call with merge.zoo in
the examples section of ?merge.zoo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subtracting POSIXct data/times

2010-06-14 Thread James Rome
That fixed it. Dumb me. I had assumed that the subtraction of the raw
POSIXcts would always give the same results.

Thanks,
Jim

On 6/14/10 5:22 PM, Don MacQueen wrote:
> See the help page for the difftime() function, which will tell you how
> to specify the units of the differences.
> (when you don't specify, it chooses the units according to some rules)
>
> -Don
>
> At 4:24 PM -0400 6/14/10, James Rome wrote:
>> I have two dataframe columns of POXIXct data/times that include seconds.
>> I got them into this format using for example
>> zsort$ETA <- as.POSIXct(as.character(zsort$ETA), format="%m/%d/%Y
>> %H:%M:%S")
>>
>> My problem is that when I subtract the two columns, sometimes the
>> difference is given in seconds, and sometimes it is given in minutes. I
>> don't care which it is, but I need to know which one I will get.
>>
>> DateTimeETA
>> 2010-05-16 02:19:562010-05-16 03:46:35
>> ...
>> Browse[1]> mins = zsort$ETA - zsort$DateTime
>> Browse[1]> mins
>> Time differences in hours
>>  [1] 1.444167 2.685000 3.077222 3.210278 3.248056 3.281944 3.281944
>> 3.360278 3.360278 3.582778 4.57 5.506111 5.857778 6.150278 6.150278
>> 6.243056 6.243889 6.248056 6.248611 6.248611 6.356667
>> attr(,"tzone")
>>
>> But sometimes the answer is in seconds.
>> # make a column with the minutes before landing
>>> zsort$MinBeforeLand = zsort$ETA - zsort$DateTime
>>>  zsort$MinBeforeLand
>> Time differences in secs
>>  [1]   -50   136   221   878  1192  2263  3296  3959  4968  5846  8709
>> 11537 12198 12442 12642 15952 18273 19952 20538
>>
>> How do I specify the resultant units?
>>
>> Thanks,
>> Jim Rome
>>
>> __
>> R-help@r-project.org mailing list
>> https://*stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://*www.*R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error: subscript out of bounds?

2010-06-14 Thread jim holtman
You need to learn how to debug your program.  When the error occurs,
look at the value of the indices in the offending statement -- you
will find that the problem is subscript out of bounds.  Put the
following statement in your code:

 options(error=utils::recover)

and then do ?browser to learn how to use the browser for debugging.

On Mon, Jun 14, 2010 at 6:48 PM, Yesha Patel  wrote:
> Hi all,
>
> I want to get results for cox proportional hazards on SNP data. I'm trying
> to get HRs, CI's, & p-values for each individual SNP - this is represented
> by cov[,i]. When I run my code, I get the following error: subscript out of
> bounds. I don't know why I am getting this error.
>
> I have looked through the R logs, but nothing that has been previously
> suggested has helped so far. What am I doing wrong?
>
> DS3 = time variable, S3 = censor/event variable, cov = my dataset, which is
> in a data.frame format:
>
> *
> CODE:
>
> test<-NULL
> bb=names(cov)
> col = dim(cov)[2]
> for(i in 19:col)
>  {
> mod=coxph(Surv(cov$DS3,cov$S3)~cov[,i]+AGE+sex,data=cov)
> aa<-summary(mod)
>  test<-rbind(test,c(bb[i],aa$coef[1,1],aa$coef[1,3],aa$coef[1,5],aa$conf.int
> [1,1],aa$conf.int[1,3],aa$conf.int
> [1,4],aa$coef[2,5],aa$coef[3,5],aa$coef[4,5]))
> }
> write.table(test,"result.txt",quote=F)
>
>
> Thanks!
> -CC
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] avoid row/column indexing in capture.output

2010-06-14 Thread David Winsemius


On Jun 14, 2010, at 6:46 PM, David Winsemius wrote:



On Jun 14, 2010, at 6:19 PM, Subodh Acharya wrote:


Hi everyone,
This might be a very petty thing but Its not working for me.
I want to export an output to a txt file but without indexing.


Unfortunately we cannot see the structure of "output" but from the  
result you are getting it looks like both "Time" and "output" might  
be numeric vectors of the same length, although the [,1] access  
looks wrong for a vector. Why don't you give us the result of  
str(output)?



Here is what I  have tried to do

outfile<- function(Time, var.names, output) {
var.names = c(names(para))
for(i in 1: ncol(output)){
 cat(length(var.names), '\n')
 cat("A", "S", "C", "I", '\n')
 cat("time =", "yes", '\n')
 cat("RUN", i, '\n')
 cat(length(Time), '\n')
#   print(cbind(Time, output[,i]))


# Perhaps?

cat( paste(Time, inp[ ,1], "\n"))  # since paste is "vectorized".


(I used a single column dataframe named inp to test what I thought was  
the structure of "output".)


--
David.


}
}
This works fine and
I get the output like as follows

4
A S  C I
time = yes
RUN 1
111
 [,1]  [,2]
[1,] 0.00 1.750
[2,] 0.05 0.3983540
[3,] 0.10 0.6010150
[4,] 0.15 0.6759570
[5,] 0.20 0.7165215
[6,] 0.25 0.7423391
[7,] 0.30 0.7603594
[8,] 0.35 0.7737155
[9,] 0.40 0.7840432
[10,] 0.45 0.7922850
.
.
But I need need to have an output like this
4
A S  C I
time = yes
RUN 1
111
0.00 1.750
0.05 0.3983540
0.10 0.6010150
...


Any kind of help will be highly appreciated.

Thank you in advance

--
Acharya, Subodh

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] error: subscript out of bounds?

2010-06-14 Thread Yesha Patel
Hi all,

I want to get results for cox proportional hazards on SNP data. I'm trying
to get HRs, CI's, & p-values for each individual SNP - this is represented
by cov[,i]. When I run my code, I get the following error: subscript out of
bounds. I don't know why I am getting this error.

I have looked through the R logs, but nothing that has been previously
suggested has helped so far. What am I doing wrong?

DS3 = time variable, S3 = censor/event variable, cov = my dataset, which is
in a data.frame format:

*
CODE:

test<-NULL
bb=names(cov)
col = dim(cov)[2]
for(i in 19:col)
 {
mod=coxph(Surv(cov$DS3,cov$S3)~cov[,i]+AGE+sex,data=cov)
aa<-summary(mod)
 test<-rbind(test,c(bb[i],aa$coef[1,1],aa$coef[1,3],aa$coef[1,5],aa$conf.int
[1,1],aa$conf.int[1,3],aa$conf.int
[1,4],aa$coef[2,5],aa$coef[3,5],aa$coef[4,5]))
}
write.table(test,"result.txt",quote=F)


Thanks!
-CC

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] avoid row/column indexing in capture.output

2010-06-14 Thread David Winsemius


On Jun 14, 2010, at 6:19 PM, Subodh Acharya wrote:


Hi everyone,
This might be a very petty thing but Its not working for me.
I want to export an output to a txt file but without indexing.


Unfortunately we cannot see the structure of "output" but from the  
result you are getting it looks like both "Time" and "output" might be  
numeric vectors of the same length, although the [,1] access looks  
wrong for a vector. Why don't you give us the result of str(output)?



Here is what I  have tried to do

outfile<- function(Time, var.names, output) {
 var.names = c(names(para))
for(i in 1: ncol(output)){
  cat(length(var.names), '\n')
  cat("A", "S", "C", "I", '\n')
  cat("time =", "yes", '\n')
  cat("RUN", i, '\n')
  cat(length(Time), '\n')
#   print(cbind(Time, output[,i]))


# Perhaps?

cat( paste(Time, inp[ ,1], "\n"))  # since paste is "vectorized".

--
David.


}
}
This works fine and
I get the output like as follows

4
A S  C I
time = yes
RUN 1
111
  [,1]  [,2]
 [1,] 0.00 1.750
 [2,] 0.05 0.3983540
 [3,] 0.10 0.6010150
 [4,] 0.15 0.6759570
 [5,] 0.20 0.7165215
 [6,] 0.25 0.7423391
 [7,] 0.30 0.7603594
 [8,] 0.35 0.7737155
 [9,] 0.40 0.7840432
[10,] 0.45 0.7922850
.
.
But I need need to have an output like this
4
A S  C I
time = yes
RUN 1
111
0.00 1.750
0.05 0.3983540
0.10 0.6010150
...


Any kind of help will be highly appreciated.

Thank you in advance

--
Acharya, Subodh

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression with 50 varaibales

2010-06-14 Thread Charles C. Berry

On Mon, 14 Jun 2010, array chip wrote:

Thanks Charles for the reproducible codes. I started this question 
because I was asked to take a look at such dataset, but I have doubt if 
it's meaningful to do a LR with 50 variables. I haven't got the dataset 
yet, thus have not tried any code. But again for sharing some simulation 
code.


have one question for your code. When you calculate common standard 
errors for coefficients using sd(coef(fit)), should you exlude intercept 
by doing sd(coef(fit)[-1])? Actually after removing intercept, the 
standard error calculate this way is very similar to one reported from 
colMeans(coef(summary(fit))), for both scenarios in your example (coef = 
0.14 or 1.0)


Of course! I slipped up on that one!



Another question is the 50 simulated variables have very low 
correlations (ranging from -0.1 to 0.08), that may contribute to the 
stable model. If some (not all) of the 50 variable have considerable 
correlation, like 0.7 or 0.8 correlation, How would LR model behave?




Why not try it out and see?

The mvtnorm package has a function for generating correlated random 
variates.


HTH,

Chuck


Thanks

John



- Original Message 
From: Charles C. Berry 
To: Joris Meys 
Cc: r-help@r-project.org; Marc Schwartz 
Sent: Mon, June 14, 2010 8:32:02 AM
Subject: Re: [R] logistic regression with 50 varaibales

On Mon, 14 Jun 2010, Joris Meys wrote:


Hi,

Marcs explanation is valid to a certain extent, but I don't agree with
his conclusion. I'd like to point out "the curse of
dimensionality"(Hughes effect) which starts to play rather quickly.



Ahem!

... minimal, self-contained, reproducible code ...

The OPs situation may not be an impossible one:


set.seed(54321)
dat <- as.data.frame(matrix(rnorm(1800*50),nc=50))
dat$y <- rbinom(1800,1,plogis(rowSums(dat)/7))
fit <- glm(y~., dat, family=binomial)
1/7 # the true coef

[1] 0.1428571

sd(coef(fit)) # roughly, the common standard error

[1] 0.05605597

colMeans(coef(summary(fit))) # what glm() got

  Estimate Std. Errorz value   Pr(>|z|)
0.14590836 0.05380666 2.71067328 0.06354820

# a trickier case:
set.seed(54321)
dat <- as.data.frame(matrix(rnorm(1800*50),nc=50))
dat$y <- rbinom(1800,1,plogis(rowSums(dat))) # try again with coef==1
fit <- glm(y~., dat, family=binomial)
colMeans(coef(summary(fit)))

   Estimate  Std. Error z valuePr(>|z|)
0.982944012 0.119063631 8.222138491 0.008458002




Finer examination of the latter fit will show some values that differ too
far from 1.0 to agree with the asymptotic std err.


sd(coef(fit)) # rather bigger than 0.119

[1] 0.1827462

range(coef(fit))

[1] -0.08128586  1.25797057

And near separability may be playing here:


cbind(

+ table(
+ cut(
+ plogis(abs(predict(fit))),
+ c( 0, 0.9, 0.99, 0.999, 0., 0.9, 1 ) ) ) )
 [,1]
(0,0.9]   453
(0.9,0.99]427
(0.99,0.999]  313
(0.999,0.]251
(0.,0.9]  173
(0.9,1]   183




Recall that the observed information contains a factor of

   plogis( predict(fit) )* plogis( -predict(fit))


hist(plogis( predict(fit) )* plogis( -predict(fit)))


So the effective sample size here was much reduced.

But to the OP's question, whether what you get is reasonable depends on
what the setup is.

I wouldn't call the first of the above cases 'highly unstable'.

Which is not to say that one cannot generate difficult cases (esp. with
correlated covariates and/or one or more highly influential covariates)
and that the OPs case is not one of them.


HTH,

Chuck


The curse of dimensionality is easily demonstrated looking at the
proximity between your datapoints. Say we scale the interval in one
dimension to be 1 unit. If you have 20 evenly-spaced observations, the
distance between the observations is 0.05 units. To have a proximity
like that in a 2-dimensional space, you need 20^2=400 observations. in
a 10 dimensional space this becomes 20^10 ~ 10^13 datapoints. The
distance between your observations is important, as a sparse dataset
will definitely make your model misbehave.

Even with about 35 samples per variable, using 50 independent
variables will render a highly unstable model, as your dataspace is
about as sparse as it can get. On top of that, interpreting a model
with 50 variables is close to impossible, and then I didn't even start
on interactions. No point in trying I'd say. If you really need all
that information, you might want to take a look at some dimension
reduction methods first.

Cheers
Joris

On Mon, Jun 14, 2010 at 2:55 PM, Marc Schwartz  wrote:

On Jun 13, 2010, at 10:20 PM, array chip wrote:


Hi, this is not R technical question per se. I know there are many excellent 
statisticians in this list, so here my questions: I have dataset with ~1800 
observations and 50 independent variables, so there are about 35 samples per 
variable. Is it wise to build a stable multiple logistic model with 50 
independent variables? Any problem with this approach? Thanks

John



T

[R] avoid row/column indexing in capture.output

2010-06-14 Thread Subodh Acharya
Hi everyone,
This might be a very petty thing but Its not working for me.
 I want to export an output to a txt file but without indexing.
Here is what I  have tried to do

outfile<- function(Time, var.names, output) {
  var.names = c(names(para))
 for(i in 1: ncol(output)){
   cat(length(var.names), '\n')
   cat("A", "S", "C", "I", '\n')
   cat("time =", "yes", '\n')
   cat("RUN", i, '\n')
   cat(length(Time), '\n')
print(cbind(Time, output[,i]))
 }
 }
This works fine and
I get the output like as follows

4
A S  C I
time = yes
RUN 1
111
   [,1]  [,2]
  [1,] 0.00 1.750
  [2,] 0.05 0.3983540
  [3,] 0.10 0.6010150
  [4,] 0.15 0.6759570
  [5,] 0.20 0.7165215
  [6,] 0.25 0.7423391
  [7,] 0.30 0.7603594
  [8,] 0.35 0.7737155
  [9,] 0.40 0.7840432
 [10,] 0.45 0.7922850
.
.
But I need need to have an output like this
4
A S  C I
time = yes
RUN 1
111
0.00 1.750
0.05 0.3983540
0.10 0.6010150
...


Any kind of help will be highly appreciated.

Thank you in advance

-- 
Acharya, Subodh

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] recursively Merging a list a zoo objects

2010-06-14 Thread steven mosher
The zoo package as a merge function which merges a set of zoo objects
result<-merge(zoo1,zoo2,...)

Assume your zoo objects are already collected in a list

# make a phony list to illustrate the situation. ( hat tip to david W for
constructing a list in a loop)

ddat <- as.list(rep("", 20))
 ytd<-seq(3,14)
 for(i in 1:20) {
+ ddat[[i]] <- zoo(data,ytd )
+ }


ddat
[[1]]
 1  2  3  4  5  6  7  8  9 10 11 12
 3  4  5  6  7  8  9 10 11 12 13 14

[[2]]
 1  2  3  4  5  6  7  8  9 10 11 12
 3  4  5  6  7  8  9 10 11 12 13 14

[[3]]
 1  2  3  4  5  6  7  8  9 10 11 12
 3  4  5  6  7  8  9 10 11 12 13 14



So given a list of zoo objects, if you want to create a merged  zoo object
is there a way to do that.

For example, merging the first two gives you this

result<-merge(ddat[[1]],ddat[[2]])
> result
   ddat[[1]] ddat[[2]]
3  3 3
4  4 4
5  5 5
6  6 6
7  7 7
8  8 8
9  9 9
101010
111111
121212
131313
141414

And..

result<-merge(result,ddat[[3]])
> result
   ddat[[1]] ddat[[2]] ddat[[3]]
3  3 3 3
4  4 4 4
5  5 5 5
6  6 6 6
7  7 7 7
8  8 8 8
9  9 9 9
10101010
11111111
12121212
13131313
14141414

I'm thinking that either do.call or lapply should do the trick.. or a
combination of the two. It would appear to be a straightforward
recursion applying merge to the result of merge..the looping solution is
straightforward

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Html help

2010-06-14 Thread Murray Jorgensen
Thanks very much, Duncan. I understand this better now. It takes a bit 
of getting used to but the prospect of some day getting graphic elements 
to help should make it worthwhile. (And I guess it saves on disk space 
by only generating the help pages as needed.)


Cheers,  Murray

On 15/06/2010 4:13 a.m., Duncan Murdoch wrote:

Murray Jorgensen wrote:

I have just installed R 2.11.1 on my XP laptop.

I like html help

[...]

How do I go about getting a local set of html help files?



Since 2.10.0, HTML help is generated on demand. It doesn't go off your
local computer, it works locally. This saves a bit of space (the HTML is
generated from the same source as the text is generated from), but the
main point is that it allows help pages to contain dynamic content. For
example, Romain Francois posted some demo code a while ago to allow the
display of graphics generated by R within help pages. (Unfortunately it
depended on a particular browser feature not supported by Internet
Explorer, so I'm going to need to put together something less elegant,
but that's life.)

Duncan Murdoch


--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: m...@waikato.ac.nzFax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441   Mobile 021 0200 8350

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Times series data file?

2010-06-14 Thread David Winsemius


On Jun 14, 2010, at 4:45 PM, Douglas M. Hultstrand wrote:


Hello,

I currently splitting a file into individual files (time series each  
separated into one file), the file I read in skips the first four  
lines and extracts the data columns I need.  I was wondering if  
there is a way for R to automatically scan and separate the files  
based on the head information?


The header delineates the observation with a 254 space time space  
day space month space year.

254  0 26  NOV1995

I would like to then use the program I have written to do the  
analysis.  I have attached a small subset text file of data (two  
observations).  Any thoughts would be very helpful.


# Read in data file
data <- read.table("file.txt", skip=4, header=F)
temp <- data$V4
elev <- data$V3


Yes:

> ctxt <- textConnection("\n\n\n\n254  0 26  NOV1995\n 
\n\n")
# simulating a file with four lines to be skipped and a fifth line to  
be read

>> dat <- read.table(file=ctxt, skip=4, header=F, nrows=1)
> str(dat)
'data.frame':   1 obs. of  5 variables:
 $ V1: int 254
 $ V2: int 0
 $ V3: int 26
 $ V4: Factor w/ 1 level "NOV": 1
 $ V5: int 1995

If you wanted the months to be character variables then use as.is=TRUE  
or stringsAsFactors=FALSE to control the default creation of factors.





Thank you,
Doug

--
-
Douglas M. Hultstrand, MS
Senior Hydrometeorologist
Metstat, Inc. Windsor, Colorado
voice: 720.771.5840
email: dmhul...@metstat.com
web: http://www.metstat.com
-

   254  0 26  NOV1995
 1  24232  72694  44.92N123.02W61   2302
 2100   1960   2680149  9  4
 3   SLE9 kt
 9  10071 61106 94180 10
 5  10037 89110 82183 12
 4  1120106 84186 12
 6   9780304  9  9205 19
 6   9500544 74 59221 19
   254 12 26  NOV1995
 1  24232  72694  44.92N123.02W61   2302
 2100   1960   2680149  9  4
 3   SLE9 kt
 9  10071 61106 94180 10
 5  10037 89110 82183 12
 4  1120106 84186 12
 6   9780304  9  9205 19
 6   9500544 74 59221 19
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subtracting POSIXct data/times

2010-06-14 Thread Don MacQueen
See the help page for the difftime() function, which will tell you 
how to specify the units of the differences.

(when you don't specify, it chooses the units according to some rules)

-Don

At 4:24 PM -0400 6/14/10, James Rome wrote:

I have two dataframe columns of POXIXct data/times that include seconds.
I got them into this format using for example
zsort$ETA <- as.POSIXct(as.character(zsort$ETA), format="%m/%d/%Y %H:%M:%S")

My problem is that when I subtract the two columns, sometimes the
difference is given in seconds, and sometimes it is given in minutes. I
don't care which it is, but I need to know which one I will get.

DateTimeETA
2010-05-16 02:19:56 2010-05-16 03:46:35
...
Browse[1]> mins = zsort$ETA - zsort$DateTime
Browse[1]> mins
Time differences in hours
 [1] 1.444167 2.685000 3.077222 3.210278 3.248056 3.281944 3.281944
3.360278 3.360278 3.582778 4.57 5.506111 5.857778 6.150278 6.150278
6.243056 6.243889 6.248056 6.248611 6.248611 6.356667
attr(,"tzone")

But sometimes the answer is in seconds.
# make a column with the minutes before landing

zsort$MinBeforeLand = zsort$ETA - zsort$DateTime
 zsort$MinBeforeLand

Time differences in secs
 [1]   -50   136   221   878  1192  2263  3296  3959  4968  5846  8709
11537 12198 12442 12642 15952 18273 19952 20538

How do I specify the resultant units?

Thanks,
Jim Rome

__
R-help@r-project.org mailing list
https://*stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://*www.*R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
--
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA
925-423-1062

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] NPMC

2010-06-14 Thread Moimt

HI,
I am a new user of R and want to analyse some data using npmc. My data have
several levels of factor (Site, Year and Season) and several variable
(Percentages).
I have tried to use npmc but I always get an error message. My data are in a
table following this example:

SiteYEarSeasonVar1 Var2   
A   2009Dry  1056
B   

here is the error message I get: Erreur dans as.vector(x, mode) : argument
'mode' incorrect.

How should I use the npmc formula?

Thank you for your help
-- 
View this message in context: 
http://r.789695.n4.nabble.com/NPMC-tp2254913p2254913.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] 3D ncdf file

2010-06-14 Thread ecvetano

Hello,

I have an ncdf file with different variables for dimensions and dates  
where the dimensions are as follows, where X=i and Y=j creating a 88  
by 188 set of cells. For each cell there are 12 readings for DO taken  
at 2 hour intervals and recoded by according to the Julian calendar  
under the ordinal dates variable.


DO: [2280x188x85]
DX: [188x85]
DY: [188x85]
X: [85x1]
Y: [188x1]
Ordinal_Dates: [2280x1]
T: [2280x1]


So far, I have set up each variable as follows, and I don?t know how  
to work with the 3D DO variable which contains 12 daily values per day


setwd ("C:/Documents and Settings/Desktop/R")
library(ncdf)
profile1 <- open.ncdf("2005r1_v3_Ct=500_lerie_out_bot.nc", header=FALSE)

name1<-get.var.ncdf( profile1, "DO")
name2<-get.var.ncdf( profile1, "DX")
name3<-get.var.ncdf( profile1, "DY")
name4<-get.var.ncdf( profile1, "X")
name5<-get.var.ncdf( profile1, "Y")
name6<-get.var.ncdf( profile1, "Ordinal_Dates")
name7<-get.var.ncdf( profile1, "T")

I want to set a threshold, and calculate how many days have an average  
of DO greater then my threshold, and also what the daily averages of  
DO are


Any help is appreciated!
Emilija

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to make a barplot similar to Excel’s “clustered column chart”.

2010-06-14 Thread Hrishi Mittal

Josef,

I think all you need to do is use the transpose of your data matrix. So if
your dataset is called mydata:

barplot(t(as.matrix(x)),beside=T)

-- 
View this message in context: 
http://r.789695.n4.nabble.com/how-to-make-a-barplot-similar-to-Excel-s-clustered-column-chart-tp2254979p2255008.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Times series data file?

2010-06-14 Thread Douglas M. Hultstrand

Hello,

I currently splitting a file into individual files (time series each 
separated into one file), the file I read in skips the first four lines 
and extracts the data columns I need.  I was wondering if there is a way 
for R to automatically scan and separate the files based on the head 
information?


The header delineates the observation with a 254 space time space day 
space month space year.

254  0 26  NOV1995

I would like to then use the program I have written to do the analysis.  
I have attached a small subset text file of data (two observations).  
Any thoughts would be very helpful.


# Read in data file
data <- read.table("file.txt", skip=4, header=F)
temp <- data$V4
elev <- data$V3

Thank you,
Doug

--
-
Douglas M. Hultstrand, MS
Senior Hydrometeorologist
Metstat, Inc. Windsor, Colorado
voice: 720.771.5840
email: dmhul...@metstat.com
web: http://www.metstat.com
-

254  0 26  NOV1995
  1  24232  72694  44.92N123.02W61   2302
  2100   1960   2680149  9  4
  3   SLE9 kt
  9  10071 61106 94180 10
  5  10037 89110 82183 12
  4  1120106 84186 12
  6   9780304  9  9205 19
  6   9500544 74 59221 19
254 12 26  NOV1995
  1  24232  72694  44.92N123.02W61   2302
  2100   1960   2680149  9  4
  3   SLE9 kt
  9  10071 61106 94180 10
  5  10037 89110 82183 12
  4  1120106 84186 12
  6   9780304  9  9205 19
  6   9500544 74 59221 19
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to make a barplot similar to Excel’s “clustered column chart”.

2010-06-14 Thread Peter Dalgaard
josef.kar...@phila.gov wrote:
> I have a matrix with 12 rows (one for each month), 2 columns (baseflow, 
> runoff). I would like to make a barplot similar to Excel’s “clustered 
> column chart�. 
>  Here is my matrix ‘x’
>  
> 8.25875413.300710
> 10.180953   10.760465
> 11.012184   13.954887
> 10.910870   13.839839
> 9.02351911.511129
> 7.18924112.519830
> 5.92557617.101491
> 5.21161313.585175
> 5.03959213.506304
> 4.4623259.963006
> 5.58652111.306202
> 7.73924214.669374
> 
> If I use barplot(x, beside=T), column 1 appears on the left side of the 
> plot, and then column 2 appears on the right side of the plot.  I would 
> rather that they alternate, so that for month 1, I see 2 bars – column 1, 
> then column 2.  And so forth for months 2-12. Is there a simple way to do 
> this?  (In excel this is the chart option for “clustered column�)

Can't you just transpose x? barplot(t(x), beside=T)

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Zero counts lost with table() in functions

2010-06-14 Thread Erik Iverson



Ottar Kvindesland wrote:

Hi,

I am collecting replies from a survey and counts replies by the table()
function. The function below carries two
data frames and counts the observations of the findings in the first
parameter vector given the value of the second as shown in the code below.

My trouble is that the vector kp_vec at the end of the inner loop seems to
ignore the value when the
table() counts zero occurences of one of the outcomes. I will have

y n
34   0

This is not picked up in the matrix row after the loops with something like

 "y"   "n" "y"
"Funding"   "Survival"  12 5   34

where the last "n" value is missing. This causes my returned data frame to
fail and in all, rather miserable for the plot.

I see the point of this in a way, so I believe it is not a bug. I'd love to
get my zero back. Is it a subtle point in R I have missed?


kpi_test <- function ( paramvec, kpivec ) {
kp_vec <- c()
res_kpi_y <- c()
res_kpi_n <- c()

tmp_param <- c()
tmp_kpi <- c()

for(param_no in seq(from=1, to=length(paramvec), by = 1)) {
tmp_param <- paramvec[param_no]
for (kpi_no in seq(from=1, to=length(kpivec), by = 1)) {
tmp_kpi <- kpivec[kpi_no]
res_kpi_y <- table( tmp_param [ tmp_kpi == 'y' ] )
res_kpi_n <- table( tmp_param [ tmp_kpi == 'n' ] )
kp_vec <- c(kp_vec, names(tmp_param), names(tmp_kpi), res_kpi_y, res_kpi_n )
}
}
matrix_vector <- matrix(kp_vec, ncol=6, byrow=T)
fres <- data.frame(matrix_vector)
return( fres )
}


Is it possible to provide test data that shows how your function is not 
behaving as you'd like?  It's probable that your function can be reduced 
to a line or two of R code, while solving your problem, and become much 
more readable if we know what it's supposed to do.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Zero counts lost with table() in functions

2010-06-14 Thread Peter Dalgaard
Ottar Kvindesland wrote:
> Hi,
> 
> I am collecting replies from a survey and counts replies by the table()
> function. The function below carries two
> data frames and counts the observations of the findings in the first
> parameter vector given the value of the second as shown in the code below.
> 
> My trouble is that the vector kp_vec at the end of the inner loop seems to
> ignore the value when the
> table() counts zero occurences of one of the outcomes. I will have
> 
> y n
> 34   0
> 
> This is not picked up in the matrix row after the loops with something like
> 
>  "y"   "n" "y"
> "Funding"   "Survival"  12 5   34
> 
> where the last "n" value is missing. This causes my returned data frame to
> fail and in all, rather miserable for the plot.
> 
> I see the point of this in a way, so I believe it is not a bug. I'd love to
> get my zero back. Is it a subtle point in R I have missed?

I don't know about subtle, but the general idea is that if you can only
expect to get a zero count if the set of possible values is known in
advance to include that value. Otherwise there's just no end to the
number of zeros to include! This can be obtained by making the object
you tabulate into a factor with the relevant level set:

> table("y")

y
1
> table(factor("y",levels=c("y","n")))

y n
1 0


Incidentally, this is the main reason R doesn't by default drop unused
levels when subsetting.

I can't make heads or tails of your code, but I hope the above helps
solve your issue.

-pd

> 
> 
> kpi_test <- function ( paramvec, kpivec ) {
> kp_vec <- c()
> res_kpi_y <- c()
> res_kpi_n <- c()
> 
> tmp_param <- c()
> tmp_kpi <- c()
> 
> for(param_no in seq(from=1, to=length(paramvec), by = 1)) {
> tmp_param <- paramvec[param_no]
> for (kpi_no in seq(from=1, to=length(kpivec), by = 1)) {
> tmp_kpi <- kpivec[kpi_no]
> res_kpi_y <- table( tmp_param [ tmp_kpi == 'y' ] )
> res_kpi_n <- table( tmp_param [ tmp_kpi == 'n' ] )
> kp_vec <- c(kp_vec, names(tmp_param), names(tmp_kpi), res_kpi_y, res_kpi_n )
> }
> }
> matrix_vector <- matrix(kp_vec, ncol=6, byrow=T)
> fres <- data.frame(matrix_vector)
> return( fres )
> }
> 
> 
> ottar
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Subtracting POSIXct data/times

2010-06-14 Thread James Rome
I have two dataframe columns of POXIXct data/times that include seconds.
I got them into this format using for example
zsort$ETA <- as.POSIXct(as.character(zsort$ETA), format="%m/%d/%Y %H:%M:%S")

My problem is that when I subtract the two columns, sometimes the
difference is given in seconds, and sometimes it is given in minutes. I
don't care which it is, but I need to know which one I will get.

DateTimeETA 
2010-05-16 02:19:56 2010-05-16 03:46:35 
...
Browse[1]> mins = zsort$ETA - zsort$DateTime
Browse[1]> mins
Time differences in hours
 [1] 1.444167 2.685000 3.077222 3.210278 3.248056 3.281944 3.281944
3.360278 3.360278 3.582778 4.57 5.506111 5.857778 6.150278 6.150278
6.243056 6.243889 6.248056 6.248611 6.248611 6.356667
attr(,"tzone")

But sometimes the answer is in seconds.
# make a column with the minutes before landing
>zsort$MinBeforeLand = zsort$ETA - zsort$DateTime
> zsort$MinBeforeLand
Time differences in secs
 [1]   -50   136   221   878  1192  2263  3296  3959  4968  5846  8709
11537 12198 12442 12642 15952 18273 19952 20538

How do I specify the resultant units?

Thanks,
Jim Rome

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to make a barplot similar to Excel’s “clustered column chart”.

2010-06-14 Thread Josef . Kardos
I have a matrix with 12 rows (one for each month), 2 columns (baseflow, 
runoff). I would like to make a barplot similar to Excel’s “clustered 
column chart”. 
 Here is my matrix ‘x’
 
8.25875413.300710
10.180953   10.760465
11.012184   13.954887
10.910870   13.839839
9.02351911.511129
7.18924112.519830
5.92557617.101491
5.21161313.585175
5.03959213.506304
4.4623259.963006
5.58652111.306202
7.73924214.669374

If I use barplot(x, beside=T), column 1 appears on the left side of the 
plot, and then column 2 appears on the right side of the plot.  I would 
rather that they alternate, so that for month 1, I see 2 bars – column 1, 
then column 2.  And so forth for months 2-12. Is there a simple way to do 
this?  (In excel this is the chart option for “clustered column”)



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Zero counts lost with table() in functions

2010-06-14 Thread Ottar Kvindesland
Hi,

I am collecting replies from a survey and counts replies by the table()
function. The function below carries two
data frames and counts the observations of the findings in the first
parameter vector given the value of the second as shown in the code below.

My trouble is that the vector kp_vec at the end of the inner loop seems to
ignore the value when the
table() counts zero occurences of one of the outcomes. I will have

y n
34   0

This is not picked up in the matrix row after the loops with something like

 "y"   "n" "y"
"Funding"   "Survival"  12 5   34

where the last "n" value is missing. This causes my returned data frame to
fail and in all, rather miserable for the plot.

I see the point of this in a way, so I believe it is not a bug. I'd love to
get my zero back. Is it a subtle point in R I have missed?


kpi_test <- function ( paramvec, kpivec ) {
kp_vec <- c()
res_kpi_y <- c()
res_kpi_n <- c()

tmp_param <- c()
tmp_kpi <- c()

for(param_no in seq(from=1, to=length(paramvec), by = 1)) {
tmp_param <- paramvec[param_no]
for (kpi_no in seq(from=1, to=length(kpivec), by = 1)) {
tmp_kpi <- kpivec[kpi_no]
res_kpi_y <- table( tmp_param [ tmp_kpi == 'y' ] )
res_kpi_n <- table( tmp_param [ tmp_kpi == 'n' ] )
kp_vec <- c(kp_vec, names(tmp_param), names(tmp_kpi), res_kpi_y, res_kpi_n )
}
}
matrix_vector <- matrix(kp_vec, ncol=6, byrow=T)
fres <- data.frame(matrix_vector)
return( fres )
}


ottar

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] installing RExcel package

2010-06-14 Thread Richard M. Heiberger
You can normally get through the firewall by using
the internet2 option.
Use ??internet for the exact function name.  I am not at my computer now so I 
can't check for you.

Rich
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] installing RExcel package

2010-06-14 Thread Cable, Samuel B Civ USAF AFMC AFRL/RVBXI
Was wondering if anyone has any experience installing the RExcel package
by hand.  I think I have all the files needed, but our firewall here
prevents RExcelInstaller from going through the internet to get them
like it "wants" to do, and it just gives up.  Any ideas?  Thanks.

--Sam

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression with 50 varaibales

2010-06-14 Thread array chip
Thanks Charles for the reproducible codes. I started this question because I 
was asked to take a look at such dataset, but I have doubt if it's meaningful 
to do a LR with 50 variables. I haven't got the dataset yet, thus have not 
tried any code. But again for sharing some simulation code.

have one question for your code. When you calculate common standard errors for 
coefficients using sd(coef(fit)), should you exlude intercept by doing 
sd(coef(fit)[-1])? Actually after removing intercept, the standard error 
calculate this way is very similar to one reported from 
colMeans(coef(summary(fit))), for both scenarios in your example (coef = 0.14 
or 1.0)

Another question is the 50 simulated variables have very low correlations 
(ranging from -0.1 to 0.08), that may contribute to the stable model. If some 
(not all) of the 50 variable have considerable correlation, like 0.7 or 0.8 
correlation, How would LR model behave?

Thanks

John



- Original Message 
From: Charles C. Berry 
To: Joris Meys 
Cc: r-help@r-project.org; Marc Schwartz 
Sent: Mon, June 14, 2010 8:32:02 AM
Subject: Re: [R] logistic regression with 50 varaibales

On Mon, 14 Jun 2010, Joris Meys wrote:

> Hi,
>
> Marcs explanation is valid to a certain extent, but I don't agree with
> his conclusion. I'd like to point out "the curse of
> dimensionality"(Hughes effect) which starts to play rather quickly.
>

Ahem!

... minimal, self-contained, reproducible code ...

The OPs situation may not be an impossible one:

> set.seed(54321)
> dat <- as.data.frame(matrix(rnorm(1800*50),nc=50))
> dat$y <- rbinom(1800,1,plogis(rowSums(dat)/7))
> fit <- glm(y~., dat, family=binomial)
> 1/7 # the true coef
[1] 0.1428571
> sd(coef(fit)) # roughly, the common standard error
[1] 0.05605597
> colMeans(coef(summary(fit))) # what glm() got
   Estimate Std. Errorz value   Pr(>|z|)
0.14590836 0.05380666 2.71067328 0.06354820
> # a trickier case:
> set.seed(54321)
> dat <- as.data.frame(matrix(rnorm(1800*50),nc=50))
> dat$y <- rbinom(1800,1,plogis(rowSums(dat))) # try again with coef==1
> fit <- glm(y~., dat, family=binomial)
> colMeans(coef(summary(fit)))
Estimate  Std. Error z valuePr(>|z|)
0.982944012 0.119063631 8.222138491 0.008458002
>

Finer examination of the latter fit will show some values that differ too 
far from 1.0 to agree with the asymptotic std err.

> sd(coef(fit)) # rather bigger than 0.119
[1] 0.1827462
> range(coef(fit))
[1] -0.08128586  1.25797057

And near separability may be playing here:

> cbind(
+ table(
+ cut(
+ plogis(abs(predict(fit))),
+ c( 0, 0.9, 0.99, 0.999, 0., 0.9, 1 ) ) ) )
  [,1]
(0,0.9]   453
(0.9,0.99]427
(0.99,0.999]  313
(0.999,0.]251
(0.,0.9]  173
(0.9,1]   183
>

Recall that the observed information contains a factor of

plogis( predict(fit) )* plogis( -predict(fit))

> hist(plogis( predict(fit) )* plogis( -predict(fit)))

So the effective sample size here was much reduced.

But to the OP's question, whether what you get is reasonable depends on 
what the setup is.

I wouldn't call the first of the above cases 'highly unstable'.

Which is not to say that one cannot generate difficult cases (esp. with 
correlated covariates and/or one or more highly influential covariates) 
and that the OPs case is not one of them.


HTH,

Chuck

> The curse of dimensionality is easily demonstrated looking at the
> proximity between your datapoints. Say we scale the interval in one
> dimension to be 1 unit. If you have 20 evenly-spaced observations, the
> distance between the observations is 0.05 units. To have a proximity
> like that in a 2-dimensional space, you need 20^2=400 observations. in
> a 10 dimensional space this becomes 20^10 ~ 10^13 datapoints. The
> distance between your observations is important, as a sparse dataset
> will definitely make your model misbehave.
>
> Even with about 35 samples per variable, using 50 independent
> variables will render a highly unstable model, as your dataspace is
> about as sparse as it can get. On top of that, interpreting a model
> with 50 variables is close to impossible, and then I didn't even start
> on interactions. No point in trying I'd say. If you really need all
> that information, you might want to take a look at some dimension
> reduction methods first.
>
> Cheers
> Joris
>
> On Mon, Jun 14, 2010 at 2:55 PM, Marc Schwartz  wrote:
>> On Jun 13, 2010, at 10:20 PM, array chip wrote:
>>
>>> Hi, this is not R technical question per se. I know there are many 
>>> excellent statisticians in this list, so here my questions: I have dataset 
>>> with ~1800 observations and 50 independent variables, so there are about 35 
>>> samples per variable. Is it wise to build a stable multiple logistic model 
>>> with 50 independent variables? Any problem with this approach? Thanks
>>>
>>> John
>>
>>
>> The general rule of thumb is to have 10-20 'events' per covariate degree of 
>> freedom. Frank has suggest

Re: [R] How we can open the results are saved

2010-06-14 Thread Prof Brian Ripley

On Mon, 14 Jun 2010, Patrick Burns wrote:


On 14/06/2010 17:50, jim holtman wrote:

load('adresse/filename.R')


Or:

attach('adresse/filename.R')

The difference between 'load' and 'attach'
is that 'load' puts the contents of the file
into your workspace (global environment, first


Not necessarily: see the envir argument.


location on the search list), while 'attach'
creates a new location on the search list.


location = environement here.


But calling this 'filename.R' is likely to lead
to trouble.  The '.R' suffix is usually used
for files containing R commands that can be
used by the 'source' function.

The usual convention for files created by 'save'
is to use a '.rda' suffix.


Or .RData or .Rdata for those not DOS-centric: see the examples in the 
help file.




Pat




On Mon, Jun 14, 2010 at 12:41 PM,  wrote:

Hi all
I saved the result of my code as a file, like

save(namefunction,file="adresse/filename.R").

  I want to open the filename. Could you please help me how I can open the
filename and see the result.

best
Khazaei



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] executable script

2010-06-14 Thread Cliff Clive

In Python, it is literally this easy:

import rpy2.robjects as robjects
robjects.r("""
source("C:/YOUR R FILE GOES HERE  ")
""")


Type the name of your R source code into this script and save it as a Python
script (add the suffix .py), and then you can run by double-clicking.  If
you want to see the results of your program, you'll have to have R print
them in a file.

Now, to get Python to run on your computer, you'll have to install the
following:

http://www.python.org/download/releases/2.6.5/ Python 2.6.5 
http://sourceforge.net/projects/numpy/files/ NumPy 1.4.1 
http://sourceforge.net/projects/rpy/files/rpy2/ Rpy2 2.0.8 

(Note that these are the versions I currently have on my computer, there may
be more recent ones out there.)

You will also have to add Python to your computer's Path.  To do this in
Windows:
1. Right-click on My Computer, select Properties
2. On the Advanced tab, click Environmental Variables
3. In the System Variables list, select Path and click Edit
4. In the Variable Value line, enter "C:\Python26\;" at the beginning of the
line (or "C:\Python30\" if you choose to install Python 3.0)

Now you can run Python scripts simply by double clicking them.  This makes
it very easy to turn R codes into an executable script; the only caveat is
that you need to have Python installed on any computers that you want to run
your script on.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/executable-script-tp839859p2254813.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] recursive function

2010-06-14 Thread Henrique Dallazuanna
Try this:

transform(x, DELTA = NULL, value = rev(c(5, 5 - cumsum(rev(DELTA[-1])


On Mon, Jun 14, 2010 at 12:29 PM, n.via...@libero.it wrote:

>
> Dear list,
> I have the following problem, what i'm trying to do is to built a function
> which does the following calculationg in a recursive way:
>
>
> I have a data frame more or less like this:
>
> variableyear DELTA
>
> EC01 2006/
> EC01 2007   10
> EC01 20085
> EC01 20099
>
>
> And then I have at time 2009  a variable called R_EC01(2009)=5
> What I have to do is to construct the R_EC01 time series by starting from
> the 2009 value:
> R_EC01(2008)=R_EC01(2009)-DELTA(2009)
> R_EC01(2007)=R_EC01(2008)-DELTA(2008)
> R_EC01(2006)=R_EC01(2007)-DELTA(2007)
>
>
> In terms of number, the results that i should get are:
> R_EC01(2008)=5-9=-4
>
> R_EC01(2007)=-4-5=-9
> R_EC01(2006)=-9-10=-19
> so my data frame should looks like this
> SERIESYEAR value
>
> R_EC01   2006  -19
>
> R_EC012007   -9
>
> R_EC012008   -4
>
> R_EC01 2009   5
> Anyone Knows hot to do it??
> My dataframe is not set as a time series...
>
>
> Thanks a lot!!!
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How we can open the results are saved

2010-06-14 Thread Patrick Burns

On 14/06/2010 17:50, jim holtman wrote:

load('adresse/filename.R')


Or:

attach('adresse/filename.R')

The difference between 'load' and 'attach'
is that 'load' puts the contents of the file
into your workspace (global environment, first
location on the search list), while 'attach'
creates a new location on the search list.

But calling this 'filename.R' is likely to lead
to trouble.  The '.R' suffix is usually used
for files containing R commands that can be
used by the 'source' function.

The usual convention for files created by 'save'
is to use a '.rda' suffix.

Pat




On Mon, Jun 14, 2010 at 12:41 PM,  wrote:

Hi all
I saved the result of my code as a file, like

save(namefunction,file="adresse/filename.R").

  I want to open the filename. Could you please help me how I can open the
filename and see the result.

best
Khazaei

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.







--
Patrick Burns
pbu...@pburns.seanet.com
http://www.burns-stat.com
(home of 'Some hints for the R beginner'
and 'The R Inferno')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] recursive function

2010-06-14 Thread K. Elo
Hi!

Do you mean something like this (df is your original data frame):

--- cut here ---

df1<-df
df1[[1]]<-paste("R",df[[1]],sep="_")
colnames(df1)<-c("SERIES","YEAR","value")
df1$value[ df1$YEAR==2009 ]<-5
for (i in c(2009:2007)) { df1$value[ df1$YEAR==(i-1) ]<-( df1$value[
df1$YEAR==i ]-df$DELTA[ df$year==i ] ) }

--- cut here ---

Now the output:

> df1
  SERIES YEAR value
1 R_EC01 2006   -19
2 R_EC01 2007-9
3 R_EC01 2008-4
4 R_EC01 2009 5

Please let me know if you were looking for a more general approach
suitable for larger data frames with e.g. several "variable" classes
(EC01, EC02 etc.)

Kind regards,
Kimmo


--
University of Turku, Finland
Dep. of Political Science and Contemporary history

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Prime Numbers Pkgs - Schoolmath is broken

2010-06-14 Thread David Smith
On Mon, Jun 14, 2010 at 7:16 AM, Red Roo  wrote:

> Looking for a recommended package that handles prime number computations.
>

The gmp package (http://crantastic.org/packages/gmp) has some good tools for
prime numbers. I've used the is.prime function before; it's stochastic (in
the sense that it gives the right answer with a high probability), but very
fast for large numbers. I'm pretty sure there are exact functions for
testing and generating primes too.

# David Smith

-- 
David M Smith 
VP of Marketing, Revolution Analytics  http://blog.revolutionanalytics.com
Tel: +1 (650) 330-0553 x205 (Palo Alto, CA, USA)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Overlay of barchart and xyplot

2010-06-14 Thread Chen, Huapeng FOR:EX
Thanks a lot!

Huapeng

-Original Message-
From: foolish.andr...@gmail.com [mailto:foolish.andr...@gmail.com] On Behalf Of 
Felix Andrews
Sent: Friday, June 11, 2010 8:23 PM
To: Chen, Huapeng FOR:EX
Cc: r-help@r-project.org
Subject: Re: [R] Overlay of barchart and xyplot

Hi,

I have an example below of adding a key to the merged plot.

You can not have the key on the right hand side because that viewport
is used by the second ylab (ylab2 from doubleYScale). Well, if you
really wanted to, you could do it with the grid package, using
frameGrob or somesuch.


NTLST_Dispersal_VAR_00_08$Year <-
factor(NTLST_Dispersal_VAR_00_08$Year, levels =
c("1999","2000","2001","2002","2003","2004","2005","2006","2007"),
ordered = TRUE)

dispersal<-
barchart(LDP_PER*100 + SPP_PER*100 + SPG_PER*100 ~
Year | District,
data=NTLST_Dispersal_VAR_00_08,
stack=TRUE,
layout=c(5,5),
scales = list(x = list(rot = 90)),
xlab="Year", ylab="%",
strip = strip.custom( bg="light gray"),
par.settings = simpleTheme(col = c("dark gray",
"light gray", "white")),
auto.key = list(points = FALSE, rectangles = TRUE)
)


vars <-
xyplot(sqrt(Infestation_NUM) + AI  ~  Year | District,
data=NTLST_Dispersal_VAR_00_08,
layout=c(5,5),
type="b",
ylab="Square roots of number of infested cells/Landscape
aggregation index",
auto.key = list(lines = TRUE)
)

dblplot <-
doubleYScale(dispersal, vars, use.style=FALSE, add.ylab2 = TRUE
)

dblplot <-
update(dblplot,
par.settings = simpleTheme(fill = c("white", "dark gray",
"black"), border="black",col.line="black",
   col.points="black",pch=c(16,17),lty=c(1,1,1,2,1))
)

## include second key at the bottom
update(dblplot, legend = list(bottom = vars$legend$top))

## Otherwise you could just include a "key" argument in the first plot
which includes all the items explicitly.

## Or merge the two 'auto.key's at the top:

mergeLegends <- function(a, b, ...) {
g <- frameGrob()
agrob <- a
if (!inherits(a, "grob")) {
a <- eval(as.call(c(as.symbol(a$fun), a$args)), getNamespace("lattice"))
}
if (!inherits(b, "grob")) {
b <- eval(as.call(c(as.symbol(b$fun), b$args)), getNamespace("lattice"))
}
g <- packGrob(g, a, side = "left")
packGrob(g, b, side = "right")
}

update(dblplot, legend = list(top = list(fun = "mergeLegends",
args = list(a = dispersal$legend$top, b = vars$legend$top





On 5 June 2010 04:49, Chen, Huapeng FOR:EX  wrote:
> Hi Felix,
>
> Thanks for your help and advice. The following code is close to what I want 
> but still have problems of failure to custom lines and add a key in any way. 
> Par.settings with the final plot seems not working somehow except pch and lty 
> but they overwrite par.setting with barchart. I also attached data I used by 
> using "dput". I appreciate your further helps.
>
> Thanks,
>
> Huapeng
>
>
> # code #
> NTLST_Dispersal_VAR_00_08$Year <- factor(NTLST_Dispersal_VAR_00_08$Year, 
> levels = c("1999","2000","2001","2002","2003","2004","2005","2006","2007"), 
> ordered = TRUE)
>
> dispersal<-barchart(NTLST_Dispersal_VAR_00_08$LDP_PER*100 +
>                     NTLST_Dispersal_VAR_00_08$SPP_PER*100 +
>                     NTLST_Dispersal_VAR_00_08$SPG_PER*100 ~
>                     NTLST_Dispersal_VAR_00_08$Year | 
> NTLST_Dispersal_VAR_00_08$District,
>                     data=NTLST_Dispersal_VAR_00_08,
>                     horizontal=FALSE,
>                     stack=TRUE,
>                     layout=c(5,5),
>                     xlab="Year",
>                     ylab="%",
>                     strip = strip.custom( bg="light gray"),
>                     par.settings = simpleTheme(col = c("dark gray", "light 
> gray", "white")),
>                     #key=list(space="right",size=10,
>                     # rectangles=list(size=1.7, border="black", col = 
> c("white", "dark gray", "black")),
>                      
> #lines=list(pch=c(16,17),lty=c(1,2),col="black",type="b"),
>                     # text=list(text=c("SPG","SPP","LDP")))
>
>                     #auto.key=TRUE
>                     )
>
>
>
> xyplot(sqrt(NTLST_Dispersal_VAR_00_08$Infestation_NUM) +
>             NTLST_Dispersal_VAR_00_08$AI  ~  NTLST_Dispersal_VAR_00_08$Year | 
> NTLST_Dispersal_VAR_00_08$District,
>             data=NTLST_Dispersal_VAR_00_08,
>             layout=c(5,5),
>             type="b",
>             ylab="Square roots of number of infested cells/Landscape 
> aggregation index",
>             #par.settings = simpleTheme(col = c("black", "black"), 
> pch=c(16,17)),
>             #key=list(space="right",size=10,
>                      #rectangles=list(siz

Re: [R] unqiue problem

2010-06-14 Thread David Winsemius


On Jun 14, 2010, at 1:10 PM, David Winsemius wrote:



On Jun 14, 2010, at 12:32 PM, Assa Yeroslaviz wrote:


I thought unique delete the whole line.
I don't really need the row names, but I thought of it as a way of  
getting

the unique items.

Is there a way of deleting whole lines completely according to their
identifiers?

What I really need are unique values on the first column.

Assa

On Mon, Jun 14, 2010 at 18:04, jim holtman   
wrote:



Your process does remove all the duplicate entries based on the
content of the two columns.  After you do this, there are still
duplicate entries in the first column that you are trying to use as
rownames and therefore the error.  Why to you want to use non-unique
entries as rownames?  Do you really need the row names, or should  
you

only be keeping unique values for the first column?

On Mon, Jun 14, 2010 at 8:54 AM, Assa Yeroslaviz  
 wrote:

Hello everybody,

I have a a matrix of 2 columns and over 27k rows.
some of the rows are double , so I tried to remove them with the  
command

unique():


Workbook5 <- read.delim(file =  "Workbook5.txt")
dim(Workbook5)

[1] 27748 2

Workbook5 <- unique(Workbook5)


Jim already showed you one way in another thread and it is probably  
more intuitive than this way, but just so you know...


Workbook5 <- Workbook5[ unique(Workbook5[ ,1] ) , ]

... should have worked. Logical indexing on first column with return  
of both columns of qualifying rows.


Actually I was thinking a bit askew although that would have  
succeeded. That was not logical indexing, which would have been done  
with duplicated() ... or rather its negation through the use of the  
"!" unary operator:


> str(unique(Workbook5[ ,1] ) )
 Factor w/ 17209 levels "A_51_P100034",..: 1 2 3 4 5 6 7 8 9 10 ...
> str(!duplicated(Workbook5[ ,1] ) )
 logi [1:20101] TRUE TRUE TRUE TRUE TRUE TRUE ...

So this would have been the way to do it with logical indexing:

Workbook5 <- Workbook5[ !duplicated(Workbook5[ ,1] ) , ]




--
David.

dim(Workbook5)

[1] 20101 2

it removed a lot of line, but unfortunately not all of them. I  
wanted to

add

the row names to the matrix and got this error message:

rownames(Workbook5) <- Workbook5[,1]
Error in `row.names<-.data.frame`(`*tmp*`, value = c(1L, 2L, 3L,  
4L, 5L,

:

duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘A_51_P102339’,
‘A_51_P102518’, ‘A_51_P103435’, ‘A_51_P103465’,
‘A_51_P103594’, ‘A_51_P104409’, ‘A_51_P104718’,
‘A_51_P105869’, ‘A_51_P106428’, ‘A_51_P106799’,
‘A_51_P107176’, ‘A_51_P107959’, ‘A_51_P108767’,
‘A_51_P109258’, ‘A_51_P109708’, ‘A_51_P110341’,
‘A_51_P111757’, ‘A_51_P112427’, ‘A_51_P112662’,
‘A_51_P113672’, ‘A_51_P115018’, ‘A_51_P116496’,
‘A_51_P116636’, ‘A_51_P117666’, ‘A_51_P118132’,
‘A_51_P118168’, ‘A_51_P118400’, ‘A_51_P118506’,
‘A_51_P119315’, ‘A_51_P120093’, ‘A_51_P120305’,
‘A_51_P120738’, ‘A_51_P120785’, ‘A_51_P121134’,
‘A_51_P121359’, ‘A_51_P121412’, ‘A_51_P121652’,
‘A_51_P121724’, ‘A_51_P121829’, ‘A_51_P122141’,
‘A_51_P122964’, ‘A_51_P123422’, ‘A_51_P123895’,
‘A_51_P124008’, ‘A_51_P124719’, ‘A_51_P125648’,
‚ÄòA_51_P125679‚Äô, ‚ÄòA_51_P125779‚ [... truncated]

Is there a better way to discard the duplicataions in the text file

(Excel

file is the origin).


R.version

_
platform   x86_64-apple-darwin9.8.0
arch   x86_64
os darwin9.8.0
system x86_64, darwin9.8.0
status Patched
major  2
minor  11.1
year   2010
month  06
day03
svn rev52201
language   R
version.string R version 2.11.1 Patched (2010-06-03 r52201)

THX

Assa

__
R-help@r-project.org mailing list


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unqiue problem

2010-06-14 Thread David Winsemius


On Jun 14, 2010, at 12:32 PM, Assa Yeroslaviz wrote:


I thought unique delete the whole line.
I don't really need the row names, but I thought of it as a way of  
getting

the unique items.

Is there a way of deleting whole lines completely according to their
identifiers?

What I really need are unique values on the first column.

Assa

On Mon, Jun 14, 2010 at 18:04, jim holtman  wrote:


Your process does remove all the duplicate entries based on the
content of the two columns.  After you do this, there are still
duplicate entries in the first column that you are trying to use as
rownames and therefore the error.  Why to you want to use non-unique
entries as rownames?  Do you really need the row names, or should you
only be keeping unique values for the first column?

On Mon, Jun 14, 2010 at 8:54 AM, Assa Yeroslaviz   
wrote:

Hello everybody,

I have a a matrix of 2 columns and over 27k rows.
some of the rows are double , so I tried to remove them with the  
command

unique():


Workbook5 <- read.delim(file =  "Workbook5.txt")
dim(Workbook5)

[1] 27748 2

Workbook5 <- unique(Workbook5)


Jim already showed you one way in another thread and it is probably  
more intuitive than this way, but just so you know...


 Workbook5 <- Workbook5[ unique(Workbook5[ ,1] ) , ]

... should have worked. Logical indexing on first column with return  
of both columns of qualifying rows.


--
David.

dim(Workbook5)

[1] 20101 2

it removed a lot of line, but unfortunately not all of them. I  
wanted to

add

the row names to the matrix and got this error message:

rownames(Workbook5) <- Workbook5[,1]
Error in `row.names<-.data.frame`(`*tmp*`, value = c(1L, 2L, 3L,  
4L, 5L,

:

duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘A_51_P102339’,
‘A_51_P102518’, ‘A_51_P103435’, ‘A_51_P103465’,
‘A_51_P103594’, ‘A_51_P104409’, ‘A_51_P104718’,
‘A_51_P105869’, ‘A_51_P106428’, ‘A_51_P106799’,
‘A_51_P107176’, ‘A_51_P107959’, ‘A_51_P108767’,
‘A_51_P109258’, ‘A_51_P109708’, ‘A_51_P110341’,
‘A_51_P111757’, ‘A_51_P112427’, ‘A_51_P112662’,
‘A_51_P113672’, ‘A_51_P115018’, ‘A_51_P116496’,
‘A_51_P116636’, ‘A_51_P117666’, ‘A_51_P118132’,
‘A_51_P118168’, ‘A_51_P118400’, ‘A_51_P118506’,
‘A_51_P119315’, ‘A_51_P120093’, ‘A_51_P120305’,
‘A_51_P120738’, ‘A_51_P120785’, ‘A_51_P121134’,
‘A_51_P121359’, ‘A_51_P121412’, ‘A_51_P121652’,
‘A_51_P121724’, ‘A_51_P121829’, ‘A_51_P122141’,
‘A_51_P122964’, ‘A_51_P123422’, ‘A_51_P123895’,
‘A_51_P124008’, ‘A_51_P124719’, ‘A_51_P125648’,
‚ÄòA_51_P125679‚Äô, ‚ÄòA_51_P125779‚ [... truncated]

Is there a better way to discard the duplicataions in the text file

(Excel

file is the origin).


R.version

 _
platform   x86_64-apple-darwin9.8.0
arch   x86_64
os darwin9.8.0
system x86_64, darwin9.8.0
status Patched
major  2
minor  11.1
year   2010
month  06
day03
svn rev52201
language   R
version.string R version 2.11.1 Patched (2010-06-03 r52201)

THX

Assa

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.






--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How we can open the results are saved

2010-06-14 Thread Bert Gunter
No.

Binary "workspace" data are saved by default with the .Rdata extension and
are "opened" (actually have their contents added to the current workspace)
by load().

.R text files and would need to be sourced:

source('adresse/filename.R')

Bert Gunter
Genentech Nonclinical Biostatistics
 
 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of jim holtman
Sent: Monday, June 14, 2010 9:50 AM
To: khaz...@ceremade.dauphine.fr
Cc: r-help@r-project.org
Subject: Re: [R] How we can open the results are saved

load('adresse/filename.R')


On Mon, Jun 14, 2010 at 12:41 PM,   wrote:
> Hi all
> I saved the result of my code as a file, like
>> save(namefunction,file="adresse/filename.R").
>  I want to open the filename. Could you please help me how I can open the
> filename and see the result.
>
> best
> Khazaei
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How we can open the results are saved

2010-06-14 Thread jim holtman
load('adresse/filename.R')


On Mon, Jun 14, 2010 at 12:41 PM,   wrote:
> Hi all
> I saved the result of my code as a file, like
>> save(namefunction,file="adresse/filename.R").
>  I want to open the filename. Could you please help me how I can open the
> filename and see the result.
>
> best
> Khazaei
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How we can open the results are saved

2010-06-14 Thread khazaei
Hi all
I saved the result of my code as a file, like
> save(namefunction,file="adresse/filename.R").
 I want to open the filename. Could you please help me how I can open the
filename and see the result.

best
Khazaei

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Count of unique factors within another factor

2010-06-14 Thread Henrique Dallazuanna
Another possibility:

 rowSums(table(x) > 0)


On Sun, Jun 13, 2010 at 3:08 PM, Erik Iverson  wrote:

> I think ?tapply will help here.  But *please* read the posting guide and
> provide minimal, reproducible examples!
>
>
> Birdnerd wrote:
>
>> I have a data frame with two factors (sampling 'unit', 'species'). I want
>> to
>> calculate the number of unique 'species' per 'unit.' I can calculate the
>> number of unique values for each variable separately, but can't get a
>> count
>> for each ‘unit’.
>>
>>  data=read.csv("C:/Desktop/sr_sort_practice.csv")
>>> attach(data)
>>>
>>
>>  data[1:10,]
>>>
>>   unit species
>> 1   123ACMA
>> 2   123LIDE
>> 3   123LIDE
>> 4   123SESE
>> 5   123SESE
>> 6   123SESE
>> 7   345HEAR
>> 8   345LOHI
>> 9   345QUAG
>> 10  345TODI…..
>>
>>  sr.unique<- lapply (data, unique)
>>>
>> $unit
>> [1] 123 345 216
>> $species
>>  [1] ACMA  LIDE  SESE  HEAR  LOHI  QUAG  TODI  UMCA  ARSP  LIDE
>>
>>  sapply (sr.unique,length)
>>>
>>unit species   3  10
>>
>> Then, I get stuck here because this unique species count is not given for
>> each ‘unit’.
>> What I'd like to get is:
>>
>> unit species
>> 1233
>> 3454
>> 216--
>>
>> Thanks--
>>
>>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merging data frames

2010-06-14 Thread jim holtman
If you want to keep only the rows that are unique in the first column
then do the following:

workComb1 <- subset(workComb, !duplicated(ProbeID))

On Mon, Jun 14, 2010 at 11:20 AM, Assa Yeroslaviz  wrote:
> well, the problem is basically elsewhere. I have a data frame with
> expression data and doubled IDs in the first column (see example)
> when I want to put them into row names I get the message, that there are
> non-unique items in the data.
> So I tried with unique to delete such rows. The problem is unique doesn't
> delete all of them.
>
> I compare two data frames with their Probe IDs.
> I would like to delete all double lines with a certain probe ID independent
> from the rest of the line, as to say I would like a data frame with single
> unique idetifiers in the Probe Id column.
> merge doesn't give me that. It doesn't delete all similar line, if the line
> are not identical in the other columns it leaves them in the table.
>
> Is there a way of deleting whole the line with double Probe IDs?
>
>> workbook <- read.delim(file = "workbook1.txt", quote = "", sep = "\t")
>> GeneID <- read.delim(file = "testTable.txt", quote = "", sep = "\t")
>> workComb <- merge(workbook, GeneID, by.x = "ProbeID", by.y = "Probe.Id")
>> workComb1 <- unique(workComb)
>> write.table(workComb, file = "workComb.txt" , sep = "\t", quote = FALSE,
>> row.names = FALSE)
>> write.table(workComb1, file = "workComb1.txt" , sep = "\t", quote = FALSE,
>> row.names = FALSE)
>
> look at lines 49 and 50 in the file workComb1.txt after using unique on the
> file. The line are identical  with the exception of the Transcript ID. I
> would like to take one of them out of the table.
>
> THX,
>
> Assa
>
> On Mon, Jun 14, 2010 at 15:33, jim holtman  wrote:
>>
>> Put the rownames as another column in your dataframe so that it
>> remains with the data.  After merging, you can then use it as the
>> "rownames"
>>
>> On Mon, Jun 14, 2010 at 9:25 AM, Assa Yeroslaviz  wrote:
>> > Hi,
>> >
>> > is it possible to merge two data frames while preserving the row names
>> > of
>> > the bigger data frame?
>> >
>> > I have two data frames which  i would like to combine. While doing so I
>> > always loose the row names. When I try to append this, I get the error
>> > message, that I have non-unique names. This although I used unique
>> > command
>> > on the data frame where the double inputs supposedly are
>> >
>> > thanks for the help
>> >
>> > Assa
>> >
>> >        [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem that you are trying to solve?
>
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unqiue problem

2010-06-14 Thread Assa Yeroslaviz
I thought unique delete the whole line.
I don't really need the row names, but I thought of it as a way of getting
the unique items.

Is there a way of deleting whole lines completely according to their
identifiers?

What I really need are unique values on the first column.

Assa

On Mon, Jun 14, 2010 at 18:04, jim holtman  wrote:

> Your process does remove all the duplicate entries based on the
> content of the two columns.  After you do this, there are still
> duplicate entries in the first column that you are trying to use as
> rownames and therefore the error.  Why to you want to use non-unique
> entries as rownames?  Do you really need the row names, or should you
> only be keeping unique values for the first column?
>
> On Mon, Jun 14, 2010 at 8:54 AM, Assa Yeroslaviz  wrote:
> > Hello everybody,
> >
> > I have a a matrix of 2 columns and over 27k rows.
> > some of the rows are double , so I tried to remove them with the command
> > unique():
> >
> >> Workbook5 <- read.delim(file =  "Workbook5.txt")
> >> dim(Workbook5)
> > [1] 27748 2
> >> Workbook5 <- unique(Workbook5)
> >> dim(Workbook5)
> > [1] 20101 2
> >
> > it removed a lot of line, but unfortunately not all of them. I wanted to
> add
> > the row names to the matrix and got this error message:
> >> rownames(Workbook5) <- Workbook5[,1]
> > Error in `row.names<-.data.frame`(`*tmp*`, value = c(1L, 2L, 3L, 4L, 5L,
>  :
> >  duplicate 'row.names' are not allowed
> > In addition: Warning message:
> > non-unique values when setting 'row.names': ‘A_51_P102339’,
> > ‘A_51_P102518’, ‘A_51_P103435’, 
> > ‘A_51_P103465’,
> > ‘A_51_P103594’, ‘A_51_P104409’, 
> > ‘A_51_P104718’,
> > ‘A_51_P105869’, ‘A_51_P106428’, 
> > ‘A_51_P106799’,
> > ‘A_51_P107176’, ‘A_51_P107959’, 
> > ‘A_51_P108767’,
> > ‘A_51_P109258’, ‘A_51_P109708’, 
> > ‘A_51_P110341’,
> > ‘A_51_P111757’, ‘A_51_P112427’, 
> > ‘A_51_P112662’,
> > ‘A_51_P113672’, ‘A_51_P115018’, 
> > ‘A_51_P116496’,
> > ‘A_51_P116636’, ‘A_51_P117666’, 
> > ‘A_51_P118132’,
> > ‘A_51_P118168’, ‘A_51_P118400’, 
> > ‘A_51_P118506’,
> > ‘A_51_P119315’, ‘A_51_P120093’, 
> > ‘A_51_P120305’,
> > ‘A_51_P120738’, ‘A_51_P120785’, 
> > ‘A_51_P121134’,
> > ‘A_51_P121359’, ‘A_51_P121412’, 
> > ‘A_51_P121652’,
> > ‘A_51_P121724’, ‘A_51_P121829’, 
> > ‘A_51_P122141’,
> > ‘A_51_P122964’, ‘A_51_P123422’, 
> > ‘A_51_P123895’,
> > ‘A_51_P124008’, ‘A_51_P124719’, 
> > ‘A_51_P125648’,
> > ‚ÄòA_51_P125679‚Äô, ‚ÄòA_51_P125779‚ [... truncated]
> >
> > Is there a better way to discard the duplicataions in the text file
> (Excel
> > file is the origin).
> >
> >> R.version
> >   _
> > platform   x86_64-apple-darwin9.8.0
> > arch   x86_64
> > os darwin9.8.0
> > system x86_64, darwin9.8.0
> > status Patched
> > major  2
> > minor  11.1
> > year   2010
> > month  06
> > day03
> > svn rev52201
> > language   R
> > version.string R version 2.11.1 Patched (2010-06-03 r52201)
> >
> > THX
> >
> > Assa
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] script development for Unconditional Density and Probability estimation

2010-06-14 Thread jim holtman
First start by putting it in a function so you can specify the
parameters you want to change.

On Mon, Jun 14, 2010 at 11:54 AM,   wrote:
>
> Hello,
>
> I'd like to automate this script a bit more and cycle several
> parameters(both the species and the metric).  For example where AnnualDepth
> occurs, I need to process about 12 metrics so instead of writing this
> entire script 12 times once for each metric  I'd like to be able to
> automatically get another metric.
>
> Any suggestion will be greatly appreciated.
>
> Currently running Windows XP, R 2.11.1
>
> ###
>
> Marsh <- cbind(SoilVegHydro, vegcode)
> AnnualDepth <- Marsh[,'meanAnnualDepthAve']
>
> cattail_0 <- Marsh[,'cattail'] == '0'  # no           need to run for 8
> species, automate if possible
>  cattail_1 <- Marsh[,'cattail'] == '1' # yes         need to run for 8
> species
>
> spbase.rate.d1 <- sum(cattail_1)/(sum(cattail_1) + sum(cattail_0) )
> annualDepth.density <- density(AnnualDepth)    # this line needs either
> interactively defined or automatically cycle thru a number of metrics
>
> cattail.d0.density <- density(AnnualDepth[cattail_0])
> cattail.d1.density <- density(AnnualDepth[cattail_1])
>
> approxfun (cattail.d0.density$x, cattail.d0.density$y) -> cattail.d0.f
> approxfun (cattail.d1.density$x, cattail.d1.density$y) -> cattail.d1.f
>
> p.d.given.AnnualDepth <- function(AnnualDepth, spbase.rate.d1)
>     {
>        p1 <- cattail.d1.f(AnnualDepth) * spbase.rate.d1
>        p0 <- cattail.d0.f(AnnualDepth) * (1 - spbase.rate.d1)
>       p1/(cattail_0+cattail_1)
>    }
>
> x <- 1:1292
> y <- p.d.given.AnnualDepth(x, spbase.rate.d1)
> plot (x, y, type='l', col='red', xlab='Mean Annual Depth',
> main=c("Cattail"), ylab='estimated\nProbability(cattail|AnnualDepth)')
>
> plot (cattail.d0.density, col ='red', lty= 1, main = "")
>  lines(cattail.d1.density, col = 'blue', lty=1)
>  lines(annualDepth.density , col='black', lty = 1)
>  legend(2000, 0.0023, c("No Cattail", "Cattail", "Mean Annual Depth"),
>      col=c("red", "blue", "black"),lty=c(1))
>
>
>
>
> #
>
> Steve Friedman Ph. D.
> Spatial Statistical Analyst
> Everglades and Dry Tortugas National Park
> 950 N Krome Ave (3rd Floor)
> Homestead, Florida 33034
>
> steve_fried...@nps.gov
> Office (305) 224 - 4282
> Fax     (305) 224 - 4147
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] xtable with Sweave

2010-06-14 Thread James W. MacDonald

Hi Silvano,

Silvano wrote:

Hi,

I'm using Sweave to prepare a descriptive report.
Are at least 20 tables built with xtable command of kind:

<>=
q5 = factor(Q5, label=c("Não", "Sim"))
(q5.tab = cbind(table(q5)))
@

<>=
xtable(q5.tab, align="l|c", caption.placement = "top", table.placement='H')
@

I'm getting the following message:

Too many unprocessed floats

in Latex file.

How to avoid these messages appearing?


Not really an R-help question, and easily answered using our friend google.

Anyway, you need to cause the floats to be processed before you have 
'too many'. This is done by adding a \clearpage to the LaTeX portion of 
your document every so often.


Best,

Jim




--
Silvano Cesar da Costa
Departamento de Estatística
Universidade Estadual de Londrina
Fone: 3371-4346

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Count of unique factors within another factor

2010-06-14 Thread Birdnerd

Hi Dennis,

Thanks for this suggestion (which I got to run!), as this code makes
intuitive sense, whereas not all the other suggestions were that
straightforward. I'm relatively new to programming in R and am very
appreciative that you and others take time to help out where you can.

Sincerely,
Sarah



On Sun, Jun 13, 2010 at 8:47 PM, Dennis Murphy [via R] <
ml-node+2253845-1393472685-291...@n4.nabble.com
> wrote:

> Hi:
>
> Another possibility:
>
> as.data.frame(with(data[!duplicated(data), ], table(unit))
>   unit Freq
> 1  1233
> 2  3454
>
> HTH,
> Dennis
>
> On Sun, Jun 13, 2010 at 9:07 AM, Birdnerd <[hidden 
> email]>
> wrote:
>
> >
> > I have a data frame with two factors (sampling 'unit', 'species'). I want
>
> > to
> > calculate the number of unique 'species' per 'unit.' I can calculate the
> > number of unique values for each variable separately, but can't get a
> count
> > for each ‘unit’.
> >
> > > data=read.csv("C:/Desktop/sr_sort_practice.csv")
> > > attach(data)
> >
> > > data[1:10,]
> >   unit species
> > 1   123ACMA
> > 2   123LIDE
> > 3   123LIDE
> > 4   123SESE
> > 5   123SESE
> > 6   123SESE
> > 7   345HEAR
> > 8   345LOHI
> > 9   345QUAG
> > 10  345TODI…..
> >
> > > sr.unique<- lapply (data, unique)
> > $unit
> > [1] 123 345 216
> > $species
> >  [1] ACMA  LIDE  SESE  HEAR  LOHI  QUAG  TODI  UMCA  ARSP  LIDE
> >
> > > sapply (sr.unique,length)
> >unit species
> >  3  10
> >
> > Then, I get stuck here because this unique species count is not given for
>
> > each ‘unit’.
> > What I'd like to get is:
> >
> > unit species
> > 1233
> > 3454
> > 216--
> >
> > Thanks--
> >
> > --
> > View this message in context:
> >
> http://r.789695.n4.nabble.com/Count-of-unique-factors-within-another-factor-tp2253545p2253545.html
> > Sent from the R help mailing list archive at Nabble.com.
> >
> > __
> > [hidden email] 
> > mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> [[alternative HTML version deleted]]
>
>
> __
> [hidden email] mailing 
> list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> --
>  View message @
> http://r.789695.n4.nabble.com/Count-of-unique-factors-within-another-factor-tp2253545p2253845.html
> To unsubscribe from Count of unique factors within another factor, click
> here< (link removed) >.
>
>
>


-- 
Sarah E. Haas
haaszool...@gmail.com

Center for Applied Geographic Information Science (CAGIS)
Department of Geography and Earth Sciences
University of North Carolina at Charlotte
9201 University City Blvd.
Charlotte, NC 28223, USA
http://www.gis.uncc.edu/

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Count-of-unique-factors-within-another-factor-tp2253545p2254591.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] script development for Unconditional Density and Probability estimation

2010-06-14 Thread Steve_Friedman

Hello,

I'd like to automate this script a bit more and cycle several
parameters(both the species and the metric).  For example where AnnualDepth
occurs, I need to process about 12 metrics so instead of writing this
entire script 12 times once for each metric  I'd like to be able to
automatically get another metric.

Any suggestion will be greatly appreciated.

Currently running Windows XP, R 2.11.1

###

Marsh <- cbind(SoilVegHydro, vegcode)
AnnualDepth <- Marsh[,'meanAnnualDepthAve']

cattail_0 <- Marsh[,'cattail'] == '0'  # no   need to run for 8
species, automate if possible
 cattail_1 <- Marsh[,'cattail'] == '1' # yes need to run for 8
species

spbase.rate.d1 <- sum(cattail_1)/(sum(cattail_1) + sum(cattail_0) )
annualDepth.density <- density(AnnualDepth)# this line needs either
interactively defined or automatically cycle thru a number of metrics

cattail.d0.density <- density(AnnualDepth[cattail_0])
cattail.d1.density <- density(AnnualDepth[cattail_1])

approxfun (cattail.d0.density$x, cattail.d0.density$y) -> cattail.d0.f
approxfun (cattail.d1.density$x, cattail.d1.density$y) -> cattail.d1.f

p.d.given.AnnualDepth <- function(AnnualDepth, spbase.rate.d1)
 {
p1 <- cattail.d1.f(AnnualDepth) * spbase.rate.d1
p0 <- cattail.d0.f(AnnualDepth) * (1 - spbase.rate.d1)
   p1/(cattail_0+cattail_1)
}

x <- 1:1292
y <- p.d.given.AnnualDepth(x, spbase.rate.d1)
plot (x, y, type='l', col='red', xlab='Mean Annual Depth',
main=c("Cattail"), ylab='estimated\nProbability(cattail|AnnualDepth)')

plot (cattail.d0.density, col ='red', lty= 1, main = "")
  lines(cattail.d1.density, col = 'blue', lty=1)
  lines(annualDepth.density , col='black', lty = 1)
  legend(2000, 0.0023, c("No Cattail", "Cattail", "Mean Annual Depth"),
  col=c("red", "blue", "black"),lty=c(1))




#

Steve Friedman Ph. D.
Spatial Statistical Analyst
Everglades and Dry Tortugas National Park
950 N Krome Ave (3rd Floor)
Homestead, Florida 33034

steve_fried...@nps.gov
Office (305) 224 - 4282
Fax (305) 224 - 4147

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] xtable with Sweave

2010-06-14 Thread Marc Schwartz
On Jun 14, 2010, at 8:09 AM, Silvano wrote:

> Hi,
> 
> I'm using Sweave to prepare a descriptive report.
> Are at least 20 tables built with xtable command of kind:
> 
> <>=
> q5 = factor(Q5, label=c("Não", "Sim"))
> (q5.tab = cbind(table(q5)))
> @
> 
> <>=
> xtable(q5.tab, align="l|c", caption.placement = "top", table.placement='H')
> @
> 
> I'm getting the following message:
> 
> Too many unprocessed floats
> 
> in Latex file.
> 
> How to avoid these messages appearing?


Hi,

That is an error message from 'latex' indicating that you may have too many 
float tables without sufficient separation (eg. new pages, text in between, 
etc.) and/or conflicts in table placement.

You might have a look at the relevant TeX FAQ here:

  http://www.tex.ac.uk/cgi-bin/texfaq2html?label=tmupfl

Also, while I don't use xtable(), I do believe that you have to call it using 
the print method directly to specify non-default arguments:

  print(xtable(q5.tab, align = "l|c"), caption.placement = "top", 
table.placement = 'H')

See the help pages for ?xtable and ?print.table, including the last examples in 
the former.

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Html help

2010-06-14 Thread Duncan Murdoch

Murray Jorgensen wrote:

I have just installed R 2.11.1 on my XP laptop.

I like html help for browsing but text help for on-the-fly look-ups. I 
was a bit surprised when I was asked to choose between them during the 
installation. I chose text, thinking I could fix the html help later, 
which is what I am trying to do now.


Now when I ask for html help my browser goes to

'http://-ip-number-/doc/html/index.html'

instead of where I want on my computer:

C:\apps\R\R-2.11.1\doc\html\index.html

Now I can go where I want manually but then the package list on


C:\apps\R\R-2.11.1\doc\html\packages.html

does not include all the packages that I have installed and linked. I 
don't want to read my html help from the web because sometimes I am 
off-line or on a slow connection.


How do I go about getting a local set of html help files?

  


Since 2.10.0, HTML help is generated on demand.  It doesn't go off your 
local computer, it works locally.  This saves a bit of space (the HTML 
is generated from the same source as the text is generated from), but 
the main point is that it allows help pages to contain dynamic content.  
For example, Romain Francois posted some demo code a while ago to allow 
the display of graphics generated by R within help pages.  
(Unfortunately it depended on a particular browser feature not supported 
by Internet Explorer, so I'm going to need to put together something 
less elegant, but that's life.)


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Revolutions Blog: May Roundup

2010-06-14 Thread David Smith
I write about R every weekday at the Revolutions blog:
 http://blog.revolutionanalytics.com
and every month I post a summary of articles from the previous month
of particular interest to readers of r-help.

http://bit.ly/dn7DgR linked to 13 videos for learning R, from the
basics ("What is R?") to more advanced topics.

http://bit.ly/cJUhiY noted the release of R 2.11.1.

http://bit.ly/d53tvn announced that Revolution Analytics makes its
software available free of charge to the academic community.

http://bit.ly/9xCQ83 noted how statistical inference was used to
accurately estimate German tank forces in WWII, and linked to an R
simulation to verify the calculation.

http://bit.ly/b5puKD announced a new website for the R community,
www.inside-R.org, sponsored by Revolution Analytics.

http://bit.ly/9r85bd linked to a video of economist JD Long explaining
why he uses R.

http://bit.ly/cySMgE linked to Jeromy Anglim's explanation of the
abbreviated names of 150 R functions.

http://bit.ly/cEceU8 announced a webinar I gave, Introduction to
Revolution R. The live event has passed now, but you can download
slides and watch a replay at this link.

http://bit.ly/brs2s2 recapped some recent news articles mentioning R
and Revolution, in Forbes, The Register, PC World and elsewhere.

http://bit.ly/9UDgOL linked to an analysis in R on predicting the
outcome of a series of baseball games.

http://bit.ly/bJYW9v linked to some materials from the CloudAsia
conference on parallel computing in R for life sciences.

http://bit.ly/dfb4PA provided a tip on keeping the console window
active in the R Productivity Environment GUI.

http://bit.ly/aeHO7B announced a webinar on integrating R-based graphs
and computations with business intelligence dashboards. (The live
event has passed, but you can download slides and a replay at this
link.)

http://bit.ly/dqr1hc linked to another method of mapping your Twitter
social network with R.

http://bit.ly/bVI6e9 linked to a video by JD Long on using the Amazon
Elastic Map-Reduce system to run large-scale map-reduce calculations
in the cloud with R.

http://bit.ly/auZ7N8 announced Revolution Analytic's development
roadmap for 2010.

There are new R user groups in Boston (http://bit.ly/cHedm0) and
Atlanta (http://bit.ly/aVo1cI). Also, http://bit.ly/a82GAf noted the
list of local R User Groups worldwide at MeetUp.com, and how you can
request a new group in your area.

Other non-R-related stories in the past month included Google's new
BigQuery and Prediction API tools (http://bit.ly/bfEeLm), the effects
of volcanic ash on a modern city (http://bit.ly/9qWfQf) and (on a
lighter note) the dating equation (http://bit.ly/9LR28N) and a neat,
practical optical illusion (http://bit.ly/dsmyov).

The R Community Calendar has also been updated at:
http://blog.revolutionanalytics.com/calendar.html

If you're looking for more articles about R, you can find summaries
from previous months at http://blog.revolutionanalytics.com/roundups/.
Join the REvolution mailing list at
http://revolutionanalytics.com/newsletter to be alerted to new
articles on a monthly basis.

As always, thanks for the comments and please keep sending suggestions
to me at da...@revolutionanalytics.com . Don't forget you can also
follow the blog using an RSS reader like Google Reader, or by
following me on Twitter (I'm @revodavid).

Cheers,
# David

--
David M Smith 
VP of Marketing, Revolution Analytics  http://blog.revolutionanalytics.com
Tel: +1 (650) 330-0553 x205 (Palo Alto, CA, USA)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] points marking

2010-06-14 Thread Greg Snow
I don't think that I would use a barplot as the base, but rather just set up 
the graph and add the lines where I wanted them.  I still don't understand what 
you want your graph to look like, or what question you are trying to answer 
with it (part may be a language barrier).  If you can give us a better example 
of what you are trying to accomplish, or a better description of what your data 
is like and what you are trying to get from the graph, we will have a better 
chance of being able to help you.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111

From: khush  [mailto:bioinfo.kh...@gmail.com]
Sent: Saturday, June 12, 2010 5:38 AM
To: Greg Snow
Cc: r-help@r-project.org; Petr PIKAL
Subject: Re: [R] points marking

Hi,

Well Thanks for letting me know that pch is of no use with segments petr. I am 
using lend as it suits to me more as gregory suggested , but I am not getting 
imite???  think I try to fix it with some other method also, as I have to deal 
more with the symbols in this case, But I want to the know one thing from you 
guys that the way I am using the code is good enough to start, as I am not much 
familiar with this suff or its dirty way to handle such task. please let me 
know.

Thanks gregory and petr.

Thank you
Jeet


On Fri, Jun 11, 2010 at 9:07 PM, Greg Snow 
mailto:greg.s...@imail.org>> wrote:
Those graphs look like chromosome maps, if so, you may want to look into the 
bioconductor project, they may have some prewritten functions to do this.  If 
not, the lend argument (see ?par) may be something to look at.  If you really 
want points and segments you will need to plot the points with the points 
function and the segments separately.  Segments can take vectors, so you don't 
need to separate things into multiple calls.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111

From: khush  
[mailto:bioinfo.kh...@gmail.com]
Sent: Friday, June 11, 2010 12:00 AM
To: Greg Snow
Cc: r-help@r-project.org
Subject: Re: [R] points marking

Dear Gregory ,

Thnaks for your reply and help. I am explaining you my problems again, below  
is my script for the same .

Dom <-c (195,568,559)

fkbp <- barplot (Dom, col="black", xlab="", border = NA, space = 7, 
xlim=c(0,650), ylim =c(0, 87), las = 2, horiz = TRUE)

axis (1, at = seq(0,600,10), las =2)

1. ==Segments 1=

segments(164,7.8,192,7.8, col = "green", pch=23, cex="9", lty="solid", lwd=20)
segments(45,15.8,138,15.8, col = "green", pch=23, cex="9", lty="solid", lwd=20)
segments(160,15.8,255,15.8, col = "green", pch=23, cex="9", lty="solid", lwd=20)
segments(277,15.8,378,15.8, col = "green", pch=23, cex="9", lty="solid", lwd=20)
segments(51,23.8,145,23.8, col = "green", pch=23, cex="9", lty="solid", lwd=20)
segments(167,23.8,262,23.8, col = "green", pch=23, cex="9", lty="solid", lwd=20)
segments(284,23.8,381,23.8, col = "green", pch=23, cex="9", lty="solid", lwd=20)

2. ==Segments 2 ==
segments(399,15.8,432,15.8, col = "blue", pch=21, cex="9", lty="solid", lwd=20)
segments(448,15.8,475,15.8, col = "blue", pch=21, cex="9", lty="solid", lwd=20)
segments(486,15.8,515,15.8, col = "blue", pch=21, cex="9", lty="solid", lwd=20)
segments(401,23.8,434,23.8, col = "blue", pch=21, cex="9", lty="solid", lwd=20)
segments(450,23.8,475,23.8, col = "blue", pch=21, cex="9", lty="solid", lwd=20)
segments(486,23.8,517,23.8, col = "blue", pch=21, cex="9", lty="solid", lwd=20)

I solved one part of my query i.e to mark points from one positions to other is 
ok and I found that its working fine but I have another issue now, as I am 
using using two segments data 1 and 2 , although I want to draw different 
shapes for segmants 2 as I am giving pch=21, but I it seems to give a solid 
line for both. I want to draw different shapes for every chunk of segments i.e 
is the whole point.

I want to make script which can generate such figures, below is link to one of 
the tool.
http://www.expasy.ch/tools/mydomains/

Thank you

Jeet
On Thu, Jun 10, 2010 at 11:10 PM, Greg Snow 
mailto:greg.s...@imail.org>> wrote:
Your question is not really clear, do either of these examples do what you want?

 with(anscombe, plot(x1, y2, ylim=range(y2,y3)) )
 with(anscombe, points(x1, y3, col='blue', pch=2) )
 with(anscombe, segments(x1, y2, x1, y3, col=ifelse( y2>y3, 'green','red') ) )


 with(anscombe, plot(x1, y2, ylim=range(y2,y3), type='n') )
 with(anscombe[order(anscombe$x1),], polygon( c( x1,rev(x1) ), c(y2, rev(y3)), 
col='grey' ) )



--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org

Re: [R] logistic regression with 50 varaibales

2010-06-14 Thread Charles C. Berry

On Mon, 14 Jun 2010, Joris Meys wrote:


Hi,

Marcs explanation is valid to a certain extent, but I don't agree with
his conclusion. I'd like to point out "the curse of
dimensionality"(Hughes effect) which starts to play rather quickly.



Ahem!

... minimal, self-contained, reproducible code ...

The OPs situation may not be an impossible one:


set.seed(54321)
dat <- as.data.frame(matrix(rnorm(1800*50),nc=50))
dat$y <- rbinom(1800,1,plogis(rowSums(dat)/7))
fit <- glm(y~., dat, family=binomial)
1/7 # the true coef

[1] 0.1428571

sd(coef(fit)) # roughly, the common standard error

[1] 0.05605597

colMeans(coef(summary(fit))) # what glm() got

  Estimate Std. Errorz value   Pr(>|z|)
0.14590836 0.05380666 2.71067328 0.06354820

# a trickier case:
set.seed(54321)
dat <- as.data.frame(matrix(rnorm(1800*50),nc=50))
dat$y <- rbinom(1800,1,plogis(rowSums(dat))) # try again with coef==1
fit <- glm(y~., dat, family=binomial)
colMeans(coef(summary(fit)))

   Estimate  Std. Error z valuePr(>|z|)
0.982944012 0.119063631 8.222138491 0.008458002




Finer examination of the latter fit will show some values that differ too 
far from 1.0 to agree with the asymptotic std err.



sd(coef(fit)) # rather bigger than 0.119

[1] 0.1827462

range(coef(fit))

[1] -0.08128586  1.25797057

And near separability may be playing here:


cbind(

+ table(
+ cut(
+ plogis(abs(predict(fit))),
+ c( 0, 0.9, 0.99, 0.999, 0., 0.9, 1 ) ) ) )
 [,1]
(0,0.9]   453
(0.9,0.99]427
(0.99,0.999]  313
(0.999,0.]251
(0.,0.9]  173
(0.9,1]   183




Recall that the observed information contains a factor of

plogis( predict(fit) )* plogis( -predict(fit))


hist(plogis( predict(fit) )* plogis( -predict(fit)))


So the effective sample size here was much reduced.

But to the OP's question, whether what you get is reasonable depends on 
what the setup is.


I wouldn't call the first of the above cases 'highly unstable'.

Which is not to say that one cannot generate difficult cases (esp. with 
correlated covariates and/or one or more highly influential covariates) 
and that the OPs case is not one of them.



HTH,

Chuck


The curse of dimensionality is easily demonstrated looking at the
proximity between your datapoints. Say we scale the interval in one
dimension to be 1 unit. If you have 20 evenly-spaced observations, the
distance between the observations is 0.05 units. To have a proximity
like that in a 2-dimensional space, you need 20^2=400 observations. in
a 10 dimensional space this becomes 20^10 ~ 10^13 datapoints. The
distance between your observations is important, as a sparse dataset
will definitely make your model misbehave.

Even with about 35 samples per variable, using 50 independent
variables will render a highly unstable model, as your dataspace is
about as sparse as it can get. On top of that, interpreting a model
with 50 variables is close to impossible, and then I didn't even start
on interactions. No point in trying I'd say. If you really need all
that information, you might want to take a look at some dimension
reduction methods first.

Cheers
Joris

On Mon, Jun 14, 2010 at 2:55 PM, Marc Schwartz  wrote:

On Jun 13, 2010, at 10:20 PM, array chip wrote:


Hi, this is not R technical question per se. I know there are many excellent 
statisticians in this list, so here my questions: I have dataset with ~1800 
observations and 50 independent variables, so there are about 35 samples per 
variable. Is it wise to build a stable multiple logistic model with 50 
independent variables? Any problem with this approach? Thanks

John



The general rule of thumb is to have 10-20 'events' per covariate degree of 
freedom. Frank has suggested that in some cases that number should be as high 
as 25.

The number of events is the smaller of the two possible outcomes for your 
binary dependent variable.

Covariate degrees of freedom refers to the number of columns in the model 
matrix. Continuous variables are 1, binary factors are 1, K-level factors are K 
- 1.

So if out of your 1800 records, you have at least 500 to 1000 events, depending 
upon how many of your 50 variables are K-level factors and whether or not you 
need to consider interactions, you may be OK. Better if towards the high end of 
that range, especially if the model is for prediction versus explanation.

Two excellent references would be Frank's book:

?http://www.amazon.com/Regression-Modeling-Strategies-Frank-Harrell/dp/0387952322/

and Steyerberg's book:

?http://www.amazon.com/Clinical-Prediction-Models-Development-Validation/dp/038777243X/

to assist in providing guidance for model building/validation techniques.

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible cod

Re: [R] unqiue problem

2010-06-14 Thread jim holtman
Your process does remove all the duplicate entries based on the
content of the two columns.  After you do this, there are still
duplicate entries in the first column that you are trying to use as
rownames and therefore the error.  Why to you want to use non-unique
entries as rownames?  Do you really need the row names, or should you
only be keeping unique values for the first column?

On Mon, Jun 14, 2010 at 8:54 AM, Assa Yeroslaviz  wrote:
> Hello everybody,
>
> I have a a matrix of 2 columns and over 27k rows.
> some of the rows are double , so I tried to remove them with the command
> unique():
>
>> Workbook5 <- read.delim(file =  "Workbook5.txt")
>> dim(Workbook5)
> [1] 27748     2
>> Workbook5 <- unique(Workbook5)
>> dim(Workbook5)
> [1] 20101     2
>
> it removed a lot of line, but unfortunately not all of them. I wanted to add
> the row names to the matrix and got this error message:
>> rownames(Workbook5) <- Workbook5[,1]
> Error in `row.names<-.data.frame`(`*tmp*`, value = c(1L, 2L, 3L, 4L, 5L,  :
>  duplicate 'row.names' are not allowed
> In addition: Warning message:
> non-unique values when setting 'row.names': ‘A_51_P102339’,
> ‘A_51_P102518’, ‘A_51_P103435’, ‘A_51_P103465’,
> ‘A_51_P103594’, ‘A_51_P104409’, ‘A_51_P104718’,
> ‘A_51_P105869’, ‘A_51_P106428’, ‘A_51_P106799’,
> ‘A_51_P107176’, ‘A_51_P107959’, ‘A_51_P108767’,
> ‘A_51_P109258’, ‘A_51_P109708’, ‘A_51_P110341’,
> ‘A_51_P111757’, ‘A_51_P112427’, ‘A_51_P112662’,
> ‘A_51_P113672’, ‘A_51_P115018’, ‘A_51_P116496’,
> ‘A_51_P116636’, ‘A_51_P117666’, ‘A_51_P118132’,
> ‘A_51_P118168’, ‘A_51_P118400’, ‘A_51_P118506’,
> ‘A_51_P119315’, ‘A_51_P120093’, ‘A_51_P120305’,
> ‘A_51_P120738’, ‘A_51_P120785’, ‘A_51_P121134’,
> ‘A_51_P121359’, ‘A_51_P121412’, ‘A_51_P121652’,
> ‘A_51_P121724’, ‘A_51_P121829’, ‘A_51_P122141’,
> ‘A_51_P122964’, ‘A_51_P123422’, ‘A_51_P123895’,
> ‘A_51_P124008’, ‘A_51_P124719’, ‘A_51_P125648’,
> ‚ÄòA_51_P125679‚Äô, ‚ÄòA_51_P125779‚ [... truncated]
>
> Is there a better way to discard the duplicataions in the text file (Excel
> file is the origin).
>
>> R.version
>               _
> platform       x86_64-apple-darwin9.8.0
> arch           x86_64
> os             darwin9.8.0
> system         x86_64, darwin9.8.0
> status         Patched
> major          2
> minor          11.1
> year           2010
> month          06
> day            03
> svn rev        52201
> language       R
> version.string R version 2.11.1 Patched (2010-06-03 r52201)
>
> THX
>
> Assa
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Prime Numbers Pkgs

2010-06-14 Thread Karl-Dieter Crisman
> Looking for a recommended package that handles prime number computations.

I'm not sure whether this would be helpful to you, but Sage
(http://www.sagemath.org) has excellent number theory support and
several ways to interface with R (which is included in the
distribution of Sage).  I use it myself to access several R packages I
need to do analysis on data I produce in Sage/Python, and then send
the data back to Sage for further processing.

Of course, this is not an R package, so depending on how much you need
such things from within R itself it may or may not help you out.

HTH,
Karl-Dieter Crisman

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lme command

2010-06-14 Thread Joshua Wiley
Hello Enrico,

One thing I notice between your two calls is that in the second you
specify data=dados, but you do not in the first.  When I try to do
something similar to your formulae using one of my longitudinal
datasets, I get the same results whether or not I put the formula for
random in a list.  Perhaps you could provide some sample data that
shows what is happening?

One other comment, rather than specifically adding each term and
interaction, you can use this shorthand:

Altura~(Idade+Idade2)*(sexo+status)

Best regards,

Josh



On Mon, Jun 14, 2010 at 6:37 AM, Enrico Colosimo  wrote:
> Hi,
>
> I am doing a longitudinal data set fit using lme.
> I used two forms of the lme command and I am
> getting two different outputs.
>
> FIRST
> out<-lme(Altura~Idade+Idade2+sexo+status+Idade:sexo+Idade:status+Idade2:sexo+Idade2:status,
> random=(list(ident=~Idade+Idade2)))
>
> SECOND
> out<-lme(Altura~Idade+Idade2+sexo+status+Idade:sexo+Idade:status+Idade2:sexo+Idade2:status,
> random= ~Idade+Idade2|ident,data=dados)
>
> I got weird results from the first one and could not understand the
> reason of it. All the results are
> exactly the same but  the intercetp, and the two main terms sexo
> (gender) and status (treatment).
> That differences made a lot of difference in the final results.
>
> Anybody can tell me what is the differences  between them?
> Thanks.
>
> Enrico.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student
Health Psychology
University of California, Los Angeles

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] recursive function

2010-06-14 Thread n.via...@libero.it

Dear list,
I have the following problem, what i'm trying to do is to built a function 
which does the following calculationg in a recursive way:


I have a data frame more or less like this:

variableyear DELTA

EC01 2006/
EC01 2007   10
EC01 20085
EC01 20099


And then I have at time 2009  a variable called R_EC01(2009)=5
What I have to do is to construct the R_EC01 time series by starting from the 
2009 value:
R_EC01(2008)=R_EC01(2009)-DELTA(2009)
R_EC01(2007)=R_EC01(2008)-DELTA(2008)
R_EC01(2006)=R_EC01(2007)-DELTA(2007)


In terms of number, the results that i should get are:
R_EC01(2008)=5-9=-4

R_EC01(2007)=-4-5=-9
R_EC01(2006)=-9-10=-19
so my data frame should looks like this
SERIESYEAR value

R_EC01   2006  -19

R_EC012007   -9

R_EC012008   -4

R_EC01 2009   5
Anyone Knows hot to do it??
My dataframe is not set as a time series...


Thanks a lot!!!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] MICE Package and LMER

2010-06-14 Thread KDT

Hi R users,

I am estimating a multilevel model using lmer.  My dataset has missing
values and I am using MICE package to make Multiple Imputations. 
Everything works good until i reach the POOLING stage using the pool()
function. I am able to get a summary of the pooled fixed effects but not the
random effects. No errors or warnings are given to me.
I checked the "help" file in R and the developers of MICE noted that "The
function pools also estimates obtained with lme() and lmer(), BUT only the
fixed part of the model."

Does anyone have any ideas on how I can get a summary of pooled random
effects? 

Below is my code
imp<-mice(mydata,m=3, imputationMethod =c("","","","logreg"),maxit=2, pri=F) 
model <- with(data=imp, lmer(miss~ sex + age + (1|id) + (1|sch),
family=binomial(link="logit")))
result<-pool(model)
summary(result)

Thanks
Trevor
-- 
View this message in context: 
http://r.789695.n4.nabble.com/MICE-Package-and-LMER-tp2254504p2254504.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R in Linux: problem with special characters

2010-06-14 Thread daniel fernandes

Hi,

 

First of all, thank you for you reply. It was very helpfull.

 

I have another problem: I have changed the locale to pt_pt.iso885...@euro. Now 
the problem that I reported earlier doesn’t occur .

 

print("dúvida")
[1] "dúvida"

 

My system information now is the following:

 

Sys.getlocale()
[1] 
"lc_ctype=pt_pt.iso885...@euro;LC_NUMERIC=C;lc_time=pt_pt.iso885...@euro;lc_collate=pt_pt.iso885...@euro;
LC_MONETARY=C;lc_messages=pt_pt.iso885...@euro;lc_paper=pt_pt.iso885...@euro;LC_NAME=C;LC_ADDRESS=C;
LC_TELEPHONE=C;lc_measurement=pt_pt.iso885...@euro;LC_IDENTIFICATION=C"
 

 

[daniel.fernan...@pt-lnx13 ~]$ locale


 

lang=pt_pt.iso885...@euro
lc_ctype=pt_pt.iso885...@euro
lc_numeric=pt_pt.iso885...@euro
lc_time=pt_pt.iso885...@euro
lc_collate=pt_pt.iso885...@euro
lc_monetary=pt_pt.iso885...@euro
lc_messages=pt_pt.iso885...@euro
lc_paper=pt_pt.iso885...@euro
lc_name=pt_pt.iso885...@euro
lc_address=pt_pt.iso885...@euro
lc_telephone=pt_pt.iso885...@euro
lc_measurement=pt_pt.iso885...@euro
lc_identification=pt_pt.iso885...@euro
LC_ALL=
 
However, if I utilise the tcltk package and try to build a frame with the label 
“dúvida” the string appears incorrect. My guess is that this problem is also 
related with the locales. Does Tcl/tk utilise a different locale?

 

 

Thanks in advance,

 

Daniel

 


 
> Date: Fri, 11 Jun 2010 17:54:34 -0400
> From: murdoch.dun...@gmail.com
> To: danielpas...@hotmail.com
> CC: r-help@r-project.org
> Subject: Re: [R] R in Linux: problem with special characters
> 
> daniel fernandes wrote:
> >
> >
> >
> > Hi,
> > 
> > I’m working with the 64 bit version of R 2.11.0 for Linux. My session info 
> > is:
> > 
> > R version 2.11.0 (2010-04-22)
> > x86_64-redhat-linux-gnu
> > 
> > locale:
> > [1] C
> > 
> > attached base packages:
> > [1] stats graphics grDevices utils datasets methods base 
> > 
> > 
> > When I try to print words with special characters the result is that the 
> > expression printed has some kind of code substituting the special 
> > character. For example, if I run print(“dúvida”) the result is:
> > 
> > > print("dúvida")
> > [1] "d\372vida"
> > 
> > This as problem has something to do with the locale settings? If I run the 
> > locale command in the Linux server, I get:
> > 
> 
> Yes, it's your locale settings. The C locale doesn't support the "ú" 
> character in your string, and displays it in octal.
> 
> Duncan Murdoch
> > 
> > [daniel.fernan...@pt-lnx13 ~]$ locale
> > LANG=pt_PT.UTF-8
> > LC_CTYPE="C"
> > LC_NUMERIC="C"
> > LC_TIME="C"
> > LC_COLLATE="C"
> > LC_MONETARY="C"
> > LC_MESSAGES="C"
> > LC_PAPER="C"
> > LC_NAME="C"
> > LC_ADDRESS="C"
> > LC_TELEPHONE="C"
> > LC_MEASUREMENT="C"
> > LC_IDENTIFICATION="C"
> > LC_ALL=C
> > 
> > Thanks in advance for your help,
> > 
> > Daniel
> > 
> >
> >
> > TRANSFORME SUAS FOTOS EM EMOTICONS PARA O MESSENGER. CLIQUE AQUI E VEJA 
> > COMO. 
> > _
> > VEJA SEUS EMAILS ONDE QUER QUE VOCÊ ESTEJA, ACESSE O HOTMAIL PELO SEU 
> > CELULAR AGORA.
> >
> > =Live_Hotmail&utm_medium=Tagline&utm_content=VEJASEUSEM84&utm_campaign=MobileServices
> > [[alternative HTML version deleted]]
> >
> > 
> > 
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> > 
> 
  
_
ACESSE O MESSENGER DO SEU CELULAR AGORA MESMO. CLIQUE E VEJA AQUI UM PASSO A 
PASSO.

rce=Live_Hotmail&utm_medium=Tagline&utm_content=ACESSEOMES83&utm_campaign=MobileServices
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lme command

2010-06-14 Thread Enrico Colosimo
Hi,

I am doing a longitudinal data set fit using lme.
I used two forms of the lme command and I am
getting two different outputs.

FIRST
out<-lme(Altura~Idade+Idade2+sexo+status+Idade:sexo+Idade:status+Idade2:sexo+Idade2:status,
random=(list(ident=~Idade+Idade2)))

SECOND
out<-lme(Altura~Idade+Idade2+sexo+status+Idade:sexo+Idade:status+Idade2:sexo+Idade2:status,
random= ~Idade+Idade2|ident,data=dados)

I got weird results from the first one and could not understand the
reason of it. All the results are
exactly the same but  the intercetp, and the two main terms sexo
(gender) and status (treatment).
That differences made a lot of difference in the final results.

Anybody can tell me what is the differences  between them?
Thanks.

Enrico.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] xtable with Sweave

2010-06-14 Thread Silvano

Hi,

I'm using Sweave to prepare a descriptive report.
Are at least 20 tables built with xtable command of kind:

<>=
q5 = factor(Q5, label=c("Não", "Sim"))
(q5.tab = cbind(table(q5)))
@

<>=
xtable(q5.tab, align="l|c", caption.placement = "top", 
table.placement='H')

@

I'm getting the following message:

Too many unprocessed floats

in Latex file.

How to avoid these messages appearing?

--
Silvano Cesar da Costa
Departamento de Estatística
Universidade Estadual de Londrina
Fone: 3371-4346

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] do faster ANOVAS

2010-06-14 Thread melissa
Thank you, with the matrix for the responses (here my 101 timepoints), it takes 
less than 30 minutes for 1000 pemutations, whereas before it takes 2h30!

Best regards,

Mélissa





> Message du 10/06/10 18:52
> De : "Douglas Bates" 
> A : "melissa" 
> Copie à : r-help@r-project.org
> Objet : Re: [R] do faster ANOVAS
> 
> 
> The lm and aov functions can take a matrix response allowing you to
> fit all of the responses for a single attribute simultaneously.
> 
> 
> On Thu, Jun 10, 2010 at 8:47 AM, melissa wrote:
> > Dear all R users,
> > I want to realize 800 000 ANOVAS and to store Sum of Squares of the 
> > effects. Here is an extract of my table data
> > Product attribute subject rep t1 t2 t3 … t101
> > P1 A1 S1 R1 1 0 0 … 1
> > I want to realize 1 ANOVA per timepoint and per attribute, there are 101 
> > timepoints and 8 attributes so I want to realize 808 ANOVAS. This will be 
> > an ANOVA with two factors :
> > Here is one example:
> > Aov(t1~Subject*Product,data[data$attribute==”A1”,])
> > I want to store for each ANOVA SSprod,SSsujet,SSerreur,SSinter and SStotal.
> > In fact I want the result in several matrices:
> > Ssprod matrice:
> > T1 t2 t3 t4 … t101
> > A1 ssprod(A1,T1)
> > A2
> > A3
> > …
> > A8
> > So I would like a matrice like that for ssprod, ssujet,sserreur,ssinter and 
> > sstotal.
> > And this is for one permutation, and I want to do 1000 permutations
> > Here is my code:
> > SSmatrixglobal<-function(k){
> >
> > daten.temp<-data
> > daten.temp$product=permutations[[k]]
> > listmat<-apply(daten.temp[,5:105],2,function(x,y){
> > tab2<-as.data.frame(cbind(x,y))
> > tab.class<-by(tab2[,1:3],tab2[,4],function(x){
> > f <- formula(paste(names(x)[1],"~",names(x)[2],"*",names(x)[3],sep=""))
> > anovas <- aov(f, data=x)
> > anovas$call$formula <-f
> > s1 <- summary(anovas)
> > qa <- s1[[1]][,2]
> > return(qa)
> > })
> > return(tab.class)
> > },y=daten.temp[,1:3]
> > )
> > ar <- 
> > array(unlist(listmat),dim=c(length(listmat[[1]][[1]]),length(listmat[[1]]),length(listmat)))
> > l=lapply(1:4,function(i) ar[i,,])
> > sssujet=l[[1]]
> > ssprod=l[[2]]
> > ssinter=l[[3]]
> > sserreur=l[[4]]
> > ss=rbind(sssujet,ssprod,ssinter,sserreur,sstotal)
> > ss=as.data.frame(ss)
> > sqlSave(channel,ss,"SS1000",append=T)
> > rm(ss,numperm,daten.temp)
> > }
> >
> > system.time(por <- lapply(c(1:1000), SSmatrixglobal))
> >
> > But it takes time about 90seconds for a permutation so *1000, how can I do 
> > in order to do faster ANOVAS?
> >
> > Many thanks
> > Best regards
> > Mélissa
> >
> > PS: I think that I can gain a lot of time in the aov function but I don't 
> > know how to do
> > [[alternative HTML version deleted]]
> >
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> 
>
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Prime Numbers Pkgs - Schoolmath is broken

2010-06-14 Thread Gavin Simpson
On Mon, 2010-06-14 at 07:16 -0700, Red Roo wrote:
> Looking for a recommended package that handles prime number computations.
> 
> Tried the following unsuccessfully:
> primeFactors() in the R.basic package failed to install.
> 
> primes() and primlist are broken in Schoolmath pkg on CRAN.
> My analysis can be found here http://j.mp/9BNI9q
> Not sure what the procedure is for getting things fixed, so I've
> cross-posted to r-dev as well.

Neither place is correct. This has *nothing* to do with R. Address bug
reports directly to the package maintainer, details of which can be
found here:

http://cran.r-project.org/web/packages/schoolmath/index.html

which is what is requested in the posting guide...

HTH

G

> --njg
> 
> TAKING THE PITH OUT OF PERFORMANCE http://perfdynamics.blogspot.com/
> Follow me on Twitter http://twitter.com/DrQz
> PERFORMANCE DYNAMICS COMPANY http://www.perfdynamics.com/
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression with 50 varaibales

2010-06-14 Thread Marc Schwartz
Joris,

There are two separate issues here:

1. Can you consider an LR model with 50 covariates?

2. Should you have 50 covariates in your LR model?


The answer to 1 is certainly yes, given what I noted below as a general working 
framework. I have personally been involved with the development and validation 
of LR models with ~35 covariates, albeit with notably larger datasets than 
discussed below, because the models are used for prediction. In fact, the 
current incarnations of those same models, now 15 years later, appear to have 
>40 covariates and are quite stable. The interpretation of the models by both 
statisticians and clinicians is relatively straightforward.

The answer to 2 gets into the subject matter that you raise, which is to 
consider other factors beyond the initial rules of thumb for minimum sample 
size. These get into reasonable data reduction methods, the consideration of 
collinearity, subject matter expertise, sparse data, etc.

The issues raised in number 2 are discussed in the two references that I noted.

Two additional references that might be helpful here on the first point are:

P. Peduzzi, J. Concato, E. Kemper, T. R. Holford, and A. R. Feinstein. A 
simulation study of the number of events per variable in logistic regression 
analysis. J Clin Epi, 49:1373–1379, 1996. 

E. Vittinghoff and C. E. McCulloch. Relaxing the rule of ten events per 
variable in logistic and Cox regression. Am J Epi, 165:710–718, 2006.


Regards,

Marc

On Jun 14, 2010, at 8:38 AM, Joris Meys wrote:

> Hi,
> 
> Marcs explanation is valid to a certain extent, but I don't agree with
> his conclusion. I'd like to point out "the curse of
> dimensionality"(Hughes effect) which starts to play rather quickly.
> 
> The curse of dimensionality is easily demonstrated looking at the
> proximity between your datapoints. Say we scale the interval in one
> dimension to be 1 unit. If you have 20 evenly-spaced observations, the
> distance between the observations is 0.05 units. To have a proximity
> like that in a 2-dimensional space, you need 20^2=400 observations. in
> a 10 dimensional space this becomes 20^10 ~ 10^13 datapoints. The
> distance between your observations is important, as a sparse dataset
> will definitely make your model misbehave.
> 
> Even with about 35 samples per variable, using 50 independent
> variables will render a highly unstable model, as your dataspace is
> about as sparse as it can get. On top of that, interpreting a model
> with 50 variables is close to impossible, and then I didn't even start
> on interactions. No point in trying I'd say. If you really need all
> that information, you might want to take a look at some dimension
> reduction methods first.
> 
> Cheers
> Joris
> 
> On Mon, Jun 14, 2010 at 2:55 PM, Marc Schwartz  wrote:
>> On Jun 13, 2010, at 10:20 PM, array chip wrote:
>> 
>>> Hi, this is not R technical question per se. I know there are many 
>>> excellent statisticians in this list, so here my questions: I have dataset 
>>> with ~1800 observations and 50 independent variables, so there are about 35 
>>> samples per variable. Is it wise to build a stable multiple logistic model 
>>> with 50 independent variables? Any problem with this approach? Thanks
>>> 
>>> John
>> 
>> 
>> The general rule of thumb is to have 10-20 'events' per covariate degree of 
>> freedom. Frank has suggested that in some cases that number should be as 
>> high as 25.
>> 
>> The number of events is the smaller of the two possible outcomes for your 
>> binary dependent variable.
>> 
>> Covariate degrees of freedom refers to the number of columns in the model 
>> matrix. Continuous variables are 1, binary factors are 1, K-level factors 
>> are K - 1.
>> 
>> So if out of your 1800 records, you have at least 500 to 1000 events, 
>> depending upon how many of your 50 variables are K-level factors and whether 
>> or not you need to consider interactions, you may be OK. Better if towards 
>> the high end of that range, especially if the model is for prediction versus 
>> explanation.
>> 
>> Two excellent references would be Frank's book:
>> 
>>  
>> http://www.amazon.com/Regression-Modeling-Strategies-Frank-Harrell/dp/0387952322/
>> 
>> and Steyerberg's book:
>> 
>>  
>> http://www.amazon.com/Clinical-Prediction-Models-Development-Validation/dp/038777243X/
>> 
>> to assist in providing guidance for model building/validation techniques.
>> 
>> HTH,
>> 
>> Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression with 50 varaibales

2010-06-14 Thread Robert A LaBudde
I think the real issue is why the fit is being 
done. If it is solely to interpolate and condense 
the dataset, the number of variables is not an important issue.


If the issue is developing a model that will 
capture causality, it is hard to believe that can 
be accomplished with 50+ variables. With this 
many, some kind of hunt would have to be done, 
and the resulting model would not be real stable. 
It would be better perhaps to first reduce the 
variable set by, say, principal components 
analysis, so that a reasonable sized set results.


If a stable and meaningful model is the goal, 
each term in the final model should be plausibly causal.


At 10:36 AM 6/14/2010, Claudia Beleites wrote:

Dear all,

(this first part of the email I sent to John 
earlier today, but forgot to put it to the list as well)

Dear John,

> Hi, this is not R technical question per se. 
I know there are many excellent

> statisticians in this list, so here my questions: I have dataset with ~1800
> observations and 50 independent variables, so 
there are about 35 samples per

> variable. Is it wise to build a stable multiple logistic model with 50
> independent variables? Any problem with this approach? Thanks

First: I'm not a statistician, but a spectroscopist.
But I do build logistic Regression models with 
far less than 1800 samples and far more variates 
(e.g. 75 patients / 256 spectral measurement 
channels). Though I have many measurements per 
sample: typically several hundred spectra per sample.


Question: are the 1800 real, independent samples?

Model stability is something you can measure.
Do a honest validation of your model with really 
_independent_ test data and measure the 
stability according to what your stability needs 
are (e.g. stable parameters or stable predictions?).




(From here on reply to Joris)

> Marcs explanation is valid to a certain extent, but I don't agree with
> his conclusion. I'd like to point out "the curse of
> dimensionality"(Hughes effect) which starts to play rather quickly.
No doubt.

> The curse of dimensionality is easily demonstrated looking at the
> proximity between your datapoints. Say we scale the interval in one
> dimension to be 1 unit. If you have 20 evenly-spaced observations, the
> distance between the observations is 0.05 units. To have a proximity
> like that in a 2-dimensional space, you need 20^2=400 observations. in
> a 10 dimensional space this becomes 20^10 ~ 10^13 datapoints. The
> distance between your observations is important, as a sparse dataset
> will definitely make your model misbehave.

But won't also the distance between groups grow?
No doubt, that high-dimensional spaces are _very_ unintuitive.

However, the required sample size may grow 
substantially slower, if the model has 
appropriate restrictions. I remember the 
recommendation of "at least 5 samples per class 
and variate" for linear classification models. 
I.e. not to get a good model, but to have a 
reasonable chance of getting a stable model.


> Even with about 35 samples per variable, using 50 independent
> variables will render a highly unstable model,
Am I wrong thinking that there may be a 
substantial difference between stability of 
predictions and stability of model parameters?


BTW: if the models are unstable, there's also aggregation.

At least for my spectra I can give toy examples 
with physical-chemical explanation that yield 
the same prediction with different parameters 
(of course because of correlation).


> as your dataspace is
> about as sparse as it can get. On top of that, interpreting a model
> with 50 variables is close to impossible,
No, not necessary. IMHO it depends very much on 
the meaning of the variables. E.g. for the 
spectra a set of model parameters may be 
interpreted like spectra or difference spectra. 
Of course this has to do with the fact, that a 
parallel coordinate plot is the more "natural" 
view of spectra compared to a point in so many dimensions.


> and then I didn't even start
> on interactions. No point in trying I'd say. If you really need all
> that information, you might want to take a look at some dimension
> reduction methods first.

Which puts to my mind a question I've had since long:
I assume that all variables that I know 
beforehand to be without information are already discarded.
The dimensionality is then further reduced in a 
data-driven way (e.g. by PCA or PLS). The model is built in the reduced space.


How much less samples are actually needed, 
considering the fact that the dimension 
reduction is a model estimated on the data?
...which of course also means that the honest 
validation embraces the data-driven dimensionality reduction as well...


Are there recommendations about that?


The other curious question I have is:
I assume that it is impossible for him to obtain 
the 10^xy samples required for comfortable model building.

So what is he to do?


Cheers,

Claudia



--
Claudia Beleites
Dipartimento dei Materiali e delle 

Re: [R] using R to draw over a distribution.

2010-06-14 Thread David Winsemius


On Jun 14, 2010, at 10:42 AM, SHANE MILLER, BLOOMBERG/ 731 LEXIN wrote:


Hi,

Suppose I analyze a log to create a histogram:

event E1 occurred N1 times
event E2 occurred N2 times
 ...
 ... for m total events
 ...
event Em occurred Nm times

The total number of occurrences is: T = SumNj
   j=1..m

I want to give this histogram to R and ask it to produce T random  
events such that approximately N1 instances of E1 are drawn, N2  
instances of E2 drawn, and so forth.


?table  # or perhaps use the fact that hist() will return a table of a  
particular type.

?sample  # from the events with prob = frequencies/T

(If you insist on constraining the total count to be the observed sum  
it will no longer be random with m degrees of freedom. And there are  
many postings on how to do this in the archives.)


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] using R to draw over a distribution.

2010-06-14 Thread SHANE MILLER, BLOOMBERG/ 731 LEXIN
Hi,

Suppose I analyze a log to create a histogram:

event E1 occurred N1 times
event E2 occurred N2 times
  ...
  ... for m total events
  ...
event Em occurred Nm times

The total number of occurrences is: T = SumNj
j=1..m

I want to give this histogram to R and ask it to produce T random events such 
that approximately N1 instances of E1 are drawn, N2 instances of E2 drawn, and 
so forth.

Regards,
Shane
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to ignore rows missing arguments of a function when creating a function?

2010-06-14 Thread Joris Meys
Ah, I overlooked that possibility.

You can do following :

not <- attr(fm$model,"na.action")
if( ! is.null(not)){   # only drop the NA values if there are any left
out of the model
   cluster <- cluster[-not]
   dat <- dat[-not,]
}
with(dat,{


On Mon, Jun 14, 2010 at 4:30 PM, edmund jones  wrote:
> Thanks a lot!
>
> So, what you propose works perfectly (the "cluster" variable is indeed a
> vector), except in the case where I have no missing values in my regression.
> With the following function:
>
> cl2 <- function(dat,fm, cluster){
> attach(dat, warn.conflicts = F)
> require(sandwich)
> require(lmtest)
> not <- attr(fm$model,"na.action")
> cluster <- cluster[-not]
> with( dat[-not,] ,{
> M <- length(unique(cluster))
> N <- length(cluster)
> K <- fm$rank
> dfc <- (M/(M-1))*((N-1)/(N-K))
> uj <- apply(estfun(fm),2, function(x) tapply(x, cluster, sum));
> vcovCL <- dfc*sandwich(fm, meat=crossprod(uj)/N)
> coeftest(fm, vcovCL)
> }
> )
> }
>
> If I have no missing values in the arguments of fm, get the message
> "error in -not: invalid argument to unary operator"
>
> In your example:
>
>
>> x <- rnorm(100)
>> y <- c(rnorm(90),NA,rnorm(9))
>> test <- lm(x~y)
>> str(test)
> List of 13
>  ... (tons of information)
>  $ model        :'data.frame':  99 obs. of  2 variables:
>  ... (more tons of information)
>  ..- attr(*, "na.action")=Class 'omit'  Named int 91
>  .. .. ..- attr(*, "names")= chr "91"
>  - attr(*, "class")= chr "lm"
>
> Now we know that we can do :
>> not <-attr(test$model,"na.action")
>> y[-not]
>
> If I have, for example, y <- rnorm(100), I also get the error:
> "error in -not: invalid argument to unary operator"
>
>
> In my database:
>
>  female    income transport lunch dist reg_f
> 4900  0 18.405990 0 0   75   750
> 4901  0    NA 0    NA   75   753
> 4902  1    NA 0 1   75   752
> 4903  1    NA 0 1   75   751
> 4904  1 69.678340 1 0   74   740
> 4905  0 57.953230 1 0   73   730
> 4906  1 85.835130 0 1   68   680
> 4907  0 81.952980 0 0   75   750
> 4908  1 46.837490 1 0   74   740
> 4909  0    NA 1 0    5    52
> 4910  1 65.041360 0 1   75   750
> 4911  0 77.451870 1 0   75   750
> 4912  0 96.148590 1 0   75   750
> 4913  0 64.510410 0 0   74   740
> 4914  0 69.391230 0 0   75   750
> 4915  0  4.804243 0 1   65   650
> 4916  0    NA 0 0   75   751
> 4917  1    NA 0 0   75   751
> 4918  1    NA 0 1   40   401
> 4919  1 49.920750 0 1   76   760
> 4920  0    NA 0 1   76   763
> 4921  0 10.187910 0 0   77   770
> 4922  0 14.839710 0 1   77   770
> 4923  1 32.041000 0 0   77   770
> 4924  0 85.639440 0 0   77   770
> 4925  1 86.308410 0 0   68   680
> 4926  0 79.223910 0 0    7    70
> 4927  0 81.825800 0 0   78   780
> 4928  0 31.931000 0 1   37   370
> 4929  0 53.282310 0 1   41   410
> 4930  1 31.312910 1 1   25   250
> 4931  1 50.478870 0 1   25   250
> 4932  0    NA 0 0   66   662
> 4933  1 58.156940 0 1   31   310
> 4934  0    NA 0 1    1    13
> 4935  1    NA 0 1    1    12
> 4936  1 59.149180 0 0    3    30
> 4937  1  5.400807 0 1    5    50
> 4938  1 76.828630 0 0    6    60
> 4939  1 73.488300 0 1   63   630
> 4940  0  6.529074 0 1    6    60
> 4941  0    NA 0 0    6    61
> 4942  1 70.128530 0 0    3    30
> 4943  0    NA 0 1   53   531
> 4944  1 75.715350 1 0    6    60
> 4945  0  8.623850 0 1   24   240
> 4946  1 79.062470 0 1   62   620
> 4947  0 83.863370 1 1   11   110
> 4948  0 58.904450 0 1   62   620
> 4949  0 88.500290 0 0    9    90
> 4950  0    NA 1 1    9    90
> 4951 NA    NA 0 1   15   151
>
> It works perfectly if I do
> cl2(testdata, lm(income ~ transport + reg_f ), female)
>
> but not for cl2(testdata, lm(dist ~ transport + reg_f ), female)
> Or any other case about when the arguments of the "lm" function have no
> "missing" values.
>
> How can I tell R to do this only if there is a "missing" problem?
>
> Thanks a lot for your help! Working on your previous replies has been very
> helpful in understanding how R works.
>
> Cheers,
> Edmund.
>
>
> 2010/6/13 Joris Meys 
>>
>>  Next to that, Off course you have to use the right  indices for
>> "cluster", but as I have no clue what the "dist"

Re: [R] logistic regression with 50 varaibales

2010-06-14 Thread Claudia Beleites

Dear all,

(this first part of the email I sent to John earlier today, but forgot to put it 
to the list as well)

Dear John,

> Hi, this is not R technical question per se. I know there are many excellent
> statisticians in this list, so here my questions: I have dataset with ~1800
> observations and 50 independent variables, so there are about 35 samples per
> variable. Is it wise to build a stable multiple logistic model with 50
> independent variables? Any problem with this approach? Thanks

First: I'm not a statistician, but a spectroscopist.
But I do build logistic Regression models with far less than 1800 samples and 
far more variates (e.g. 75 patients / 256 spectral measurement channels). Though 
I have many measurements per sample: typically several hundred spectra per sample.


Question: are the 1800 real, independent samples?

Model stability is something you can measure.
Do a honest validation of your model with really _independent_ test data and 
measure the stability according to what your stability needs are (e.g. stable 
parameters or stable predictions?).




(From here on reply to Joris)

> Marcs explanation is valid to a certain extent, but I don't agree with
> his conclusion. I'd like to point out "the curse of
> dimensionality"(Hughes effect) which starts to play rather quickly.
No doubt.

> The curse of dimensionality is easily demonstrated looking at the
> proximity between your datapoints. Say we scale the interval in one
> dimension to be 1 unit. If you have 20 evenly-spaced observations, the
> distance between the observations is 0.05 units. To have a proximity
> like that in a 2-dimensional space, you need 20^2=400 observations. in
> a 10 dimensional space this becomes 20^10 ~ 10^13 datapoints. The
> distance between your observations is important, as a sparse dataset
> will definitely make your model misbehave.

But won't also the distance between groups grow?
No doubt, that high-dimensional spaces are _very_ unintuitive.

However, the required sample size may grow substantially slower, if the model 
has appropriate restrictions. I remember the recommendation of "at least 5 
samples per class and variate" for linear classification models. I.e. not to get 
a good model, but to have a reasonable chance of getting a stable model.


> Even with about 35 samples per variable, using 50 independent
> variables will render a highly unstable model,
Am I wrong thinking that there may be a substantial difference between stability 
of predictions and stability of model parameters?


BTW: if the models are unstable, there's also aggregation.

At least for my spectra I can give toy examples with physical-chemical 
explanation that yield the same prediction with different parameters (of course 
because of correlation).


> as your dataspace is
> about as sparse as it can get. On top of that, interpreting a model
> with 50 variables is close to impossible,
No, not necessary. IMHO it depends very much on the meaning of the variables. 
E.g. for the spectra a set of model parameters may be interpreted like spectra 
or difference spectra. Of course this has to do with the fact, that a parallel 
coordinate plot is the more "natural" view of spectra compared to a point in so 
many dimensions.


> and then I didn't even start
> on interactions. No point in trying I'd say. If you really need all
> that information, you might want to take a look at some dimension
> reduction methods first.

Which puts to my mind a question I've had since long:
I assume that all variables that I know beforehand to be without information are 
already discarded.
The dimensionality is then further reduced in a data-driven way (e.g. by PCA or 
PLS). The model is built in the reduced space.


How much less samples are actually needed, considering the fact that the 
dimension reduction is a model estimated on the data?
...which of course also means that the honest validation embraces the 
data-driven dimensionality reduction as well...


Are there recommendations about that?


The other curious question I have is:
I assume that it is impossible for him to obtain the 10^xy samples required for 
comfortable model building.

So what is he to do?


Cheers,

Claudia



--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Lattice: How to color the data points in splom() according to the panel they are plotted?

2010-06-14 Thread Deepayan Sarkar
On Mon, Jun 14, 2010 at 6:24 PM, Martin Maechler
 wrote:
> Dear Deepayan,
>
> this is in reply to a message almost 6 months ago :
>
>> Deepayan Sarkar 

[...]

>    > Thanks, I was going to say the same thing, except that it would be
> (1)
>    > conceptually simpler just to add the 'i' and 'j' values as arguments
>    > to the "panel" function (the 'pargs' variable). The colors could then
>    > be passed through as part of the ... arguments, and the relevant entry
>    > extracted in the panel function.
> (2)
>    > The other option is to keep a global counter and increment it inside
>    > the panel function, and choosing colors based on that counter and
>    > knowledge of the order in which panels are drawn. Not very elegant,
>    > but the least intrusive solution I can think of.
>
>    > -Deepayan
>
> Against the R-forge version of lattice,
> the following very small patch to  panel.pairs
> would allow users to use '(1)' i.e., provide  panel functions
> with (i,j) arguments directly.
> I'm pretty sure that the change could not easily break existing
> splom() usage.
>
> --- R/splom.R   (revision 619)
> +++ R/splom.R   (working copy)
> @@ -291,7 +291,8 @@
>                                    y = z[subscripts, i]),
>  ##                                    panel.number = panel.number,
>  ##                                    packet.number = packet.number),
> -                              list(...))
> +                              list(...),
> +                              list(i = i, j = j))
>                         else
>                             c(list(x = z[subscripts, j],
>                                    y = z[subscripts, i],
> @@ -299,7 +300,8 @@
>                                    subscripts = subscripts),
>  ##                                    panel.number = panel.number,
>  ##                                    packet.number = packet.number),
> -                              list(...))
> +                              list(...),
> +                              list(i = i, j = j))
>
>                     if (!("..." %in% names(formals(panel
>                         pargs <- pargs[intersect(names(pargs), 
> names(formals(panel)))]

Done in r-forge svn.

-Deepayan

>
>
> With the above change,
> a user could use a panel function with (i,j) arguments,
> and e.g. say
>
> Cmat <- outer(1:6,1:6,
>              function(i,j) rainbow(11, start=.12, end=.5)[i+j-1])
>
> splom(~diag(6), ## for testing: superpanel=mypanel.pairs,
>      panel=function(x,y,i,j,...){
>               panel.fill(Cmat[i,j]); panel.splom(x,y,...)
>               panel.text(.5,.5, paste("(",i,",",j,")",sep=""))
>      })
>
>
> I think that would allow quite a bit more flexibility without
> the need to explicitly "hack"  panel.pairs
> (and having to maintain such a hack against the ever-enhancing
> lattice).
>
> Martin Maechler, ETH Zurich
>
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cforest and Random Forest memory use

2010-06-14 Thread Max Kuhn
The first thing that I would recommend is to avoid the "formula
interface" to models. The internals that R uses to create matrices
form a formula+data set are not efficient. If you had a large number
of variables, I would have automatically pointed to that as a source
of issues. cforest and ctree only have formula interfaces though, so
you are stuck on that one. The randomForest package has both
interfaces, so that might be better.

Probably the issue is the depth of the trees. With that many
observations, you are likely to get extremely deep trees. You might
try limiting the depth of the tree and see if that has an effect on
performance.

We run into these issues with large compound libraries; in those cases
we do whatever we can to avoid ensembles of trees or kernel methods.
If you want those, you might need to write your own code that is
hyper-efficient and tuned to your particular data structure (as we
did).

On another note... are this many observations really needed? You have
40ish variables; I suspect that >1M points are pretty densely packed
into 40-dimensional space. Do you loose much by sampling the data set
or allocating a large portion to a test set? If you have thousands of
predictors, I could see the need for so many observations, but I'm
wondering if many of the samples are redundant.

Max

On Mon, Jun 14, 2010 at 3:45 AM, Matthew OKane  wrote:
> Answers added below.
> Thanks again,
> Matt
>
> On 11 June 2010 14:28, Max Kuhn  wrote:
>>
>> Also, you have not said:
>>
>>  - your OS: Windows Server 2003 64-bit
>>  - your version of R: 2.11.1 64-bit
>>  - your version of party: 0.9-9995
>
>
>>
>>  - your code:  test.cf <-(formula=badflag~.,data =
>> example,control=cforest_control
>
>                                              (teststat = 'max', testtype =
> 'Teststatistic', replace = FALSE, ntree = 500, savesplitstats = FALSE,mtry =
> 10))
>
>>  - what "Large data set" means: > 1 million observations, 40+ variables,
>> around 200MB
>>  - what "very large model objects" means - anything which breaks
>>
>> So... how is anyone suppose to help you?
>>
>> Max
>
>



-- 

Max

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Prime Numbers Pkgs - Schoolmath is broken

2010-06-14 Thread Red Roo
Looking for a recommended package that handles prime number computations.

Tried the following unsuccessfully:
primeFactors() in the R.basic package failed to install.

primes() and primlist are broken in Schoolmath pkg on CRAN.
My analysis can be found here http://j.mp/9BNI9q
Not sure what the procedure is for getting things fixed, so I've cross-posted 
to r-dev as well.
--njg

TAKING THE PITH OUT OF PERFORMANCE http://perfdynamics.blogspot.com/
Follow me on Twitter http://twitter.com/DrQz
PERFORMANCE DYNAMICS COMPANY http://www.perfdynamics.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] remove last char of a text string

2010-06-14 Thread Henrique Dallazuanna
Try:

 gsub(".$", "", c('01asap05a', '02ee04b'))

On Mon, Jun 14, 2010 at 10:47 AM, glaporta  wrote:

>
> Dear R experts,
> is there a simple way to remove the last char of a text string?
> substr() function use as parameter start end only... but my strings are of
> different length...
> 01asap05a -> 01asap05
> 02ee04b -> 02ee04
> Thank you all,
> Gianandrea
> --
> View this message in context:
> http://r.789695.n4.nabble.com/remove-last-char-of-a-text-string-tp2254377p2254377.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to change default help settings from factory default html

2010-06-14 Thread Uwe Ligges



On 14.06.2010 15:39, Katya Mauff wrote:

Hi-I have tried that offline, -my browser opens and says: "Offline mode
Firefox is currently in offline mode and can't browse the Web.
Uncheck "Work Offline" in the File menu, then try again"



Well, go "online" with Firefox which means firefox can access the R help 
pages afterwards. You do not need to go really online with your machine.


Uwe



I can access ?help that way, or pages I've been to before having gone offline, 
but nothing new.


Uwe Ligges  2010/06/14 03:29 PM>>>



On 14.06.2010 14:11, Katya Mauff wrote:

Hi all

Apologies if this is a trivial question- I have searched the lists and the 
online help files etc but have not managed to find anything. I recently 
downloaded the latest version of R, which has the help type set to htmlhelp as 
default (according to http://127.0.0.1:18380/library/utils/html/help.html)

I would very much like to be able to access the help files when I am offline by 
typing ?topic etc as I used to with the previous R version. Any suggestions?



Yes, just type ?topic and your web brwoser opens and displays the help
file. Note that the address beginning "http://127.0.0.1"; is on your
local machine, hence you can stay offline.

Uwe Ligges



Thanks
Katya




###
UNIVERSITY OF CAPE TOWN

This e-mail is subject to the UCT ICT policies and e-mail disclaimer published 
on our website at http://www.uct.ac.za/about/policies/emaildisclaimer/ or 
obtainable from +27 21 650 4500. This e-mail is intended only for the person(s) 
to whom it is addressed. If the e-mail has reached you in error, please notify 
the author. If you are not the intended recipient of the e-mail you may not 
use, disclose, copy, redirect or print the content. If this e-mail is not 
related to the business of UCT it is sent by the sender in the sender's 
individual capacity.

###


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





###
UNIVERSITY OF CAPE TOWN

This e-mail is subject to the UCT ICT policies and e-mail disclaimer published 
on our website at http://www.uct.ac.za/about/policies/emaildisclaimer/ or 
obtainable from +27 21 650 4500. This e-mail is intended only for the person(s) 
to whom it is addressed. If the e-mail has reached you in error, please notify 
the author. If you are not the intended recipient of the e-mail you may not 
use, disclose, copy, redirect or print the content. If this e-mail is not 
related to the business of UCT it is sent by the sender in the sender's 
individual capacity.

###




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] remove last char of a text string

2010-06-14 Thread Sarah Goslee
Sure. You can use nchar() to find out how long the string is.

> teststring <- "01asap05a"
> substr(teststring, 1, nchar(teststring)-1)
[1] "01asap05"

On Mon, Jun 14, 2010 at 9:47 AM, glaporta  wrote:
>
> Dear R experts,
> is there a simple way to remove the last char of a text string?
> substr() function use as parameter start end only... but my strings are of
> different length...
> 01asap05a -> 01asap05
> 02ee04b -> 02ee04
> Thank you all,
> Gianandrea


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] remove last char of a text string

2010-06-14 Thread Gustaf Rydevik
On Mon, Jun 14, 2010 at 3:47 PM, glaporta  wrote:
>
> Dear R experts,
> is there a simple way to remove the last char of a text string?
> substr() function use as parameter start end only... but my strings are of
> different length...
> 01asap05a -> 01asap05
> 02ee04b -> 02ee04
> Thank you all,
> Gianandrea
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/remove-last-char-of-a-text-string-tp2254377p2254377.html
> Sent from the R help mailing list archive at Nabble.com.
>

It's not terribly elegant, but this works:

orig.text<-c("01asap05a","02ee04b")
substr(orig.text,1,nchar(orig.text)-1)

Regards,
Gustaf


-- 
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] remove last char of a text string

2010-06-14 Thread glaporta

Dear R experts,
is there a simple way to remove the last char of a text string?
substr() function use as parameter start end only... but my strings are of
different length...
01asap05a -> 01asap05
02ee04b -> 02ee04
Thank you all,
Gianandrea
-- 
View this message in context: 
http://r.789695.n4.nabble.com/remove-last-char-of-a-text-string-tp2254377p2254377.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Html help

2010-06-14 Thread Joris Meys
If the IP number is something like 127.0.0.1:x then you are on
your local computer.
Cheers
Joris

On Mon, Jun 14, 2010 at 1:33 PM, Murray Jorgensen  wrote:
> I have just installed R 2.11.1 on my XP laptop.
>
> I like html help for browsing but text help for on-the-fly look-ups. I was a
> bit surprised when I was asked to choose between them during the
> installation. I chose text, thinking I could fix the html help later, which
> is what I am trying to do now.
>
> Now when I ask for html help my browser goes to
>
> 'http://-ip-number-/doc/html/index.html'
>
> instead of where I want on my computer:
>
> C:\apps\R\R-2.11.1\doc\html\index.html
>
> Now I can go where I want manually but then the package list on
>
>
> C:\apps\R\R-2.11.1\doc\html\packages.html
>
> does not include all the packages that I have installed and linked. I don't
> want to read my html help from the web because sometimes I am off-line or on
> a slow connection.
>
> How do I go about getting a local set of html help files?
>
> Cheers,   Murray Jorgensen
>
> --
> Dr Murray Jorgensen      http://www.stats.waikato.ac.nz/Staff/maj.html
> Department of Statistics, University of Waikato, Hamilton, New Zealand
> Email: m...@waikato.ac.nz  majorgen...@ihug.co.nz        Fax 7 838 4155
> Phone  +64 7 838 4773 wk    Home +64 7 825 0441   Mobile 021 0200 8350
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to change default help settings from factory default html

2010-06-14 Thread Katya Mauff
Hi-I have tried that offline, -my browser opens and says: "Offline mode
Firefox is currently in offline mode and can't browse the Web.
Uncheck "Work Offline" in the File menu, then try again"
 
I can access ?help that way, or pages I've been to before having gone offline, 
but nothing new.

>>> Uwe Ligges  2010/06/14 03:29 PM >>>


On 14.06.2010 14:11, Katya Mauff wrote:
> Hi all
>
> Apologies if this is a trivial question- I have searched the lists and the 
> online help files etc but have not managed to find anything. I recently 
> downloaded the latest version of R, which has the help type set to htmlhelp 
> as default (according to http://127.0.0.1:18380/library/utils/html/help.html)
>
> I would very much like to be able to access the help files when I am offline 
> by typing ?topic etc as I used to with the previous R version. Any 
> suggestions?


Yes, just type ?topic and your web brwoser opens and displays the help 
file. Note that the address beginning "http://127.0.0.1"; is on your 
local machine, hence you can stay offline.

Uwe Ligges


> Thanks
> Katya
>
>
>
>
> ###
> UNIVERSITY OF CAPE TOWN
>
> This e-mail is subject to the UCT ICT policies and e-mail disclaimer 
> published on our website at 
> http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from +27 
> 21 650 4500. This e-mail is intended only for the person(s) to whom it is 
> addressed. If the e-mail has reached you in error, please notify the author. 
> If you are not the intended recipient of the e-mail you may not use, 
> disclose, copy, redirect or print the content. If this e-mail is not related 
> to the business of UCT it is sent by the sender in the sender's individual 
> capacity.
>
> ###
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


 

###
UNIVERSITY OF CAPE TOWN 

This e-mail is subject to the UCT ICT policies and e-mail disclaimer published 
on our website at http://www.uct.ac.za/about/policies/emaildisclaimer/ or 
obtainable from +27 21 650 4500. This e-mail is intended only for the person(s) 
to whom it is addressed. If the e-mail has reached you in error, please notify 
the author. If you are not the intended recipient of the e-mail you may not 
use, disclose, copy, redirect or print the content. If this e-mail is not 
related to the business of UCT it is sent by the sender in the sender's 
individual capacity.

###
 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression with 50 varaibales

2010-06-14 Thread Joris Meys
Hi,

Marcs explanation is valid to a certain extent, but I don't agree with
his conclusion. I'd like to point out "the curse of
dimensionality"(Hughes effect) which starts to play rather quickly.

The curse of dimensionality is easily demonstrated looking at the
proximity between your datapoints. Say we scale the interval in one
dimension to be 1 unit. If you have 20 evenly-spaced observations, the
distance between the observations is 0.05 units. To have a proximity
like that in a 2-dimensional space, you need 20^2=400 observations. in
a 10 dimensional space this becomes 20^10 ~ 10^13 datapoints. The
distance between your observations is important, as a sparse dataset
will definitely make your model misbehave.

Even with about 35 samples per variable, using 50 independent
variables will render a highly unstable model, as your dataspace is
about as sparse as it can get. On top of that, interpreting a model
with 50 variables is close to impossible, and then I didn't even start
on interactions. No point in trying I'd say. If you really need all
that information, you might want to take a look at some dimension
reduction methods first.

Cheers
Joris

On Mon, Jun 14, 2010 at 2:55 PM, Marc Schwartz  wrote:
> On Jun 13, 2010, at 10:20 PM, array chip wrote:
>
>> Hi, this is not R technical question per se. I know there are many excellent 
>> statisticians in this list, so here my questions: I have dataset with ~1800 
>> observations and 50 independent variables, so there are about 35 samples per 
>> variable. Is it wise to build a stable multiple logistic model with 50 
>> independent variables? Any problem with this approach? Thanks
>>
>> John
>
>
> The general rule of thumb is to have 10-20 'events' per covariate degree of 
> freedom. Frank has suggested that in some cases that number should be as high 
> as 25.
>
> The number of events is the smaller of the two possible outcomes for your 
> binary dependent variable.
>
> Covariate degrees of freedom refers to the number of columns in the model 
> matrix. Continuous variables are 1, binary factors are 1, K-level factors are 
> K - 1.
>
> So if out of your 1800 records, you have at least 500 to 1000 events, 
> depending upon how many of your 50 variables are K-level factors and whether 
> or not you need to consider interactions, you may be OK. Better if towards 
> the high end of that range, especially if the model is for prediction versus 
> explanation.
>
> Two excellent references would be Frank's book:
>
>  http://www.amazon.com/Regression-Modeling-Strategies-Frank-Harrell/dp/0387952322/
>
> and Steyerberg's book:
>
>  http://www.amazon.com/Clinical-Prediction-Models-Development-Validation/dp/038777243X/
>
> to assist in providing guidance for model building/validation techniques.
>
> HTH,
>
> Marc Schwartz
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to change default help settings from factory default html

2010-06-14 Thread Prof Brian Ripley

On Mon, 14 Jun 2010, Katya Mauff wrote:


Hi all

Apologies if this is a trivial question- I have searched the lists 
and the online help files etc but have not managed to find anything. 
I recently downloaded the latest version of R, which has the help 
type set to htmlhelp as default (according to 
http://127.0.0.1:18380/library/utils/html/help.html)


Not so for the latest R (2.11.1) compiled with the default settings. 
The help(help) page in 2.11.1 actually says


 The ‘factory-fresh’ default is text help except from the Mac OS
 GUI, which uses HTML help displayed in its own browser window.

However, the Windows installer allows you to set the default type of 
help, and its default is HTML help in R >= 2.10.0.


I would very much like to be able to access the help files when I am 
offline by typing ?topic etc as I used to with the previous R 
version. Any suggestions?


You should be able to use HTML help offline: it is running locally on 
your machine (127.0.0.1 is the loopback interface).  (It is 
conceivable that your unstated OS disables the loopback interface when 
'offline', but we have not come across an instance of this.)


But ?help tells you how select other forms of help by default via

  help_type = getOption("help_type"))

and ?'?' says

 This is a shortcut to ‘help’ and uses its default type of help.




Thanks
Katya




###
UNIVERSITY OF CAPE TOWN

This e-mail is subject to the UCT ICT policies and e-mail disclaimer published 
on our website at http://www.uct.ac.za/about/policies/emaildisclaimer/ or 
obtainable from +27 21 650 4500. This e-mail is intended only for the person(s) 
to whom it is addressed. If the e-mail has reached you in error, please notify 
the author. If you are not the intended recipient of the e-mail you may not 
use, disclose, copy, redirect or print the content. If this e-mail is not 
related to the business of UCT it is sent by the sender in the sender's 
individual capacity.

###


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merging data frames

2010-06-14 Thread jim holtman
Put the rownames as another column in your dataframe so that it
remains with the data.  After merging, you can then use it as the
"rownames"

On Mon, Jun 14, 2010 at 9:25 AM, Assa Yeroslaviz  wrote:
> Hi,
>
> is it possible to merge two data frames while preserving the row names of
> the bigger data frame?
>
> I have two data frames which  i would like to combine. While doing so I
> always loose the row names. When I try to append this, I get the error
> message, that I have non-unique names. This although I used unique command
> on the data frame where the double inputs supposedly are
>
> thanks for the help
>
> Assa
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] list matching

2010-06-14 Thread jim holtman
One thing you might do is to transform the data into a format that is
easier to combine; I like using 'merge':

> mynames=cbind(c('a','b'),c(11,22))
> lst=list(a=c(1,2), b=5)
> mynames
 [,1] [,2]
[1,] "a"  "11"
[2,] "b"  "22"
> lst
$a
[1] 1 2

$b
[1] 5

> mynames.df <- as.data.frame(mynames)
> mynames.df
  V1 V2
1  a 11
2  b 22
> lst.s <- stack(lst)
> lst.s
  values ind
1  1   a
2  2   a
3  5   b
> merge(mynames.df, lst.s, by.x="V1", by.y="ind")
  V1 V2 values
1  a 11  1
2  a 11  2
3  b 22  5
>


On Mon, Jun 14, 2010 at 8:06 AM, Yuan Jian  wrote:
> Hello,
>
> I could not find a clear solution for the follow question. please allow me 
> to ask. thanks
>
> mynames=cbind(c('a','b'),c(11,22))
> lst=list(a=c(1,2), b=5)
>
> now I try to combine mynames and lst:
> a  1  11
> a  2  11
> b  5  22
>
> thanks
> jian
>
>
>
>
>        [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to change default help settings from factory default html

2010-06-14 Thread Uwe Ligges



On 14.06.2010 14:11, Katya Mauff wrote:

Hi all

Apologies if this is a trivial question- I have searched the lists and the 
online help files etc but have not managed to find anything. I recently 
downloaded the latest version of R, which has the help type set to htmlhelp as 
default (according to http://127.0.0.1:18380/library/utils/html/help.html)

I would very much like to be able to access the help files when I am offline by 
typing ?topic etc as I used to with the previous R version. Any suggestions?



Yes, just type ?topic and your web brwoser opens and displays the help 
file. Note that the address beginning "http://127.0.0.1"; is on your 
local machine, hence you can stay offline.


Uwe Ligges



Thanks
Katya




###
UNIVERSITY OF CAPE TOWN

This e-mail is subject to the UCT ICT policies and e-mail disclaimer published 
on our website at http://www.uct.ac.za/about/policies/emaildisclaimer/ or 
obtainable from +27 21 650 4500. This e-mail is intended only for the person(s) 
to whom it is addressed. If the e-mail has reached you in error, please notify 
the author. If you are not the intended recipient of the e-mail you may not 
use, disclose, copy, redirect or print the content. If this e-mail is not 
related to the business of UCT it is sent by the sender in the sender's 
individual capacity.

###


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] merging data frames

2010-06-14 Thread Assa Yeroslaviz
Hi,

is it possible to merge two data frames while preserving the row names of
the bigger data frame?

I have two data frames which  i would like to combine. While doing so I
always loose the row names. When I try to append this, I get the error
message, that I have non-unique names. This although I used unique command
on the data frame where the double inputs supposedly are

thanks for the help

Assa

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Which is the easiest (most elegant) way to force "aov" to treat numerical variables as categorical ?

2010-06-14 Thread Andrea Bernasconi DG
I think I found the solution !

> cc<-factor(cars)
> dd<-factor(driver)
> MODEL<-y~cc+dd+additive
> summary(aov(MODEL,data=DATA))

On 14 Jun, 2010, at 2:52 PM, Andrea Bernasconi DG wrote:

> Hi R help,
> 
> Hi R help,
> 
> Which is the easiest (most elegant) way to force "aov" to treat numerical 
> variables as categorical ?
> 
> Sincerely, Andrea Bernasconi DG
> 
> PROBLEM EXAMPLE
> 
> I consider the latin squares example described at page 157 of the book:
> Statistics for Experimenters: Design, Innovation, and Discovery by George E. 
> P. Box, J. Stuart Hunter, William G. Hunter.
> 
> This example use the data-file /BHH2-Data/tab0408.dat from 
> ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip.
> 
> The file tab0408.dat contains following DATA:
> > DATA
>driver cars additive  y
> 1   11A 19
> 2   21D 23
> 3   31B 15
> 4   41C 19
> 5   12B 24
> 6   22C 24
> 7   32D 14
> 8   42A 18
> 9   13D 23
> 10  23A 19
> 11  33C 15
> 12  43B 19
> 13  14C 26
> 14  24B 30
> 15  34A 16
> 16  44D 16
> 
> Now
> > summary( aov(MODEL, data=DATA) )
> Df Sum Sq Mean Sq F value Pr(>F)  
> cars 1   12.8  12.800  0.8889 0.3680  
> driver   1  115.2 115.200  8. 0.0179 *
> additive 3   40.0  13.333  0.9259 0.4634  
> Residuals   10  144.0  14.400 
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> This results differ from book result at p 159, since "cars" and "driver" are 
> treated as numerical variables by "aov".
> 
> BRUTE FORCE SOLUTION
> 
> Manually transforming "cars" and "driver" into categorical variables, I 
> obtain the correct result:
> > DATA_AB
>driver cars additive  y
> 1  D1   C1A 19
> 2  D2   C1D 23
> 3  D3   C1B 15
> 4  D4   C1C 19
> 5  D1   C2B 24
> 6  D2   C2C 24
> 7  D3   C2D 14
> 8  D4   C2A 18
> 9  D1   C3D 23
> 10 D2   C3A 19
> 11 D3   C3C 15
> 12 D4   C3B 19
> 13 D1   C4C 26
> 14 D2   C4B 30
> 15 D3   C4A 16
> 16 D4   C4D 16
> > summary( aov(MODEL, data=DATA_AB) )
> Df Sum Sq Mean Sq F value   Pr(>F)   
> cars 3 24   8.000 1.5 0.307174   
> driver   3216  72.00013.5 0.004466 **
> additive 3 40  13.333 2.5 0.156490   
> Residuals6 32   5.333
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
> 
> QUESTION
> 
> Which is the easiest (most elegant) way to force "driver" and "cars" from 
> DATA to be treated as categorical variables by "aov"?
> More generally, which is the easiest way to force "aov"  to treat numerical 
> variables as categorical ?
> 
> Sincerely, Andrea Bernasconi DG
> 
> PROBLEM EXAMPLE
> 
> I consider the latin squares example described at page 157 of the book:
> Statistics for Experimenters: Design, Innovation, and Discovery by George E. 
> P. Box, J. Stuart Hunter, William G. Hunter.
> 
> This example use the data-file /BHH2-Data/tab0408.dat from 
> ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip.
> 
> The file tab0408.dat contains following DATA:
> > DATA
>driver cars additive  y
> 1   11A 19
> 2   21D 23
> 3   31B 15
> 4   41C 19
> 5   12B 24
> 6   22C 24
> 7   32D 14
> 8   42A 18
> 9   13D 23
> 10  23A 19
> 11  33C 15
> 12  43B 19
> 13  14C 26
> 14  24B 30
> 15  34A 16
> 16  44D 16
> 
> Now
> > summary( aov(MODEL, data=DATA) )
> Df Sum Sq Mean Sq F value Pr(>F)  
> cars 1   12.8  12.800  0.8889 0.3680  
> driver   1  115.2 115.200  8. 0.0179 *
> additive 3   40.0  13.333  0.9259 0.4634  
> Residuals   10  144.0  14.400 
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> This results differ from book result at p 159, since "cars" and "driver" are 
> treated as numerical variables by "aov".
> 
> BRUTE FORCE SOLUTION
> 
> Manually transforming "cars" and "driver" into categorical variables, I 
> obtain the correct result:
> > DATA_AB
>driver cars additive  y
> 1  D1   C1A 19
> 2  D2   C1D 23
> 3  D3   C1B 15
> 4  D4   C1C 19
> 5  D1   C2B 24
> 6  D2   C2C 24
> 7  D3   C2D 14
> 8  D4   C2A 18
> 9  D1   C3D 23
> 10 D2   C3A 19
> 11 D3   C3C 15
> 12 D4   C

Re: [R] Which is the easiest (most elegant) way to force "aov" to treat numerical variables as categorical ?

2010-06-14 Thread Ivan Calandra
Hi,

See ?factor
e.g.: DATA$driver <- factor(DATA$driver)
See also the level= argument if you want to change the order of your levels.

HTH,
Ivan

Le 6/14/2010 14:52, Andrea Bernasconi DG a écrit :
> Hi R help,
>
> Hi R help,
>
> Which is the easiest (most elegant) way to force "aov" to treat numerical 
> variables as categorical ?
>
> Sincerely, Andrea Bernasconi DG
>
> PROBLEM EXAMPLE
>
> I consider the latin squares example described at page 157 of the book:
> Statistics for Experimenters: Design, Innovation, and Discovery by George E. 
> P. Box, J. Stuart Hunter, William G. Hunter.
>
> This example use the data-file /BHH2-Data/tab0408.dat from 
> ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip.
>
> The file tab0408.dat contains following DATA:
>
>> DATA
>>  
> driver cars additive  y
> 1   11A 19
> 2   21D 23
> 3   31B 15
> 4   41C 19
> 5   12B 24
> 6   22C 24
> 7   32D 14
> 8   42A 18
> 9   13D 23
> 10  23A 19
> 11  33C 15
> 12  43B 19
> 13  14C 26
> 14  24B 30
> 15  34A 16
> 16  44D 16
>
> Now
>
>> summary( aov(MODEL, data=DATA) )
>>  
>  Df Sum Sq Mean Sq F value Pr(>F)
> cars 1   12.8  12.800  0.8889 0.3680
> driver   1  115.2 115.200  8. 0.0179 *
> additive 3   40.0  13.333  0.9259 0.4634
> Residuals   10  144.0  14.400
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> This results differ from book result at p 159, since "cars" and "driver" are 
> treated as numerical variables by "aov".
>
> BRUTE FORCE SOLUTION
>
> Manually transforming "cars" and "driver" into categorical variables, I 
> obtain the correct result:
>
>> DATA_AB
>>  
> driver cars additive  y
> 1  D1   C1A 19
> 2  D2   C1D 23
> 3  D3   C1B 15
> 4  D4   C1C 19
> 5  D1   C2B 24
> 6  D2   C2C 24
> 7  D3   C2D 14
> 8  D4   C2A 18
> 9  D1   C3D 23
> 10 D2   C3A 19
> 11 D3   C3C 15
> 12 D4   C3B 19
> 13 D1   C4C 26
> 14 D2   C4B 30
> 15 D3   C4A 16
> 16 D4   C4D 16
>
>> summary( aov(MODEL, data=DATA_AB) )
>>  
>  Df Sum Sq Mean Sq F value   Pr(>F)
> cars 3 24   8.000 1.5 0.307174
> driver   3216  72.00013.5 0.004466 **
> additive 3 40  13.333 2.5 0.156490
> Residuals6 32   5.333
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> QUESTION
>
> Which is the easiest (most elegant) way to force "driver" and "cars" from 
> DATA to be treated as categorical variables by "aov"?
> More generally, which is the easiest way to force "aov"  to treat numerical 
> variables as categorical ?
>
> Sincerely, Andrea Bernasconi DG
>
> PROBLEM EXAMPLE
>
> I consider the latin squares example described at page 157 of the book:
> Statistics for Experimenters: Design, Innovation, and Discovery by George E. 
> P. Box, J. Stuart Hunter, William G. Hunter.
>
> This example use the data-file /BHH2-Data/tab0408.dat from 
> ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip.
>
> The file tab0408.dat contains following DATA:
>
>> DATA
>>  
> driver cars additive  y
> 1   11A 19
> 2   21D 23
> 3   31B 15
> 4   41C 19
> 5   12B 24
> 6   22C 24
> 7   32D 14
> 8   42A 18
> 9   13D 23
> 10  23A 19
> 11  33C 15
> 12  43B 19
> 13  14C 26
> 14  24B 30
> 15  34A 16
> 16  44D 16
>
> Now
>
>> summary( aov(MODEL, data=DATA) )
>>  
>  Df Sum Sq Mean Sq F value Pr(>F)
> cars 1   12.8  12.800  0.8889 0.3680
> driver   1  115.2 115.200  8. 0.0179 *
> additive 3   40.0  13.333  0.9259 0.4634
> Residuals   10  144.0  14.400
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> This results differ from book result at p 159, since "cars" and "driver" are 
> treated as numerical variables by "aov".
>
> BRUTE FORCE SOLUTION
>
> Manually transforming "cars" and "driver" into categorical variables, I 
> obtain the correct result:
>
>> DATA_AB
>>  
> driver cars additive  y
> 1  D1   C1A 19
> 2  D2   C1D 23
> 3  D3   C1B 15
> 4  D4   C1C 19
> 5  D1   C2B 24
> 6  D2   C2C 24
> 7  D3   C2D 14
> 8  D4   C2A 18
> 9  D1   C3D 23
> 10 D2   C3A 19
> 11 D3   C3C 15

Re: [R] logistic regression with 50 varaibales

2010-06-14 Thread Marc Schwartz
On Jun 13, 2010, at 10:20 PM, array chip wrote:

> Hi, this is not R technical question per se. I know there are many excellent 
> statisticians in this list, so here my questions: I have dataset with ~1800 
> observations and 50 independent variables, so there are about 35 samples per 
> variable. Is it wise to build a stable multiple logistic model with 50 
> independent variables? Any problem with this approach? Thanks
> 
> John


The general rule of thumb is to have 10-20 'events' per covariate degree of 
freedom. Frank has suggested that in some cases that number should be as high 
as 25. 

The number of events is the smaller of the two possible outcomes for your 
binary dependent variable.

Covariate degrees of freedom refers to the number of columns in the model 
matrix. Continuous variables are 1, binary factors are 1, K-level factors are K 
- 1.

So if out of your 1800 records, you have at least 500 to 1000 events, depending 
upon how many of your 50 variables are K-level factors and whether or not you 
need to consider interactions, you may be OK. Better if towards the high end of 
that range, especially if the model is for prediction versus explanation.

Two excellent references would be Frank's book:

  
http://www.amazon.com/Regression-Modeling-Strategies-Frank-Harrell/dp/0387952322/

and Steyerberg's book:

  
http://www.amazon.com/Clinical-Prediction-Models-Development-Validation/dp/038777243X/

to assist in providing guidance for model building/validation techniques.

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Lattice: How to color the data points in splom() according to the panel they are plotted?

2010-06-14 Thread Martin Maechler
Dear Deepayan,

this is in reply to a message almost 6 months ago :

> Deepayan Sarkar 
> on Sun, 17 Jan 2010 01:39:21 -0800 writes:

> On Sat, Jan 16, 2010 at 11:56 PM, Peter Ehlers  wrote:
>> Marius Hofert wrote:
>>> 
>>> Dear ExpeRts,
>>> 
>>> I have the scatter plot matrix as given below. I would like the 
different
>>> "sub-plots" in the scatter plot matrix to be colored differently. How 
do I
>>> get all points shown in the upper-left plot (on position (1,1) in the
>>> scatter plot matrix) to be plotted in blue, and the points shown in the 
plot
>>> to the right (on position (1,2) in the scatter plot matrix) to be 
plotted in
>>> red? More generally, how can I provide a matrix of colors to be used by
>>> splom() such that all data points in the corresponding sub-plot of the
>>> scatter plot matrix are shown in the specified color?
>>> 
>>> Cheers,
>>> 
>>> Marius
>>> 
>>> Here is the code:
>>> 
>>> library(lattice)
>>> 
>>> entrymat=matrix(0,nrow=3,ncol=3)
>>> entrymat[1,2]="black"
>>> entrymat[1,3]="blue"
>>> entrymat[2,3]="red"
>>> entrymat=t(entrymat)
>>> 
>>> splom(~iris[,1:3],superpanel=function(z,...){
>>> 
>>>  
mymat.df=data.frame(rows=as.vector(row(entrymat)),cols=as.vector(col(entrymat)),entries=as.vector(entrymat))
>>>        mymat.df=subset(mymat.df,cols>>                with(mymat.df,{
>>>                        panel.text(x=rows,y=cols,labels=entries)
>>>                })
>>> 
>>>  panel.pairs(z,upper.panel=panel.splom,lower.panel=function(...){},...)
>>>        },varnames=c("1","2","3")
>>> )
>> 
>> I think that you will have to modify panel.pairs to get what
>> you want. But I must admit that I can't see why you would
>> want such a plot. What's achieved by having different
>> colours in different subpanels? And you would lose the
>> ability to colour groups differently (or things would become
>> really complicated and messy).

> Thanks, I was going to say the same thing, except that it would be
(1)
> conceptually simpler just to add the 'i' and 'j' values as arguments
> to the "panel" function (the 'pargs' variable). The colors could then
> be passed through as part of the ... arguments, and the relevant entry
> extracted in the panel function.
(2)
> The other option is to keep a global counter and increment it inside
> the panel function, and choosing colors based on that counter and
> knowledge of the order in which panels are drawn. Not very elegant,
> but the least intrusive solution I can think of.

> -Deepayan

Against the R-forge version of lattice,
the following very small patch to  panel.pairs
would allow users to use '(1)' i.e., provide  panel functions
with (i,j) arguments directly.
I'm pretty sure that the change could not easily break existing
splom() usage.

--- R/splom.R   (revision 619)
+++ R/splom.R   (working copy)
@@ -291,7 +291,8 @@
y = z[subscripts, i]),
 ##panel.number = panel.number,
 ##packet.number = packet.number),
-  list(...))
+  list(...),
+  list(i = i, j = j))
 else
 c(list(x = z[subscripts, j],
y = z[subscripts, i],
@@ -299,7 +300,8 @@
subscripts = subscripts),
 ##panel.number = panel.number,
 ##packet.number = packet.number),
-  list(...))
+  list(...),
+  list(i = i, j = j))
 
 if (!("..." %in% names(formals(panel
 pargs <- pargs[intersect(names(pargs), 
names(formals(panel)))]


With the above change, 
a user could use a panel function with (i,j) arguments, 
and e.g. say

Cmat <- outer(1:6,1:6, 
  function(i,j) rainbow(11, start=.12, end=.5)[i+j-1])

splom(~diag(6), ## for testing: superpanel=mypanel.pairs,
  panel=function(x,y,i,j,...){
   panel.fill(Cmat[i,j]); panel.splom(x,y,...)
   panel.text(.5,.5, paste("(",i,",",j,")",sep=""))
  })


I think that would allow quite a bit more flexibility without
the need to explicitly "hack"  panel.pairs
(and having to maintain such a hack against the ever-enhancing
lattice).

Martin Maechler, ETH Zurich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Which is the easiest (most elegant) way to force "aov" to treat numerical variables as categorical ?

2010-06-14 Thread Andrea Bernasconi DG
Hi R help,

Hi R help,

Which is the easiest (most elegant) way to force "aov" to treat numerical 
variables as categorical ?

Sincerely, Andrea Bernasconi DG

PROBLEM EXAMPLE

I consider the latin squares example described at page 157 of the book:
Statistics for Experimenters: Design, Innovation, and Discovery by George E. P. 
Box, J. Stuart Hunter, William G. Hunter.

This example use the data-file /BHH2-Data/tab0408.dat from ftp://ftp.wiley.com/ 
in /sci_tech_med/statistics_experimenters/BHH2-Data.zip.

The file tab0408.dat contains following DATA:
> DATA
   driver cars additive  y
1   11A 19
2   21D 23
3   31B 15
4   41C 19
5   12B 24
6   22C 24
7   32D 14
8   42A 18
9   13D 23
10  23A 19
11  33C 15
12  43B 19
13  14C 26
14  24B 30
15  34A 16
16  44D 16

Now
> summary( aov(MODEL, data=DATA) )
Df Sum Sq Mean Sq F value Pr(>F)  
cars 1   12.8  12.800  0.8889 0.3680  
driver   1  115.2 115.200  8. 0.0179 *
additive 3   40.0  13.333  0.9259 0.4634  
Residuals   10  144.0  14.400 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

This results differ from book result at p 159, since "cars" and "driver" are 
treated as numerical variables by "aov".

BRUTE FORCE SOLUTION

Manually transforming "cars" and "driver" into categorical variables, I obtain 
the correct result:
> DATA_AB
   driver cars additive  y
1  D1   C1A 19
2  D2   C1D 23
3  D3   C1B 15
4  D4   C1C 19
5  D1   C2B 24
6  D2   C2C 24
7  D3   C2D 14
8  D4   C2A 18
9  D1   C3D 23
10 D2   C3A 19
11 D3   C3C 15
12 D4   C3B 19
13 D1   C4C 26
14 D2   C4B 30
15 D3   C4A 16
16 D4   C4D 16
> summary( aov(MODEL, data=DATA_AB) )
Df Sum Sq Mean Sq F value   Pr(>F)   
cars 3 24   8.000 1.5 0.307174   
driver   3216  72.00013.5 0.004466 **
additive 3 40  13.333 2.5 0.156490   
Residuals6 32   5.333
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

QUESTION

Which is the easiest (most elegant) way to force "driver" and "cars" from DATA 
to be treated as categorical variables by "aov"?
More generally, which is the easiest way to force "aov"  to treat numerical 
variables as categorical ?

Sincerely, Andrea Bernasconi DG

PROBLEM EXAMPLE

I consider the latin squares example described at page 157 of the book:
Statistics for Experimenters: Design, Innovation, and Discovery by George E. P. 
Box, J. Stuart Hunter, William G. Hunter.

This example use the data-file /BHH2-Data/tab0408.dat from ftp://ftp.wiley.com/ 
in /sci_tech_med/statistics_experimenters/BHH2-Data.zip.

The file tab0408.dat contains following DATA:
> DATA
   driver cars additive  y
1   11A 19
2   21D 23
3   31B 15
4   41C 19
5   12B 24
6   22C 24
7   32D 14
8   42A 18
9   13D 23
10  23A 19
11  33C 15
12  43B 19
13  14C 26
14  24B 30
15  34A 16
16  44D 16

Now
> summary( aov(MODEL, data=DATA) )
Df Sum Sq Mean Sq F value Pr(>F)  
cars 1   12.8  12.800  0.8889 0.3680  
driver   1  115.2 115.200  8. 0.0179 *
additive 3   40.0  13.333  0.9259 0.4634  
Residuals   10  144.0  14.400 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

This results differ from book result at p 159, since "cars" and "driver" are 
treated as numerical variables by "aov".

BRUTE FORCE SOLUTION

Manually transforming "cars" and "driver" into categorical variables, I obtain 
the correct result:
> DATA_AB
   driver cars additive  y
1  D1   C1A 19
2  D2   C1D 23
3  D3   C1B 15
4  D4   C1C 19
5  D1   C2B 24
6  D2   C2C 24
7  D3   C2D 14
8  D4   C2A 18
9  D1   C3D 23
10 D2   C3A 19
11 D3   C3C 15
12 D4   C3B 19
13 D1   C4C 26
14 D2   C4B 30
15 D3   C4A 16
16 D4   C4D 16
> summary( aov(MODEL, data=DATA_AB) )
Df Sum Sq Mean Sq F value   Pr(>F)   
cars 3 24   8.000 1.5 0.307174   
driver   3216  72.00013.5 0.004466 **
additive 3 40  13.333 2.5 0.156490   
Residuals6 32   5.333
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

QUESTION

Wh

Re: [R] list matching

2010-06-14 Thread Henrique Dallazuanna
Try this:

cbind(mynames[rep(seq(nrow(mynames)), sapply(lst, length)),], unlist(lst))

On Mon, Jun 14, 2010 at 9:06 AM, Yuan Jian  wrote:

> Hello,
>
> I could not find a clear solution for the follow question. please allow me
> to ask. thanks
>
> mynames=cbind(c('a','b'),c(11,22))
> lst=list(a=c(1,2), b=5)
>
> now I try to combine mynames and lst:
> a  1  11
> a  2  11
> b  5  22
>
> thanks
> jian
>
>
>
>
>[[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>


-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Large Data

2010-06-14 Thread Joris Meys
http://www.google.com/#hl=en&source=hp&q=R+big+data+sets&aq=f&aqi=g1&aql=&oq=&gs_rfai=&fp=686584f57664

Cheers
Joris

On Mon, Jun 14, 2010 at 12:07 PM, Meenakshi
 wrote:
>
> HI,
>
> I want to import 1.5G CSV file in R.
> But the following error comes:
>
> 'Victor allocation 12.4 size'
>
> How to read the large CSV file in R .
>
> Any one can help me?
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Large-Data-tp2254130p2254130.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >