Re: [R] Advantages of using SQLite for data import in comparison to csv files

2010-01-13 Thread Juliet Jacobson
Thanks for your answer. I hadn't found this possibility by web search.
Since sqldf also allows the import of tables from csv files, complex
SELECT queries and even joins on tables, I have the impression that
there aren't any reasons for using a SQLite database to organise the
data for R.
But then why has the R driver for data import from a SQLite database
been written?

Gabor Grothendieck wrote:
> You could look at read.csv.sql in sqldf (http://sqldf.googlecode.com) as well.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Odp: Better way than an ifelse statement?

2010-01-13 Thread Petr PIKAL
Hi

r-help-boun...@r-project.org napsal dne 14.01.2010 08:05:14:

> Hello All,
> 
> I am trying to create a column of weights based off of factor levels
> from another column.  I am using the weights to calculate L scores.
> Here is an example where the first column are scores, the second is my
> "factor" and the third I want to be a column of weights.  I can do
> what I want with an ifelse statement (see below), but I am wondering
> if anyone knows of a cleaner way to do this?
> 
> example <- data.frame(cbind(rnorm(4), rep(1:4, 1), c(0)))
> 
> example$X3 <- ifelse(example$X2==1, -3, (
> ifelse(example$X2==2, -1, (
> ifelse(example$X2==3, 1, (
> ifelse(example$X2==4, 3, NA))) ## this seems sloppy to me
> 
> > example
>X1 X2 X3
> 1  1.75308880  1 -3
> 2 -0.49273616  2 -1
> 3 -0.12446648  3  1
> 4 -0.06417217  4  3

One way is with factor

as.numeric(as.character(factor(example$X2, labels=c(-3, -1,1,3

Regards
Petr


> 
> 
> Thanks for your help,
> 
> Joshua
> 
> -- 
> Joshua Wiley
> Senior in Psychology
> University of California, Riverside
> http://www.joshuawiley.com/
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple symbols per single line in a legend

2010-01-13 Thread Jim Lemon

On 01/13/2010 01:50 AM, Primoz PETERLIN wrote:

Hello everybody,

Is it possible to coax legend() into displaying more than one simbol per
line in legend? I have a graph like the one attached to this mail; I would
like to reorganize the legend in such a way that the duplicate text would be
omitted, i.e., the first line would read"increasing
frequency" and the second one would read"decreasing
frequency". Before resorting to box() and text() I would like to check
whether some clever method already exists that would solve my problem. :)


Hi Primoz,
This request has been made before on the list, so I suppose it's time to 
get something together. Below is a first cut on a function to do this. 
If the user sends a list as the fill= argument, each one of the legend 
commands can have a few rectangles before it in different colors. The 
length of the fill= argument should be the same as the length of the 
legend= argument. If a list is passed as the pch= argument, the same 
thing happens with points. It is probably best to pass a corresponding 
list to the col= argument if the user wants different colored points. 
"legendg" stands for legend(grouped).


legendg<-function(x,y=NULL,legend,fill=NULL,col=par("col"),
 border="black",lty,lwd,pch=NULL,angle=45,density=NULL,
 bty="o",bg=par("bg"),box.lwd=par("lwd"),box.lty=par("lty"),
 box.col=par("fg"),pt.bg=NA,cex=1,pt.cex=cex,pt.lwd=lwd,
 xjust=0,yjust=1,x.intersp=1,y.intersp=1,adj=c(0,0.5),
 text.width=NULL,text.col=par("col"),merge=FALSE,
 trace=FALSE,plot=TRUE,ncol=1,horiz=FALSE,title=NULL,
 inset=0,xpd,title.col=text.col) {

 if(missing(legend) && !is.null(y)) {
  legend<-y
  y<-NULL
 }
 if(is.list(x)) {
  y<-x$y
  x<-x$x
 }
 if(!missing(xpd)) {
  oldxpd<-par("xpd")
  par(xpd=xpd)
 }
 legend.info<-legend(x=x,y=y,legend=legend,col=par("bg"),lty=1,
  bty=bty,bg=bg,box.lwd=box.lwd,box.lty=box.lty,
  box.col=par("fg"),pt.bg=NA,cex=1,pt.cex=pt.cex,pt.lwd=pt.lwd,
  xjust=xjust,yjust=yjust,x.intersp=x.intersp,y.intersp=y.intersp,
  adj=adj,text.width=text.width,text.col=text.col,merge=merge,
  trace=trace,plot=plot,ncol=ncol,horiz=horiz,title=title,
  inset=inset,title.col=title.col)
 if(!is.null(fill)) {
  rectheight<-strheight("Q")
  if(length(adj) > 1) yadj<-adj[2] else yadj<-0.5
  for(nel in 1:length(fill)) {
   nrect<-length(fill[[nel]])
   rectspace<-(legend.info$text$x[nel]-legend.info$rect$left)
   lefts<-cumsum(c(legend.info$rect$left+rectspace*0.1,
rep(0.8*rectspace/nrect,nrect-1)))
   rights<-lefts+0.7*rectspace/nrect
   bottoms<-rep(legend.info$text$y[nel]-yadj*rectheight,nrect)
   rect(lefts,bottoms,rights,bottoms+rectheight,col=fill[[nel]])
  }
 }
 if(!is.null(pch)) {
  if(!is.list(col)) {
   mycol<-pch
   if(length(col) < length(mycol[[1]])) 
col<-rep(col,length.out=length(mycol[[1]]))

   for(nel in 1:length(col))
mycol[[nel]]<-rep(col,length.out=length(mycol[[nel]]))
  }
  else mycol<-col
  for(nel in 1:length(pch)) {
   midspace<-(legend.info$rect$left+legend.info$text$x[nel])/2
   npch<-length(pch[[nel]])
   pchwidth<-strwidth("O")
   xpos<-cumsum(c(midspace-npch*0.5*pchwidth,rep(pchwidth,npch-1)))
   ypos<-rep(legend.info$text$y[nel],npch)
   points(xpos,ypos,pch=pch[[nel]],col=mycol[[nel]])
  }
 }
 if(!missing(xpd)) par(xpd=oldxpd)
 invisible(legend.info)
}

legendg(locator(1),c("one","two","three"),fill=list(2:3,3:5,6:7))
legendg(locator(1),c("one","two","three"),pch=list(1:2,3:5,6:7),
 col=list(2:3,3:5,6:7))

Give it a whacking folks and I'll put the function into the next version 
of plotrix if it survives. Also send any requests for features that I 
have neglected.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Odp: a question about deleting rows

2010-01-13 Thread Petr PIKAL
Hi

r-help-boun...@r-project.org napsal dne 13.01.2010 23:15:05:

> 
> I have a file like this:
> idn1n2   n3   n4   n5   n6 
> 1  3 47 8 102
> 2  4 12 4 3 10
> 3  7 00 0 0 8
> 4  1010 0 2 3
> 5  1110 0 0 5
> 
> what I want to do is: only if n2=0 and n3=0 and n4=0 and n5=0 then 
delete
> the row. how can I do that?

Why do you complicate things for yourself. Few days ago you wanted put 
zeroes instead of NA values. R has quite extensive capabilities how to 
handle NA values so

your.na.data <- your.data[your.data==0]<-NA

# returns NA instead zero values.

chosen.one <- complete.cases(your.na.data[,2:5])

# makes a logical vector that is TRUE only if your.na.data does not have 
NA value in it.

your.data[chosen.one,] or your.na.data[chosen.one,]

# selects rows without NA values.

Or you can use na.omit, na.rm or other NA handling facility provided with 
many functions.

What about to read few pages from R intro manual where you can find how to 
start with data manipulation.

Regards
Petr



> 
> thank you,
> 
> karena 
> -- 
> View this message in context: 
http://n4.nabble.com/a-question-about-deleting-
> rows-tp1013403p1013403.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Better way than an ifelse statement?

2010-01-13 Thread Ista Zahn
There is a recode function in the Hmisc package that might help:

library(Hmisc)
example$X3 <- recode(example$X2, 1:4, c(-3,-1,1,3))

-Ista

On Thu, Jan 14, 2010 at 2:05 AM, Joshua Wiley  wrote:
> Hello All,
>
> I am trying to create a column of weights based off of factor levels
> from another column.  I am using the weights to calculate L scores.
> Here is an example where the first column are scores, the second is my
> "factor" and the third I want to be a column of weights.  I can do
> what I want with an ifelse statement (see below), but I am wondering
> if anyone knows of a cleaner way to do this?
>
> example <- data.frame(cbind(rnorm(4), rep(1:4, 1), c(0)))
>
> example$X3 <- ifelse(example$X2==1, -3, (
> ifelse(example$X2==2, -1, (
> ifelse(example$X2==3, 1, (
> ifelse(example$X2==4, 3, NA))) ## this seems sloppy to me
>
>> example
>           X1 X2 X3
> 1  1.75308880  1 -3
> 2 -0.49273616  2 -1
> 3 -0.12446648  3  1
> 4 -0.06417217  4  3
>
>
> Thanks for your help,
>
> Joshua
>
> --
> Joshua Wiley
> Senior in Psychology
> University of California, Riverside
> http://www.joshuawiley.com/
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Better way than an ifelse statement?

2010-01-13 Thread Joshua Wiley
Hello All,

I am trying to create a column of weights based off of factor levels
from another column.  I am using the weights to calculate L scores.
Here is an example where the first column are scores, the second is my
"factor" and the third I want to be a column of weights.  I can do
what I want with an ifelse statement (see below), but I am wondering
if anyone knows of a cleaner way to do this?

example <- data.frame(cbind(rnorm(4), rep(1:4, 1), c(0)))

example$X3 <- ifelse(example$X2==1, -3, (
ifelse(example$X2==2, -1, (
ifelse(example$X2==3, 1, (
ifelse(example$X2==4, 3, NA))) ## this seems sloppy to me

> example
   X1 X2 X3
1  1.75308880  1 -3
2 -0.49273616  2 -1
3 -0.12446648  3  1
4 -0.06417217  4  3


Thanks for your help,

Joshua

-- 
Joshua Wiley
Senior in Psychology
University of California, Riverside
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] HTML translation problem in R-2.10.1

2010-01-13 Thread Jim Lemon

Hi again,
It's the \samp (and thus the HTML span tag) that does it. I removed the 
\samp from one of the links and rebuilt, and the link is properly 
translated.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] HTML translation problem in R-2.10.1

2010-01-13 Thread Jim Lemon

Hi Core Team,
I received an email about a problem with the help on the plotrix 
package. Apparently the \link tags in the help pages were showing up as 
literal text. I couldn't see this problem, nor any problem with the Rd 
files. Since the plotrix package hasn't been built for a while, I 
rechecked, rebuilt and reinstalled it. Sure enough, the \link tags 
showed up as literal text in both text and HTML help. This may be 
peculiar to R-2.10.1 as I never installed 2.10.0. If it helps, the 
--no-latex tag wasn't recognized by the INSTALL command (although only 
the HTML help was apparently built).


As far as I can see, the previous behavior of translating \link{ into href="... and the following } into  has been lost. The \samp{ string 
is now translated to an HTML span tag whereas I think it used to be 
translated to a  tag and this may be where the problem lies.


I think this is all done in Perl, so I can't help with the debugging.

Jim

R version 2.10.1 (2009-12-14)
i686-pc-linux-gnu

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] plotrix_2.7-2 prettyR_1.8

loaded via a namespace (and not attached):
[1] tools_2.10.1

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to install old randomForest?

2010-01-13 Thread David Scott

Julian Ramirez wrote:

Hi Ted,

You need to unzip and untar the files that are inside that file, and then
build the package using R CMD build --binary PackageName. However, for
compiling a package under a windows environment you will need Rtools2.10
from Duncan Murdoch, along with Miktex, and html workshop from microsoft.
All that is free. I suggest you to read tutorials on how to build packages
from sources in Windows. This website might be a good start point
http://www.biostat.wisc.edu/~kbroman/Rintro/Rwinpack.html.

Hope this helps,


Julian Ramirez
Research Assistant
International Centre for Tropical Agriculture, CIAT

On Wed, Jan 13, 2010 at 11:29 PM, Chang, C-Y. wrote:


Hi all,

I'm using windowsXP and R 2.10.0. I downloaded "randomForest 4.5-33.tar.gz"
from its archive, but how do I make it into a installation ZIP file?

Thanks,
Ted


As an alternative guide to the build process, have a look at Rob 
Hyndman's instructions:


http://robjhyndman.com/researchtips/building-r-packages-for-windows/

David Scott
_
David Scott Department of Statistics
The University of Auckland, PB 92019
Auckland 1142,NEW ZEALAND
Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055
Email:  d.sc...@auckland.ac.nz,  Fax: +64 9 373 7018

Director of Consulting, Department of Statistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mixtures of Discrete Uniforms

2010-01-13 Thread GlenB



Jim Silverton wrote:
> 
> I want to create the mixture formulation of a discrete uniform ie, say
> f(x) = 1/10, for i = 1,2,3,4,5,6,7,8,9 and 10 and
> another discrete distribution which has the same values of x, but he
> probabilities can vary. Can this be done on any package in R?  an if so,
> can
> the package estimate the 'probabilities' of each of the x value as well as
> the mixing proportion if I have the data?
> 

Without some further restrictions on the second component, you have an
identifiability problem.


-- 
View this message in context: 
http://n4.nabble.com/Re-Mixtures-of-Discrete-Uniforms-tp1010567p1013604.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to install old randomForest?

2010-01-13 Thread Julian Ramirez
Hi Ted,

You need to unzip and untar the files that are inside that file, and then
build the package using R CMD build --binary PackageName. However, for
compiling a package under a windows environment you will need Rtools2.10
from Duncan Murdoch, along with Miktex, and html workshop from microsoft.
All that is free. I suggest you to read tutorials on how to build packages
from sources in Windows. This website might be a good start point
http://www.biostat.wisc.edu/~kbroman/Rintro/Rwinpack.html.

Hope this helps,


Julian Ramirez
Research Assistant
International Centre for Tropical Agriculture, CIAT

On Wed, Jan 13, 2010 at 11:29 PM, Chang, C-Y. wrote:

> Hi all,
>
> I'm using windowsXP and R 2.10.0. I downloaded "randomForest 4.5-33.tar.gz"
> from its archive, but how do I make it into a installation ZIP file?
>
> Thanks,
> Ted
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to install old randomForest?

2010-01-13 Thread Chang, C-Y.

Hi all,

I'm using windowsXP and R 2.10.0. I downloaded "randomForest 
4.5-33.tar.gz" from its archive, but how do I make it into a 
installation ZIP file?


Thanks,
Ted

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Bootstrap for correlation coefficient

2010-01-13 Thread Roslina Zakaria
I have the following code:
 
## to check correlation between the simulated uniform data
x2 <- uni[,1] ; x2[1:10]
y2 <- uni[,2] ; y2[1:10]
result2 <- boot(cbind(x2,y2), f, 20)
# get 95% confidence interval 
boot.ci(result2, type="bca")
cor.test(x2,y2, method="pearson", conf.level=0.95)
 
part of my data:
 
> x2 <- uni[,1] ; x2[1:10]
 [1] 0.63933145 0.71677785 0.02181925 0.15913391 0.61021930 0.72878176 
0.22237891 0.28178186 0.75503612 0.54928692
> y2 <- uni[,2] ; y2[1:10]
 [1] 0.65754240 0.49263876 0.01352257 0.19195681 0.65759797 0.89813660 
0.24582441 0.12900017 0.78982501 0.68676534

## Result
> result2 <- boot(cbind(x2,y2), f, 20)
> result2
ORDINARY NONPARAMETRIC BOOTSTRAP

Call:
boot(data = cbind(x2, y2), statistic = f, R = 20)

Bootstrap Statistics :
    original   bias    std. error
t1* 0.891797 -0.005272889  0.01198383
 
Not sure about this:
 
> boot.ci(result2, type="bca")
Error in bca.ci(boot.out, conf, index[1], L = L, t = t.o, t0 = t0.o, h = h,  : 
  estimated adjustment 'a' is NA

> cor.test(x2,y2, method="pearson", conf.level=0.95)
    Pearson's product-moment correlation
data:  x2 and y2 
t = 51.7391, df = 689, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0 
95 percent confidence interval:
 0.8754420 0.9061121 
sample estimates:
 cor 
0.891797

My question is when I want to find the confidence interval why it gives me such 
message?
How do I get the p-value from the bootstrap?
 
Thank you so much


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help, How can I boxplot mse and mtry using 20 5-fold cross-validation?

2010-01-13 Thread bbslover

thank Max.
   you are so responsible, every time, you give me a lot of help. On my 
learning road, you are my guide, though we do not know each other.
 
best wishes
 
kevin



在2010-01-14,"Max Kuhn [via R]"  
写道: -原始邮件-
发件人:"Max Kuhn [via R]" 
发送时间:2010年1月14日 星期四
收件人:bbslover 
主题:Re: [R] Help, How can I boxplot mse and mtry using 20 5-fold 
cross-validation?

In caret, see ?trainControl. Use returnResamp = "all" 

Max 

On Wed, Jan 13, 2010 at 9:47 AM, bbslover <[hidden email]> wrote: 

> 
>  Hello, 
>   I am learning randomForest, now I want to boxplot mse and mtry using 20 
> 5-fold cross-validation(using median value), but I have no a good method to 
> do it, except a not good method. 
> 
> randomforest package itself did not contain cross-validating method, and 
> caret package contain cross validation method, but how can I get the the all 
> number of mtry , at the same time corresponding mse? 
> 
> 
> -- 
> View this message in 
> context:http://n4.nabble.com/Help-How-can-I-boxplot-mse-and-mtry-using-20-5-fold-cross-validation-tp1013058p1013058.html
> Sent from the R help mailing list archive at Nabble.com. 
> 
> __ 
>[hidden email]mailing list 
>https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code. 
> 



-- 

Max 

__ 
[hidden email]mailing list 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code. 



View message 
@http://n4.nabble.com/Help-How-can-I-boxplot-mse-and-mtry-using-20-5-fold-cross-validation-tp1013058p1013265.html
To unsubscribe from Help, How can I boxplot mse and mtry using 20 5-fold 
cross-validation?,click here. 


-- 
View this message in context: 
http://n4.nabble.com/Help-How-can-I-boxplot-mse-and-mtry-using-20-5-fold-cross-validation-tp1013058p1013515.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] apply a function down each column

2010-01-13 Thread Laetitia Schmid
Thank you very much! It works now perfectly. I even extended it to be  
able to apply it to the whole dataset:


data<-read.delim("mhc_data.txt", stringsAsFactors=FALSE)

lettermatch <- function(a, b) {
tb <- merge(as.data.frame(table(strsplit(a, ""))),
as.data.frame(table(strsplit(b, ""))), by="Var1")
sum(apply(tb[-1], 1, min))
}

output<-matrix(ncol=(ncol(data)-1),nrow=nrow(data)/2)
sim<-rep(0, nrow(data)/2)

for (y in 2:(ncol(data))) {

for (x in 1:(nrow(data)/2)) {
a <- data[(2*x-1),y]  # odd rows
b <- data[(2*x),y]# even rows
sim[x]<-(lettermatch(a,b))   
}
output[,y-1]<-sim
}
colnames(output)<-c(names(data[2:length(names(data))]))
rownames(output)<-c(1:(nrow(data)/2))

output

Laetitia



Am 12.01.2010 um 18:31 schrieb Peter Ehlers:


Laetitia,

I was just responding to your comment that "R complains
about a syntax error". But I realize now that "2x" would
probably cause an "unexpected symbol" error.

Here's what I get when I run your loop; what do you get?


for (x in 1:(nrow(dat)-1)) {

+  a <- as.character(dat[(2x-1),1])
Error: unexpected symbol in:
"for (x in 1:(nrow(dat)-1)) {
 a <- as.character(dat[(2x"

b <- as.character(dat[(2x),1])

Error: unexpected symbol in " b <- as.character(dat[(2x"

lettermatch(a,b)

Error in strsplit(a, "") : object 'a' not found

}

Error: unexpected '}' in "}"




and here's what I get when I fix the obvious syntax
error:


for (x in 1:(nrow(dat)-1)) {

+  a <- as.character(dat[(2*x-1),1])
+  b <- as.character(dat[(2*x),1])
+  lettermatch(a,b)
+ }
Error in fix.by(by.x, x) : 'by' must specify valid column(s)




That leaves two problems:
1) you're looking at the wrong column in dat[,1]; that
   should be dat[,2], etc.
2) that error message indicates that your index variable (x)
   gets to invalid values.

Try this:

for (x in 1:(nrow(dat)/2)) {
 a <- dat[(2*x-1),2]  # odd rows
 b <- dat[(2*x),2]# even rows
 print(lettermatch(a,b))
}

You don't need the as.character() if you have character data.
Always do a str(dat) before you do any analysis.

 -Peter Ehlers

Laetitia Schmid wrote:

Dear Peter,
thank you for the suggestion.
Unfortunately the star did not help. Did it work for you? For me it  
seems incomplete somehow.

Laetitia


From: Peter Ehlers [ehl...@ucalgary.ca]
Sent: Tuesday, January 12, 2010 09:54 AM
To: Laetitia Schmid
Cc: Steve Lianoglou; r-help@r-project.org
Subject: Re: [R] apply a function down each column

See inline below.

Laetitia Schmid wrote:

Dear Steve,
my solution looks like it would work, but it does not.
I attached a text file with an extract of my data. Maybe you can  
try it
yourself. I want to compare C1 with M1, C2 with M2, C3 with M3,,,  
for

each column.
I do not really know what the problem is. R complains about a  
syntax error.
The function I am applying counts the common strings between the  
two.

Greg Hirson helped me to write it.

lettermatch <- function(a, b) {
  tb <- merge(as.data.frame(table(strsplit(a, ""))),
as.data.frame(table(strsplit(b, ""))), by="Var1")
  sum(apply(tb[-1], 1, min))
}

For example for the second column I tried:

for (x in 1:(nrow(dat)-1)) {
a <- as.character(dat[(2x-1),1])


Shouldn't that be 2*x-1??

 -Peter Ehlers


b <- as.character(dat[(2x),1])
lettermatch(a,b)
}

or

a <- as.character(dat[seq(1, nrow(dat), by=2),2])
b <- as.character(dat[seq(2, nrow(dat), by=2), 2])
all.results <- lettermatch(a,b)

With "dat<-read.delim("data_lgs.txt",stringsAsFactors=FALSE)" I can
leave the "as.character" away in the formula above.

Laetitia

IndividualsSeq1Seq2Seq3Seq4
C1AATTCCGGCTTT
M1
C2AATTCCGGCTTT
M2AGGGAACTCCGGCGTT
C3AGGGAACTCCGGCGTT
M3AGGGAACTCCGGCGTT
C4AATTCCGGCCTT
M4AAATCGGGCTTT
C5AGGGACTTCCCGCTTT
M5AGGGCTTTCCTT
C6AGGGCTTTCCTT
M6AAAGCCTTCTTT
C7AAAGACCCCCCGGTTT
M7AAGGAACCCCGG
C8AATTCCGGCCTT
M8AATTCCGGCCTT
C9
M9
C11AGGGAAACCGGGGGTT
M11AATTCCGGCCTT



Am 11.01.2010 um 15:18 schrieb Steve Lianoglou:


Hi,

On Mon, Jan 11, 2010 at 8:41 AM, Laetitia Schmid >

wrote:

Hello World,
I have a function that makes pairwise comparisons between two
strings. I would like to apply this function to my data (which
consists of columns with different strings) in the way that it
compares the first with the second entry, and then the third  
with the
fourth, and then the fifth with the sixth, and so on down each  
column...

So (2x-1) and (2x) would be the different entries to be compared!

dat= my data:

for the first co

[R] installing RCurl when libcurl is in non-standard location

2010-01-13 Thread Janet Young

Hi,

I'm struggling to install RCurl for 32-bit linux and am hoping for  
some suggestions.


I obtained RCurl_1.3-1.tar.gz from CRAN today, and am using a very  
recent version of R:

R version 2.10.1 Patched (2010-01-12 r50970).

I'm not the sysadmin for this system (disclaimer: my sysadmin skills  
are not very good, I'm afraid).  curl is available centrally on the  
system but it's a little old (7.12.3 - looks from some older r-help  
posts like this is too old for RCurl). Therefore I installed libcurl  
7.19.7 in a non-standard location (because I'm not the sysadmin), and  
I think I'm pointing R towards this new libcurl OK, but I'm not 100%  
sure about that. The output of locate (see below) makes me a little  
suspicious, but the output of the R CMD INSTALL makes it seem like the  
new libcurl I installed IS being used.


I've included various output below that I hope will help in figuring  
this out. Is there anything else that would be useful to know? I can  
also ask our sysadmin for help if that makes more sense than asking  
you all via r-help.


Thanks very much in advance for any ideas,

Janet Young

---

[2] zork20:/home/jayoung> uname -a
Linux zork20 2.6.12-1.1381_FC3smp #1 SMP Fri Oct 21 04:03:26 EDT 2005  
i686 athlon i386 GNU/Linux

[3] zork20:/home/jayoung> setenv MAKE gmake
[4] zork20:/home/jayoung> which gmake
/usr/bin/gmake
[5] zork20:/home/jayoung> gmake -version
GNU Make 3.80
Copyright (C) 2002  Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
[6] zork20:/home/jayoung> which curl-config
/home/jayoung/traskdata/bin_linux/curl-config
[7] zork20:/home/jayoung> curl-config --version
libcurl 7.19.7
[8] zork20:/home/jayoung> locate curl-config
/usr/bin/curl-config
/usr/share/man/man1/curl-config.1.gz
[16] zork20:/home/jayoung> /usr/bin/curl-config --version
libcurl 7.12.3
[9] zork20:/home/jayoung> locate libcurl
/usr/lib/libcurl.so.3
/usr/lib/libcurl.so
/usr/lib/libcurl.a
/usr/lib/libcurl.so.3.0.0
/usr/share/man/man3/libcurl-multi.3.gz
/usr/share/man/man3/libcurl-easy.3.gz
/usr/share/man/man3/libcurl-errors.3.gz
/usr/share/man/man3/libcurl-share.3.gz
/usr/share/man/man3/libcurl-tutorial.3.gz
/usr/share/man/man3/libcurl.3.gz
[10] zork20:/home/jayoung> ls ~/traskdata/lib_linux/libcu*
/home/jayoung/traskdata/lib_linux/libcurl.a
/home/jayoung/traskdata/lib_linux/libcurl.la*
/home/jayoung/traskdata/lib_linux/libcurl.so@
/home/jayoung/traskdata/lib_linux/libcurl.so.3@
/home/jayoung/traskdata/lib_linux/libcurl.so.3.0.0*
/home/jayoung/traskdata/lib_linux/libcurl.so.4@
/home/jayoung/traskdata/lib_linux/libcurl.so.4.0.0*
/home/jayoung/traskdata/lib_linux/libcurl.so.4.1.1*
[11] zork20:/home/jayoung> printenv LD_LIBRARY_PATH
/home/btrask/traskdata/lib_linux:/home/jayoung/traskdata/bin_linux/qt/ 
lib:/home/btrask/traskdata/lib_linux/R/library/RSPerl/libs:/home/ 
btrask/traskdata/lib_linux/R/lib
[14] zork20:/home/jayoung/source_codes/R/other_packages> R CMD INSTALL  
RCurl_1.3-1.tar.gz --configure-args='--libdir=/home/btrask/traskdata/ 
lib_linux --includedir=/home/btrask/traskdata/include'

* installing to library ‘/home/btrask/traskdata/lib_linux/R/library’
* installing *source* package ‘RCurl’ ...
checking for curl-config... /home/jayoung/traskdata/bin_linux/curl- 
config

checking for gcc... gcc
checking for C compiler default output file name... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables...
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ANSI C... none needed
checking how to run the C preprocessor... gcc -E
Version has a libidn field
Version has CURLOPT_URL
Version has CURLINFO_EFFECTIVE_URL
Version has CURLINFO_RESPONSE_CODE
Version has CURLINFO_TOTAL_TIME
Version has CURLINFO_NAMELOOKUP_TIME
Version has CURLINFO_CONNECT_TIME
Version has CURLINFO_PRETRANSFER_TIME
Version has CURLINFO_SIZE_UPLOAD
Version has CURLINFO_SIZE_DOWNLOAD
Version has CURLINFO_SPEED_DOWNLOAD
Version has CURLINFO_SPEED_UPLOAD
Version has CURLINFO_HEADER_SIZE
Version has CURLINFO_REQUEST_SIZE
Version has CURLINFO_SSL_VERIFYRESULT
Version has CURLINFO_FILETIME
Version has CURLINFO_CONTENT_LENGTH_DOWNLOAD
Version has CURLINFO_CONTENT_LENGTH_UPLOAD
Version has CURLINFO_STARTTRANSFER_TIME
Version has CURLINFO_CONTENT_TYPE
Version has CURLINFO_REDIRECT_TIME
Version has CURLINFO_REDIRECT_COUNT
Version has CURLINFO_PRIVATE
Version has CURLINFO_HTTP_CONNECTCODE
Version has CURLINFO_HTTPAUTH_AVAIL
Version has CURLINFO_PROXYAUTH_AVAIL
Version has CURLINFO_OS_ERRNO
Version has CURLINFO_NUM_CONNECTS
Version has CURLINFO_SSL_ENGINES
No CURLINFO_COOKIELIST enumeration value.
No CURLINFO_LASTSOCKET enumeration value.
No CURLINFO_FTP_ENTRY_PATH enumeration value.
No CURLINFO_REDIRECT_URL en

Re: [R] FW: Problems connecting with MySQL using odbcDriverConnect (RODBC package) on Linux

2010-01-13 Thread Orvalho Augusto
Thanks you solved and share with us.

But, why don't you use the RMySQL, which connects to MySQL without the
need of ODBC?

Caveman


On Wed, Jan 13, 2010 at 1:48 AM, Marcus, Jeffrey
 wrote:
> I think I figured this out. I should not have put the Driver name in
> braces. Changing it from {MySQL} to MySQL seems to work.
>
> -Original Message-
> From: Marcus, Jeffrey
> Sent: Tuesday, January 12, 2010 6:09 PM
> To: 'r-help@r-project.org'
> Subject: Problems connecting with MySQL using odbcDriverConnect (RODBC
> package) on Linux
>
> I am sure I'm doing something wrong here but not sure what.
>
> Our system administrator recently installed UnixODBC and the MyODBC
> driver on a Linux box running Linux version 2.6 x86_64.
>
> I have an .odbc.ini file in my home directory with following lines:
>
> [mydb]
> Description = MySQL server on my-server
> Driver=/usr/lib64/libmyodbc3.so
> SERVER=my-server
>
> I can successfully do the following:
>
> library(RODBC)
> channel <- odbcConnect("mydb")
> sqlQuery(channel, "show databases")
>
> And in general, I have no problems using odbcConnect to connect to the
> mydb DSN.
>
> However, for various reasons I want to make a "DSN-less" connection
> using odbcDriverConnect. However, everything I've tried generated a
> "data source not found" message (see below for details)
>
>  After reading through various documents, I tried doing following.
>
> (1) Put an odbcinst.ini file in my home directory with following lines
> [MySQL]
> Description     = ODBC for MySQL
> Driver=/usr/lib64/libmyodbc3.so
> Setup           = /usr/lib/libodbcmyS.so
> FileUsage       = 1
>
> (2) Install it with odbcinst -i -f. This seems to work as when I type
> odbcinst -j I get
>
> DRIVERS: /home/jmarcus/odbcinst.ini
> SYSTEM DATA SOURCES: /home/jmarcus/odbc.ini
> USER DATA SOURCES..: /home/jmarcus/.odbc.ini
>
>
> (2) Set the environment variable to point to this file:
>
> bash-3.2$  ODBCSYSINI=/home/jmarcus
> bash-3.2$ export ODBCSYSINI
>
> (3) Start R
>
> Note that R has inherited environment variable
>> Sys.getenv("ODBCSYSINI")
>
>     ODBCSYSINI
> "/home/jmarcus"
>
> (4) Try to connect to the MySQL server
>
>  > conn <-
> odbcDriverConnect(connection="Driver={MySQL};Server=my-server;Database=m
> y_database;Uid=my_username;Pwd=my_password")
>
> This generates following:
>
> Warning messages:
> 1: In odbcDriverConnect(connection =
> "Driver={MySQL};Server=my-server;Database=my_database;Uid=my_username;Pw
> d=my_password") :
>  [RODBC] ERROR: state IM002, code 0, message [unixODBC][Driver
> Manager]Data source name not found, and no default driver specified
> 2: In odbcDriverConnect(connection =
> "Driver={MySQL};Server=my-server;Database=my_database;Uid=my_username;Pw
> d=my_password") :
>  ODBC connection failed
>
>
> Can anyone see what I'm doing wrong? Thanks.
>
>  Jeff
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
OpenSource Software Consultant
CENFOSS (www.cenfoss.co.mz)
SP Tech (www.sptech.co.mz)
email: orvaq...@cenfoss.co.mz
cell: +258828810980

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Formula for normal distribution with know mean and standard error and n terms

2010-01-13 Thread GlenB



steve_fried...@nps.gov wrote:
> 
> I am searching for a method to calculate a normal distribution.
> 
> For example this equation is used to calculate the normal curve when the
> mean and standard deviation are know.
> p(x) = (1/σ*sqrt(2π)) x exp (- (x-μ)2/2σ2)
> 
> However, some of the literature I'm reading (I'm building an ecological
> niche model for vegetation along several ecological gradients) report the
> standard error instead and n sample size.  Is there an equivalent formula
> ?
> If so, how can I also normalize the p(x) term to be within the 0-1 range?
> 

What you have there (p) is a density rather than the distribution.

note that p(x) is NOT a probability, so it doesn't lie between 0 and 1 

(integrals of p(x).dx are probabilities and do lie between 0 and 1)

The function to compute p is dnorm. Try ?dnorm in R.

if you're given the standard error of a mean (which I'll call "se") and n, 
then sigma = sqrt(n)*se

(because se = sigma/sqrt(n) ).

If it's the standard error of something other than the mean you'll need to
give
more details.


-- 
View this message in context: 
http://n4.nabble.com/Formula-for-normal-distribution-with-know-mean-and-standard-error-and-n-terms-tp1013280p1013552.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] optimization challenge

2010-01-13 Thread Albyn Jones
Hi Aaron!  It's always nice to see a former student doing well.

Thanks for the notes and references, too!

albyn

On Wed, Jan 13, 2010 at 07:29:57PM -0500, Aaron Mackey wrote:
> FYI, in bioinformatics, we use dynamic programming algorithms in similar
> ways to solve similar problems of finding guaranteed-optimal partitions in
> streams of data (usually DNA or protein sequence, but sometimes numerical
> data from chip-arrays).  These "path optimization" algorithms are often
> called Viterbi algorithms, a web search for which should provide multiple
> references.
> 
> The solutions are not necessarily unique (there may be multiple
> paths/partitions with identical integer maxima in some systems) and there is
> much research on whether the optimal solution is actually the one you want
> to work with (for example, there may be a fair amount of probability mass
> within an area/ensemble of suboptimal solutions that overall have greater
> posterior probabilities than does the optimal solution "singleton").  See
> Chip Lawrence's PNAS paper for more erudite discussion, and references
> therein: www.pnas.org/content/105/9/3209.abstract
> 
> -Aaron
> 
> P.S. Good to see you here Albyn -- I enjoyed your stat. methods course at
> Reed back in 1993, which started me down a somewhat windy road to
> statistical genomics!
> 
> --
> Aaron J. Mackey, PhD
> Assistant Professor
> Center for Public Health Genomics
> University of Virginia
> amac...@virginia.edu
> 
> 
> On Wed, Jan 13, 2010 at 5:23 PM, Ravi Varadhan  wrote:
> 
> > Greg - thanks for posting this interesting problem.
> >
> > Albyn - thanks for posting a solution.  Now, I have some questions: (1) is
> > the algorithm guaranteed to find a "best" solution? (2) can there be
> > multiple solutions (it seems like there can be more than 1 solution
> > depending on the data)?, and (3) is there a good reference for this and
> > similar algorithms?
> >
> > Thanks & Best,
> > Ravi.
> >
> >
> > 
> > ---
> >
> > Ravi Varadhan, Ph.D.
> >
> > Assistant Professor, The Center on Aging and Health
> >
> > Division of Geriatric Medicine and Gerontology
> >
> > Johns Hopkins University
> >
> > Ph: (410) 502-2619
> >
> > Fax: (410) 614-9625
> >
> > Email: rvarad...@jhmi.edu
> >
> > Webpage:
> >
> > http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h
> > tml
> >
> >
> >
> >
> > 
> > 
> >
> >
> > -Original Message-
> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> > On
> > Behalf Of Albyn Jones
> > Sent: Wednesday, January 13, 2010 1:19 PM
> > To: Greg Snow
> > Cc: r-help@r-project.org
> > Subject: Re: [R] optimization challenge
> >
> > The key idea is that you are building a matrix that contains the
> > solutions to smaller problems which are sub-problems of the big
> > problem.  The first row of the matrix SSQ contains the solution for no
> > splits, ie SSQ[1,j] is just the sum of squares about the overall mean
> > for reading chapters1 through j in one day.  The iteration then uses
> > row m-1 to construct row m, since if SSQ[m-1,j] (optimal reading of j
> > chapters in m-1 days) is part of the overall optimal solution, you
> > have already computed it, and so don't ever need to recompute it.
> >
> >   TS = SSQ[m-1,j]+(SSQ1[j+1])
> >
> > computes the vector of possible solutions for SSQ[m,n] (n chapters in n
> > days)
> > breaking it into two pieces: chapters 1 to j in m-1 days, and chapters j+1
> > to
> > n in 1 day.  j is a vector in the function, and min(TS) is the minimum
> > over choices of j, ie SSQ[m,n].
> >
> > At the end, SSQ[128,239] is the optimal value for reading all 239
> > chapters in 128 days.  That's just the objective function, so the rest
> > involves constructing the list of optimal cuts, ie which chapters are
> > grouped together for each day's reading.  That code uses the same
> > idea... constructing a list of lists of cutpoints.
> >
> > statisticians should study a bit of data structures and algorithms!
> >
> > albyn
> >
> > On Wed, Jan 13, 2010 at 10:45:11AM -0700, Greg Snow wrote:
> > > WOW, your results give about half the variance of my best optim run
> > (possibly due to my suboptimal use of optim).
> > >
> > > Can you describe a little what the algorithm is doing?
> > >
> > > --
> > > Gregory (Greg) L. Snow Ph.D.
> > > Statistical Data Center
> > > Intermountain Healthcare
> > > greg.s...@imail.org
> > > 801.408.8111
> > >
> > >
> > > > -Original Message-
> > > > From: Albyn Jones [mailto:jo...@reed.edu]
> > > > Sent: Tuesday, January 12, 2010 5:31 PM
> > > > To: Greg Snow
> > > > Cc: r-help@r-project.org
> > > > Subject: Re: [R] optimization challenge
> > > >
> > > > Greg
> > > >
> > > > Nice problem: I wasted my whole day on it :-)
> 

Re: [R] Error: object of type 'closure' is not subsettable

2010-01-13 Thread Gabor Grothendieck
See ?rep where it says that the argument must be a vector.  Try
   rep(list(sin), 3)

On Wed, Jan 13, 2010 at 8:11 PM, Matthew Walker
 wrote:
> Hi everyone,
>
> Would somebody please explain (or point me to a reference that explains) the
> following error:
>
> "Error: object of type 'closure' is not subsettable"
>
> I was trying to use rep() to replicate a function:
>
>> example_function <- function() { return(TRUE) }
>> rep(example_function, 3)
> Error: object of type 'closure' is not subsettable
>
> But I just cannot understand this error.  I can combine functions using "c"
> without any problems:
>
>> c(example_function, example_function)
> [[1]]
> function ()
> {
>   return(TRUE)
> }
>
> [[2]]
> function ()
> {
>   return(TRUE)
> }
>
> What am I doing wrong when I use rep()?
>
> Thanks in advance,
>
> Matthew Walker
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] package spam for R64-devel

2010-01-13 Thread Julian Ramirez
Dear Uwe and all,

First of all, I want to congratulate you for your dedication in providing
and maintaining R for 64bit operating systems. I tried the 64bit version of
R, under a windows server 2003 system. It seems to work properly, but am
concerned since I need to use the package "fields", which depends on the
package "spam", which seems to have a check error. I know 64bit versions of
R and its packages are just starting to roll, but I wonder if there's a
possibility of making the "spam" package working on 64bit R. From what I saw
in the log file (
http://www.statistik.tu-dortmund.de/~ligges/CRAN/bin/windows64/contrib/r-devel/check/spam-check.log)
it seems to be a problem with tests.

Is it possible to run the R CMD check for the "spam" package with the
--no-tests flag? By the way, the "fields" package was built using the
--no-tests flag

Many thanks for any help you might be able to provide,


Julian Ramirez
Research Assistant
International Centre for Tropical Agriculture, CIAT
Colombia

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error: object of type 'closure' is not subsettable

2010-01-13 Thread Matthew Walker

Hi everyone,

Would somebody please explain (or point me to a reference that explains) 
the following error:


"Error: object of type 'closure' is not subsettable"

I was trying to use rep() to replicate a function:

> example_function <- function() { return(TRUE) }
> rep(example_function, 3)
Error: object of type 'closure' is not subsettable

But I just cannot understand this error.  I can combine functions using 
"c" without any problems:


> c(example_function, example_function)
[[1]]
function ()
{
   return(TRUE)
}

[[2]]
function ()
{
   return(TRUE)
}

What am I doing wrong when I use rep()?

Thanks in advance,

Matthew Walker

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Updated comparison table for SAS-SPSS Add-ons and R Functions

2010-01-13 Thread Muenchen, Robert A (Bob)
>From: b.rowling...@googlemail.com [mailto:b.rowling...@googlemail.com] On 
>Behalf Of Barry Rowlingson
>Sent: Wednesday, January 13, 2010 7:03 PM
>To: Muenchen, Robert A (Bob)
>Cc: r-help@r-project.org
>Subject: Re: [R] Updated comparison table for SAS-SPSS Add-ons and R Functions
>
>Maybe the first thing you should do is a global search and replace of 'SPSS' 
>with 'PASW'
>
> http://www.spss.com/software/product-name-guide/
>
>Barry

One of the things I updated was to *remove* the now-obsolete "PASW"! Since IBM 
bought the company, they did away with that and renamed things "IBM SPSS 
" See the list at:
http://spss.com/software/statistics/ 
They still have some "old" web pages to clean up as you point out.

Cheers, 
Bob

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] optimization challenge

2010-01-13 Thread Aaron Mackey
FYI, in bioinformatics, we use dynamic programming algorithms in similar
ways to solve similar problems of finding guaranteed-optimal partitions in
streams of data (usually DNA or protein sequence, but sometimes numerical
data from chip-arrays).  These "path optimization" algorithms are often
called Viterbi algorithms, a web search for which should provide multiple
references.

The solutions are not necessarily unique (there may be multiple
paths/partitions with identical integer maxima in some systems) and there is
much research on whether the optimal solution is actually the one you want
to work with (for example, there may be a fair amount of probability mass
within an area/ensemble of suboptimal solutions that overall have greater
posterior probabilities than does the optimal solution "singleton").  See
Chip Lawrence's PNAS paper for more erudite discussion, and references
therein: www.pnas.org/content/105/9/3209.abstract

-Aaron

P.S. Good to see you here Albyn -- I enjoyed your stat. methods course at
Reed back in 1993, which started me down a somewhat windy road to
statistical genomics!

--
Aaron J. Mackey, PhD
Assistant Professor
Center for Public Health Genomics
University of Virginia
amac...@virginia.edu


On Wed, Jan 13, 2010 at 5:23 PM, Ravi Varadhan  wrote:

> Greg - thanks for posting this interesting problem.
>
> Albyn - thanks for posting a solution.  Now, I have some questions: (1) is
> the algorithm guaranteed to find a "best" solution? (2) can there be
> multiple solutions (it seems like there can be more than 1 solution
> depending on the data)?, and (3) is there a good reference for this and
> similar algorithms?
>
> Thanks & Best,
> Ravi.
>
>
> 
> ---
>
> Ravi Varadhan, Ph.D.
>
> Assistant Professor, The Center on Aging and Health
>
> Division of Geriatric Medicine and Gerontology
>
> Johns Hopkins University
>
> Ph: (410) 502-2619
>
> Fax: (410) 614-9625
>
> Email: rvarad...@jhmi.edu
>
> Webpage:
>
> http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h
> tml
>
>
>
>
> 
> 
>
>
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On
> Behalf Of Albyn Jones
> Sent: Wednesday, January 13, 2010 1:19 PM
> To: Greg Snow
> Cc: r-help@r-project.org
> Subject: Re: [R] optimization challenge
>
> The key idea is that you are building a matrix that contains the
> solutions to smaller problems which are sub-problems of the big
> problem.  The first row of the matrix SSQ contains the solution for no
> splits, ie SSQ[1,j] is just the sum of squares about the overall mean
> for reading chapters1 through j in one day.  The iteration then uses
> row m-1 to construct row m, since if SSQ[m-1,j] (optimal reading of j
> chapters in m-1 days) is part of the overall optimal solution, you
> have already computed it, and so don't ever need to recompute it.
>
>   TS = SSQ[m-1,j]+(SSQ1[j+1])
>
> computes the vector of possible solutions for SSQ[m,n] (n chapters in n
> days)
> breaking it into two pieces: chapters 1 to j in m-1 days, and chapters j+1
> to
> n in 1 day.  j is a vector in the function, and min(TS) is the minimum
> over choices of j, ie SSQ[m,n].
>
> At the end, SSQ[128,239] is the optimal value for reading all 239
> chapters in 128 days.  That's just the objective function, so the rest
> involves constructing the list of optimal cuts, ie which chapters are
> grouped together for each day's reading.  That code uses the same
> idea... constructing a list of lists of cutpoints.
>
> statisticians should study a bit of data structures and algorithms!
>
> albyn
>
> On Wed, Jan 13, 2010 at 10:45:11AM -0700, Greg Snow wrote:
> > WOW, your results give about half the variance of my best optim run
> (possibly due to my suboptimal use of optim).
> >
> > Can you describe a little what the algorithm is doing?
> >
> > --
> > Gregory (Greg) L. Snow Ph.D.
> > Statistical Data Center
> > Intermountain Healthcare
> > greg.s...@imail.org
> > 801.408.8111
> >
> >
> > > -Original Message-
> > > From: Albyn Jones [mailto:jo...@reed.edu]
> > > Sent: Tuesday, January 12, 2010 5:31 PM
> > > To: Greg Snow
> > > Cc: r-help@r-project.org
> > > Subject: Re: [R] optimization challenge
> > >
> > > Greg
> > >
> > > Nice problem: I wasted my whole day on it :-)
> > >
> > > I was explaining my plan for a solution to a colleague who is a
> > > computer scientist, he pointed out that I was trying to re-invent the
> > > wheel known as dynamic programming.  here is my code, apparently it is
> > > called "bottom up dynamic programming".  It runs pretty quickly, and
> > > returns (what I hope is :-) the optimal sum of squares and the
> > > cut-points.
> > >
> > > function(X=bom3$Verses,days=128){
> > > # fi

Re: [R] a question about deleting rows

2010-01-13 Thread jim holtman
Try this:

> x
  id n1 n2 n3 n4 n5 n6
1  1  3  4  7  8 10  2
2  2  4  1  2  4  3 10
3  3  7  0  0  0  0  8
4  4 10  1  0  0  2  3
5  5 11  1  0  0  0  5
> delete <- with(x, n2 == 0 & n3 == 0 & n4 == 0 & n5 == 0)
> delete
[1] FALSE FALSE  TRUE FALSE FALSE
> x[!delete,]
  id n1 n2 n3 n4 n5 n6
1  1  3  4  7  8 10  2
2  2  4  1  2  4  3 10
4  4 10  1  0  0  2  3
5  5 11  1  0  0  0  5
>


On Wed, Jan 13, 2010 at 5:15 PM, karena  wrote:

>
> I have a file like this:
> idn1n2   n3   n4   n5   n6
> 1  3 47 8 102
> 2  4 12 4 3 10
> 3  7 00 0 0 8
> 4  1010 0 2 3
> 5  1110 0 0 5
>
> what I want to do is: only if n2=0 and n3=0 and n4=0 and n5=0 then delete
> the row. how can I do that?
>
> thank you,
>
> karena
> --
> View this message in context:
> http://n4.nabble.com/a-question-about-deleting-rows-tp1013403p1013403.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] a question about deleting rows

2010-01-13 Thread Steve Taylor
yourdataframe = subset(yourdataframe, !(n2==0 & n3==0 & n4==0 & n5==0))

>>> 
From: karena 
To:
Date: 14/Jan/2010 12:24 p.m.
Subject: [R]  a question about deleting rows

I have a file like this:
idn1n2   n3   n4   n5   n6  
1  3 47 8 102
2  4 12 4 3 10
3  7 00 0 0 8
4  1010 0 2 3
5  1110 0 0 5

what I want to do is: only if n2=0 and n3=0 and n4=0 and n5=0 then delete
the row. how can I do that?

thank you,

karena 
-- 
View this message in context: 
http://n4.nabble.com/a-question-about-deleting-rows-tp1013403p1013403.html 
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide http://www.R ( http://www.r/ 
)-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Updated comparison table for SAS-SPSS Add-ons and R Functions

2010-01-13 Thread Barry Rowlingson
On Wed, Jan 13, 2010 at 11:53 PM, Muenchen, Robert A (Bob)  wrote:

> Hi All,
>
> I have substantially expanded the table that compares SAS and SPSS
> add-on modules to somewhat equivalent R packages. This new version is
> at:
> http://r4stats.com/add-on-modules
> and I would very much appreciate any feedback you might have on it.
>
> The site http://r4stats.com is the replacement to
> http://RforSASandSPSSusers.com and includes the support files for both
> "R for SAS and SPSS Users" and the new "R for Stata Users", due out in
> March from Springer. I'll phase the older site out eventually and change
> the URL to point to the new one.
>
>
Maybe the first thing you should do is a global search and replace of 'SPSS'
with 'PASW'

 http://www.spss.com/software/product-name-guide/

Barry

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Updated comparison table for SAS-SPSS Add-ons and R Functions

2010-01-13 Thread Muenchen, Robert A (Bob)
Hi All,

I have substantially expanded the table that compares SAS and SPSS
add-on modules to somewhat equivalent R packages. This new version is
at:
http://r4stats.com/add-on-modules 
and I would very much appreciate any feedback you might have on it.

The site http://r4stats.com is the replacement to
http://RforSASandSPSSusers.com and includes the support files for both
"R for SAS and SPSS Users" and the new "R for Stata Users", due out in
March from Springer. I'll phase the older site out eventually and change
the URL to point to the new one.

Thanks,
Bob

=
  Bob Muenchen (pronounced Min'-chen), Manager  
  Research Computing Support
  Voice: (865) 974-5230  
  Email: muenc...@utk.edu
  Web:   http://oit.utk.edu/research, 
  News:  http://oit.utk.edu/research/news.php
=

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] optimization challenge

2010-01-13 Thread Ravi Varadhan
Greg - thanks for posting this interesting problem.

Albyn - thanks for posting a solution.  Now, I have some questions: (1) is
the algorithm guaranteed to find a "best" solution? (2) can there be
multiple solutions (it seems like there can be more than 1 solution
depending on the data)?, and (3) is there a good reference for this and
similar algorithms?

Thanks & Best,
Ravi.


---

Ravi Varadhan, Ph.D.

Assistant Professor, The Center on Aging and Health

Division of Geriatric Medicine and Gerontology 

Johns Hopkins University

Ph: (410) 502-2619

Fax: (410) 614-9625

Email: rvarad...@jhmi.edu

Webpage:
http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h
tml

 





-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Albyn Jones
Sent: Wednesday, January 13, 2010 1:19 PM
To: Greg Snow
Cc: r-help@r-project.org
Subject: Re: [R] optimization challenge

The key idea is that you are building a matrix that contains the
solutions to smaller problems which are sub-problems of the big
problem.  The first row of the matrix SSQ contains the solution for no
splits, ie SSQ[1,j] is just the sum of squares about the overall mean
for reading chapters1 through j in one day.  The iteration then uses
row m-1 to construct row m, since if SSQ[m-1,j] (optimal reading of j
chapters in m-1 days) is part of the overall optimal solution, you
have already computed it, and so don't ever need to recompute it.

   TS = SSQ[m-1,j]+(SSQ1[j+1])

computes the vector of possible solutions for SSQ[m,n] (n chapters in n
days) 
breaking it into two pieces: chapters 1 to j in m-1 days, and chapters j+1
to
n in 1 day.  j is a vector in the function, and min(TS) is the minimum
over choices of j, ie SSQ[m,n].

At the end, SSQ[128,239] is the optimal value for reading all 239
chapters in 128 days.  That's just the objective function, so the rest
involves constructing the list of optimal cuts, ie which chapters are
grouped together for each day's reading.  That code uses the same
idea... constructing a list of lists of cutpoints.

statisticians should study a bit of data structures and algorithms!

albyn

On Wed, Jan 13, 2010 at 10:45:11AM -0700, Greg Snow wrote:
> WOW, your results give about half the variance of my best optim run
(possibly due to my suboptimal use of optim).
> 
> Can you describe a little what the algorithm is doing?
> 
> -- 
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.s...@imail.org
> 801.408.8111
> 
> 
> > -Original Message-
> > From: Albyn Jones [mailto:jo...@reed.edu]
> > Sent: Tuesday, January 12, 2010 5:31 PM
> > To: Greg Snow
> > Cc: r-help@r-project.org
> > Subject: Re: [R] optimization challenge
> > 
> > Greg
> > 
> > Nice problem: I wasted my whole day on it :-)
> > 
> > I was explaining my plan for a solution to a colleague who is a
> > computer scientist, he pointed out that I was trying to re-invent the
> > wheel known as dynamic programming.  here is my code, apparently it is
> > called "bottom up dynamic programming".  It runs pretty quickly, and
> > returns (what I hope is :-) the optimal sum of squares and the
> > cut-points.
> > 
> > function(X=bom3$Verses,days=128){
> > # find optimal BOM reading schedule for Greg Snow
> > # minimize variance of quantity to read per day over 128 days
> > #
> > N = length(X)
> > Nm1 = N-1
> > SSQ<- matrix(NA,nrow=days,ncol=N)
> > Cuts <- list()
> > #
> > #  SSQ[i,j]: the ssqs about the overall mean for the optimal partition
> > #   for i days on the chapters 1 to j
> > #
> > M = sum(X)/days
> > CS = cumsum(X)
> > SSQ[1,]= (CS-M)^2
> > Cuts[[1]]= as.list(1:N)
> > #
> > for(m in 2:days){
> > Cuts[[m]]=list()
> > #for(i in 1:(m-1)) Cuts[[m]][[i]] = Cuts[[m-1]][[i]]
> > for(n in m:N){
> >   CS = cumsum(X[n:1])[n:1]
> >   SSQ1 = (CS-M)^2
> >   j = (m-1):(n-1)
> >   TS = SSQ[m-1,j]+(SSQ1[j+1])
> >   SSQ[m,n] = min(TS)
> >   k = min(which((min(TS)== TS)))+m-1
> >   Cuts[[m]][[n]] = c(Cuts[[m-1]][[k-1]],n)
> > }
> > }
> > list(SSQ=SSQ[days,N],Cuts=Cuts[[days]][[N]])
> > }
> > 
> > $SSQ
> > [1] 11241.05
> > 
> > $Cuts
> >   [1]   2   4   7   9  11  13  15  16  17  19  21  23  25  27  30  31
> > 34  37
> >  [19]  39  41  44  46  48  50  53  56  59  60  62  64  66  68  70  73
> > 75  77
> >  [37]  78  80  82  84  86  88  89  91  92  94  95  96  97  99 100 103
> > 105 106
> >  [55] 108 110 112 113 115 117 119 121 124 125 126 127 129 131 132 135
> > 137 138
> >  [73] 140 141 142 144 145 146 148 150 151 152 154 156 157 160 162 163
> > 164 166
> >  [91] 167 169 171 173 175 177 179 181 183 185 186 188 190 192 193 194
> > 196 199
> > [109] 201 204 205 207 209 211 213 214

Re: [R] merging issue.........

2010-01-13 Thread karena

thank you very much!
-- 
View this message in context: 
http://n4.nabble.com/merging-issue-tp1013356p1013433.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] counting the number of times a string appears

2010-01-13 Thread Jesse Sinclair
This is great all.

It works perfectly. Thank-you.

Cheers,
Jesse

On Wed, Jan 13, 2010 at 14:27, Adrian Dusa  wrote:

> Hi Jesse,
>
> If your vector is called "aa", then how about:
>
> > table(aa)
> aa
>  spp1 spp10  spp2  spp3  spp4  spp5  spp6  spp7  spp8  spp9
>7 216 815 9 910 915
>
> Hope this helps,
> Adrian
>
>
> On Thursday 14 January 2010, Jesse Sinclair wrote:
> > Hi all,
> >
> > I have a vector of strings and need to count the number of times a string
> > appears in the vector.
> >
> > eg:
> >
> >  [1] spp6  spp10 spp6  spp6  spp4  spp2  spp9  spp10 spp5  spp2  spp2
>  spp3
> >  [13] spp4  spp3  spp6  spp10 spp6  spp4  spp9  spp3  spp6  spp1  spp10
> >  spp8
> >
> >  [25] spp2  spp10 spp9  spp7  spp1  spp3  spp8  spp6  spp3  spp8  spp6
> >  spp5
> >
> >  [37] spp5  spp9  spp3  spp1  spp4  spp5  spp9  spp3  spp3  spp5  spp4
> >  spp9
> >
> >  [49] spp3  spp7  spp7  spp2  spp6  spp5  spp7  spp4  spp8  spp9  spp2
> >  spp6
> >
> >  [61] spp3  spp3  spp2  spp6  spp3  spp5  spp6  spp6  spp4  spp1  spp1
> >  spp1
> >
> >  [73] spp10 spp8  spp1  spp6  spp1  spp5  spp8  spp9  spp5  spp6  spp9
> > spp10
> >  [85] spp2  spp6  spp10 spp1  spp2  spp3  spp5  spp8  spp2  spp7  spp4
> >  spp7
> >
> >  [97] spp2  spp6  spp2  spp6
> >
> > Is it possible to create a vector of counts for each spp1-spp10?
> >
> > Any help or ideas would be appreciated.
> >
> > Cheers,
> > Jesse
> >
> >   [[alternative HTML version deleted]]
> >
>
>
> --
> Adrian Dusa
> Romanian Social Data Archive
> 1, Schitu Magureanu Bd.
> 050025 Bucharest sector 5
> Romania
> Tel.:+40 21 3126618 \
> +40 21 3120210 / int.101
> Fax: +40 21 3158391
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] a question about deleting rows

2010-01-13 Thread karena

I have a file like this:
idn1n2   n3   n4   n5   n6  
1  3 47 8 102
2  4 12 4 3 10
3  7 00 0 0 8
4  1010 0 2 3
5  1110 0 0 5

what I want to do is: only if n2=0 and n3=0 and n4=0 and n5=0 then delete
the row. how can I do that?

thank you,

karena 
-- 
View this message in context: 
http://n4.nabble.com/a-question-about-deleting-rows-tp1013403p1013403.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Operating on each row of data frame

2010-01-13 Thread Stephan Kolassa

Hi,

does this do what you want?

d <- cbind(d,apply(d[,c(2,3,4)],1,mean),apply(d[,c(2,3,4)],1,sd))

HTH,
Stephan


Abhishek Pratap schrieb:

Hi All

I have a data frame in which there are 4 columns .

Column 1 : name

Column 2-4 : values

I would like to calculate mean/Standard error  of values in column 2-4 and
store them in column 5,6 respectively.



I have done the following but doesn't seem to work

mean_N_SE <-function(x)
{

name <- x[1]
vals <- c(x[2:4])
temp_mean <- mean(vals)
SE <-  sqrt(var(x)/length(x))

}

apply(d,1,mean_N_SE) where d = data frame.


Can someone help me with this.

Thanks!
-Abhi

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] counting the number of times a string appears

2010-01-13 Thread Adrian Dusa
Hi Jesse,

If your vector is called "aa", then how about:

> table(aa)
aa
 spp1 spp10  spp2  spp3  spp4  spp5  spp6  spp7  spp8  spp9
7 216 815 9 910 915

Hope this helps,
Adrian


On Thursday 14 January 2010, Jesse Sinclair wrote:
> Hi all,
> 
> I have a vector of strings and need to count the number of times a string
> appears in the vector.
> 
> eg:
> 
>  [1] spp6  spp10 spp6  spp6  spp4  spp2  spp9  spp10 spp5  spp2  spp2  spp3
>  [13] spp4  spp3  spp6  spp10 spp6  spp4  spp9  spp3  spp6  spp1  spp10
>  spp8
> 
>  [25] spp2  spp10 spp9  spp7  spp1  spp3  spp8  spp6  spp3  spp8  spp6 
>  spp5
> 
>  [37] spp5  spp9  spp3  spp1  spp4  spp5  spp9  spp3  spp3  spp5  spp4 
>  spp9
> 
>  [49] spp3  spp7  spp7  spp2  spp6  spp5  spp7  spp4  spp8  spp9  spp2 
>  spp6
> 
>  [61] spp3  spp3  spp2  spp6  spp3  spp5  spp6  spp6  spp4  spp1  spp1 
>  spp1
> 
>  [73] spp10 spp8  spp1  spp6  spp1  spp5  spp8  spp9  spp5  spp6  spp9
> spp10
>  [85] spp2  spp6  spp10 spp1  spp2  spp3  spp5  spp8  spp2  spp7  spp4 
>  spp7
> 
>  [97] spp2  spp6  spp2  spp6
> 
> Is it possible to create a vector of counts for each spp1-spp10?
> 
> Any help or ideas would be appreciated.
> 
> Cheers,
> Jesse
> 
>   [[alternative HTML version deleted]]
> 


-- 
Adrian Dusa
Romanian Social Data Archive
1, Schitu Magureanu Bd.
050025 Bucharest sector 5
Romania
Tel.:+40 21 3126618 \
 +40 21 3120210 / int.101
Fax: +40 21 3158391

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merging issue.........

2010-01-13 Thread Heinz Tuechler

Did you consider to look at the help page for merge?
h

At 22:01 13.01.2010, karena wrote:


hi, I have a question about merging two files.
For example, I have two files, the first file is like the following:

id   trait1
110.2
211.1
39.7
610.2
78.9
10  9.7
11  10.2

The second file is like the following:
idtrait2
1 9.8
2 10.8
4 7.8
5 9.8
6 10.1
1210.2
1310.1

now I want to merge the two files by the variable "id", I only want to keep
the "id"s which show up in the first file. Even the "id" does not show up in
the second file, it doesn't matter, I can keep the missing values. So my
question is: how can I merge the two files and keep only the rows whose "id"
show up in the first file?
I know how to do it is SAS, just use the following code:
merge data1(in=in1) data2(in=in2);
by id;
if in1;

but I really have no idea about how to do it in R.

thank you in advance,

karean
--
View this message in context: 
http://n4.nabble.com/merging-issue-tp1013356p1013356.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merging issue.........

2010-01-13 Thread Adrian Dusa
Hi Karean,

If your first object is called obj1 and the second called obj2, then:

> merge(obj1, obj2, all.x=TRUE)
  id trait1 trait2
1  1   10.29.8
2  2   11.1   10.8
3  39.7 NA
4  6   10.2   10.1
5  78.9 NA
6 109.7 NA
7 11   10.2 NA

Hope this helps,
Adrian

On Wednesday 13 January 2010, karena wrote:
> hi, I have a question about merging two files.
> For example, I have two files, the first file is like the following:
> 
> id   trait1
> 110.2
> 211.1
> 39.7
> 610.2
> 78.9
> 10  9.7
> 11  10.2
> 
> The second file is like the following:
> idtrait2
> 1 9.8
> 2 10.8
> 4 7.8
> 5 9.8
> 6 10.1
> 1210.2
> 1310.1
> 
> now I want to merge the two files by the variable "id", I only want to keep
> the "id"s which show up in the first file. Even the "id" does not show up
>  in the second file, it doesn't matter, I can keep the missing values. So
>  my question is: how can I merge the two files and keep only the rows whose
>  "id" show up in the first file?
> I know how to do it is SAS, just use the following code:
> merge data1(in=in1) data2(in=in2);
> by id;
> if in1;
> 
> but I really have no idea about how to do it in R.
> 
> thank you in advance,
> 
> karean
> 


-- 
Adrian Dusa
Romanian Social Data Archive
1, Schitu Magureanu Bd.
050025 Bucharest sector 5
Romania
Tel.:+40 21 3126618 \
 +40 21 3120210 / int.101
Fax: +40 21 3158391

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Advantages of using SQLite for data import in comparison to csv files

2010-01-13 Thread Gabor Grothendieck
You could look at read.csv.sql in sqldf (http://sqldf.googlecode.com) as well.

On Wed, Jan 13, 2010 at 2:00 PM, Juliet Jacobson  wrote:
> Hello everybody out there using R,
>
> I'm using R for the analysis of biological data and write the results
> down using LaTeX, both on a notebook with linux installed.
> I've already tried two options for the import of my data:
> 1. Import from a SQLite database
> 2. Import from individual csv files edited with sed, awk and sort.
> Both methods actually work very well, since I don't need advanced
> features like multi-user network access to the data.
> My data sets are tables with up to 20 columns and 1000 rows, containing
> mostly numerical values and strings. Moreover,
> I might also have to handle microarray data, but I'm not so sure about
> that yet. Moreover, I need to organise tags for a collection of photos,
> but this data is of course not analysed with R.
> I'm now beginning to work on a larger project and have to decide,
> whether it is better to use SQLite or csv-files for handling my data.
> I fear, it might get difficult to switch between the two system after
> having accumulated the data, adapted software for backups and revision
> control, written makefiles etc.
> Could anyone of you give me a hint on the additional benefits of
> importing data from a SQLite database to R to the simpler way of
> organising the data in csv files? Is it for example possible to select
> values from a column within a certain range from a csv file using awk?
>
> Thanks in advance,
> Juliet Jacobson
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] counting the number of times a string appears

2010-01-13 Thread Rolf Turner


?table

On 14/01/2010, at 11:12 AM, Jesse Sinclair wrote:


Hi all,

I have a vector of strings and need to count the number of times a  
string

appears in the vector.

eg:

 [1] spp6  spp10 spp6  spp6  spp4  spp2  spp9  spp10 spp5  spp2   
spp2  spp3
 [13] spp4  spp3  spp6  spp10 spp6  spp4  spp9  spp3  spp6  spp1   
spp10 spp8


 [25] spp2  spp10 spp9  spp7  spp1  spp3  spp8  spp6  spp3  spp8   
spp6  spp5


 [37] spp5  spp9  spp3  spp1  spp4  spp5  spp9  spp3  spp3  spp5   
spp4  spp9


 [49] spp3  spp7  spp7  spp2  spp6  spp5  spp7  spp4  spp8  spp9   
spp2  spp6


 [61] spp3  spp3  spp2  spp6  spp3  spp5  spp6  spp6  spp4  spp1   
spp1  spp1


 [73] spp10 spp8  spp1  spp6  spp1  spp5  spp8  spp9  spp5  spp6  spp9
spp10
 [85] spp2  spp6  spp10 spp1  spp2  spp3  spp5  spp8  spp2  spp7   
spp4  spp7


 [97] spp2  spp6  spp2  spp6

Is it possible to create a vector of counts for each spp1-spp10?

Any help or ideas would be appreciated.

Cheers,
Jesse

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting- 
guide.html

and provide commented, minimal, self-contained, reproducible code.



##
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] counting the number of times a string appears

2010-01-13 Thread Greg Hirson

Jesse,

see ?table and try

table(stringVector)

Greg

On 1/13/10 2:12 PM, Jesse Sinclair wrote:

Hi all,

I have a vector of strings and need to count the number of times a string
appears in the vector.

eg:

  [1] spp6  spp10 spp6  spp6  spp4  spp2  spp9  spp10 spp5  spp2  spp2  spp3
  [13] spp4  spp3  spp6  spp10 spp6  spp4  spp9  spp3  spp6  spp1  spp10 spp8

  [25] spp2  spp10 spp9  spp7  spp1  spp3  spp8  spp6  spp3  spp8  spp6  spp5

  [37] spp5  spp9  spp3  spp1  spp4  spp5  spp9  spp3  spp3  spp5  spp4  spp9

  [49] spp3  spp7  spp7  spp2  spp6  spp5  spp7  spp4  spp8  spp9  spp2  spp6

  [61] spp3  spp3  spp2  spp6  spp3  spp5  spp6  spp6  spp4  spp1  spp1  spp1

  [73] spp10 spp8  spp1  spp6  spp1  spp5  spp8  spp9  spp5  spp6  spp9
spp10
  [85] spp2  spp6  spp10 spp1  spp2  spp3  spp5  spp8  spp2  spp7  spp4  spp7

  [97] spp2  spp6  spp2  spp6

Is it possible to create a vector of counts for each spp1-spp10?

Any help or ideas would be appreciated.

Cheers,
Jesse

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
   


--
Greg Hirson
ghir...@ucdavis.edu

Graduate Student
Agricultural and Environmental Chemistry

1106 Robert Mondavi Institute North
One Shields Avenue
Davis, CA 95616

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Operating on each row of data frame

2010-01-13 Thread Abhishek Pratap
Thanks all for a very quick solution. It is actually good to know different
ways to do the same things. It expands my limited understanding of R :).

-A

On Wed, Jan 13, 2010 at 5:12 PM, Stephan Kolassa wrote:

> Hi,
>
> does this do what you want?
>
> d <- cbind(d,apply(d[,c(2,3,4)],1,mean),apply(d[,c(2,3,4)],1,sd))
>
> HTH,
> Stephan
>
>
> Abhishek Pratap schrieb:
>
>> Hi All
>>
>> I have a data frame in which there are 4 columns .
>>
>> Column 1 : name
>>
>> Column 2-4 : values
>>
>> I would like to calculate mean/Standard error  of values in column 2-4 and
>> store them in column 5,6 respectively.
>>
>>
>>
>> I have done the following but doesn't seem to work
>>
>> mean_N_SE <-function(x)
>> {
>>
>> name <- x[1]
>> vals <- c(x[2:4])
>> temp_mean <- mean(vals)
>> SE <-  sqrt(var(x)/length(x))
>>
>> }
>>
>> apply(d,1,mean_N_SE) where d = data frame.
>>
>>
>> Can someone help me with this.
>>
>> Thanks!
>> -Abhi
>>
>>[[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] counting the number of times a string appears

2010-01-13 Thread Jesse Sinclair
Hi all,

I have a vector of strings and need to count the number of times a string
appears in the vector.

eg:

 [1] spp6  spp10 spp6  spp6  spp4  spp2  spp9  spp10 spp5  spp2  spp2  spp3
 [13] spp4  spp3  spp6  spp10 spp6  spp4  spp9  spp3  spp6  spp1  spp10 spp8

 [25] spp2  spp10 spp9  spp7  spp1  spp3  spp8  spp6  spp3  spp8  spp6  spp5

 [37] spp5  spp9  spp3  spp1  spp4  spp5  spp9  spp3  spp3  spp5  spp4  spp9

 [49] spp3  spp7  spp7  spp2  spp6  spp5  spp7  spp4  spp8  spp9  spp2  spp6

 [61] spp3  spp3  spp2  spp6  spp3  spp5  spp6  spp6  spp4  spp1  spp1  spp1

 [73] spp10 spp8  spp1  spp6  spp1  spp5  spp8  spp9  spp5  spp6  spp9
spp10
 [85] spp2  spp6  spp10 spp1  spp2  spp3  spp5  spp8  spp2  spp7  spp4  spp7

 [97] spp2  spp6  spp2  spp6

Is it possible to create a vector of counts for each spp1-spp10?

Any help or ideas would be appreciated.

Cheers,
Jesse

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] merging issue.........

2010-01-13 Thread karena

hi, I have a question about merging two files.
For example, I have two files, the first file is like the following:

id   trait1
110.2
211.1
39.7
610.2
78.9
10  9.7
11  10.2

The second file is like the following:
idtrait2
1 9.8
2 10.8
4 7.8
5 9.8
6 10.1
1210.2
1310.1

now I want to merge the two files by the variable "id", I only want to keep
the "id"s which show up in the first file. Even the "id" does not show up in
the second file, it doesn't matter, I can keep the missing values. So my
question is: how can I merge the two files and keep only the rows whose "id"
show up in the first file?
I know how to do it is SAS, just use the following code: 
merge data1(in=in1) data2(in=in2);
by id;
if in1;

but I really have no idea about how to do it in R.

thank you in advance,

karean 
-- 
View this message in context: 
http://n4.nabble.com/merging-issue-tp1013356p1013356.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R package dependencies

2010-01-13 Thread Gabor Grothendieck
See the dep function defined here:
http://tolstoy.newcastle.edu.au/R/e6/help/09/03/7159.html

On Wed, Jan 13, 2010 at 11:39 AM, Colin Millar  wrote:
> Hi there,
>
> My question relates to getting information about R packages.  In particular i 
> would like to be able to find from within R:
>  what are a packages dependencies
>  what are a packages reverse dependencies
>  does a package contain a dll
>
> The reason i ask is:
>
> The organisation that i work for is introducing a secure intranet operating 
> on windows PCs and laptops, and this requires that all software / executables 
> / dlls are validated before they are combined to produce a generic PC build.
>
> I would like to maximise the packages available to our staff and so for the 
> packages that we have listed as buisness needs, i would like to include all 
> reverse dependencies of this collection that do not have dlls.
>
> I hope this makes sense (the question not the reason).
>
> Kind regards,
> Colin.
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Operating on each row of data frame

2010-01-13 Thread Pete B

Look at the apply function
?apply

x = data.frame(x1=c(1,2,3,4,5),x2=c(2,4,6,8,10),x3=c(1,3,5,7,9))
x$x5=apply(x,1,mean)
x$x6=apply(x,1,sd)

print(x)



Abhishek Pratap wrote:
> 
> Hi All
> 
> I have a data frame in which there are 4 columns .
> 
> Column 1 : name
> 
> Column 2-4 : values
> 
> I would like to calculate mean/Standard error  of values in column 2-4 and
> store them in column 5,6 respectively.
> 
> 
> 
> I have done the following but doesn't seem to work
> 
> mean_N_SE <-function(x)
> {
> 
> name <- x[1]
> vals <- c(x[2:4])
> temp_mean <- mean(vals)
> SE <-  sqrt(var(x)/length(x))
> 
> }
> 
> apply(d,1,mean_N_SE) where d = data frame.
> 
> 
> Can someone help me with this.
> 
> Thanks!
> -Abhi
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://n4.nabble.com/Operating-on-each-row-of-data-frame-tp1013365p1013397.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Rollapply

2010-01-13 Thread Gabor Grothendieck
See:

http://tolstoy.newcastle.edu.au/R/help/04/03/1446.html

On Wed, Jan 13, 2010 at 3:45 PM, Pete B  wrote:
>
> Hi
>
> I would like to understand how to extend the function (FUN) I am using in
> rollapply below.
>
> ##
> With the following simplified data, test1 yields parameters for a rolling
> regression
>
> data = data.frame(Xvar=c(70.67,70.54,69.87,69.51,70.69,72.66,72.65,73.36),
>               Yvar =c(78.01,77.07,77.35,76.72,77.49,78.70,77.78,79.58))
> data.z = zoo(d)
>
> test1 = rollapply(data.z, width=3,
>          FUN = function(z) coef(lm(z[,1]~z[,2],
>          data=as.data.frame(z))), by.column = FALSE, align = "right")
>
> print(test1)
>
> ##
>
> Rewriting this to call myfn1 gives test2 (and is consistent with test1
> above)
>
> myfn1 = function(mydata){
>      dd = as.data.frame(mydata)
>      l = lm(dd[,1]~dd[,2], data=dd)
>      c = coef(l)
>    }
>
> test2 = rollapply(data.z, width=3,
>     FUN= myfn1, by.column = FALSE, align = "right")
>
> print(test2)
>
> ##
>
> I would like to be able to use the predict function to obtain a prediction
> (and its std error) from the rolling regression I have just calculated.
>
> My effort below issues a warning that 'newdata' had 1 row but variable(s)
> found have 3 rows.
> (if I run this outside of rollapply I don't get this warning)
>
> Also, I don't see the predicted value or its se with print(fm2[[1]]). Again,
> if I run this outside of rollapply I am able to extract the predicted value.
>
>
> Xpred=c(70.67)
>
> myfn2 = function(mydata){
>      dd = as.data.frame(mydata)
>      l = lm(dd[,1]~dd[,2], data=dd)
>      c = coef(l)
>      p = predict(l, data.frame(Xvar=Xpred),se=T)
>      ret=c(l,c,p)
>    }
>
> fm2 = rollapply(data.z, width=3,
>     FUN= myfn2, by.column = FALSE, align = "right")
>
> print(fm2[[1]])
>
>
> Any insights would be gratefully received.
>
> Best regards
>
> Pete
> --
> View this message in context: 
> http://n4.nabble.com/Rollapply-tp1013345p1013345.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merging issue.........

2010-01-13 Thread Pete B

Try the merge function
?merge

in1 = "id trait1 
110.2 
211.1 
39.7 
610.2 
78.9 
10  9.7 
11  10.2 
"

in2 = "id trait2 
1 9.8 
2 10.8 
4 7.8 
5 9.8 
6 10.1 
1210.2 
1310.1
" 

data1 = read.table(textConnection(in1), header=T)
data2 = read.table(textConnection(in2), header=T)

mymerge = merge(data1,data2,all.x=TRUE)
print(mymerge)



karena wrote:
> 
> hi, I have a question about merging two files.
> For example, I have two files, the first file is like the following:
> 
> id   trait1
> 110.2
> 211.1
> 39.7
> 610.2
> 78.9
> 10  9.7
> 11  10.2
> 
> The second file is like the following:
> idtrait2
> 1 9.8
> 2 10.8
> 4 7.8
> 5 9.8
> 6 10.1
> 1210.2
> 1310.1
> 
> now I want to merge the two files by the variable "id", I only want to
> keep the "id"s which show up in the first file. Even the "id" does not
> show up in the second file, it doesn't matter, I can keep the missing
> values. So my question is: how can I merge the two files and keep only the
> rows whose "id" show up in the first file?
> I know how to do it is SAS, just use the following code: 
> merge data1(in=in1) data2(in=in2);
> by id;
> if in1;
> 
> but I really have no idea about how to do it in R.
> 
> thank you in advance,
> 
> karean 
> 

-- 
View this message in context: 
http://n4.nabble.com/merging-issue-tp1013356p1013375.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Operating on each row of data frame

2010-01-13 Thread Abhishek Pratap
Hi All

I have a data frame in which there are 4 columns .

Column 1 : name

Column 2-4 : values

I would like to calculate mean/Standard error  of values in column 2-4 and
store them in column 5,6 respectively.



I have done the following but doesn't seem to work

mean_N_SE <-function(x)
{

name <- x[1]
vals <- c(x[2:4])
temp_mean <- mean(vals)
SE <-  sqrt(var(x)/length(x))

}

apply(d,1,mean_N_SE) where d = data frame.


Can someone help me with this.

Thanks!
-Abhi

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Ask for histogram

2010-01-13 Thread Yi Du
Thanks all, I fixed it.

On Wed, Jan 13, 2010 at 2:47 PM, Don MacQueen  wrote:

> If I do
>
>   b <- rnorm(4332)
>   hist(b,probability=T,breaks=30,col='lightblue',ylim=c(0,1))
>   rug(b)
>
> The plot looks entirely reasonable.
>
> As far as being different from SAS, perhaps SAS and R use different
> breakpoints, that is, different boundaries between the histogram bars.
>
> -Don
>
> At 11:58 AM -0600 1/13/10, Yi Du wrote:
>
>> Hi,
>>
>>
>> I use a vector of data to draw the histogram, but it is different from the
>> graph by SAS. Can you check it for me please?
>>
>> b is a column vector of 4332
>>
>> hist(b,probability=T,breaks=30,col='lightblue',ylim=c(0,1))
>> rug(b)
>>
>> When I used rug, I find the records are smaller than 4332. I don't know
>> where I did wrong.
>>
>> Thanks.
>>
>> --
>> Yi Du
>>
>>[[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://*stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://
>> *www.*R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> --
> --
> Don MacQueen
> Environmental Protection Department
> Lawrence Livermore National Laboratory
> Livermore, CA, USA
> 925-423-1062
> --
>



-- 
Yi Du
Ph. D student in Economics
University of Missouri
Department of Economics
118 Professional Building
Columbia MO  65211
1-573-239-6467

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Ask for histogram

2010-01-13 Thread Don MacQueen

If I do

   b <- rnorm(4332)
   hist(b,probability=T,breaks=30,col='lightblue',ylim=c(0,1))
   rug(b)

The plot looks entirely reasonable.

As far as being different from SAS, perhaps SAS and R use different 
breakpoints, that is, different boundaries between the histogram bars.


-Don

At 11:58 AM -0600 1/13/10, Yi Du wrote:

Hi,


I use a vector of data to draw the histogram, but it is different from the
graph by SAS. Can you check it for me please?

b is a column vector of 4332

hist(b,probability=T,breaks=30,col='lightblue',ylim=c(0,1))
rug(b)

When I used rug, I find the records are smaller than 4332. I don't know
where I did wrong.

Thanks.

--
Yi Du

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://*stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://*www.*R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
--
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA
925-423-1062

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Rollapply

2010-01-13 Thread Pete B

Hi 

I would like to understand how to extend the function (FUN) I am using in
rollapply below.

##
With the following simplified data, test1 yields parameters for a rolling
regression

data = data.frame(Xvar=c(70.67,70.54,69.87,69.51,70.69,72.66,72.65,73.36),
   Yvar =c(78.01,77.07,77.35,76.72,77.49,78.70,77.78,79.58))
data.z = zoo(d)

test1 = rollapply(data.z, width=3, 
  FUN = function(z) coef(lm(z[,1]~z[,2], 
  data=as.data.frame(z))), by.column = FALSE, align = "right")

print(test1)

##

Rewriting this to call myfn1 gives test2 (and is consistent with test1
above)

myfn1 = function(mydata){
  dd = as.data.frame(mydata) 
  l = lm(dd[,1]~dd[,2], data=dd)
  c = coef(l)
}

test2 = rollapply(data.z, width=3, 
 FUN= myfn1, by.column = FALSE, align = "right")

print(test2)

##

I would like to be able to use the predict function to obtain a prediction
(and its std error) from the rolling regression I have just calculated.

My effort below issues a warning that 'newdata' had 1 row but variable(s)
found have 3 rows.
(if I run this outside of rollapply I don't get this warning) 

Also, I don't see the predicted value or its se with print(fm2[[1]]). Again,
if I run this outside of rollapply I am able to extract the predicted value.


Xpred=c(70.67)

myfn2 = function(mydata){
  dd = as.data.frame(mydata) 
  l = lm(dd[,1]~dd[,2], data=dd)
  c = coef(l)
  p = predict(l, data.frame(Xvar=Xpred),se=T)
  ret=c(l,c,p)
}

fm2 = rollapply(data.z, width=3, 
 FUN= myfn2, by.column = FALSE, align = "right")

print(fm2[[1]])


Any insights would be gratefully received.

Best regards

Pete
-- 
View this message in context: 
http://n4.nabble.com/Rollapply-tp1013345p1013345.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Advantages of using SQLite for data import in comparison to csv files

2010-01-13 Thread Juliet Jacobson
Hello everybody out there using R,

I'm using R for the analysis of biological data and write the results
down using LaTeX, both on a notebook with linux installed.
I've already tried two options for the import of my data:
1. Import from a SQLite database
2. Import from individual csv files edited with sed, awk and sort.
Both methods actually work very well, since I don't need advanced
features like multi-user network access to the data.
My data sets are tables with up to 20 columns and 1000 rows, containing
mostly numerical values and strings. Moreover,
I might also have to handle microarray data, but I'm not so sure about
that yet. Moreover, I need to organise tags for a collection of photos,
but this data is of course not analysed with R.
I'm now beginning to work on a larger project and have to decide,
whether it is better to use SQLite or csv-files for handling my data.
I fear, it might get difficult to switch between the two system after
having accumulated the data, adapted software for backups and revision
control, written makefiles etc.
Could anyone of you give me a hint on the additional benefits of
importing data from a SQLite database to R to the simpler way of
organising the data in csv files? Is it for example possible to select
values from a column within a certain range from a csv file using awk?

Thanks in advance,
Juliet Jacobson

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] convert factor data to numeric

2010-01-13 Thread Rolf Turner


On 14/01/2010, at 6:00 AM, Nathalie Yauschew-Raguenes wrote:


Hello,

I find a way to convert data in factor type to numeric :
data_numeric <- as.numeric(as.character(data_factor)).
It's treaky but works.


Possibly even more ``treaky'' but more efficient is:

data_numeric <- as.numeric(levels(data_factor)[data_factor])

as has been pointed out quite a few times on this list.

cheers,

Rolf Turner

##
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] New sp release

2010-01-13 Thread Roger Bivand
The sp package provides class definitions for spatial data, and utilities 
for spatial data handling and manipulation.


The release of sp version 0.9-56 introduces changes in the ways in which 
Polygon, Polygons, and SpatialPolygons objects are created, moving from R 
code to compiled C code. Because of these changes, it is possible that 
users will see changed output. The package maintainers have tested as far 
as possible, and a beta release has been checked by some users, without 
any problems coming to light.


Further details are given in:

https://stat.ethz.ch/pipermail/r-sig-geo/2010-January/007377.html

Should anyone see problems following this change, please contact me 
directly with a reproducible example.


--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: roger.biv...@nhh.no

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Formula for normal distribution with know mean and standard error and n terms

2010-01-13 Thread Steve_Friedman

Hello,

I am searching for a method to calculate a normal distribution.

For example this equation is used to calculate the normal curve when the
mean and standard deviation are know.
p(x) = (1/σ*sqrt(2π)) x exp (- (x-μ)2/2σ2)


or
(Embedded image moved to file: pic27350.jpg)Normal Probability Distribution
Formula


However, some of the literature I'm reading (I'm building an ecological
niche model for vegetation along several ecological gradients) report the
standard error instead and n sample size.  Is there an equivalent formula ?
If so, how can I also normalize the p(x) term to be within the 0-1 range?


Thank you all
Steve


Steve Friedman Ph. D.
Spatial Ecological Analyst
Everglades and Dry Tortugas National Park
950 N Krome Ave (3rd Floor)
Homestead, Florida 33034

steve_fried...@nps.gov
Office (305) 224 - 4282
Fax (305) 224 - 4147__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help, How can I boxplot mse and mtry using 20 5-fold cross-validation?

2010-01-13 Thread Max Kuhn
In caret, see ?trainControl. Use returnResamp = "all"

Max

On Wed, Jan 13, 2010 at 9:47 AM, bbslover  wrote:
>
>  Hello,
>   I am learning randomForest, now I want to boxplot mse and mtry using 20
> 5-fold cross-validation(using median value), but I have no a good method to
> do it, except a not good method.
>
> randomforest package itself did not contain cross-validating method, and
> caret package contain cross validation method, but how can I get the the all
> number of mtry , at the same time corresponding mse?
>
>
> --
> View this message in context: 
> http://n4.nabble.com/Help-How-can-I-boxplot-mse-and-mtry-using-20-5-fold-cross-validation-tp1013058p1013058.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 

Max

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Method for reduction of independent variables

2010-01-13 Thread Daniel Malter
Hi, please read the posting guide. You are not likely to get an extensive
answer to your question from this list. Your question is a "please
solve/explain my statistical problem for me" question. There are two things
problematic with that. First, "statistical", and second "please solve for
me."

First, the R-help list is mostly concerned with problems in implementing
analyses in R, not with the (choice of the) statistical approach per se
(there are few exceptions). Second, "please solve for me" questions are
generally frowned upon, unless you evidence a specific point at which you
are stuck and have to make a choice. That is, the list members want to see
that you have done your "homework" to the extent one can expect you to. To
ask the list to provide an introduction to data reduction methods without
having any background knowledge is, frankly, a waste of your and the list
members' time. There are books on the topic, which you can buy or lend, and
certainly many online sources to give you a basic background. Or you can
start here: http://en.wikipedia.org/wiki/Dimension_reduction. If you want
your statistical questions answered and problems solved without reading
yourself into the matter, your question is more suitable for a local
statistician at your institution or a paid service rather than this list.

Best,
Daniel 

-
cuncta stricte discussurus
-
-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of rubystallion
Sent: Wednesday, January 13, 2010 11:57 AM
To: r-help@r-project.org
Subject: [R] Method for reduction of independent variables


Hello

I am currently investing software code metrics for a variety of software
projects of a company to determine the worst parts of software products
according to specified quality characteristics. 
As the gathering of metrics correlates with effort, I would like to find a
subset of the metrics preserving significant predictive power for the
"problem value" while using the least amount of code metrics. 

I have the results of 25 metrics for 6 software projects for a combined 9355
"individuals", i.e. software parts with metrics.
However, as many metrics only measure metric values above a predefined
limit, 58% of the responses for independent variables are 0.

Which method can I use to determine a reduced set of independent variables
with significant predictive power?
As I do not have a statistics background, I would also appreciate a simple
explanation of the chosen method and sensible choices for parameters, so
that I will be able to infer the reduced set of software metrics to keep.

Thank you in advance!

Johannes
-- 
View this message in context:
http://n4.nabble.com/Method-for-reduction-of-independent-variables-tp1013171
p1013171.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simulation numbers from a probability table

2010-01-13 Thread Peter Ehlers

Try this:

dat <- data.frame(x=11:14, pa=1:4/10, pb=4:1/10)
f <- function(numreps, data){
  pmat <- as.matrix(data[-1])
  x <- data[,1]
  result <- matrix(0, nrow=numreps, ncol=ncol(pmat))
  colnames(result) <- c("A", "B")
  for(i in seq_len(numreps)){
result[i,] <- apply(pmat, 2, function(p) sample(x, 1, prob=p))
  }
  result
}
f(5, dat)

 -Peter Ehlers

Kelvin wrote:

Dear friends,

If I have a table like this, first row A B C D ... are different
levels of the variable, first column 0 1 2 4 ... are the levels of the
"numbers", the numbers inside the table are the probabilities of the
"number" occuring.

A  B  C   D...
0  0.20.30.10.05
1  0.10.10.20.2
2  0.02  0.20   0.1
4  0.30.01  0.01   0.4
...

How can I use R to do the simulation and get a table like this, first
row A B C D ... are different levels of the variable, the numbers
inside the table are the "numbers" simulated from the probailties
table above?

A  B  C  D ...
0  4   2   0
2   2  0   1
0   1  4   1
2   2  0   0
...


Thanks for help!


Kelvin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Peter Ehlers
University of Calgary
403.202.3921

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] optimization challenge

2010-01-13 Thread Albyn Jones
The key idea is that you are building a matrix that contains the
solutions to smaller problems which are sub-problems of the big
problem.  The first row of the matrix SSQ contains the solution for no
splits, ie SSQ[1,j] is just the sum of squares about the overall mean
for reading chapters1 through j in one day.  The iteration then uses
row m-1 to construct row m, since if SSQ[m-1,j] (optimal reading of j
chapters in m-1 days) is part of the overall optimal solution, you
have already computed it, and so don't ever need to recompute it.

   TS = SSQ[m-1,j]+(SSQ1[j+1])

computes the vector of possible solutions for SSQ[m,n] (n chapters in n days) 
breaking it into two pieces: chapters 1 to j in m-1 days, and chapters j+1 to
n in 1 day.  j is a vector in the function, and min(TS) is the minimum
over choices of j, ie SSQ[m,n].

At the end, SSQ[128,239] is the optimal value for reading all 239
chapters in 128 days.  That's just the objective function, so the rest
involves constructing the list of optimal cuts, ie which chapters are
grouped together for each day's reading.  That code uses the same
idea... constructing a list of lists of cutpoints.

statisticians should study a bit of data structures and algorithms!

albyn

On Wed, Jan 13, 2010 at 10:45:11AM -0700, Greg Snow wrote:
> WOW, your results give about half the variance of my best optim run (possibly 
> due to my suboptimal use of optim).
> 
> Can you describe a little what the algorithm is doing?
> 
> -- 
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.s...@imail.org
> 801.408.8111
> 
> 
> > -Original Message-
> > From: Albyn Jones [mailto:jo...@reed.edu]
> > Sent: Tuesday, January 12, 2010 5:31 PM
> > To: Greg Snow
> > Cc: r-help@r-project.org
> > Subject: Re: [R] optimization challenge
> > 
> > Greg
> > 
> > Nice problem: I wasted my whole day on it :-)
> > 
> > I was explaining my plan for a solution to a colleague who is a
> > computer scientist, he pointed out that I was trying to re-invent the
> > wheel known as dynamic programming.  here is my code, apparently it is
> > called "bottom up dynamic programming".  It runs pretty quickly, and
> > returns (what I hope is :-) the optimal sum of squares and the
> > cut-points.
> > 
> > function(X=bom3$Verses,days=128){
> > # find optimal BOM reading schedule for Greg Snow
> > # minimize variance of quantity to read per day over 128 days
> > #
> > N = length(X)
> > Nm1 = N-1
> > SSQ<- matrix(NA,nrow=days,ncol=N)
> > Cuts <- list()
> > #
> > #  SSQ[i,j]: the ssqs about the overall mean for the optimal partition
> > #   for i days on the chapters 1 to j
> > #
> > M = sum(X)/days
> > CS = cumsum(X)
> > SSQ[1,]= (CS-M)^2
> > Cuts[[1]]= as.list(1:N)
> > #
> > for(m in 2:days){
> > Cuts[[m]]=list()
> > #for(i in 1:(m-1)) Cuts[[m]][[i]] = Cuts[[m-1]][[i]]
> > for(n in m:N){
> >   CS = cumsum(X[n:1])[n:1]
> >   SSQ1 = (CS-M)^2
> >   j = (m-1):(n-1)
> >   TS = SSQ[m-1,j]+(SSQ1[j+1])
> >   SSQ[m,n] = min(TS)
> >   k = min(which((min(TS)== TS)))+m-1
> >   Cuts[[m]][[n]] = c(Cuts[[m-1]][[k-1]],n)
> > }
> > }
> > list(SSQ=SSQ[days,N],Cuts=Cuts[[days]][[N]])
> > }
> > 
> > $SSQ
> > [1] 11241.05
> > 
> > $Cuts
> >   [1]   2   4   7   9  11  13  15  16  17  19  21  23  25  27  30  31
> > 34  37
> >  [19]  39  41  44  46  48  50  53  56  59  60  62  64  66  68  70  73
> > 75  77
> >  [37]  78  80  82  84  86  88  89  91  92  94  95  96  97  99 100 103
> > 105 106
> >  [55] 108 110 112 113 115 117 119 121 124 125 126 127 129 131 132 135
> > 137 138
> >  [73] 140 141 142 144 145 146 148 150 151 152 154 156 157 160 162 163
> > 164 166
> >  [91] 167 169 171 173 175 177 179 181 183 185 186 188 190 192 193 194
> > 196 199
> > [109] 201 204 205 207 209 211 213 214 215 217 220 222 223 225 226 228
> > 234 236
> > [127] 238 239
> > 
> > 
> > 
> > 
> > On Tue, Jan 12, 2010 at 11:33:36AM -0700, Greg Snow wrote:
> > > I have a challenge that I want to share with the group.
> > >
> > > This is not homework (but I may assign it as such if I teach the
> > appropriate class again) and I have found one solution, so don't need
> > anything urgent.  This is more for fun to see if others can find a
> > better solution than I did.
> > >
> > > The challenge:
> > >
> > > I want to read a book in a given number of days.  I want to read an
> > integer number of chapters each day (there are more chapters than
> > days), no stopping part way through a chapter, and at least 1 chapter
> > each day.  The chapters are very non uniform in length (some very
> > short, a few very long, many in between) so I would like to come up
> > with a reading schedule that minimizes the variance of the length of
> > the days readings (read multiple short chapters on the same day, long
> > chapters are the only one read that day).  I also want to read through
> > the book in order (no skippin

Re: [R] Ask for histogram

2010-01-13 Thread Steve Lianoglou
Hi,

On Wed, Jan 13, 2010 at 12:58 PM, Yi Du  wrote:
> Hi,
>
>
> I use a vector of data to draw the histogram, but it is different from the
> graph by SAS. Can you check it for me please?

How are we supposed to check something without data, pictures, etc?
What do you want checking, exactly?

> b is a column vector of 4332
>
> hist(b,probability=T,breaks=30,col='lightblue',ylim=c(0,1))
> rug(b)
>
> When I used rug, I find the records are smaller than 4332. I don't know
> where I did wrong.

What do you mean? Is the histogram that you're getting surprising? Is
the result of adding a "rug" surprising?

Are you actually trying to count 4332 tick marks at the bottom of your
plot? What records are smaller than 4332?

Try to see if what rug returns, eg:

r <- rug(b)

length(r) should be as long as your `b` vector

I'm not sure what you're asking, but hopefully some of the info I
threw at you is helpful. Please be a bit more specific with any follow
up if you still find anything confusing.

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] convert factor data to numeric

2010-01-13 Thread S Devriese
On 01/13/2010 05:41 PM, Peter Ehlers wrote:
> S Devriese wrote:
>> On 01/13/2010 10:47 AM, Ahmet Temiz wrote:
>>> hello
>>>
>>>  could you give me a hint to convert data in factor type to numeric
>>> (float) ?
>>>
>>>   regards
>>>
>>> -- 
>>> Open WebMail Project (http://openwebmail.org)
>>>
>>>
>> you could try as.numeric but without more details it is difficult to see
>> if this will work. How did you end up with a factor (e.g. through
>> import)?
>>
> No, don't use as.numeric(). Do follow Dimitris' advice.
> But the question of how you got the factor data is good; you
> can usually avoid getting factors to begin with.
> 
>  -Peter Ehlers
> 
>> Stephan
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
> 

I know, slightly sloppy answer (see Dimitri's answer), but I hoped to
find out how he got the factor in the first place, because if it is an
import issue (and e,g. decimal character is different from the locale
decimal character) the FAQ answer might not work as expected.

Stephan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Ask for histogram

2010-01-13 Thread Yi Du
Hi,


I use a vector of data to draw the histogram, but it is different from the
graph by SAS. Can you check it for me please?

b is a column vector of 4332

hist(b,probability=T,breaks=30,col='lightblue',ylim=c(0,1))
rug(b)

When I used rug, I find the records are smaller than 4332. I don't know
where I did wrong.

Thanks.

-- 
Yi Du

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simulation numbers from a probability table

2010-01-13 Thread Tal Galili
If the trials are not connected then I would consider melting the table
using melt() from the reshape package.
And then using lapply() with the function
random.function <- function(my.prob, number.of.observations = 10)
{
sum(rbinom(number.of.observations, 1, my.prob))
}


in case the trials are connected, by column,
than you could use
apply(the.data.table, 2, a.function)
on it. Where "a.function" will to multinum distribution (for which I don't
remember the function at the moment, but it can be searched).


Best,
Tal.




Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com/ (English)
--




On Wed, Jan 13, 2010 at 7:20 PM, Kelvin <6kelv...@gmail.com> wrote:

> Dear friends,
>
> If I have a table like this, first row A B C D ... are different
> levels of the variable, first column 0 1 2 4 ... are the levels of the
> "numbers", the numbers inside the table are the probabilities of the
> "number" occuring.
>
>A  B  C   D...
> 0  0.20.30.10.05
> 1  0.10.10.20.2
> 2  0.02  0.20   0.1
> 4  0.30.01  0.01   0.4
> ...
>
> How can I use R to do the simulation and get a table like this, first
> row A B C D ... are different levels of the variable, the numbers
> inside the table are the "numbers" simulated from the probailties
> table above?
>
>A  B  C  D ...
>0  4   2   0
>2   2  0   1
>0   1  4   1
>2   2  0   0
>...
>
>
>Thanks for help!
>
>
>Kelvin
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Applying function to parts of a matrix based on a factor

2010-01-13 Thread Heinz Tuechler

If your matrix were a data.frame, it could work like this:

df <- data.frame(age=1:100, sex=rep(1:2, 50))
with(df, by(age, sex, mean))

without the "lapply, sapply etc. family".

h

At 18:16 13.01.2010, Doran, Harold wrote:

with(yourdataframe, tapply(age,sex,mean))

-Original Message-
From: r-help-boun...@r-project.org 
[mailto:r-help-boun...@r-project.org] On Behalf Of John Sorkin

Sent: Wednesday, January 13, 2010 12:11 PM
To: r-help@r-project.org
Subject: [R] Applying function to parts of a matrix based on a factor

R 2.9
Windows XP

I have a matrix, Data, which contains a factor Sex and a continuous 
variable Age.

I want to get mean age by sex. I know I can do this with two statements,
mean(Data["Age,Data[,"Sex"]=="Male") and
mean(Data["Age,Data[,"Sex"]=="Female")

I know this can be done in a single command, but I can remember how. 
There is a function that allows another function work within 
factors, something like
magicfunction(Data,Factor=Sex). n.b. I know the function I am 
looking for is not in the lapply, sapply etc. family


Please put me out of my misery (and senior moment) and remind me 
what function I should be using.





John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

Confidentiality Statement:
This email message, including any attachments, is for t...{{dropped:9}}


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] optimization challenge

2010-01-13 Thread Greg Snow
WOW, your results give about half the variance of my best optim run (possibly 
due to my suboptimal use of optim).

Can you describe a little what the algorithm is doing?

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: Albyn Jones [mailto:jo...@reed.edu]
> Sent: Tuesday, January 12, 2010 5:31 PM
> To: Greg Snow
> Cc: r-help@r-project.org
> Subject: Re: [R] optimization challenge
> 
> Greg
> 
> Nice problem: I wasted my whole day on it :-)
> 
> I was explaining my plan for a solution to a colleague who is a
> computer scientist, he pointed out that I was trying to re-invent the
> wheel known as dynamic programming.  here is my code, apparently it is
> called "bottom up dynamic programming".  It runs pretty quickly, and
> returns (what I hope is :-) the optimal sum of squares and the
> cut-points.
> 
> function(X=bom3$Verses,days=128){
> # find optimal BOM reading schedule for Greg Snow
> # minimize variance of quantity to read per day over 128 days
> #
> N = length(X)
> Nm1 = N-1
> SSQ<- matrix(NA,nrow=days,ncol=N)
> Cuts <- list()
> #
> #  SSQ[i,j]: the ssqs about the overall mean for the optimal partition
> #   for i days on the chapters 1 to j
> #
> M = sum(X)/days
> CS = cumsum(X)
> SSQ[1,]= (CS-M)^2
> Cuts[[1]]= as.list(1:N)
> #
> for(m in 2:days){
> Cuts[[m]]=list()
> #for(i in 1:(m-1)) Cuts[[m]][[i]] = Cuts[[m-1]][[i]]
> for(n in m:N){
> CS = cumsum(X[n:1])[n:1]
> SSQ1 = (CS-M)^2
> j = (m-1):(n-1)
> TS = SSQ[m-1,j]+(SSQ1[j+1])
> SSQ[m,n] = min(TS)
>   k = min(which((min(TS)== TS)))+m-1
>   Cuts[[m]][[n]] = c(Cuts[[m-1]][[k-1]],n)
> }
> }
> list(SSQ=SSQ[days,N],Cuts=Cuts[[days]][[N]])
> }
> 
> $SSQ
> [1] 11241.05
> 
> $Cuts
>   [1]   2   4   7   9  11  13  15  16  17  19  21  23  25  27  30  31
> 34  37
>  [19]  39  41  44  46  48  50  53  56  59  60  62  64  66  68  70  73
> 75  77
>  [37]  78  80  82  84  86  88  89  91  92  94  95  96  97  99 100 103
> 105 106
>  [55] 108 110 112 113 115 117 119 121 124 125 126 127 129 131 132 135
> 137 138
>  [73] 140 141 142 144 145 146 148 150 151 152 154 156 157 160 162 163
> 164 166
>  [91] 167 169 171 173 175 177 179 181 183 185 186 188 190 192 193 194
> 196 199
> [109] 201 204 205 207 209 211 213 214 215 217 220 222 223 225 226 228
> 234 236
> [127] 238 239
> 
> 
> 
> 
> On Tue, Jan 12, 2010 at 11:33:36AM -0700, Greg Snow wrote:
> > I have a challenge that I want to share with the group.
> >
> > This is not homework (but I may assign it as such if I teach the
> appropriate class again) and I have found one solution, so don't need
> anything urgent.  This is more for fun to see if others can find a
> better solution than I did.
> >
> > The challenge:
> >
> > I want to read a book in a given number of days.  I want to read an
> integer number of chapters each day (there are more chapters than
> days), no stopping part way through a chapter, and at least 1 chapter
> each day.  The chapters are very non uniform in length (some very
> short, a few very long, many in between) so I would like to come up
> with a reading schedule that minimizes the variance of the length of
> the days readings (read multiple short chapters on the same day, long
> chapters are the only one read that day).  I also want to read through
> the book in order (no skipping ahead to combine short chapters that are
> not naturally next to each other.
> >
> > My thought was that the optim function with method="SANN" would be an
> appropriate approach, but my first couple of tries did not give very
> good results.  I have since come up with an optim with SANN solution
> that gives what I consider good results (but I accept that better is
> possible).
> >
> > Below is a data frame with the lengths of the chapters for the book
> that originally sparked the challenge for me (but the general idea
> should work for any book).  Each row represents a chapter (in order)
> with 3 different measures of the length of the chapter.
> >
> > For this challenge I want to read the book in 128 days (there are 239
> chapters).
> >
> > I will post my solutions in a few days, but I want to wait so that my
> direction does not influence people from trying other approaches (if
> there is something better than optim, that is fine).
> >
> > Good luck for anyone interested in the challenge,
> >
> > The data frame:
> >
> > bom3 <- structure(list(Chapter = structure(1:239, .Label = c("1 Nephi
> 1",
> > "1 Nephi 2", "1 Nephi 3", "1 Nephi 4", "1 Nephi 5", "1 Nephi 6",
> > "1 Nephi 7", "1 Nephi 8", "1 Nephi 9", "1 Nephi 10", "1 Nephi 11",
> > "1 Nephi 12", "1 Nephi 13", "1 Nephi 14", "1 Nephi 15", "1 Nephi 16",
> > "1 Nephi 17", "1 Nephi 18", "1 Nephi 19", "1 Nephi 20", "1 Nephi 21",
> > "1 Nephi 22", "2 Nephi 1", "2 Nephi 2", "2 Nephi 3", "2 Nephi 4",
> > "2 Nephi 5", "2 Nephi 6", "2 Nephi 7",

[R] Simulation numbers from a probability table

2010-01-13 Thread Kelvin
Dear friends,

If I have a table like this, first row A B C D ... are different
levels of the variable, first column 0 1 2 4 ... are the levels of the
"numbers", the numbers inside the table are the probabilities of the
"number" occuring.

A  B  C   D...
0  0.20.30.10.05
1  0.10.10.20.2
2  0.02  0.20   0.1
4  0.30.01  0.01   0.4
...

How can I use R to do the simulation and get a table like this, first
row A B C D ... are different levels of the variable, the numbers
inside the table are the "numbers" simulated from the probailties
table above?

A  B  C  D ...
0  4   2   0
2   2  0   1
0   1  4   1
2   2  0   0
...


Thanks for help!


Kelvin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How can I store the results

2010-01-13 Thread Gavin Simpson
On Wed, 2010-01-13 at 15:59 +0100, Alex Roy wrote:
> Dear R users,
> I am running a R code which gives me 10 columns and
> 160 rows. I need to run the code for 100 times and each time I need to store
> the results in a single file.
> I do not know how can I store them in a single file without over writting
> the results?

In a list?

results <- vector(mode = "list, length = 100)
for(i in seq_along(results) {
## do something
## 
## store result for iteration i
results[[i]] <- something
}

results will now contain 100 matrices of dim 160x10.

HTH

G

> 
> Thanks
> 
> Alex
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Applying function to parts of a matrix based on a factor

2010-01-13 Thread Dimitris Rizopoulos

try this:

with(Data, tapply(Age, Sex, mean))


I hope it helps.

Best,
Dimitris


John Sorkin wrote:

R 2.9
Windows XP

I have a matrix, Data, which contains a factor Sex and a continuous variable 
Age.
I want to get mean age by sex. I know I can do this with two statements,
mean(Data["Age,Data[,"Sex"]=="Male") and  
mean(Data["Age,Data[,"Sex"]=="Female")


I know this can be done in a single command, but I can remember how. There is a 
function that allows another function work within factors, something like
magicfunction(Data,Factor=Sex). n.b. I know the function I am looking for is 
not in the lapply, sapply etc. family

Please put me out of my misery (and senior moment) and remind me what function I should be using. 





John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Applying function to parts of a matrix based on a factor

2010-01-13 Thread Doran, Harold
with(yourdataframe, tapply(age,sex,mean))

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of John Sorkin
Sent: Wednesday, January 13, 2010 12:11 PM
To: r-help@r-project.org
Subject: [R] Applying function to parts of a matrix based on a factor

R 2.9
Windows XP

I have a matrix, Data, which contains a factor Sex and a continuous variable 
Age.
I want to get mean age by sex. I know I can do this with two statements,
mean(Data["Age,Data[,"Sex"]=="Male") and  
mean(Data["Age,Data[,"Sex"]=="Female")

I know this can be done in a single command, but I can remember how. There is a 
function that allows another function work within factors, something like
magicfunction(Data,Factor=Sex). n.b. I know the function I am looking for is 
not in the lapply, sapply etc. family

Please put me out of my misery (and senior moment) and remind me what function 
I should be using. 




John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Applying function to parts of a matrix based on a factor

2010-01-13 Thread John Sorkin
R 2.9
Windows XP

I have a matrix, Data, which contains a factor Sex and a continuous variable 
Age.
I want to get mean age by sex. I know I can do this with two statements,
mean(Data["Age,Data[,"Sex"]=="Male") and  
mean(Data["Age,Data[,"Sex"]=="Female")

I know this can be done in a single command, but I can remember how. There is a 
function that allows another function work within factors, something like
magicfunction(Data,Factor=Sex). n.b. I know the function I am looking for is 
not in the lapply, sapply etc. family

Please put me out of my misery (and senior moment) and remind me what function 
I should be using. 




John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] convert factor data to numeric

2010-01-13 Thread Nathalie Yauschew-Raguenes

Hello,

I find a way to convert data in factor type to numeric :
data_numeric <- as.numeric(as.character(data_factor)).
It's treaky but works.


Peter Ehlers a écrit :

S Devriese wrote:

On 01/13/2010 10:47 AM, Ahmet Temiz wrote:

hello

 could you give me a hint to convert data in factor type to numeric 
(float) ?


  regards

--
Open WebMail Project (http://openwebmail.org)



you could try as.numeric but without more details it is difficult to see
if this will work. How did you end up with a factor (e.g. through 
import)?



No, don't use as.numeric(). Do follow Dimitris' advice.
But the question of how you got the factor data is good; you
can usually avoid getting factors to begin with.

 -Peter Ehlers


Stephan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.







--
Nathalie YAUSCHEW-RAGUENES
Ph.D Student

Unité de Recherches Ecologie Fonctionnelle et Physique de l'Environnement 
(EPHYSE)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Method for reduction of independent variables

2010-01-13 Thread rubystallion

Hello

I am currently investing software code metrics for a variety of software
projects of a company to determine the worst parts of software products
according to specified quality characteristics. 
As the gathering of metrics correlates with effort, I would like to find a
subset of the metrics preserving significant predictive power for the
"problem value" while using the least amount of code metrics. 

I have the results of 25 metrics for 6 software projects for a combined 9355
"individuals", i.e. software parts with metrics.
However, as many metrics only measure metric values above a predefined
limit, 58% of the responses for independent variables are 0.

Which method can I use to determine a reduced set of independent variables
with significant predictive power?
As I do not have a statistics background, I would also appreciate a simple
explanation of the chosen method and sensible choices for parameters, so
that I will be able to infer the reduced set of software metrics to keep.

Thank you in advance!

Johannes
-- 
View this message in context: 
http://n4.nabble.com/Method-for-reduction-of-independent-variables-tp1013171p1013171.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help, How can I boxplot mse and mtry using 20 5-fold cross-validation?

2010-01-13 Thread bbslover

 Hello,
   I am learning randomForest, now I want to boxplot mse and mtry using 20
5-fold cross-validation(using median value), but I have no a good method to
do it, except a not good method.

randomforest package itself did not contain cross-validating method, and
caret package contain cross validation method, but how can I get the the all
number of mtry , at the same time corresponding mse?


-- 
View this message in context: 
http://n4.nabble.com/Help-How-can-I-boxplot-mse-and-mtry-using-20-5-fold-cross-validation-tp1013058p1013058.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] exporting data frame - write foreign inconsistencies

2010-01-13 Thread John Cullen
Hello List,

I have a data frame object (wa2) that I am exporting for use in
another statistics package. Using

library(foreign)
write.foreign(wa2, choose.files(), choose.files(), package='SPSS')

I noticed that there were several differences between the data sets as
seen within R (View(wa2)) and what was produced in SPSS.  Examining
the data file produced by write.foreign (before running the generated
SPSS syntax), I noticed the same inconsistencies.

I then used:

write.table(wa2, choose.files(), sep=",", col.names=TRUE,
row.names=FALSE, quote=TRUE, na="NA")

and the file generated using this method matched what was in the R object.
I'm trying to send this dataset to a colleague who will only use SPSS.
Any ideas why the two methods produce different data files?

--
sessionInfo()
R version 2.10.1 (2009-12-14)
i386-pc-mingw32

locale:
[1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252
[3] LC_MONETARY=English_Canada.1252 LC_NUMERIC=C
[5] LC_TIME=English_Canada.1252

attached base packages:
[1] tcltk stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
[1] Rcmdr_1.5-4car_1.2-16 relimp_1.0-1   foreign_0.8-39

loaded via a namespace (and not attached):
[1] tools_2.10.1
--

Thanks in advance.

Sincerely;

John Cullen, M.Sc.
caninesinmotion.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] convert factor data to numeric

2010-01-13 Thread Peter Ehlers

S Devriese wrote:

On 01/13/2010 10:47 AM, Ahmet Temiz wrote:

hello

 could you give me a hint to convert data in factor type to numeric (float) ?

  regards

--
Open WebMail Project (http://openwebmail.org)



you could try as.numeric but without more details it is difficult to see
if this will work. How did you end up with a factor (e.g. through import)?


No, don't use as.numeric(). Do follow Dimitris' advice.
But the question of how you got the factor data is good; you
can usually avoid getting factors to begin with.

 -Peter Ehlers


Stephan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Peter Ehlers
University of Calgary
403.202.3921

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R package dependencies

2010-01-13 Thread Colin Millar
Hi there,
 
My question relates to getting information about R packages.  In particular i 
would like to be able to find from within R:
  what are a packages dependencies
  what are a packages reverse dependencies
  does a package contain a dll
 
The reason i ask is:
 
The organisation that i work for is introducing a secure intranet operating on 
windows PCs and laptops, and this requires that all software / executables / 
dlls are validated before they are combined to produce a generic PC build.
 
I would like to maximise the packages available to our staff and so for the 
packages that we have listed as buisness needs, i would like to include all 
reverse dependencies of this collection that do not have dlls.
 
I hope this makes sense (the question not the reason).
 
Kind regards,
Colin.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How can I store the results

2010-01-13 Thread jim holtman
Collect the results in a list (one entry for each matrix) and then 'save'
the list.  When you 'load' it back in, you can easily reference each element
for further processing.

On Wed, Jan 13, 2010 at 9:59 AM, Alex Roy  wrote:

> Dear R users,
>I am running a R code which gives me 10 columns and
> 160 rows. I need to run the code for 100 times and each time I need to
> store
> the results in a single file.
> I do not know how can I store them in a single file without over writting
> the results?
>
> Thanks
>
> Alex
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How can I store the results

2010-01-13 Thread Greg Snow
You could put all of your results into a single list, then just save the list.

Or, functions like write.table and write have an append argument, set that to 
true and the information will be appended to the file rather than overwriting 
it.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of Alex Roy
> Sent: Wednesday, January 13, 2010 8:00 AM
> To: r-help@r-project.org
> Subject: [R] How can I store the results
> 
> Dear R users,
> I am running a R code which gives me 10 columns
> and
> 160 rows. I need to run the code for 100 times and each time I need to
> store
> the results in a single file.
> I do not know how can I store them in a single file without over
> writting
> the results?
> 
> Thanks
> 
> Alex
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem fitting a non-linear regression model with nls

2010-01-13 Thread Nathalie Yauschew-Raguenes

Actually, the data that I used are measurements of plant growth during
an entire year.It is usual to model the growth with logistic models.
I have already tried the simple logistic model (which works). But the
problem is that with this model the inflexion point occurs half-way up
or down the logistic curve.
Thats why, despite the small amount of measurements, I wanted to try the
generalized logistic model proposed by richards.

So I will still try the nls2 package, just in case. And if it doesn't
work, I'll use "a more parsimonious model" as you two have suggested.
Thank you for your answers

--
Nathalie YAUSCHEW-RAGUENES
Ph.D Student
Unité de Recherches Ecologie Fonctionnelle et Physique de l'Environnement

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Ask about large data set

2010-01-13 Thread Magnus Torfason

On 1/12/2010 8:29 PM, Yi Du wrote:

Hi,

Is that okay to let R to read data set more than 1 rows and
use it to do some kernel density estimation? Thanks.

Yi


Why don't you just try it and see? Nothing bad will happen - the 
absolute worst case scenario is that R will hang.


But I can tell you that reading 1 rows should be a piece of cake on 
any decent computer. Different estimation techniques are different in 
terms of computational intensity. Trying it is the best approach. If you 
run into problems, you could come back with specific questions of 
optimization.


Best,
Magnus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] wrong with using subset

2010-01-13 Thread Duncan Murdoch

On 13/01/2010 10:45 AM, Don MacQueen wrote:

I would suggest that first you look at the results of

   (as.numeric(as.character(dfpr2_r$pr2))) > 0.2 && (dfpr2_r$landa > 10)

by itself. Does it give all FALSE ?
  


I'd guess the problem is using && instead of &. 


Duncan Murdoch

Then look at each of the parts separately. What are the results of

(as.numeric(as.character(dfpr2_r$pr2))) > 0.2
and
dfpr2_r$landa > 10

Are there any TRUE among the results?

Does
as.numeric(as.character(dfpr2_r$pr2))
give what you expect?

-Don

At 5:20 PM +0200 1/13/10, Ahmet Temiz wrote:
>hello
>
>is it wrong with this expression:
>
>subset(dfpr2_r,(as.numeric(as.character(dfpr2_r$pr2))) > 0.2 && (dfpr2_r$landa
>>  10))
>
>it gives nothing
>
>regards
>--
>Open WebMail Project (http://*openwebmail.org)
>
>
>--
>This message has been scanned for viruses and
>dangerous content by MailScanner, and is
>believed to be clean.
>
>__
>R-help@r-project.org mailing list
>https://*stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://*www.*R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] wrong with using subset

2010-01-13 Thread Don MacQueen

I would suggest that first you look at the results of

  (as.numeric(as.character(dfpr2_r$pr2))) > 0.2 && (dfpr2_r$landa > 10)

by itself. Does it give all FALSE ?

Then look at each of the parts separately. What are the results of

   (as.numeric(as.character(dfpr2_r$pr2))) > 0.2
and
   dfpr2_r$landa > 10

Are there any TRUE among the results?

Does
   as.numeric(as.character(dfpr2_r$pr2))
give what you expect?

-Don

At 5:20 PM +0200 1/13/10, Ahmet Temiz wrote:

hello

is it wrong with this expression:

subset(dfpr2_r,(as.numeric(as.character(dfpr2_r$pr2))) > 0.2 && (dfpr2_r$landa

 10))


it gives nothing

regards
--
Open WebMail Project (http://*openwebmail.org)


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

__
R-help@r-project.org mailing list
https://*stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://*www.*R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
--
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA
925-423-1062

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem fitting a non-linear regression model with nls

2010-01-13 Thread Bert Gunter
> My question is how could I estimate those initial values so that the nls
> fitting works.
>
You can't. Your parameters are almost certainly nonidentifiable (which is
what Gabor told you more gracefully).

Just because you believe in a complex (often mechanistic) nonlinear model
and have some data does not assure that the model parameters can be
estimated. If you do not understand why this is so, consider fitting even a
simple 4 parameter logistic when the data do not level off at the top and/or
bottom end. There are then infinitely many solutions in which the parameters
"trade off" with one another to give essentially identical fits. That is
what the "singular gradient" message is trying to tell you.

Bert Gunter
Genentech Nonclinical Statistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Dynamic file / url name with read.csv

2010-01-13 Thread ivan popivanov

A few packages have support for basic download from Yahoo Finance. If that's 
what you are trying to achieve - you may want to try quantmod (getSymbols 
function) or tseries (get.hist.quote function). If you want to do something not 
supported yet - first take a look at their source code.

Regards,
Ivan

> From: k...@csusb.edu
> Date: Tue, 12 Jan 2010 22:25:17 -0800
> To: r-help@r-project.org
> Subject: Re: [R] Dynamic file / url name with read.csv
> 
> 
> A few suggestions:
>   Don't mix ' and "
>   Use paste()
>   Don't include an extraneous ;
> 
> SymA<- "SPY"
> Sym1<- 
> paste("http://ichart.finance.yahoo.com/table.csv?s=",SymA,"&ignore=.csv",sep="";)
> Symbol<- read.csv(Sym1, stringsAsFactors=F)
> 
> 
> On Jan 12, 2010, at 10:03 PM, B S wrote:
> > 
> > Hi- 
> > 
> > I would like to be able to change the value of SymA below and download a 
> > file from the corresponding URL.  Hardcoded, this line works fine: 
> > 
> > Symbol<- 
> > read.csv("http://ichart.finance.yahoo.com/table.csv?s=SPY&ignore=.csv";, 
> > stringsAsFactors=F)
> > 
> > However, when I incorporate using a variable for the ticker, it no longer 
> > works.  
> > 
> > SymA<- "SPY"
> > Sym1<- 
> > cat('http://ichart.finance.yahoo.com/table.csv?s=",SymA,"&ignore=.csv",sep="";;)
> > Symbol<- read.csv(Sym1, stringsAsFactors=F)
> > 
> > I know that the problem lies in the concatenation, but I've tried different 
> > variations of cat() and toString() (and others) with SymA and Sym1 but 
> > cannot seem to get a string together that will work.  Would appreciate any 
> > suggestions for this simple problem?? 
> > 
> > Thank you. 
> > 
> > 
> > 
> > 
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
_


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] <= returns wrong result? Why

2010-01-13 Thread Magnus Torfason

Yupp, FAQ 7.31 is definitely your friend here.

You might also want to take a look at these two very recent threads on 
this help list:


"Strange behaviour of as.integer()"
http://tolstoy.newcastle.edu.au/R/e9/help/10/01/index.html#547

"Newbie question on precision"
http://tolstoy.newcastle.edu.au/R/e9/help/10/01/index.html#718

Best,
Magnus

On 1/13/2010 3:25 AM, Stephan Kolassa wrote:


take a look at FAQ 7.31.

Trafim Vanishek wrote:

Does anybody know the probable reason why <= gives false when it
should give true?


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] wrong with using subset

2010-01-13 Thread Ahmet Temiz

hello 

is it wrong with this expression:

subset(dfpr2_r,(as.numeric(as.character(dfpr2_r$pr2))) > 0.2 && (dfpr2_r$landa
> 10))

it gives nothing

regards
--
Open WebMail Project (http://openwebmail.org)


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem fitting a non-linear regression model with nls

2010-01-13 Thread Gabor Grothendieck
You could try the brute force of nls2 package; however, note that you
have 8 parameters and only 16 points so you might look for a more
parsimonious model.  Plotting it it seems somewhat gaussian in shape
so:

mod <- nls(y ~ a * dnorm(x, b, c), start = c(a = mean(y)/dnorm(0, 0,
sd(x)), b = mean(x), c = sd(x)))
matplot(x, cbind(y, fitted(mod)), type = c("p", "l"), pch = 20)



On Wed, Jan 13, 2010 at 9:02 AM, Nathalie Yauschew-Raguenes
 wrote:
> Hi,
>
> I'm trying to make a regression of the form :
>
> formula <- y ~ Asym_inf  + Asym_sup * ( (1 / (1 + (n1 * (exp( (tmid1-x) /
> scal1) )^(1/n1) ) ) ) - (1 / (1 + (n2 * (exp( (tmid2-x) / scal2) )^(1/n2) )
> ) ) )
> which is a sum of the generalized logistic model proposed by richards.
>
> with data such as these:
>
> x <- c(88,113,128,143,157,172,184,198,210,226,240,249,263,284,302,340)
> y <-
> c(0.04,0.16,1.09,2.65,2.46,2.43,1.88,2.42,1.51,1.70,1.92,1.35,0.89,0.34,0.13,0.10)
>
> I use the nls function to fit my data to the model.
>
> nls(formule, data=cbind.data.frame(x,y), start=list(Asym_inf
> =min(y),Asym_inf =max(y)-min(y),
> n1=1,n2=1,tmid1=120,tmid2=250,scal1=11,scal2=30))
>
> and it always finished by one of those answers (even if I change the initial
> values) :
> - "Error in nls(formule, data = cbind.data.frame(x, y), start =
> list(Asym_inf =min(y),  : \n  le pas 0.000488281 est devenu inférieur à
>  'minFactor' de 0.000976562\n"
> - "Error in nls(formule, data = cbind.data.frame(x, y), start = list(miny =
> min(y),  : \n  gradient singulier\n"
> - "Error in numericDeriv(form[[3]], names(ind), env) : \n  Valeur manquante
> ou infinie obtenue au cours du calcul du modèle\n")
> - "Error in nlsModel(formula, mf, start, wts) : \n  singular gradient matrix
> at initial parameter estimates\n"
> So it seems that I reach a local extremum each time. I know that most of
>  the problem comes from the choice of the initial values of the parameters
> Asym_inf, Asym_inf, n1, n2, tmid1, tmid2, scal1and scal2.
>
> My question is how could I estimate those initial values so that the nls
> fitting works.
>
> Thanks in advance
>
> --
> Nathalie YAUSCHEW-RAGUENES
> Ph.D Student
>
> Unité de Recherches Ecologie Fonctionnelle et Physique de l'Environnement
> (EPHYSE)
> INRA, Centre de Bordeaux - Aquitaine
> 71 Av Edouard Bourlaux
> 33883 Villenave d'Ornon Cedex
> France
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How can I store the results

2010-01-13 Thread Alex Roy
Dear R users,
I am running a R code which gives me 10 columns and
160 rows. I need to run the code for 100 times and each time I need to store
the results in a single file.
I do not know how can I store them in a single file without over writting
the results?

Thanks

Alex

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plotting moving range control chart

2010-01-13 Thread Tom Hopper
I have been having the same problem as poster Hodgess, below. It appears
that her question was never answered, so I would like to share a solution
with the community.

The problem is the (apparent?) inability to produce moving range process
behavior (a.k.a. "control") charts with individuals data in the package
"qcc" (v. 2.0). I have also struggled with the same limitation in package
"IQCC" (v. 1.0).

The package "qAnalyst" (v. 0.6.0) provides an option to produce a moving
range chart with individuals data. The example given in the qAnalyst manual
for function spc yields an individuals chart:

> #i-chart, moving range to estimate st. dev. is equal to 2 points with
testType=1,
> data(rawWeight)
> ichart=spc(x=rawWeight$rawWeight, sg=2, type="i", name="weight",
testType=1)
> plot(ichart)
> summary(ichart)

Changing "type = 'i'" to "type = 'mr'" yields the moving chart:

> mrchart = spc(x = rawWeight$rawWeight, sg = 2, type = "mr", name =
"weight", testType = 1)
> plot(mrchart)
> summary(mrchart)

In separate tests, I have confirmed that qAnalyst correctly computes natural
process limits (a.k.a. "control limits") for X-bar and R charts, using the
average of the subgroup means. I have not yet checked the calculations for
the ImR or other charts.

An additional difference between these packages is that qAnalyst uses the
lattice library to generate output, while the other two packages appear to
use the (traditional) graphics library.

Regards,

Tom


On Tue, 10 Nov 2009 23:39:23 -0600, Erin Hodgess
>
wrote:

> Dear R People:
>
> I am using qcc for a quality control class.
>
> I have used qcc with type "xbar.one" for individuals but cannot determine
> how to plot a moving range control chart.
>
> Has anyone done that, please?
>
> Thanks,
> Erin
>
> --
> Erin Hodgess
> Associate Professor
> Department of Computer and Mathematical Sciences
> University of Houston - Downtown
> mailto: erinm.hodgess_at_gmail.com
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem fitting a non-linear regression model with nls

2010-01-13 Thread Nathalie Yauschew-Raguenes

Hi,

I'm trying to make a regression of the form :

formula <- y ~ Asym_inf  + Asym_sup * ( (1 / (1 + (n1 * (exp( (tmid1-x) 
/ scal1) )^(1/n1) ) ) ) - (1 / (1 + (n2 * (exp( (tmid2-x) / scal2) 
)^(1/n2) ) ) ) )

which is a sum of the generalized logistic model proposed by richards.

with data such as these:

x <- c(88,113,128,143,157,172,184,198,210,226,240,249,263,284,302,340)
y <- 
c(0.04,0.16,1.09,2.65,2.46,2.43,1.88,2.42,1.51,1.70,1.92,1.35,0.89,0.34,0.13,0.10) 



I use the nls function to fit my data to the model.

nls(formule, data=cbind.data.frame(x,y), start=list(Asym_inf 
=min(y),Asym_inf =max(y)-min(y), 
n1=1,n2=1,tmid1=120,tmid2=250,scal1=11,scal2=30))


and it always finished by one of those answers (even if I change the 
initial values) :
- "Error in nls(formule, data = cbind.data.frame(x, y), start = 
list(Asym_inf =min(y),  : \n  le pas 0.000488281 est devenu inférieur à  
'minFactor' de 0.000976562\n"
- "Error in nls(formule, data = cbind.data.frame(x, y), start = 
list(miny = min(y),  : \n  gradient singulier\n"
- "Error in numericDeriv(form[[3]], names(ind), env) : \n  Valeur 
manquante ou infinie obtenue au cours du calcul du modèle\n")
- "Error in nlsModel(formula, mf, start, wts) : \n  singular gradient 
matrix at initial parameter estimates\n"
So it seems that I reach a local extremum each time. I know that most 
of  the problem comes from the choice of the initial values of the 
parameters Asym_inf, Asym_inf, n1, n2, tmid1, tmid2, scal1and scal2.


My question is how could I estimate those initial values so that the nls 
fitting works.


Thanks in advance

--
Nathalie YAUSCHEW-RAGUENES
Ph.D Student

Unité de Recherches Ecologie Fonctionnelle et Physique de l'Environnement 
(EPHYSE)
INRA, Centre de Bordeaux - Aquitaine
71 Av Edouard Bourlaux
33883 Villenave d'Ornon Cedex
France

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] reading fifo with read.table hangs

2010-01-13 Thread Mads Jeppe Tarp-Johansen

To R-helpers,

Running
  R version 2.10.0 (2009-10-26)
  Linux ... 2.6.25.20-0.5-default #1 SMP 2009-08-14 01:48:11 +0200 x86_64 
x86_64 x86_64 GNU/Linux
  openSUSE 11.0 (X86-64)
and having difficulties reading a fifo from within R.

A short example that I find simply haning is shown as 'SHORT SCRIPT' 
below. I expected R to print a data set read from the fifo with the 
numbers 0,1,...7 and then gracefully exit. Any ideas why not?


A longer script that actually does the job in its 2nd clause is shown in 
'LONG SCRIPT' below ... I'm confused that the open call is needed. Any 
comments on this?


Regards MJ

--- SHORT SCRIPT BEGIN
#!/bin/bash

mkfifo chops
gawk 'BEGIN {for (i=0;i<8;i++){print i}}' > chops &

R --slave --no-save < chops &

  R --slave --no-save < chops &

  R --slave --no-save < chops &
  cat chops
  unlink chops

fi
--- LONG SCRIPT END

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] selection of multiple subscripts

2010-01-13 Thread Duncan Murdoch

On 13/01/2010 8:09 AM, e-letter wrote:

On 13/01/2010, Duncan Murdoch  wrote:
> On 13/01/2010 7:36 AM, e-letter wrote:
>> Readers,
>>
>> For a data set 'x':
>>
>> 1 a
>> 2 b
>> 3 c
>> 4 d
>> 5 e
>> 6 f
>> 7 g
>> 8 h
>> 9 i
>>
>> How to select multiple subscripts to plot? For example to plot values
>> 1:3 and 9:10:
>>
>> plot(x[1:3,1],x[,2])
>>
>> and
>>
>> plot(x[9:10,1],x[,2])
>>
>> into one plot?
>
> Neither of those will work, because your x[,2] vector is longer than the
> other vector.
>
> What you want is something like this:
>
> plot(col2 ~ col1, data=x[c(1:3, 9:10),])
>
Thanks, I now understand the concatenate function would help but
forgot the syntax. Anyway I've just realised that the search database
for R yields no result for '?concatenate' which is surprising.


That's because there's no "concatenate" function in base R.  If you want 
to search for the word "concatenate", use "??concatenate".  You won't 
find the c() function, because it is called "combine", but you'll find 
several other ways to concatenate.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Plotting a linear step function without vertical lines

2010-01-13 Thread walter.djuric
--- Begin Message ---
--- End Message ---
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] column width in .dbf files using write.dbf ... to be continued

2010-01-13 Thread Arnaud Mosnier
Dear UseRs,

I did not have any answer to my previous message ("Is there a way to define
"manually" columns width when using write.dbf function from the library
foreign ?"), so I tried to modify write.dbf function to do what I want.

Here is my modified version :

write.dbfMODIF <- function (dataframe, file, factor2char = TRUE, max_nchar =
254, width = d)
{
allowed_classes <- c("logical", "integer", "numeric", "character",
"factor", "Date")
if (!is.data.frame(dataframe))
dataframe <- as.data.frame(dataframe)
if (any(sapply(dataframe, function(x) !is.null(dim(x)
stop("cannot handle matrix/array columns")
cl <- sapply(dataframe, function(x) class(x[1L]))
asis <- cl == "AsIs"
cl[asis & sapply(dataframe, mode) == "character"] <- "character"
if (length(cl0 <- setdiff(cl, allowed_classes)))
stop("data frame contains columns of unsupported class(es) ",
paste(cl0, collapse = ","))
m <- ncol(dataframe)
DataTypes <- c(logical = "L", integer = "N", numeric = "F",
character = "C", factor = if (factor2char) "C" else "N",
Date = "D")[cl]
for (i in seq_len(m)) {
x <- dataframe[[i]]
if (is.factor(x))
dataframe[[i]] <- if (factor2char)
as.character(x)
else as.integer(x)
else if (inherits(x, "Date"))
dataframe[[i]] <- format(x, "%Y%m%d")
}
precision <- integer(m)
scale <- integer(m)
dfnames <- names(dataframe)
for (i in seq_len(m)) {
nlen <- nchar(dfnames[i], "b")
x <- dataframe[, i]
if (is.logical(x)) {
precision[i] <- 1L
scale[i] <- 0L
}
else if (is.integer(x)) {
rx <- range(x, na.rm = TRUE)
rx[!is.finite(rx)] <- 0
if (any(rx == 0))
rx <- rx + 1
mrx <- as.integer(max(ceiling(log10(abs(rx +
3L)
precision[i] <- min(max(nlen, mrx), 19L)
scale[i] <- 0L
}
else if (is.double(x)) {
precision[i] <- 19L
rx <- range(x, na.rm = TRUE)
rx[!is.finite(rx)] <- 0
mrx <- max(ceiling(log10(abs(rx
scale[i] <- min(precision[i] - ifelse(mrx > 0L, mrx +
3L, 3L), 15L)
}
else if (is.character(x)) {
if (width == "d") {
   mf <- max(nchar(x[!is.na(x)], "b"))
p <- max(nlen, mf)
if (p > max_nchar)
warning(gettext("character column %d will be truncated
to %d bytes",
  i, max_nchar), domain = NA)
precision[i] <- min(p, max_nchar)
scale[i] <- 0L
} else {
if (width > max_nchar)
warning(gettext("character column %d will be truncated
to %d bytes",
  i, max_nchar), domain = NA)
precision[i] <- min(width, max_nchar)
}
}
else stop("unknown column type in data frame")
}
if (any(is.na(precision)))
stop("NA in precision")
if (any(is.na(scale)))
stop("NA in scale")
invisible(.Call(DoWritedbf, as.character(file), dataframe,
as.integer(precision), as.integer(scale), as.character(DataTypes)))
}


However, when I wanted to use this function ... it does not find the
DoWritedbf function that is called in the last lines (a function written in
C).

Is there a way to temporally replace the original write.dbf function by this
one in the foreign package ?

Thanks,

Arnaud

R version 2.10.0 (2009-10-26)
i386-pc-mingw32

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] selection of multiple subscripts

2010-01-13 Thread e-letter
On 13/01/2010, e-letter  wrote:
> On 13/01/2010, Duncan Murdoch  wrote:
>> On 13/01/2010 7:36 AM, e-letter wrote:
>>> Readers,
>>>
>>> For a data set 'x':
>>>
>>> 1 a
>>> 2 b
>>> 3 c
>>> 4 d
>>> 5 e
>>> 6 f
>>> 7 g
>>> 8 h
>>> 9 i
>>>
>>> How to select multiple subscripts to plot? For example to plot values
>>> 1:3 and 9:10:
>>>
>>> plot(x[1:3,1],x[,2])
>>>
>>> and
>>>
>>> plot(x[9:10,1],x[,2])
>>>
>>> into one plot?
>>
>> Neither of those will work, because your x[,2] vector is longer than the
>> other vector.
>>
>> What you want is something like this:
>>
>> plot(col2 ~ col1, data=x[c(1:3, 9:10),])
>>
> Thanks, I now understand the concatenate function would help but
> forgot the syntax. Anyway I've just realised that the search database
> for R yields no result for '?concatenate' which is surprising.
>
For the benefit of other novices: for the data set, the subscripts
should have read:

1:3

and

8:9

Alternatively, the data set should have included:

10 j

:)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] selection of multiple subscripts

2010-01-13 Thread e-letter
On 13/01/2010, Duncan Murdoch  wrote:
> On 13/01/2010 7:36 AM, e-letter wrote:
>> Readers,
>>
>> For a data set 'x':
>>
>> 1 a
>> 2 b
>> 3 c
>> 4 d
>> 5 e
>> 6 f
>> 7 g
>> 8 h
>> 9 i
>>
>> How to select multiple subscripts to plot? For example to plot values
>> 1:3 and 9:10:
>>
>> plot(x[1:3,1],x[,2])
>>
>> and
>>
>> plot(x[9:10,1],x[,2])
>>
>> into one plot?
>
> Neither of those will work, because your x[,2] vector is longer than the
> other vector.
>
> What you want is something like this:
>
> plot(col2 ~ col1, data=x[c(1:3, 9:10),])
>
Thanks, I now understand the concatenate function would help but
forgot the syntax. Anyway I've just realised that the search database
for R yields no result for '?concatenate' which is surprising.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to do FMOLS and DOLS?

2010-01-13 Thread John Hust

Hi,

Can R do FMOLS(Fully Modified OLS) and DOLS(Dynamic OLS)?

I cannot find any useful thing in the present package.

Thanks in advance!
-- 
View this message in context: 
http://n4.nabble.com/How-to-do-FMOLS-and-DOLS-tp1012976p1012976.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] selection of multiple subscripts

2010-01-13 Thread Duncan Murdoch

On 13/01/2010 7:36 AM, e-letter wrote:

Readers,

For a data set 'x':

1 a
2 b
3 c
4 d
5 e
6 f
7 g
8 h
9 i

How to select multiple subscripts to plot? For example to plot values
1:3 and 9:10:

plot(x[1:3,1],x[,2])

and

plot(x[9:10,1],x[,2])

into one plot?


Neither of those will work, because your x[,2] vector is longer than the 
other vector.


What you want is something like this:

plot(col2 ~ col1, data=x[c(1:3, 9:10),])

where col1 and col2 are the names of those two columns.

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Odp: selection of multiple subscripts

2010-01-13 Thread Petr PIKAL
Hi

see ?points or ?lines which you would surely found out if you bother to 
look at ?plot help page

Regards
Petr

r-help-boun...@r-project.org napsal dne 13.01.2010 13:36:57:

> Readers,
> 
> For a data set 'x':
> 
> 1 a
> 2 b
> 3 c
> 4 d
> 5 e
> 6 f
> 7 g
> 8 h
> 9 i
> 
> How to select multiple subscripts to plot? For example to plot values
> 1:3 and 9:10:
> 
> plot(x[1:3,1],x[,2])
> 
> and
> 
> plot(x[9:10,1],x[,2])
> 
> into one plot?
> 
> Yours,
> 
> rhelpatconference.jabber.org
> r251
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Illustrating kernel distribution in wheat ears

2010-01-13 Thread Carl-Göran CG . Pettersson
Hi,

Thanks a lot for your suggestions and the very detailed instructions, I needed 
them...
Everything worked fine also in the full dataset, up until the last suggestion 
(the box plots)

Here I also got an error message, but a different one from what you got. And no 
output...

Here are the last two command lines and the error message:

> q <- ggplot(spikes.long, aes(side, value))
> q + geom_boxplot() + facet_grid(~ cultivar)
Error in `[.data.frame`(plot$data, , setdiff(cond, names(df)), drop = FALSE) : 
  undefined columns selected

I used the same variable names and have done the steps suggested up to this 
point, but with a much bigger dataset than in the question sample.

Sorry to say, I don´t understand the error message..
But the first two variants of plots worked nice and are possible to use for me.

All the best
/CG


Från: Dennis Murphy [djmu...@gmail.com]
Skickat: den 11 januari 2010 15:03
Till: Carl-Göran CG. Pettersson
Kopia: r-help@r-project.org
Ämne: Re: [R] Illustrating kernel distribution in wheat ears

Hi:

It wasn't clear to me precisely what you wanted, but here are a couple of ideas 
in the hope that it will help.
I used ggplot2 for the graphics, so it requires some manipulation of your 
dataset from 'wide' format to 'long'.
I also add an indicator for side of the ear (odd is side one (L?), even is side 
2) and a variable I call 'loc' to
indicate the value associated with the splxx variable.

I read the data into a data frame called spikelets. The first step is to remove 
the rows of missing responses:

naind <- apply(spikelets[, -1], 1, function(x) all(is.na(x)))
spikelets2 <- spikelets[!naind, ]

Next, I use the plyr package and its melt() function to convert the data frame 
from 'wide' to 'long' form:

library(ggplot2) # attaches the plyr package in the loading process
spikes.long <- melt(spikelets2, id = 'cn')

The variable 'variable' contains the variable names as a vector (spl01, spl02, 
..., spl14)
Next, I create a variable called loc, which represents the numeric part of the 
spl variables, and then
create a variable side to distinguish one side of the awn from the other. 
'variable' is then removed...

spikes.long$loc <- as.numeric(substring(spikes.long$variable, 4))
spikes.long$side <- factor(2 - spikes.long$loc %% 2)
spikes.long$variable <- NULL

Now we're in a position to plot. The first is a scatterplot of the response by 
location, stratified by cultivar;
it contains color to distinguish sides.

# With color:
p <- qplot(loc, value, data = spikes.long, group = cn,
   colour = side)
p + facet_grid(cn ~ .)

The color is not terribly informative, so to get rid of it, remove the colour = 
side argument. One could
also merge the plots together and fit smooths to the different cultivars.

ggplot(spikes.long, aes(loc, value, colour = cn)) +
geom_point() + geom_smooth(se = FALSE)

I also came up with boxplot pairs by side for each cultivar, which is shown 
below:

q <- ggplot(spikes.long, aes(side, value))
q + geom_boxplot() + facet_grid(~ cultivar)

For some reason, I kept getting these messages from every ggplot2 call:

Error in recordGraphics(drawGTree(x), list(x = x), getNamespace("grid")) :
  invalid graphics state

but all of the plots rendered as expected.


HTH,
Dennis

2010/1/10 Carl-Göran CG. Pettersson 
mailto:cg.petters...@vpe.slu.se>>
Dear all

R2.10  WinXP

I have a dataset dealing with the way different wheat cultivars build their 
yield.
Wheat ears are organised in spikelets where the spikelets can be numbered from 
the bottom, with even numbers on one side and odd on the other.
I know how many kernels there were in each spikelet after some months spent 
counting them...

Now I want to illustrate the differences between the cultivars in how the 
kernels are distributed in the ears.
In the best of all possible worlds it would be possible to place histograms or 
boxplots on adjecent sides of vertical lines representing different cultivars.
I have done some experimenting using boxplot() but I am stuck and out of ideas 
right now.

All ideas are welcome!
/CG


Here is a sample dataset with the countings of kernels for the first 14 
spikelets:

cn  spl01   spl02   spl03   spl04   spl05   spl06   spl07   spl08   spl09   
spl10   spl11   spl12   spl13   spl14
Lans1.8 3.1 3.5 3.8 3.8 4.1 4.2 4.3 4.4 
4.5 4.2 4.1 3.9 3.8
Kranich 0.6 2.4 3.4 4.2 4.5 4.7 4.9 4.9 4.8 
4.7 4.4 4.1 4.1 3.9
Loyal   1.1 2.7 3.6 3.7 4.1 4.4 4.4 4.6 4.3 
4.5 4.3 4.1 3.8 3.7
Boomer  NA  NA  NA  NA  NA  NA  NA  NA  NA  
NA  NA  NA  NA  NA
Oakley  NA  NA  NA  NA  NA  NA  NA  NA  NA  
NA  NA  NA  NA  NA
Hereford0.6 2.3 3.3 3.6 3.9 4   

[R] selection of multiple subscripts

2010-01-13 Thread e-letter
Readers,

For a data set 'x':

1 a
2 b
3 c
4 d
5 e
6 f
7 g
8 h
9 i

How to select multiple subscripts to plot? For example to plot values
1:3 and 9:10:

plot(x[1:3,1],x[,2])

and

plot(x[9:10,1],x[,2])

into one plot?

Yours,

rhelpatconference.jabber.org
r251

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >