Re: [R] Writing a single output file
Many ways of doing this and you have to think about efficiency and logisitcs of different approaches. If the data is not large, you can read all n files into a list and then combine. If data is very large, you may wish to read one file at a time, combining and then deleting it before reading the next file. You can use cbind() to combine if all the Date columns are the same, otherwise merge() is useful. The simple brute force approach would be: fns <- list.files(pattern="^output") do.call( "cbind", lapply(fns, read.csv, row.names=1) ) The slightly more optimized and flexible optiop but slightly less elegant could be something like this: fns <- list.files(pattern="^output") out <- read.csv(fns[1], row.names=NULL) for(fn in fns[-1]){ tmp <- read.csv(fn, row.names=NULL) out <- merge(out, tmp, by=1, all=T) rm(tmp); gc() } You have to see which option is best for your file sizes. Good luck. Regards, Adai On 23/12/2010 13:07, Amy Milano wrote: Dear R helpers! Let me first wish all of you "Merry Christmas and Very Happy New year 2011" "Christmas day is a day of Joy and Charity, May God make you rich in both" - Phillips Brooks ## I have a process which generates number of outputs. The R code for the same is as given below. for(i in 1:n) { write.csv(output[i], file = paste("output", i, ".csv", sep = ""), row.names = FALSE) } Depending on value of 'n', I get different output files. Suppose n = 3, that means I am having three output csv files viz. 'output1.csv', 'output2.csv' and 'output3.csv' output1.csv date yield_rate 12/23/20105.25 12/22/20105.19 . . output2.csv date yield_rate 12/23/20104.16 12/22/20104.59 . . output3.csv date yield_rate 12/23/20106.15 12/22/20106.41 . . Thus all the output files have same column names viz. Date and yield_rate. Also, I do need these files individually too. My further requirement is to have a single dataframe as given below. Date yield_rate1 yield_rate2 yield_rate3 12/23/2010 5.25 4.16 6.15 12/22/2010 5.19 4.59 6.41 ... ... where yield_rate1 = output1$yield_rate and so on. One way is to simply create a dataframe as df = data.frame(Date = read.csv('output1.csv')$Date, yield_rate1 = read.csv('output1.csv')$yield_rate, yield_rate2 = read.csv('output2.csv')$yield_rate, yield_rate3 = read.csv('output3.csv')$yield_rate) However, the problem arises when I am not aware how many output files are there as n can be 5 or even 100. So is it possible to write some loop or some function which will enable me to read 'n' files individually and then keeping "Date" common, only pickup the yield_curve data from each output file. Thanking in advance for any guidance. Regards Amy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] more flexible "ave"
Here is a possible solution using sweep instead of ave: df <- data.frame(site = c("a", "a", "a", "b", "b", "b"), gr = c("total", "x1", "x2", "x1", "total","x2"), value1 = c(212, 56, 87, 33, 456, 213), value2 = c(1546, 560, 543, 234, 654, 312) ) sdf <- split(df, df$site) out <- lapply( sdf, function(mat){ small.mat <- mat[ , -c(1,2)] totals<- mat[ which( mat[ , "gr"] == "total" ), -c(1,2) ] totals<- as.numeric(totals) percent=sweep( small.mat, MARGIN=2, STATS=totals, FUN="/" ) colnames(percent) <- paste("percent_", colnames(percent), sep="") return( cbind(mat, percent) ) } ) do.call("rbind", out) sitegr value1 value2 percent_value1 percent_value2 a.1a total212 1546 1. 1.000 a.2ax1 56560 0.26415094 0.3622251 a.3ax2 87543 0.41037736 0.3512290 b.4bx1 33234 0.07236842 0.3577982 b.5b total456654 1. 1.000 b.6bx2213312 0.46710526 0.4770642 Also I think it might be more efficient to replace your "gr" variable with a binary 0,1 where 1 indicates the total. That way you don't have to generate x1, x2, x3, Regards, Adai On 30/11/2010 14:42, Patrick Hausmann wrote: Hi all, I would like to calculate the percent of the total per group for this data.frame: df<- data.frame(site = c("a", "a", "a", "b", "b", "b"), gr = c("total", "x1", "x2", "x1", "total","x2"), value1 = c(212, 56, 87, 33, 456, 213)) df calcPercent<- function(df) { df<- transform(df, pct_val1 = ave(df[, -c(1:2)], df$gr, FUN = function(x) x/df[df$gr == "total", "value1"]) ) } # This works as intended... w<- lapply(split(df, df$site), calcPercent) w<- do.call(rbind, w) w # ... but when I add a new column df$value2<- c(1546, 560, 543, 234, 654, 312) # the result is not what I want... w<- lapply(split(df, df$site), calcPercent) w<- do.call(rbind, w) w Clearly I have to change the function, (particularly "value1") - but how... I've also played around with "apply" but without any success. Thanks for any help! Patrick __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] saving multiple panes to PNG
I cannot run your example because I cannot identify which package the function returns is from. Nonetheless, something like par(mfrow=c(2,3)) should do the trick. Regards, Adai On 30/11/2010 14:22, Charles Evans wrote: After searching multiple combinations of keywords over the past two days and downloading n R graphics tutorials, I have not been able to find anything online or in my R books about how to save multiple plot panes to PNG. Specifically, I am using the irf() function in the vars package to generate plots of Impulse Response Functions: > x.data<- cbind(na.omit(returns(p[,2])),na.omit(returns(n[,2]))) > colnames(x.data)<- c("p.ret","n.ret") > x.jo<- ca.jo(x.data,type="trace",ecdet="none",spec="transitory") > x.var<- vec2var(x.jo) > x.irf<- irf(x.var,n.ahead=30) > plot(x.irf) This results in a plot containing a pair of IRF graphs in Quartz and the following message in the Console: "Hit to see next plot:" When one hits, the next pair of IRF graphs appears in Quartz. When I try to save the plots to PNG > png(...) > plot(...) > dev.off() I am able to save only one of the plots. How does one tell plot() to plot first one of the panes and then the second? Any help would be greatly appreciated. Yours, Charles Evans __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help Please!!!!!!!!!
Dear Melissa, If Jim's solution doesn't work then for some reason your function is converting numerical values into either character or factor and I would suggest you use the colClasses argument to force the right class. For example, mat <- read.table( file="lala.txt", sep="\t", row.names=1, header=T, colClasses=rep("numeric", 4) ) Then do a str(mat) and see what you get. Regards, Adai On 29/11/2010 13:02, jim holtman wrote: Your data seems to read in just fine, so what is the problem you are trying to solve? x<- read.table('clipboard', sep='\t', header=TRUE) str(x) 'data.frame': 5 obs. of 5 variables: $ X : Factor w/ 5 levels "JE","JM","S",..: 5 2 4 1 3 $ None : int 4 4 25 18 10 $ Light : int 2 3 10 24 6 $ Medium: int 3 7 12 33 7 $ Heavy : int 2 4 4 13 2 summary(x) X None LightMedium Heavy JE:1 Min. : 4.0 Min. : 2 Min. : 3.0 Min. : 2 JM:1 1st Qu.: 4.0 1st Qu.: 3 1st Qu.: 7.0 1st Qu.: 2 S :1 Median :10.0 Median : 6 Median : 7.0 Median : 4 SE:1 Mean :12.2 Mean : 9 Mean :12.4 Mean : 5 SM:1 3rd Qu.:18.0 3rd Qu.:10 3rd Qu.:12.0 3rd Qu.: 4 Max. :25.0 Max. :24 Max. :33.0 Max. :13 On Mon, Nov 29, 2010 at 12:29 AM, Melissa Waldman wrote: Hi, I have been working with Program R for my stats class and I keep coming upon the same error, I have read so many sites about inputting data from a text file into R and I'm using the data to do a correspondence analysis. I feel like I have read everything and it is still not explaining why the error message keeps coming up, I have used the exact examples I have seen in articles and the same error keeps popping up: Error in sum(N) : invalid 'type' (character) of argument I have spent so long trying to figure this out without success, I am sure it has to do with the fact that my rows have names in them. I have attached the text file I have been using and if you have any ideas as to how I can get R to plot the data using correspondence analysis with the column and row names that would be really helpful! Or if you could pass this email to someone who may know how to help me, that would be much appreciated. Thank you, Melissa Waldman my email: melissawald...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Significance of the difference between two correlation coefficients
Thanks for providing the example but it would be useful to know who I am communicating with or from which institute, but nevermind ... I don't know much about this subject but a quick google search gives me the following site: http://davidmlane.com/hyperstat/A50760.html Using the info from that website, I can code up the following to give the two-tailed p-value of difference in correlations: diff.corr <- function( r1, n1, r2, n2 ){ Z1 <- 0.5 * log( (1+r1)/(1-r1) ) Z2 <- 0.5 * log( (1+r2)/(1-r2) ) diff <- Z1 - Z2 SEdiff <- sqrt( 1/(n1 - 3) + 1/(n2 - 3) ) diff.Z <- diff/SEdiff p <- 2*pnorm( abs(diff.Z), lower=F) cat( "Two-tailed p-value", p , "\n" ) } diff.corr( r1=0.5, n1=100, r2=0.40, n2=80 ) ## Two-tailed p-value 0.4103526 diff.corr( r1=0.1, n1=100, r2=-0.1, n2=80 ) ## Two-tailed p-value 0.1885966 The p-value here is slightly different from the Vassar website because the website rounds it's "diff.Z" values to 2 digits. Regards, Adai On 29/11/2010 15:30, syrvn wrote: Hi, based on the sample size I want to calculate whether to correlation coefficients are significantly different or not. I know that as a first step both coefficients have to be converted to z values using fisher's z transformation. I have done this already but I dont know how to further proceed from there. unlike for correlation coefficients I know that the difference for z values is mathematically defined but I do not know how to incorporate the sample size. I found a couple of websites that provide that service but since I have huge data sets I need to automate this procedure. (http://faculty.vassar.edu/lowry/rdiff.html) Can anyone help? Cheers, syrvn __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] standardize columns selectively within a dataframe
If you want to scale within columns, you could try cbind( scale(df[,1:2]), df[ ,-c(1:2)] ) a b c d 1 -1 -1 7 10 2 0 0 8 11 3 1 1 9 12 and it is data.frame() btw. On 01/09/2010 15:35, Olga Lyashevska wrote: Dear all, I have a dataframe: df<-dataframe(a=c(1,2,3),b=c(4,5,6),c=c(7,8,9),d=c(10,11,12)) I want to obtain a new dataframe with columns a and b being standardized ((x-mean(x))/sd(x)); the other two columns (c,d) I want to leave unchanged. What is the best way to achieve this? I have been trying to use subscripts but did not succeed so far. Any tips? Many thanks, Olga __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] forest plot
You can also do meta.summaries() - from rmeta package - followed by a plot() on the resulting object. Or for a much more flexible plot try forestplot() function, also from rmeta package, but this requires a bit of work to set it up. Regards, Adai On 24/08/2010 05:50, C.H. wrote: The correct command for forest plot should be "plot" (instead of "forest") if you are using metagen from meta package. For help: ?plot.meta On Tue, Aug 24, 2010 at 11:03 AM, zhangweiwei wrote: Dear Sir or Madam, I am trying to plot forest plot. I extracted odds ratio and their corresponding 95% confidence interval from papers, then I calculated the log(OR) and standard error using the following command OR<-metagen(logOR,selogOR,sm="OR") forest(OR,comb.fixed=TRUE,comb.random=TRUE,digits=2) However, it does not produce a forest plot. Can someone kindly help? Thank you in advance. Best wishes weiwei [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding a scalar value...
Your best option is to read the relevant help files. A simple (untested) example to find R when P, T and scal.fn=Z is given, is to do this: my.fun <- function(P, R, T, Z) scal.fn(P, R, T) - Z uniroot( fn, R=rr, T=tt, Z=zz, lower=-100, upper=100 )$root You have to make an intelligent guess on the upper and lower ranges for the parameter R. I have used +/- 1 million as a silly example. HOWEVER, I do not think this works when P,R,T,Z are scalars. Try it to be sure. If not, then you may have to write a for or apply loop. Regards, Adai On 16/08/2010 13:19, Petar Milin wrote: Thanks for the answer! However, if I would have scal.fn() like below, how would I apply uniroot() or optimize() or the like? Best, PM On 16/08/10 13:24, Adaikalavan Ramasamy wrote: You probably need to look up on how to write functions. Try scal.fn<- function(P, R, T){ out<- ( 1/R - T ) / ( P - T ) return(out) } Here is a fake example: df<- cbind.data.frame( P=rnorm(10), R=rnorm(10), T=rnorm(10) ) scal.fn( df$P, df$R, df$T ) Or are you trying to solve other parameters given scal values? If so, try having a look at functions like uniroot(). Regards, Adai On 16/08/2010 11:48, Petar Milin wrote: Hello! I need to find a simple scalar value: Scal = ((1/R) - T) / (P - T), where R, T, and P are vectors in a data.frame. Please, can anyone tell me how to solve that in R? Best, PM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding a scalar value...
You probably need to look up on how to write functions. Try scal.fn <- function(P, R, T){ out <- ( 1/R - T ) / ( P - T ) return(out) } Here is a fake example: df <- cbind.data.frame( P=rnorm(10), R=rnorm(10), T=rnorm(10) ) scal.fn( df$P, df$R, df$T ) Or are you trying to solve other parameters given scal values? If so, try having a look at functions like uniroot(). Regards, Adai On 16/08/2010 11:48, Petar Milin wrote: Hello! I need to find a simple scalar value: Scal = ((1/R) - T) / (P - T), where R, T, and P are vectors in a data.frame. Please, can anyone tell me how to solve that in R? Best, PM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] working out main effect variance when different parameterization is used and interaction term exists
Dear all, Apologies if this question is bit theoretical and for the longish email. I am meta-analyzing the coefficients and standard errors from multiple studies where the raw data is not available. Each study analyst runs a model that includes an interaction term for, say, between sex and smoking and age. Here is an illustrative example example for one study: set.seed(1066) status <- rbinom( 1000, 1, 0.2 ) males <- rbinom( 1000, 1, 0.6 ) smoke <- rbinom( 1000, 1, 0.3 ) age<- runif(1000, min=20, max=80) coef( summary( f1 <- glm( status ~ males*smoke + age, family="binomial" ) ) ) # Estimate Std. Errorz value Pr(>|z|) # (Intercept) -1.520399871 0.284464584 -5.3447774 9.052825e-08 # males0.213851446 0.201717381 1.0601538 2.890746e-01 # smoke -0.123103049 0.292346483 -0.4210861 6.736922e-01 # age -0.001056007 0.004612947 -0.2289223 8.189293e-01 # males:smoke 0.283775173 0.362821438 0.7821345 4.341355e-01 Now, unfortunately some analysts coded sex as females instead of males. Using the same dataset, I get the following output with females: females <- 1 - males coef( summary( f1 <- glm( status ~ females*smoke + age, family="binomial" )) ) # Estimate Std. Errorz value Pr(>|z|) # (Intercept) -1.306548425 0.262573162* -4.9759405 6.493160e-07 # females -0.213851446 0.201717381* -1.0601538 2.890746e-01 # smoke 0.160672124 0.214923130* 0.7475795 4.547138e-01 # age -0.001056007 0.004612947 -0.2289223 8.189293e-01 # females:smoke -0.283775173 0.362821438 -0.7821345 4.341355e-01 I have worked out algebrically (and numerically) the following: Beta(females) = -Beta(males) Var(females)= Var(males) Beta(females:smoke) = -Beta(males:smoke) Var(females:smoke) = Var(males:smoke) Beta(smoke | fit1) = Beta(smoke | fit2) + Beta(females:smoke) = 0.160672124 -0.283775173 = -0.1231030 How can I calculate the Var(smoke | fit1) from Var(smoke | fit2) ? I tried to derive this algebrically but ended up with a covariance term which I could not solve. If I could cleverly convert Var(smoke | fit2) to Var(smoke | fit1) then I could avoid going back to each analyst since this particular analyses is only one of many hundreds we run and it would be annoying for each analyst to use the same parameterisation. Any suggestions is much appreciated. Many thanks in advance. Regards, Adai __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] column selection in list
If the columns of all elements of the list are in the same order, then you can collapse it first and then extract. out <- do.call("rbind", SPECSHOR_tx_Asfc) out[ , "Asfc.median"] Regards, Adai Ivan Calandra wrote: Hi everybody! I have a (stupid) question but I cannot find a way to do it! I have a list like: > SPECSHOR_tx_Asfc $cotau SPECSHOR Asfc.median 38cotau381.0247 39cotau154.6280 40cotau303.3219 41cotau351.2933 42cotau156.5327 $eqgre SPECSHOR Asfc.median 145eqgre219.5389 146eqgre162.5926 147eqgre146.3726 148eqgre127.6413 149eqgre274.2888 $gicam SPECSHOR Asfc.median 263gicam174.7445 264gicam 83.4821 265gicam157.6005 266gicam153.7519 267gicam344.9775 I would just like to remove the column "SPECSHOR" (or extract the other one) so that it looks like $cotau Asfc.median 38381.0247 39 154.6280 40303.3219 41351.2933 42156.5327 etc. How should I do it? I know how to select each element like SPECSHOR_tx_Asfc[[1]], but I don't know how to select a single column within an element. Could you please help me on that? Thanks Ivan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] first and second derivative calculation
How about? eval( D( expression( t^3-6*t^2+5*t+30 ), "t" ) ) David Winsemius wrote: On Jan 22, 2010, at 6:49 PM, Marlin Keith Cox wrote: I can plot this just fine: t<-seq(0,4, by=.1) y<- t^3-6*t^2+5*t+30 plot(t,y ,xlab="t-values", ylab="f(t)", type="l") This is the first derivative, how I I make a similar plot? t<-seq(0,4, by=.1) y<- t^3-6*t^2+5*t+30 y1<-D(expression(t^3-6*t^2+5*t+30), 't') There might be some sort of deparse() operation that one could do on y1, but what follows sidesteps that level of programming. y1fn <- function(t) {3 * t^2 - 6 * (2 * t) + 5} par(new=TRUE) plot(t, y1fn(t), ylab="", xlab="", axes=FALSE) axis(side=4, at=seq(-7,5,by=1) ) -- David. Thanks ahead of time. kc On Fri, Jan 22, 2010 at 12:41 PM, Doran, Harold wrote: D(expression(t^3-6*t^2+5*t + 30), 't') 3 * t^2 - 6 * (2 * t) + 5 D(D(expression(t^3-6*t^2+5*t + 30), 't'), 't') 3 * (2 * t) - 6 * 2 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org ] On Behalf Of Marlin Keith Cox Sent: Friday, January 22, 2010 4:37 PM To: r-help@r-project.org Subject: [R] first and second derivative calculation I would like to calculate a first and second derivative and am having problems finding a simple solution. My syntax may be off as I am not a mathematician, so pardon ahead of time. data: t<-seq(0,4, by=.1) The function is: H(t) = t^3-6*t^2+5*t + 30 from here I plot the curve: plot(x,y ,xlab="x-values", ylab="f(x)", type="l") But would like to similarly plot the curve for both the first and second derivatives. I can calculate the derivatives by hand but would like to get R to do this for me. by hand: H'(t) = 3*t^2 - 12*t + 5 H''(t) = 6*t-12 Keith -- M. Keith Cox, Ph.D. Alaska NOAA Fisheries, National Marine Fisheries Service Auke Bay Laboratories 17109 Pt. Lena Loop Rd. Juneau, AK 99801 keith@noaa.gov marlink...@gmail.com U.S. (907) 789-6603 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
Re: [R] lm on group
You can guess by looking at class(g). It is a factor. It is NOT regressing on the mean of g (i.e. 2.5 and 7.5) and you could have changed g from (0,5] and (5,10] to A and B with the same results. Read some books or help(lm) to get an idea of what the outputs mean. Regards, Adai newbieR wrote: Hi all, I have a quick question about lm on group, say I have: x <- 1:10 y <- x*3 buckets <- seq(0, 10, by=5) g <- cut(x, buckets) summary(lm(y ~ g - 1)) Coefficients: Estimate Std. Error t value Pr(>|t|) g(0,5] 9.000 2.121 4.243 0.00283 ** g(5,10] 24.000 2.121 11.314 3.35e-06 *** What is it doing exactly? I guess the estimate is the mean of the y's in each group. How about other stats.. what do they exactly mean when we do lm on groups? Thanks a lot! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] output
Season X is taken as the reference category. So the output "factor(season)y 10.59739" means the feed_intake is higher by 10.59 units in Season Y _compared to_ Season X. Change your levels in season. E.g. season <- factor(season, levels=c("Z", "X", "Y") which means that Z will be taken as the reference category. Also read help(contrasts). Regards, Adai Ashta wrote: Hi all, I am trying to interparete the result of the following output from lm; fit1 =lm(Feed _Intake ~ weight + season + weight*season) Season has three classes(x,y,z) Reults are Estimate (Intercept) 21.51559 weight 2.13051 factor(season)y 10.59739 factor(season)z1.30421 weight:factor(season)y 10.1 weight:factor(season)z 21.70288 My question are what is the estimate of season x? Could it be possible to change the output in the following way? factor(season)x factor(season)y weight:factor(season)x weight:factor(season)y Thanks in adavance __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Learning R
Dear Julia, Welcome. It is good that you wish to learn more about R. R has certainly become very vast in the last few years. Do you wish to learn R for a particular reason (financial analyses, multivariate, prediction/classification, genetics)? You might get more targeted reading materials, books and websites to follow up. Regards, Adai Julia Cains wrote: Dear R helpers, Almost 15 days back I have become member of this very active and wonderful group. So far I have been only raising queries and in turn got them solved too and I really thank for the spirit this group member show when it comes to the guidance. I wish to learn R language and I have given 2 months time for this. Can anyone please guide me as how do I begin i.e. from basics to advance. R is such a vast thing to learn, so I wish to learn it step by step without getting lost at any stage. Please guide me where do I start and upgrade myself to higher level step by step. Regards Julia Only a man of Worth sees Worth in other men [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to analyze this design using lmer
Dear all, A friend of mine requested me to analyze some data she has generated. I am hoping for some advice on best way of properly analyzing the data as I have never worked with such complicated or nested designs. Here is the setup. She has taken material from 5 animals and each material is subdivided into 6 plate (30 plates in total). Each plate is then assigned as either a control or a treated with a chemical AND kept at one of three concentrations. A sample is taken daily from each plate for six continuous days and measured (180 measurement in total). Her main question is whether treatment has an effect. Here is a simulated dataset: df <- expand.grid( animal=LETTERS[1:5], group=c("Control", "Treated"), conc=c("X", "Y", "Z"), day=1:6 ) df$plate <- as.numeric(factor(apply(df[ ,1:3], 1, paste, collapse=""))) df <- df[ order(df$plate), ] df$plate <- as.factor(df$plate) rownames(df) <- NULL set.seed(1066) df$value <- runif(90, 1, 2)*(df$group=="Control") + c(0, -0.5, -0.20)[as.numeric(df$conc)] + rnorm(30)[ as.numeric(df$plate) ] + runif(180, 0.9, 1.1)*df$day + rnorm(180, sd=0.5) df[1:10, ] animal group conc day plate value 1A ControlX 1 1 3.3403510 2A ControlX 2 1 5.1042965 3A ControlX 3 1 5.4003462 ... ... 178 E TreatedZ 430 2.8558186 179 E TreatedZ 530 4.4567206 180 E TreatedZ 630 5.4542460 I have tried analyzing the data as follows: library(lme4) lmer( value ~ group + day + conc + (1 | animal/plate), data=df ) lmer( value ~ group + day + conc + (1 | animal), data=df ) lmer( value ~ group + day + conc + (1 | plate), data=df ) BUT I am not sure which of the models above is appropriate. Any advice would be very useful. Many thanks in advance. Regards, Adai __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Split column
Not very elegant but this does the trick: df <- cbind( var1=c(1,3,2,1,2), var2=c(3,1,1,2,3) ) out <- df out[ which(df==1, arr.ind=T) ] <- "1&1" out[ which(df==2, arr.ind=T) ] <- "1&2" out[ which(df==3, arr.ind=T) ] <- "2&2" outlist <- apply(out, 2, strsplit, split="&") do.call( "cbind.data.frame", lapply( outlist, do.call, what="rbind" ) ) var1.1 var1.2 var2.1 var2.2 1 1 1 2 2 2 2 2 1 1 3 1 2 1 1 4 1 1 1 2 5 1 2 2 2 Please check. Regards, Adai Lisaj wrote: Hello, R users, I have a dataset that looks like this: id var1 var2 1 1 3 2 3 1 3 2 1 4 1 2 5 2 3 I want to split one column to two columns with 1 = 1 and 1, 2 = 1 and 2, 3 = 2 and 2: id var1.1 var1.2 var2.1 var2.2 1 1 1 2 2 2 2 2 1 1 3 1 2 1 1 4 1 1 1 2 5 1 2 2 2 Can anyone please help how to get this done? Thanks a lot in advance Lisa __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] replace a whole word with sub()
Isn't this more straightforward? w <- grep("^Ig", vec) vec[w] <- "0" Regards, Adai Giulio Di Giovanni wrote: Dear all, I cannot figure out how to solve a small problem (well, not for me), surely somebody can help me in few seconds. I have a series of strings in a vector X of the type "xxx", "yyy", "zzz", "IgA", "IgG", "kkk", "IgM", "aaa". I want to substitute every ENTIRE string beginning with "Ig" with "0". So, I'd like to have "xxx", "yyy", "zzz", "0", "0", "kkk", "0", "aaa". I can easily identify these strings with grep("^Ig", X), but if I use this criterion in the sub() function (sub("^Ig", "0", X) I obviously get "0A", "0G" etc. I didn't expect to do it in this way and I tried with metacharacters and regexps in order to grep and substitute the whole word (\b \>, $). I don't post here my tryings, because they were obviously wrong. Please can you help me? Giulio _ Carica e scarica in un clic. Fino a 25 GB su SkyDrive [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] partial matching with grep()
Try grep( "\\.x$", c("a.x" ,"b.x","a.xx"),value=TRUE) The $ means end-of-line (while ^ means start-of-line). And special characters like dot needs to be escaped twice. Regards, Adai Vito Muggeo (UniPa) wrote: dear all, This is a probably a silly question. If I type > grep("x",c("a.x" ,"b.x","a.xx"),value=TRUE) [1] "a.x" "b.x" "a.xx" Instead, I would like to obtain only "a.x" "b.x" How is it possible to get this result with grep()? many thanks for your attention, best, vito __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to print the full name of the factors in summary?
It would be useful to say which package the object SJ comes from or provide a more reproducible example. Assuming that Demand variable is continuous and you are fitting a standard lm() model, then your results looks suspicious. Where are the coefficients for Month, Holiday, Season? Jen-Chien Chang wrote: Hi, I am wondering if there is a simple way to fix the problem I am having. For unknown reason, I could not get the full name of the factors to be printed in the summary. I have tried to used summary.lm as well but the problem still persists. SJ$Weekday <- factor(SJ$Weekday,1:7,c("Mon","Tue","Wed","Thu","Fri","Sat","Sun"),ordered=T) attach(SJ) lm.SJ <- lm(Demand ~ Weekday+Month+Holiday+Season) summary(lm.SJ) Call: lm(formula = Demand ~ Weekday + Month + Holiday + Season) Residuals: Min 1Q Median 3Q Max -69.767 -12.224 -1.378 10.857 91.376 Coefficients: (3 not defined because of singularities) Estimate Std. Error t value Pr(>|t|) (Intercept) 88.7091 3.3442 26.527 < 2e-16 *** Weekday.L20.8132 2.8140 7.396 1.08e-12 *** Weekday.Q -12.7667 2.8156 -4.534 7.99e-06 *** Weekday.C -10.6375 2.8113 -3.784 0.000182 *** Weekday^4-8.3325 2.8103 -2.965 0.003238 ** - Is there a way for summary to print the full name of the factors and levels? Say Weekday.Tue instead Weekday.L? Thanks! Jack Chang __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] convert list to numeric
It's a way of extracting from a list. See help("[") or help("Extract"). dadrivr wrote: Great, that works very well. What is the purpose of double brackets vs single ones? I will remember next time to include a subset of the data, so that readers can run the script. Thanks again for your help! Benilton Carvalho wrote: it appears that what you really want is to use: task[[i]] instead of task[i] b On Nov 1, 2009, at 11:04 PM, dadrivr wrote: I would like to preface this by saying that I am new to R, so I would ask that you be patient and thorough, so that I'm not completely clueless. I am trying to convert a list to numeric so that I can perform computations on it (specifically mean-center the variable), but I am running into problems. I have imported the data set into "task" (data frame). The data frame is made of factors with variable names in the first row. I am running a loop to set a variable equal to a column in the data frame. Here is an example of my problem: for (i in 1:dim(task)[2]){ predictor.loop <- c(task[i]) predictor.loop.mc <- predictor.loop - mean(predictor.loop, na.rm=T) } I get the following error: Error in predictor.loop - mean(predictor.loop, na.rm = T) : non-numeric argument to binary operator In addition: Warning message: In mean.default(predictor.loop, na.rm = T) : argument is not numeric or logical: returning NA The column is entirely made up of numerical data, except for the header, which is a string. My problem is that I receive an error because the predictor.loop variable is not numerical, so I need to find a way to convert it. I tried using: predictor.loop <- c(as.numeric(task[i])) But I get the following error: "Error: (list) object cannot be coerced to type 'double'" If I call the variable, I can assign it to a numerical list (e.g., predictor loop <- task$variablename), but since I am assigning the variable in a loop, I have to find another way as the variable name would have to change in each loop iteration. Any help would be greatly appreciated. Thanks! -- View this message in context: http://old.nabble.com/convert-list-to-numeric-tp26155039p26155039.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] convert list to numeric
It's a way of extracting from a list. See help("[") or help("Extract") Regards, Adai dadrivr wrote: Great, that works very well. What is the purpose of double brackets vs single ones? I will remember next time to include a subset of the data, so that readers can run the script. Thanks again for your help! Benilton Carvalho wrote: it appears that what you really want is to use: task[[i]] instead of task[i] b On Nov 1, 2009, at 11:04 PM, dadrivr wrote: I would like to preface this by saying that I am new to R, so I would ask that you be patient and thorough, so that I'm not completely clueless. I am trying to convert a list to numeric so that I can perform computations on it (specifically mean-center the variable), but I am running into problems. I have imported the data set into "task" (data frame). The data frame is made of factors with variable names in the first row. I am running a loop to set a variable equal to a column in the data frame. Here is an example of my problem: for (i in 1:dim(task)[2]){ predictor.loop <- c(task[i]) predictor.loop.mc <- predictor.loop - mean(predictor.loop, na.rm=T) } I get the following error: Error in predictor.loop - mean(predictor.loop, na.rm = T) : non-numeric argument to binary operator In addition: Warning message: In mean.default(predictor.loop, na.rm = T) : argument is not numeric or logical: returning NA The column is entirely made up of numerical data, except for the header, which is a string. My problem is that I receive an error because the predictor.loop variable is not numerical, so I need to find a way to convert it. I tried using: predictor.loop <- c(as.numeric(task[i])) But I get the following error: "Error: (list) object cannot be coerced to type 'double'" If I call the variable, I can assign it to a numerical list (e.g., predictor loop <- task$variablename), but since I am assigning the variable in a loop, I have to find another way as the variable name would have to change in each loop iteration. Any help would be greatly appreciated. Thanks! -- View this message in context: http://old.nabble.com/convert-list-to-numeric-tp26155039p26155039.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Removing & generating data by category
Hmm, so if read correctly you want to remove exactly duplicated rows. So maybe try the following to begin with. duplicated(newdf[ , c("id", "loc", "clm")]) [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE Then you can remove the duplicated rows before proceeding with what has been suggested before. Also you can try unique(newdf[ , c("id", "loc", "clm")]) if you are not interested in carrying over other corresponding variables. See help(duplicated) and help(unique). Regards, Adai David Winsemius wrote: Color me puzzled. Can you express the run more clearly in Boolean logic? If someone has five policies: 3 Life and 2 General ... is he in or out? Applying the alternate strategy to that data set I get: out <- tapply( dat$clm, dat$uid, paste ,collapse=",") > > out A1.B1 A2.B2 A3.B1 "General" "General,Life" "General" A3.B3 A4.B4 A5.B5 "General,Life,General,General" "General,Life,General" "General,Life" Please explain why you want A3.B3. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Removing & generating data by category
Here is another way based on pasting ids as hinted below: a <- data.frame(id=c(c("A1","A2","A3","A4","A5"), c("A3","A2","A3","A4","A5")), loc=c("B1","B2","B3","B4","B5"), clm=c(rep(("General"),6),rep("Life",4))) a$uid <- paste(a$id, ".", a$loc, sep="") out <- tapply( a$clm, a$uid, paste ) # can also add collapse="," $A1.B1 [1] "General" $A2.B2 [1] "General" "Life" $A3.B1 [1] "General" $A3.B3 [1] "General" "Life" $A4.B4 [1] "General" "Life" $A5.B5 [1] "General" "Life" Then here are those with single policies. > out[ which( sapply(out, length) == 1 ) ] $A1.B1 [1] "General" $A3.B1 [1] "General" David Winsemius wrote: On Oct 28, 2009, at 9:30 PM, Steven Kang wrote: Dear R users, Basically, from the following arbitrary data set: a <- data .frame (id = c (c ("A1 ","A2 ","A3 ","A4 ","A5 "),c ("A3 ","A2 ","A3 ","A4","A5")),loc=c("B1","B2","B3","B4","B5"),clm=c(rep(("General"), 6),rep("Life",4))) a id loc clm 1 A1 B1 General 2 A2 B2 General 3 A3 B3 General 4 A4 B4 General 5 A5 B5 General 6 A3 B1 General 7 A2 B2Life 8 A3 B3Life 9 A4 B4Life 10 A5 B5Life I desire removing records (highlighted records above) with identical values in each fields ("id" & "loc") but with different value of "clm" (i.e according to category) Take a look at this merge operation on separate rows of "a". > merge( a[a$clm=="Life", ], a[a$clm=="General", ] , by=c("id", "loc"), all=T) id loc clm.x clm.y 1 A1 B1 General 2 A2 B2 Life General 3 A3 B1 General 4 A3 B3 Life General 5 A4 B4 Life General 6 A5 B5 Life General Assignment of that object and selection with is.na should complete the process. > a2m <- merge( a[a$clm=="Life", ], a[a$clm=="General", ] , by=c("id", "loc"), all=T) > a2m[ is.na(a2m$clm.x) | is.na(a2m$clm.y), ] id loc clm.x clm.y 1 A1 B1 General 3 A3 B1 General Alternate methods might include paste-ing id to loc and removing duplicates. i.e categ <- table(a$id,a$clm) categ General Life A1 10 A2 11 A3 21 A4 11 A5 11 The desired output is id loc clm 1 A1 B1 General 6 A3 B1 General Because the data set I am working on is quite big (~ 800,000 x 20) with majority of the fields values being long strings, looping turned out to be very inefficient in comapring individual rows.. Are there any alternative efficient methods in implementing this problem? Steven __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there a faster way to do it?
You might also want to consider using na.string="9" in the scan(). jim holtman wrote: Here is a faster way of doing the replacement: (provide reproducible data next time) x <- matrix(sample(6:9, 64, TRUE), 8) x [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,]87767879 [2,]77867677 [3,]77769667 [4,]99768766 [5,]69988989 [6,]97697867 [7,]79897978 [8,]99699886 x.f <- 1:8 # replacement values based on column x.ind <- which(x == 9, arr.ind=TRUE) x.ind row col [1,] 4 1 [2,] 6 1 [3,] 8 1 [4,] 4 2 [5,] 5 2 [6,] 7 2 [7,] 8 2 [8,] 5 3 [9,] 6 4 [10,] 7 4 [11,] 8 4 [12,] 3 5 [13,] 8 5 [14,] 5 6 [15,] 7 6 [16,] 1 8 [17,] 5 8 x[x.ind] <- x.f[x.ind[,'col']] x [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,]87767878 [2,]77867677 [3,]77765667 [4,]12768766 [5,]62388688 [6,]17647867 [7,]72847678 [8,]12645886 On Wed, Oct 28, 2009 at 12:55 PM, Marcio Resende wrote: #Mdarts is a matrix 2343x788 #frequencia is a vector 2343x1 # 9 in Mdarts[fri,frj] stands for my missing values which i want to replace by the value in the vector frequencia Mdarts<-t(matrix(scan("C:/GWS/CNB/dartg.txt"),ncol=nindT,nrow=nm, byrow=T)) frequencia <- matrix(scan("C:/GWS/CNB/freq.txt"),ncol=1) for (fri in 1:nindT){ for (frj in 1:nm){ Mdarts[fri,frj] <- if (Mdarts[fri,frj] == 9) frequencia[frj] else Mdarts[fri,frj] Mdarts[fri,frj] <- Mdarts[fri,frj]/1-(frequencia[frj]^2) } } Is there a faster way to it? Maybe using any apply function? Thanks in advance -- View this message in context: http://www.nabble.com/Is-there-a-faster-way-to-do-it--tp26098223p26098223.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] New variables "remember" how they were created?
Your example is too complicated for me. But few points: 1) What do you mean by "instrument"? Do you mean variable? 2) diff(demand) is identical to demand[-1] - demand[-204] 3) system() is a built-in R function, so avoid using it as variable name 4) The variable "yd" is in the eqInvest formula and subsequently to the system formula. The variable "y.1" is in the instruments formula. Both formulas are passed onto systemfit() call. Thus I see no surprises here. Try simplifying and rephrasing please if you want further help. Regards, Adai Skipper Seabold wrote: Hello all, I hope this question is appropriate for this ML. Basically, I am wondering if when you create a new variable, if the variable holds some information about how it was created. Let me explain, I have the following code to replicate an example in a textbook (Greene's Econometric Analysis), using the systemfit package. dta <- read.table('http://pages.stern.nyu.edu/~wgreene/Text/Edition6/TableF5-1.txt', header = TRUE) attach(dta) library(systemfit) demand <- realcons + realinvs + realgovt c.1 <- realcons[-204] y.1 <- demand[-204] yd <- demand[-1] - y.1 eqConsump <- realcons[-1] ~ demand[-1] + c.1 eqInvest <- realinvs[-1] ~ tbilrate[-1] + yd system <- list( Consumption = eqConsump, Investment = eqInvest) instruments <- ~ realgovt[-1] + tbilrate[-1] + c.1 + y.1 # 2SLS greene2sls <- systemfit( system, "2SLS", inst = instruments, methodResidCov = "noDfCor" ) When I do the 2SLS fit, it seems that even though I declared y.1 as an instrument that the estimator "knows" that yd was created using y1, so it (correctly) transforms yd to use the instrument in the final estimation. So I'm wondering if yd somehow carries knowledge of how it was created. Thanks, Skipper __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Selecting rows according to a column
Not very elegant but try: z <- data.frame(a = 1:5, b=10*(1:5), c = c("a", "a", "b", "b", "b") ) z[ cbind( 1:nrow(z), match( as.character(z$c) , colnames(z) ) ) ] If you have very few columns, you can use ifelse() too. Regards, Adai Gurpal Kalsi wrote: Hi, With a data such as: z = data.frame(a = 1:5, b=10*a, c = c("a", "a", "b", "b", "b") ) * a b c* *1* 10 *a* *2* 20 *a* 3 *30* *b* 4 *40* *b* 5 *50* *b* Can anyone suggest a way to select [1, 2, 30, 40, 50], ie. using column "c" to specify which column is selected for each row. Many thanks G [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lost all script
To stop in Rgui mode, you can try pressing the ESC key. If you are using within emacs, change to R buffer and try C-c C-c to stop it. I am not sure how to recover the script (emacs usually makes a .R~ backup). Maybe if you still have the output printed to screen or terminal make a copy of it - you may be able to rewrite with some work. If your machine is backed up on regular basis, then try to get the last available backup. Also note that you can view the same file (even while it is in the R session) using notepad etc externally. So next time, if you face a similar situation then you can check/save externally first. Regards, Adai David Young wrote: Hi all, I just had a rather unpleasant experience. After considerable work I finally got a script working and set it to run. It had some memory allocation problems when I came back so I used Windows to stop it. During that process it told me that the script had been changed and asked if I wanted to save it. Not being positive that I'd saved the very last changes I said yes. Now when I turn on R again the script is now completely blank. I guess my questions are: Is there a way to interrupt a program without using Windows? Is there anyway to recover my script? And a nice to know: Anybody know why it saved blank space as the new script? Thanks for any advice. A humble, and humbled, new R user. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] basic statistics to csv
It would be useful to have a simplified version of the 'nsu' object. I am guessing it is a list of some sort (e.g. mean is single value, quantiles here returns 5 numbers) and not a matrix or dataframe (i.e. regular array). So you can have several choices here: 1) print nsu to a file. e.g. cat(nsu, file="lala", append=T) or using the sequence sink(file="lala"); print(nsu); sink() 2) compile the nsu objects into a list (if generating nsu takes time, you can save each nsu and then have a script to read them all into a list). Then extract the means across the elements in the list (e.g. sapply) and compile into a regular array before using csv. Regards, Adai lanc...@fns.uniba.sk wrote: I know that my question is like a very newbie question, but at the moment I stacked with it and I need a quick solution. I need to make an overall statistical overview of various datasets, the summary() and numSummary() functions are fully sufficient. My question is, how can I export results to a spreadsheet-like file, as a .csv. For the summary() with an "x" dataset I can use this way: su <- summary(x) write.csv(su, file = "summary.csv") The problem with this is that the csv file is rather chaotic. but when I apply the same for the numSummary(x) output like: nsu <- numSummary(x[,c("a", "b", "c")], statistics=c("mean", "sd", "quantiles"), quantiles=c(0,.25,.5,.75,1)) write.csv(nsu, file = "numsummary.csv") I get the "ERROR: cannot coerce class "numSummary" into a data.frame" message. Is there a more convenient way to get a spreadsheet-like output for the basic statistics? Many thanks for any help Tomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] writing R extensions
It sounds like you simply uncompressed your .tar.gz file and then zipped it up. If so, it should not work correctly. You need to compile it for windows. Try something like Rcmd build --binary myRpackageDir and you may need to include "--force" option in the command above. Also check to make sure the R version in the machine you compile on and the machine you install on are recent versions. Regards, Adai micha_ wrote: Hi, I'm working on a package and got some problems. After I've done R CMD check and build I get the package.tar.gz which I can install under Linux without any problems. Now I wanted to have a Windows version. I heard that I only have to zip the package folder. That worked once, but now the package can't be installed. I got 1 warning while I did R CMD check, and this was 1 not documented dataset, but it was also already in the old version that worked. So is there anything I have to take care of for the Windows version? Or is there a way to check what happend. The error message in Windows is this: utils:::menuInstallLocal() updating HTML package descriptions library(mask) Error in library(mask) : 'mask' is not a valid package -- installed < 2.0.0? Can anybody help me with this? Michael __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] request: How can we ignore a component of list having no element
Try x[ !sapply(x, is.null) ] hadley wickham wrote: An alternative approach would be to store 0 x 0 matrices instead of NULLs. This way every object in your list is a consistent type. Hadley On Wed, Oct 15, 2008 at 5:23 AM, Muhammad Azam <[EMAIL PROTECTED]> wrote: Dear friends There is a list of arrays comprising different no of rows and columns even sometimes NULL, such as [[2]] given below. How can we ignore [[2]] or others like this in the complete list. Any help in this regard is needed. Thanks [[1]] [,1] [,2] [1,]31 [2,]31 [3,]31 [[2]] NULL [[3]] [,1] [,2] [,3] [,4] [,5] [,6] [,7] [1,]3100000 [2,]3100000 [3,]3100000 [4,]3131321 [5,]3131321 [6,]3131320 [[4]] [,1] [,2] [,3] [,4] [1,]3000 [2,]3133 [3,]3133 [4,]3130 OR x1=c(1,2,3); x2=c(1,2,3,4,6); x3=c(); x=list(x1,x2,x3) M.Azam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] back transforming output from negative binomial
Ben, fantastic. Thank you for confirming it. One more question. What do you call the back transformed variable? In my domain, people use something called the ratio of mean but I am not sure if it is the same. I am not what the "ratio" is between. Regards, Adai Ben Bolker wrote: Adaikalavan Ramasamy imperial.ac.uk> writes: Dear all, I used the glm.nb with the default values from the MASS package to run a negative binomial regression. Here is a simple example: [snip -- thanks for the example!] The question now is how do I report the results, say, for height? Do I simply take the anti logs. i.e. 1.019613 = exp(0.019423) ? I have seen one paper where they report using anti log base 10 instead of natural base but they use STATA though. Yes, exactly. If you look at ?glm.nb you will see that it uses a log link function, and therefore you should exponentiate (anti-log) to back-transform. Natural, not base-10 logs, are used. Don't forget that back-transforming standard errors by themselves is meaningless, you have to back-transform lower and upper confidence limits ... Ben Bolker __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] back transforming output from negative binomial
Dear all, I used the glm.nb with the default values from the MASS package to run a negative binomial regression. Here is a simple example: set.seed(123) y <- c( rep(0, 30), rpois(70, lambda=2) ) smoke <- factor( sample( c("NO", "YES"), 100, replace=T ) ) height <- c( rnorm(30, mean=100, sd=20), rnorm(70, mean=150, sd=20) ) fit <- glm.nb( y ~ smoke + height ) coef(summary(fit)) Estimate Std. Errorz value Pr(>|z|) (Intercept) -2.34907191 0.537610710 -4.3694664 1.245505e-05 smokeYES-0.03479730 0.197627539 -0.1760751 8.602349e-01 height 0.01942373 0.003527538 5.5063142 3.664243e-08 The question now is how do I report the results, say, for height? Do I simply take the anti logs. i.e. 1.019613 = exp(0.019423) ? I have seen one paper where they report using anti log base 10 instead of natural base but they use STATA though. Please kindly advise. Thank you. Regards, Adai __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OHLC Plot with EMA in it
Can you give us a simple example which produces the same behavior? Michael Zak wrote: Hi there I have some timeseries data which I plot in a OHLC Plot. In the same plot I'd like to have the EMA of this timeseries. I tried to add the EMA point to OHLC with lines(), but this doesn't work. Has anyone an idea how to handle it? Regards, Michael Zak __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calling object outside function
I don't understand why you need to use a function at all, especially when all your function arguments are overwritten inside the loop. Here is a simplified example of what you are doing: f <- function(x){ x <- 5 print(x) } Therefore f(1), f(2), ..., f(1000) etc all gives you the same answer. However, you can set a default value for x, which will allow you to vary it at a later stage if you wish to. f <- function(x=5){ print(x) } So now f() gives 5, f(10) gives 10, ... Similarly, assuming that you want to vary the file, Loc_Mod_TAZ, Dev_Size later, you might be interested in perhaps: loadTestData <- function(file="TAZ_VAC_ACRES.csv", Loc_Mod_TAZ=120, Dev_Size=58){ #Loads TAZ and corresponding vacant acres data TAZ_VAC_ACRES <- read.csv(file=file,header=TRUE); #Determines vacant acres by TAZ TAZDetermine=TAZ_VAC_ACRES[TAZ_VAC_ACRES$TAZ==Loc_Mod_TAZ,2] return(TAZDetermine) } out <- LoadTestData() Regards, Adai PDXRugger wrote: What i thought was a simple process isnt working for me. After i create an multiple objects in a function (see below), how to i use those objects later in the program. I tried calling the function again and then the object i wanted and it worked the first time but now it doesnt( i think i defined the object outside the function accidently so then it worked but when run properly it doesnt). I did this using Testdata(TAZDetermine) to first recall the function then the object i wanted to use. This deosnt work and it errors that the object cannot be found. Do i use attach? this didnt seem to work either. I just want to call an object defined in a function outside of the function. Hope you can help Cheers, JR #Function to create hypothetical numbers for process testing Testdata=function(TAZ_VAC_ACRES,Loc_Mod_TAZ,Dev_Size,TAZDetermine,Dev_Size){ #Loads TAZ and corresponding vacant acres data TAZ_VAC_ACRES= read.csv(file="I:/Research/Samba/urb_transport_modeling/LUSDR/Workspace/BizLandPrice/data/TAZ_VAC_ACRES.csv",header=TRUE); #Test Location Choice Model selected TAZ Loc_Mod_TAZ = 120 #Create test Development Dev_Size=58 #Determines vacant acres by TAZ TAZDetermine=TAZ_VAC_ACRES[TAZ_VAC_ACRES$TAZ==Loc_Mod_TAZ,2] #Displays number of vacant acres in Location Choice Model selected TAZ TAZDetermine } Testdata(TAZDetermine) error indicating the that function cannot be found even thoug its part of the argument list in the main function. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] keep the row indexes/names when do aggregate
Not the most elegant solution but here goes. df <- data.frame(g=c("g1","g2","g1","g1","g2"),v=c(1,7,3,2,8)) rownames.which.max <- function(m, col){ w <- which.max( m[ , col] ) return( rownames(m)[w] ) } df.split <- split(df, df$g) ws <- sapply( df.split, rownames.which.max, col="v" ) ws g1 g2 "3" "5" df[ws, ] g v 3 g1 3 5 g2 8 Regards, Adai zhihuali wrote: Hi, R-users, If I have a data frame like this: x<-data.frame(g=c("g1","g2","g1","g1","g2"),v=c(1,7,3,2,8)) g v 1 g1 1 2 g2 7 3 g1 3 4 g1 2 5 g2 8 It contains two groups, g1 and g2. Now for each group I want the max v: aggregate(x$v,list(g=x$g),max) g x 1 g1 3 2 g2 8 Beautiful. But what if I want to keep the row index of (g1 3) and (g2 8) in the original x? So I want is: do something g x 3 g1 3 5 g2 8 Of course it'd may make much more sense if the row indexes are some row names that I want to keep. Is there a simple way to do that? Thanks a lot! Z _ [[elided Hotmail spam]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there a single command that can revert all the plotting parameters to default?
What do you mean? If you kill the existing graph, perhaps using dev.off(), the next plot generated should use default values. Is this what you want? Some plotting functions use this at the start before modifying oldpar <- par(no.readonly=T) on.exit(par(oldpar)) Regards, Adai Regards, Adai Arthur Roberts wrote: Hi, all, This might be a stupid question. Is there a single command in R that can revert parameters to default? It is much appreciated. Best wishes, Art Roberts University of Washington __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] looping through variables
Perhaps what you want is get(). apple <- rnorm(5) orange <- runif(5) fruits <- c("apple", "orange") fruit.data <- NULL for( fruit in fruits ){ v <- get(fruit) fruit.data <- cbind(fruit.data, v) } colnames(fruit.data) <- fruits fruit.data Here the resulting output is a matrix which works if all of your inputs have the same length. If they don't, then you probably want to use a list instead. Also have a look at assign(). Regards, Adai K. Fleischer wrote: Hello everyone, I have the following problem: My analysis includes many predictor variables (>50) in the form of raster maps (asc), but I am trying to avoid having to type all their names over and over again in the analysis (e.g. for vectorisation, for deletion of NA's, etc.) So ideally I would like to store them in some way that their names only have to be typed once and can always be referred back to. First step would be to automate the vectorisation of the raster maps: # these are the raster maps which need to combined somehow ?? variables <- (temperature, precipitation, elevation, vegcover) VariablesNew=c() For (i in 1:length(variables)) { Varnew <- as.vector(variables[i]) VariablesNew <- cbind(VariablesNew, Varnew) } This should return a data frame called VariablesNew with each column representing one of the variables. So the BIG QUESTION is how to input the variable names that they can be referred to easily and, the variable itself can be pulled out and not just its name!! I believe this cant be too difficult?? Thanx in advance, Katrin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] t tests/ANOVA
First check that your data satisfies the normality assumption. If yes, then start with the ANOVA test summary( fit <- aov( genomes ~ clonefed ) ) and *if* you find a significant F-value, you can see which difference is significant. i.e. post-hoc analysis. TukeyHSD( fit, "clonefed" ) You can use help("aov") etc to find out more details including examples. Regards, Adai Georgina Sarah Humphreys wrote: I have a set of data that comprises genome numbers in single eggs from three different parasite clones - 3D7, HB3, and MIX. I can draw a boxplot of the genome numbers for each clonefed but how do I carry out a t test or ANOVA to compare if the means are signifcantly different? (Data is listed below) Many thanks, Georgina Humphreys clonefedgenomes HB3 21.3 HB3 23.5 HB3 25.9 3D7 27.2 HB3 28.1 MIX 35.1 MIX 37.9 MIX 42.1 MIX 42.4 HB3 46.3 HB3 46.3 MIX 48.4 MIX 52.1 HB3 54.6 MIX 55.4 3D7 57.6 HB3 58.4 3D7 62.1 MIX 63.6 MIX 66.5 3D7 69.1 3D7 76.2 MIX 77.5 MIX 80.4 MIX 85.5 MIX 85.9 HB3 96 HB3 106.3 3D7 108.1 MIX 113.8 MIX 117.4 MIX 118 3D7 122.8 3D7 131.4 MIX 138.7 MIX 142.6 MIX 143 3D7 144 MIX 151.6 MIX 155.2 MIX 162.4 MIX 168.4 MIX 169.3 3D7 172.3 HB3 173 HB3 191.9 MIX 192.7 HB3 200 MIX 206.3 3D7 210.2 HB3 223.7 HB3 223.9 3D7 232.1 HB3 238.6 MIX 240.8 3D7 254.3 3D7 257.6 3D7 261.8 3D7 269.9 HB3 277 MIX 289.1 MIX 293.2 MIX 295.2 MIX 295.7 MIX 310.4 3D7 311.9 3D7 311.9 MIX 313.1 MIX 317.8 MIX 332.2 3D7 334.9 3D7 338.2 MIX 340 MIX 360.5 3D7 372.8 3D7 376.6 HB3 390.3 MIX 419.1 3D7 420 MIX 427.4 MIX 443 MIX 449.7 MIX 452.8 MIX 501.4 3D7 502.9 3D7 505.5 3D7 506.3 3D7 529 MIX 534.4 MIX 540.6 MIX 542 3D7 545.2 MIX 547.2 MIX 554.2 MIX 556.5 3D7 564.9 3D7 575.1 3D7 580.6 MIX 591.5 3D7 655.5 3D7 666.1 3D7 667.2 3D7 699 3D7 741.2 3D7 744.8 3D7 752.2 MIX 795.9 3D7 810.9 HB3 816.4 MIX 849.2 3D7 852.9 3D7 875.4 3D7 891.3 MIX 906.5 MIX 922.3 MIX 949.6 MIX 986.1 MIX 994.3 MIX 1005.3 MIX 1061.3 MIX 1159.5 3D7 1163.2 MIX 1177.5 3D7 1211.3 3D7 1249.7 3D7 1318.3 MIX 1579.3 MIX 1585.2 MIX 1590.3 MIX 1788.7 MIX 2012.9 3D7 2067.4 PhD Student Division of Infection and Immunity B5-29, GBRC 120 University Place Glasgow G12 8TA Tel: 0141 330 5650 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lower / upper case letters in a plot
An example would help. You generally control the titles using arguments like main, xlab, ylab, sub in the plotting functions or afterwards using title() function. You can get the upper/lower case using toupper()/tolower() functions. See help(par), help(title), help(tolower). Here is an example: string <- "My x-axis corresponding to something" plot( rnorm(10), xlab=toupper(string) ) Regards, Adai Jörg Groß wrote: Hi, How can I generate lower case letters for my axis-titles? Thanks, Jörg __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Changing a plot
One way is to keep a copy of the original and then return to it when you need it. x <- rnorm(100,1,0.5) y <- rnorm(100,1,0.5) plot(x,y,pch=16) original <- recordPlot() for( i in 1:10 ){ points( x[i], y[i], pch=19, col="yellow", cex=3) points( x[i], y[i], pch=16) Sys.sleep(1) # slow the graphs a bit replayPlot(original) } Regards, Adai R Help wrote: Hello list, I've been working on this problem for a while and I haven't been able to come up with a solution. I have a couple of functions that plot a bunch of data, then a single point on top of it. What I want is to be able to change the plot of the point without replotting all the data. Consider the following example: x = rnorm(100,1,0.5) y = rnorm(100,1,0.5) plot(x,y,pch=16) points(x[35],y[35],pch=19,col=6,cex=3) What I want to be able to do is to change the purple point to a different value without replotting everything. I know this seems like an odd suggestion, but it comes up a lot with the work I'm doing. I've prepared a package on CRAN called ResearchMethods for a course I'm working on, and there are several functions in there who's GUIs could work better if I could figure this out. If anyone has any ideas, or needs some further explanation, feel free to contact me. Thanks a lot, Sam Stewart __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rowSums()
I guess this would be the fastest way would be: rs <- rowSums( testDat, na.rm=T) rs[ which( rowMeans(is.na(testDat)) == 1 ) ] <- NA since both rowSums and rowMeans are internally coded in C. Regards, Adai Doran, Harold wrote: Say I have the following data: testDat <- data.frame(A = c(1,NA,3), B = c(NA, NA, 3)) testDat A B 1 1 NA 2 NA NA 3 3 3 rowsums() with na.rm=TRUE generates the following, which is not desired: rowSums(testDat[, c('A', 'B')], na.rm=T) [1] 1 0 6 rowsums() with na.rm=F generates the following, which is also not desired: rowSums(testDat[, c('A', 'B')], na.rm=F) [1] NA NA 6 I see why this occurs, but what I hope to have returned would be: [1] 1 NA 6 To get what I want I could do the following, but normally my ideas are bad ideas and there are codified and proper ways to do things. rr <- numeric(nrow(testDat)) for(i in 1:nrow(testDat)) rr[i] <- if(all(is.na(testDat[i,]))) NA else sum(testDat[i,], na.rm=T) rr [1] 1 NA 6 Is there a "proper" way to do this? In my real data, nrow is over 100,000 Thanks, Harold sessionInfo() R version 2.7.2 (2008-08-25) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] MiscPsycho_1.2 lattice_0.17-13 statmod_1.3.6 loaded via a namespace (and not attached): [1] grid_2.7.2 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] selecting dataframe values that are not nulls
Ramya, you sent four near identical emails with different subject lines. Since the list is run by unpaid volunteers, please avoid wasting people's time (and yours too) with such redundancies. Please read http://www.r-project.org/posting-guide.html and search the mailing lists and documentations. Did you receive the replies to your 1st request from miltinho and Moshe? If not, have a look at help(merge) with the all.x, all.y and all argument. You might also be interested in unique, is.na, list. Regards, Adai Rajasekaramya wrote: Hi, I have a dataframe with 14319rows and 9 colums. for some rows there are null values.I want a dataframe without these null values.I wanna select only those that have values !=NA. kindly let me know how to do that. Ramya __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to keep up with R?
I agree! The best way to learn (and remember for longer) is to teach someone else about it. And there is not reason not to repeat some of the anlysis done on SAS with R. That way you can verify your outputs or compare the presentations. If you consistently find differences in the outputs, then trying to figure out the reason may lead you to better understand the methods (e.g. different optimization or estimation procedures). Regards, Adai Barry Rowlingson wrote: 2008/9/19 Wensui Liu <[EMAIL PROTECTED]>: Dear Listers, I've been a big fan of R since graduate school. After working in the industry for years, I haven't had many opportunities to use R and am mainly using SAS. However, I am still forcing myself really hard to stay close to R by reading R-help and books and writing R code by myself for fun. But by and by, I start realizing I have hard time to keep up with R and am afraid that I would totally forget how to program in R. I really like it and am very unwilling to give it up. Is there any idea how I might keep touch with R without using it in work on daily basis? I really appreciate it. How about doing some kind of presentation on R at your work? It's possible that some of the old fossils don't even know about it at all, and use SAS because to them the alternative is SPSS. Do some R evangelization. Find a task that R does better than SAS (not difficult) and illustrate that to your superiors. Then when they ask how much a corporate R license is, you tell them it's free, or say it'll cost them a 2% raise in your salary, or say it will cost them your resignation if you are feeling brave! Sure you may be tied to SAS for some other reasons, but there's no reason why you can't use R for other things. Work out how to get it into your corporate framework. Encourage your colleagues to look at it for their tasks. Enthuse. The good thing about training and evangelization is that at first you don't need mad skillz at R to do it. I have trouble understanding some of the tips on R-help (especially when do.call() is used), but you can teach new people with a good knowledge of the basics, which you should still have. Eventually the hope is that enough people use R at your workplace to develop a community where everyone keeps everyone else on their toes with R questions! Good luck! Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unix-type commandline keystrokes in the windows RGUI
Ah, I didn't realize Rterm existed (Start -> Run -> Rterm). It works with CTRL-R as you said. Thank you! Regards, Adai Peter Dalgaard wrote: Adaikalavan Ramasamy wrote: ... Anyway, here is how to do what you want: 1) Install bash on your Windows machine - You can use cgywin. Or download and unzip http://www.steve.org.uk/Software/bash/ 2) Make the directory to bash.exe and R.exe are in your PATH variable. 3) Start -> Run -> cmd 4) Start R.exe and now you should have your CTRL-R functionality (along with ls and other bash goodies). Yes, I know you asked about Rgui.exe and not R.exe. But this is the best I can do. Er, I don't think you need bash nor cygwin for this, do you? It is not normal that the shell has any influence on programs that run under it. Plain Rterm in a console should do it, if and only if linked agaist libreadline, which I believe it is. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unix-type commandline keystrokes in the windows RGUI
Well, I don't see why you need the CTRL-R functionality when you can just as rapidly and efficiently using SEARCH functionality in scripts too (CTRL-F in most applications, CTRL-S in emacs etc). BTW, I am quite familiar with Unix, Linux and Sun Solaris and what CTRL-R does (yes, I used it frequently). Which is why I am able to tell you that CTRL-R will pull up all matching commands - even commands that had failed! At least in a script environment, you tend to correct failed commands. So you know when you search scripts, it will likely be the correct command. To summarize my view, I feel that CTRL-R is appropriate for shell operations where one codes on the fly while using a search functionality and scripting is appropriate for a scientific programming software. Anyway, here is how to do what you want: 1) Install bash on your Windows machine - You can use cgywin. Or download and unzip http://www.steve.org.uk/Software/bash/ 2) Make the directory to bash.exe and R.exe are in your PATH variable. 3) Start -> Run -> cmd 4) Start R.exe and now you should have your CTRL-R functionality (along with ls and other bash goodies). Yes, I know you asked about Rgui.exe and not R.exe. But this is the best I can do. By all means go bother the R developers (most of whom I suspect are on the mailing list). I will be interested in what they say. Regards, Adai mfrumin wrote: Adaikalavan, thanks. Perhaps I was not so specific enough in what I want, for those not so familiar with unix commandline featuers. I'm looking for the 'reverse search' functionality where you hit CTRL-R, then start typing a bit of text and it finds previous commands with that bit of text, which you just hit enter to execute. I already do write tons of code/scripts in R (using Emacs in fact!). But one of the great features of R/SPSS/Matlab/etc is that they are interactive environments. Thus, I spend lots of time issuing commands as well as writing code. I want to be able to search back through those commands as rapidly and efficiently as you can in the unix (and R unix) commandline. Another way to think about this is -- the unix commandline environment is a scripting environment where you can use emacs. Yet users of unix love the CTRL-R functionality anyway (they wrote it!). So, any suggestions to help do what I specifically asked, or should I go bother the R developers? thanks, Mike __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unix-type commandline keystrokes in the windows RGUI
Why not use a script? I feel that it is much better than using the history via [CTRL]-R in unix, which also pulls up errorneous commands. A script is vital for statistical analysis and research where you may want to or be asked to repeat or reproduce the analysis months later. Rgui (on windows) has a built in script editor. There are many external editors capable of working with R. My recommendation is to use emacs via ESS (emacs speaks statistics) which works in most, if not all, operating systems and has a Unix feel to it. If you insist on wanting to use [CTRL]-R like features, then have a look at history() within R. You can also try installing cgywin or bash etc and see if that works from the DOS prompt. Regards, Adai mfrumin wrote: Hi all, I am generally quite fond of the unix commandline keystrokes (e.g. searching back in your history with [CTRL]-R, and cutting/pasting with [CTRL]-K/Y) which work in the R commandline in *nix. Does anyone know if there's any way to get similar functionality in the Windows RGUI? I know that as of now, [CTRL]-A and -E do the same as unix (beginning and end of line) and [CTRL]-Y does a paste, but [CTRL]-K crops from the cursor to the end of the line but doesn't put the text into the clipboard. the most important thing I want is the [CTRL]-R functionality which is so poorly approximated by pressing the up arrow a million times. I've searched on the archives and didn't find anything about this. Any thoughts? Thanks, Mike __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] histogram
If I understand you correctly, you already pre-computed the frequencies and bin widths and want to display them as a histogram. If correct, then what you are asking for is analogous to what bxp() is to boxplot. I am not sure if such a function exists. Instead you can think of the task as drawing a bunch of rectangles (perhaps using symbols?). Or you can hack the hist() code and try br<- c(0,20,30,40,50,60,70,80,100) dens <- runif( length(br) - 1 ) r <- structure(list(breaks = br, density = dens), class = "histogram") plot(r, main="Felipe's Histogram") However, I do emphasize that this is a hack. If you have the original data that you used to calculate the densities, consider using the breaks argument with hist(). It is better to use tried and tested codes. Regards, Adai Felipe wrote: i calculated the density and wanna do something like this separate in 0-19-29-39-49-59-69-79-99 and put in these spaces 8 densities .. 0.something i have the frequency in % and divided already in 20 or 10 to get the density i tried and tried..made breaks vector to separate but couldn't put the other vector with the frequency density onit directly anyone know how to do it?? tks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] difference of two data frames
It would be useful to have indexed both dataframes with a unique identifier, such as in rownames etc. Without that information, you could possibly try to use the same approach as duplicated() does by "pasting together a character representation of rows" using "|" (or any other separator). keys1 <- apply(DF1, 1, paste, collapse="|") keys1 [1] "1|a" "2|b" "3|c" "4|d" "5|e" "6|f" duplicated(keys1) [1] FALSE FALSE FALSE FALSE FALSE FALSE keys2 <- apply(DF2, 1, paste, collapse="|") keys2 [1] "1|a" "2|b" "3|c" duplicated(keys2) [1] FALSE FALSE FALSE The duplicated part is neccessary to ensure the key generated is truly unique. You might want to experiment and see if you can create a unique key using just a few columns. keys1 %in% keys2 [1] TRUE TRUE TRUE FALSE FALSE FALSE w <- setdiff( keys1, keys2 ) DF1[ w, ] V1 V2 4 4 d 5 5 e 6 6 f Regards, Adai joseph wrote: Hi Jorge both commands work; can you extend it to several coulmns? the reason I am asking is that in my real data the uniqueness of the rows is made of all the columns; in other words V1 might have duplicates. Thanks - Original Message From: Jorge Ivan Velez <[EMAIL PROTECTED]> To: joseph <[EMAIL PROTECTED]> Cc: r-help@r-project.org Sent: Sunday, September 14, 2008 10:23:33 AM Subject: Re: [R] difference of two data frames Hi Joseph, Try this: DF1[!DF1$V1%in%DF2$V1,] subset(DF1,!V1%in%DF2$V1) HTH, Jorge On Sun, Sep 14, 2008 at 12:49 PM, joseph <[EMAIL PROTECTED]> wrote: Hello I have 2 data frames DF1 and DF2 where DF2 is a subset of DF1: DF1= data.frame(V1=1:6, V2= letters[1:6]) DF2= data.frame(V1=1:3, V2= letters[1:3]) How do I create a new data frame of the difference between DF1 and DF2 newDF=data.frame(V1=4:6, V2= letters[4:6]) In my real data, the rows are not in order as in the example I provided. Thanks much Joseph [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Again, about boxplot
I just changed the 'at' argument and added an 'xlim' option. boxplot(len ~ dose, data = ToothGrowth, boxwex = 0.25, at = 1:3, subset = which(supp == "VC"), col = "yellow", main = "Guinea Pigs' Tooth Growth", xlab = "Vitamin C dose mg", ylab = "tooth length", xlim=c(1, 6), ylim=c(0, 35), yaxs="i") abline(v=3.5) boxplot(len ~ dose, data = ToothGrowth, add = TRUE, boxwex = 0.25, at = 1:3 + 3, subset = which(supp == "OJ"), col = "orange") legend("bottomright", c("Ascorbic acid", "Orange juice"), fill = c("yellow", "orange")) Regards, Adai cathelf wrote: Thank you for your guys reply for my previous question. But I got one more question about the boxplot. With the code in the R-help: boxplot(len ~ dose, data = ToothGrowth, boxwex = 0.25, at = 1:3 - 0.2, subset = supp == "VC", col = "yellow", main = "Guinea Pigs' Tooth Growth", xlab = "Vitamin C dose mg", ylab = "tooth length", ylim = c(0, 35), yaxs = "i") boxplot(len ~ dose, data = ToothGrowth, add = TRUE, boxwex = 0.25, at = 1:3 + 0.2, subset = supp == "OJ", col = "orange") legend(2, 9, c("Ascorbic acid", "Orange juice"), fill = c("yellow", "orange")) I got 6 boxplots, which is ordered as "0.5, 0.5, 1, 1, 2, 2" How can I reorder the 6 boxplots as "0.5, 1, 2, 0.5, 1, 2"? Thank you very much! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsetting of factor
help(rowttests) says that fac needs to be a factor. So how about ? m <- matrix( rnorm(30), nc=6 ) genotype <- c("a", "a", "b", "b", "c", "c") w1 <- which( genotype %in% c("a", "b") ) w2 <- which( genotype %in% c("a", "c") ) w3 <- which( genotype %in% c("b", "c") ) list( ab = rowttests( m[ , w1], factor( genotype[w1] ) ), ac = rowttests( m[ , w2], factor( genotype[w2] ) ), bc = rowttests( m[ , w3], factor( genotype[w3] ) ) ) Regards, Adai Hui-Yi Chu wrote: Dear R list, I think my question maybe easy for you but I really spent entire day to resolve it. Say I have a matrix, rows are 6000 genes, columns(1-6) are 3 genotypes (a,b,c) with 2 repeat. I have to use two groups each time for t-test, a vs. c or b vs. c, but I dont know how to write correct codes. Below is my codes, the last two lines are needed to be corrected library("genefilter") ef <- exprs(esetsub) kk <- factor(esetsub$genotype == c("a", "c")) tt <- rowttests(ef[,c(1,2,5,6)], kk) ps. column 1-6 is a,a,b,b,c,c depending on the document, the kk should be a factor.. Any suggestions are really appreciated!! Best regards, Hui-Yi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculate mean/var by ID
AFAIK, tapply() only works for one variable (apart from the grouping variable). It might be perhaps better to use split() here: df <- data.frame(ID = c(111, 111, 111, 178, 178, 138, 138, 138, 138), value = c(5, 6, 2, 7, 3, 3, 8, 7, 6), Seg = c(2, 2, 2, 4, 4, 1, 1, 1, 1) ) df.s <- split( df, df$ID ) out <- sapply( df.s, function(m){ c( mu=mean(m$value), var=var(m$value), min=min(m$Seg), max=max(m$Seg) ) }) out <- t(out) mu var min max 111 4.33 4.33 2 2 138 6.00 4.67 1 1 178 5.00 8.00 4 4 You could also have used range() here instead of calculating min and max separately but naming the resulting columns becomes a bit tricky. Regards, Adai PS: If you do a dput() on a subset of the data, you can get a simple reproducible example that other R users can easily read in. Julia Liu wrote: Adai, Thank you so much for your help. I like your code the best. :) So simple. I have another question though, if you don't mind. I'd like to include another variable in "res". This variable defines the segmentation of each person (ranges, say, from 1 to 4). ID value Seg 111 5 2 111 6 2 111 2 2 178 7 4 178 3 4 138 3 1 138 8 1 138 7 1 138 6 1How to do this? Thank you so much for the help. Sincerely Julia --- On Thu, 9/11/08, Adaikalavan Ramasamy <[EMAIL PROTECTED]> wrote: From: Adaikalavan Ramasamy <[EMAIL PROTECTED]> Subject: Re: [R] Calculate mean/var by ID To: "Jorge Ivan Velez" <[EMAIL PROTECTED]> Cc: "liujb" <[EMAIL PROTECTED]>, r-help@r-project.org Date: Thursday, September 11, 2008, 10:28 PM A slight variation of what Jorge has proposed is: f <- function(x) c( mu=mean(x), var=var(x) ) do.call( "rbind", tapply( df$value, df$ID, f ) ) mu var 111 4.33 4.33 138 6.00 4.67 178 5.00 8.00 Regards, Adai Jorge Ivan Velez wrote: Dear Julia, Try also x=read.table(textConnection("IDvalue 111 5 111 6 111 2 178 7 178 3 138 3 138 8 138 7 138 6"),header=TRUE) closeAllConnections() attach(x) do.call(rbind,tapply(value,ID, function(x){ res=c(mean(x,na.rm=TRUE),var(x,na.rm=TRUE)) names(res)=c('Mean','Variance') res } ) ) HTH, Jorge On Thu, Sep 11, 2008 at 1:45 PM, liujb <[EMAIL PROTECTED]> wrote: Hello, I have a data set that looks like this. IDvalue 111 5 111 6 111 2 178 7 178 3 138 3 138 8 138 7 138 6 . . . I'd like to calculate the mean and var for each object identified by the ID. I can in theory just loop through the whole thing..., but is there a easier way/command which let me calculate the mean/var by ID? Thanks, Julia -- View this message in context: http://www.nabble.com/Calculate-mean-var-by-ID-tp19440461p19440461.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculate mean/var by ID
A slight variation of what Jorge has proposed is: f <- function(x) c( mu=mean(x), var=var(x) ) do.call( "rbind", tapply( df$value, df$ID, f ) ) mu var 111 4.33 4.33 138 6.00 4.67 178 5.00 8.00 Regards, Adai Jorge Ivan Velez wrote: Dear Julia, Try also x=read.table(textConnection("IDvalue 111 5 111 6 111 2 178 7 178 3 138 3 138 8 138 7 138 6"),header=TRUE) closeAllConnections() attach(x) do.call(rbind,tapply(value,ID, function(x){ res=c(mean(x,na.rm=TRUE),var(x,na.rm=TRUE)) names(res)=c('Mean','Variance') res } ) ) HTH, Jorge On Thu, Sep 11, 2008 at 1:45 PM, liujb <[EMAIL PROTECTED]> wrote: Hello, I have a data set that looks like this. IDvalue 111 5 111 6 111 2 178 7 178 3 138 3 138 8 138 7 138 6 . . . I'd like to calculate the mean and var for each object identified by the ID. I can in theory just loop through the whole thing..., but is there a easier way/command which let me calculate the mean/var by ID? Thanks, Julia -- View this message in context: http://www.nabble.com/Calculate-mean-var-by-ID-tp19440461p19440461.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to load functions in R
Strange. source() should read all the function in that file unless there was a syntax error or something else preventing the other function from being parsed correctly. Could you send us a simplified example that reproduces this problem? Thanks. Regards, Adai [EMAIL PROTECTED] wrote: Hello, It seems that all methods work. Source() however loads only the last function. with save(a,b,file="path") i can save more than 1 function. Thanks a lot, Mihai -Ursprüngliche Nachricht- Von: Yihui Xie [mailto:[EMAIL PROTECTED] Gesendet: Donnerstag, 11. September 2008 16:48 An: [EMAIL PROTECTED] Cc: Mirauta, Mihai; r-help@r-project.org Betreff: Re: [R] How to load functions in R We may just read them in the R console instead of an external editor, and "fix()" or "edit()" them when we need to make any modifications. A trivial advantage of saving them as an image file in Windows is that you can double-click the file and R will be started with these objects loaded automatically. Anyway, to save the functions as ASCII files or even write a package are also good solutions :-) Regards, Yihui On Thu, Sep 11, 2008 at 10:34 PM, Adaikalavan Ramasamy <[EMAIL PROTECTED]> wrote: I would recommend saving the functions into a separate file and then using source() as bartjoosen suggested. I do not recommend using save() here because the output is non-readable (even when using ascii=TRUE option). Which means that you have to load() it, then copy-and-paste into an editor before making changes and then running it again in R and then save() again. Another better option is to consider making your own package. It may sound complicated but once you mastered it, it makes your functions more portable and encourages you to document it. Further, the function package.skeleton() simplifies much of it. Regards, Adai Yihui Xie wrote: Hi, you may save your functions somewhere on your disk using "save()" and load them next time when you want to use them. See ?save and ?load Yihui On Thu, Sep 11, 2008 at 9:30 PM, <[EMAIL PROTECTED]> wrote: Hello, I am trying to use self created functions in other scripts than the one where they are stored. For the moment I am using the following structure of commands to do that: 1. Load the text file with the functions in the current script: x=parse("path") 2. transform the tex in a function: f1=eval(x[1]), f2=eval(x[2]) if more than one function is stored in the text file 3. use the functions as normal Is there another possibility to do the same? Thank you, Mihai Mirauta [[alternative HTML version deleted]] -- Yihui Xie <[EMAIL PROTECTED]> Phone: +86-(0)10-82509086 Fax: +86-(0)10-82509086 Mobile: +86-15810805877 Homepage: http://www.yihui.name School of Statistics, Room 1037, Mingde Main Building, Renmin University of China, Beijing, 100872, China __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to load functions in R
I would recommend saving the functions into a separate file and then using source() as bartjoosen suggested. I do not recommend using save() here because the output is non-readable (even when using ascii=TRUE option). Which means that you have to load() it, then copy-and-paste into an editor before making changes and then running it again in R and then save() again. Another better option is to consider making your own package. It may sound complicated but once you mastered it, it makes your functions more portable and encourages you to document it. Further, the function package.skeleton() simplifies much of it. Regards, Adai Yihui Xie wrote: Hi, you may save your functions somewhere on your disk using "save()" and load them next time when you want to use them. See ?save and ?load Yihui On Thu, Sep 11, 2008 at 9:30 PM, <[EMAIL PROTECTED]> wrote: Hello, I am trying to use self created functions in other scripts than the one where they are stored. For the moment I am using the following structure of commands to do that: 1. Load the text file with the functions in the current script: x=parse("path") 2. transform the tex in a function: f1=eval(x[1]), f2=eval(x[2]) if more than one function is stored in the text file 3. use the functions as normal Is there another possibility to do the same? Thank you, Mihai Mirauta [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsetting the gene list
Have you tried reading some of the material from the BioConductor workshop http://bioconductor.org/workshops/ ? Here is a simplistic way of proceeding: ## Calculate pvalues from t-test p <- apply( mat, function(x) t.test( x ~ cl )$p.value ) ## Subset mat.sub <- mat[ p, ] ## Cluster heatmap(m) Regards, Adai Abhilash Venu wrote: Hi all, I am working on a single color expression data using limma. I would like to perform a cluster analysis after selecting the differentially genes based on the P value (say 0.001). As far as my knowledge is concerned I have to do the sub setting of these selected genes on the normalized data (MA), to retrieve the distribution across the samples. But I am wondering whether I can perform using the R script? I would appreciate any help. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] give all combinations
Yuan Jian, sending 9 emails within the span of few seconds all with similar text is very confusing to say the least! Carl, look up combinations() and permutations() in the gtools package. For two case scenario, you can use combinations() v <- c("a","b","c") library(gtools) tmp <- combinations(3, 2, v,repeats=TRUE) apply( tmp, 1, paste, collapse="" ) [1] "aa" "ab" "ac" "bb" "bc" "cc" For more than two cases, I don't know of an elegant way except to generate all possible permutations and then eliminate those with the same ingredients. This function will be slow for large numbers! multiple.combinations <- function( vec, times ){ input <- vector( mode="list", times ) for(i in 1:times) input[[i]] <- vec out <- expand.grid( input ) out <- apply( out, 1, function(x) paste( sort(x), collapse="" ) ) unique.out <- unique(out) return(unique.out) } multiple.combinations( v, 3 ) [1] "aaa" "aab" "aac" "abb" "abc" "acc" "bbb" "bbc" "bcc" "ccc" multiple.combinations( v, 6 ) "aa" "ab" "ac" "bb" "bc" "cc" "aaabbb" "aaabbc" "aaabcc" "aaaccc" "aa" "aabbbc" "aabbcc" "aabccc" "aa" "ab" "ac" "abbbcc" "abbccc" "ab" "ac" "bb" "bc" "cc" "bbbccc" "bb" "bc" "cc" Regards, Adai Carl Witthoft wrote: I seem to be missing something here: given a set X:{a,b,c,whatever...} the mathematical definition of 'permutation' is the set of all possible sequences of the elements of X. The definition of 'combination' is all elements of 'permutation' which cannot be re-ordered to become a different element. example: X:{a,b,c} perm(X) = ab, ab, bc, ba, ca, cb comb(X) = ab, ac, bc So maybe a better question for this mailing list is: Are there functions available in any R package which produce perm() and comb() (perhaps as well as other standard combinatoric functions) ? Carl __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cluster
Try reading help(hclust) and help(matplot) and run the examples given in the documentation. If that doesn't work, try posting again with a simple reproducible example. Regards, Adai Marco Chiapello wrote: Hi all, I'm trying to do a cluster analysis,but I don't know if it's possible in the way that I want. I have a data set like the follow: 115/114 116/114 117/114 0.45 0.72 0.41 1.16 0.63 0.91 0.42 0.94 0.61 My real data set is, just a bit bigger, 610 entries. I want plot each row on the same graph, like a line (see the attach file). Then if it's possible I want perform a cluster analysis. The final perfect result would be a graph with many lines, with the cluster line in the same color. Any advice? Marco __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Maintaining repeated ID numbers when transposing with reshape
Not the prettiest code but it returns what you want. Might be slow for large dataframes. df <- data.frame( ID=c(1,1,1,1,2,2), TEST=c("A","A","B","C","B","B"), RESULT=c(17,12,15,12,8,9) ) big.out <- list(NULL) for( uID in unique(df$ID) ){ m <- df[ df$ID == uID, , drop=FALSE ] run.order <- unlist(sapply( table(m$TEST), function(x) if(x > 0) 1:x) ) m <- cbind( m, run.order=run.order ) nr <- max(run.order) out <- matrix( nr=nr, nc=nlevels(m$TEST), dimnames=list( rep(uID, nr), levels(m$TEST) )) for(i in 1:nrow(m)) out[ m$run.order[i], m$TEST[i] ] <- m$RESULT[i] big.out[[uID]] <- out } do.call( "rbind", big.out ) A B C 1 17 15 12 1 12 NA NA 2 NA 8 NA 2 NA 9 NA Regards, Adai jcarmichael wrote: Thank you for your suggestion, I will play around with it. I guess my concern is that I need each test result to occupy its own "cell" rather than have one or more in the same row. Adaikalavan Ramasamy-2 wrote: There might be a more elegant way of doing this but here is a way of doing it without reshape(). df <- data.frame( ID=c(1,1,1,1,2,2), TEST=c("A","A","B","C","B","B"), RESULT=c(17,12,15,12,8,9) ) df.s <- split( df, df$ID ) out <- sapply( df.s, function(m) tapply( m$RESULT, m$TEST, paste, collapse="," ) ) t(out) A B C 1 "17,12" "15" "12" 2 NA "8,9" NA Not the same output as you wanted. This makes more sense unless you have a reason to priotize 17 instead of 12 in the first row. Regards, Adai jcarmichael wrote: I have a dataset in "long" format that looks something like this: ID TESTRESULT 1 A 17 1 A 12 1 B 15 1 C 12 2 B 8 2 B 9 Now what I would like to do is transpose it like so: IDTEST ATEST BTEST C 1 17 15 12 1 12.. 2 . 8. 2 . 9. When I try: reshape(mydata, v.names="result", idvar="id",timevar="test", direction="wide") It gives me only the first occurrence of each test for each subject. How can I transpose my dataset in this way without losing information about repeated tests? Any help or guidance would be appreciated! Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple programming problem with 'with' or 'data ='
with(gdsgraph, boxplot(BReT3T5T ~ gds3lev)) is equivalent to boxplot(gdsgraph$BReT3T5T ~ gdsgraph$gds3lev), so all the terms are found. gds1lev is not found within the scope of the function gdsbox(). This function has too many hard codings to be of much use generically. Consider rewriting it as (untested function) gdsbox <- function(y, x, data, main.title=NULL, ...){ if(is.null(main.title)){ main.title <- paste('Boxplot of', substitute(y), "for GDS groups") } boxplot( data$y ~ data$x, main=main.title, ... ) } but why bother when you call the more robust, well tested and well documented boxplot() ? If you still want help, can you give a simple reproducible example? Regards, Adai Peter Flom wrote: Hello I wrote a simple program to modify a boxplot: <<< gdsbox <- function(indvar){ boxplot(indvar~gds3lev, main = paste('Boxplot of', substitute(indvar), "for GDS groups"), names = c('1', '3', '4, 5, 6')) } If I attach the dataframe gdsgraph, this works fine. However, I've been warned against attach. When I tried to run this program using 'with', it did not work. e.g. with(gdsgraph, gdsbox(BReT3T5T)) produced an error that gds3level was not found but if I try with(gdsgraph, boxplot(BReT3T5T~gds3lev)) it works fine. Similar problems occurred when I tried to use data = What am I missing? Thanks Peter Peter L. Flom, PhD Brainscope, Inc. 212 263 7863 (MTW) 917 488 7176 (ThF) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] howto optimize operations between pairs of rows in a single matrix like cor and pairs
Thank you to Jim and Moshe. I will try the Rprof option as well as the running the function to run on columns instead. Thank you. jim holtman wrote: Use Rprof to see where time is being spent. If it is in FUN, then there is probably no way to "optimize" outside of changing the way FUN works. So the first thing is to decide where time is being spent. On Sun, Aug 24, 2008 at 6:35 PM, Adaikalavan Ramasamy <[EMAIL PROTECTED]> wrote: Hi, I calculating the output of a function when applied to pairs of row from a single matrix or dataframe similar to how cor() and pairs() work. This is the code that I have been using: pairwise.apply <- function(x, FUN, ...){ n <- nrow(x) r <- rownames(x) output <- matrix(NA, nc=n, nr=n, dimnames=list(r, r)) for(i in 1:n){ for(j in 1:n){ if(i >= j) next() output[i, j] <- FUN( x[i,], x[j,] ) } } return(output) } I realize that the output of the pairwise operation needs to be scalar. Here is an example. The actual function and dataset I want to use is more complicated and thus the function runs slow for large datasets. m <- iris[ 1:5, 1:4 ] pairwise.apply(m, sum) 12345 1 NA 19.7 19.6 19.6 20.4 2 NA NA 18.9 18.9 19.7 3 NA NA NA 18.8 19.6 4 NA NA NA NA 19.6 5 NA NA NA NA NA Can I use apply() or any of it's family to optimize the codes? I have tried playing around with outer, kronecker, mapply without any sucess. Any suggestions? Thank you. Regards, Adai __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Maintaining repeated ID numbers when transposing with reshape
There might be a more elegant way of doing this but here is a way of doing it without reshape(). df <- data.frame( ID=c(1,1,1,1,2,2), TEST=c("A","A","B","C","B","B"), RESULT=c(17,12,15,12,8,9) ) df.s <- split( df, df$ID ) out <- sapply( df.s, function(m) tapply( m$RESULT, m$TEST, paste, collapse="," ) ) t(out) A B C 1 "17,12" "15" "12" 2 NA "8,9" NA Not the same output as you wanted. This makes more sense unless you have a reason to priotize 17 instead of 12 in the first row. Regards, Adai jcarmichael wrote: I have a dataset in "long" format that looks something like this: ID TESTRESULT 1 A 17 1 A 12 1 B 15 1 C 12 2 B 8 2 B 9 Now what I would like to do is transpose it like so: IDTEST ATEST BTEST C 1 17 15 12 1 12.. 2 . 8. 2 . 9. When I try: reshape(mydata, v.names="result", idvar="id",timevar="test", direction="wide") It gives me only the first occurrence of each test for each subject. How can I transpose my dataset in this way without losing information about repeated tests? Any help or guidance would be appreciated! Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] howto optimize operations between pairs of rows in a single matrix like cor and pairs
Hi, I calculating the output of a function when applied to pairs of row from a single matrix or dataframe similar to how cor() and pairs() work. This is the code that I have been using: pairwise.apply <- function(x, FUN, ...){ n <- nrow(x) r <- rownames(x) output <- matrix(NA, nc=n, nr=n, dimnames=list(r, r)) for(i in 1:n){ for(j in 1:n){ if(i >= j) next() output[i, j] <- FUN( x[i,], x[j,] ) } } return(output) } I realize that the output of the pairwise operation needs to be scalar. Here is an example. The actual function and dataset I want to use is more complicated and thus the function runs slow for large datasets. m <- iris[ 1:5, 1:4 ] pairwise.apply(m, sum) 12345 1 NA 19.7 19.6 19.6 20.4 2 NA NA 18.9 18.9 19.7 3 NA NA NA 18.8 19.6 4 NA NA NA NA 19.6 5 NA NA NA NA NA Can I use apply() or any of it's family to optimize the codes? I have tried playing around with outer, kronecker, mapply without any sucess. Any suggestions? Thank you. Regards, Adai __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.