Re: [R] removing outlier
Hi Jim, thank you for your help. :) My point is, that there are outlier and I don´t really know how to deal with that. I need the dataframe for a regression and read often that only a few outlier can change your results very much. In addition, regression diacnostics didn´t indcate me the best results. Yes, and I know its not the core of statistics to work in a way you get results you would like to have ;). So what is your suggestion? And if I remove the outliers, my problem ist, that as you said, they differ in length. I need the data frame for a regression, so can I remove the whole column or is there a call to exclude the data? JULI -- View this message in context: http://r.789695.n4.nabble.com/removing-outlier-tp4712137p4712170.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] removing outlier
Hey, i want to remove outliers so I tried do do this: # 1 define mean and sd sd.AT_ZU_SPAET <- sd(AT_ZU_SPAET) mitt.AT_ZU_SPAET <- mean(AT_ZU_SPAET) # sd.Anzahl_BAF <- sd(Anzahl_BAF) mitt.Anzahl_BAF <- mean(Anzahl_BAF) # sd.Änderungsintervall <- sd(Änderungsintervall) mitt.Änderungsintervall <- mean(Änderungsintervall) # # 2 identify outliers DA[ abs(AT_ZU_SPAET - mitt.AT_ZU_SPAET) > ( 3 * sd.AT_ZU_SPAET) , ] DA[ abs(Anzahl_BAF - mitt.Anzahl_BAF) > ( 3 * sd.Anzahl_BAF) , ] DA[ abs(Änderungsintervall - mitt.Änderungsintervall) > ( 3 * sd.Änderungsintervall) , ] # # 3 remove outliers AT_ZU_SPAET.clean <- DA[ (abs(AT_ZU_SPAET - mitt.AT_ZU_SPAET) < (3*sd.AT_ZU_SPAET)), ] Anzahl_BAF.clean <- DA[ (abs(Anzahl_BAF - mitt.Anzahl_BAF) < (3*sd.Anzahl_BAF)), ] Änderungsintervall.clean <- DA[ (abs(Änderungsintervall - mitt.Änderungsintervall) < (3*sd.Änderungsintervall)), ] My problem ist, that I am only able to remove the outliers of one column of my table, but I want to remove the outliers of every column of the table. Could anybody help me? -- View this message in context: http://r.789695.n4.nabble.com/removing-outlier-tp4712137.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] removing blanks from a string
Hi Is there a way to remove blank characters from the end of strings in a vector? Something like the =TRIM functions of the OpenOffice spreadsheet. E.g., a - c(hola, Yes , hello )# I'd like to get: c(hola, Yes, hello) Thanks Juli -- http://www.ceam.es/pausas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] write.table a df with specific column order
Hi I'd like to write.table a dataframe, but with an specific order of columns. Is there a direct way to do it? or I have to generate a new dataframe as follows: t - data.frame(c=1:10, b=11:20, a=letters[1:10]) t2 - data.frame(a=t$a, b=t$b, c=t$c) write.table(t2, row.names=F) Thanks for any comment Juli -- http://www.ceam.es/pausas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] envelope line from a cloud of points
Hi, Is there a way in R to plot an envelope line from a cloud of points (x, y data) ? That is, a smooth line that include all points, where the points do not follow a strait linear pattern. Could somebody redirect me to some package or function for this? Thanks in advance. Juli -- http://www.ceam.es/pausas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] glm binomial with no successes
Dear all, I have a question on glm, family binomial. I do not see significant differences between the levels of a factor (treatment) if all data for a level is 0; and replacing a 0 for a 1 (in fact reducing the difference), then I detect the significant difference that I expected. Is there a way to overcome this problem? or this is an expected behaviour ? Here is an example: s - c(2,4,4,5,0,0,0,0) f - c(31,28,28,28,32,37,34,35) tr - gl(2, 4) sf - cbind(s,f) # numbers of successes and failures summary(glm(sf ~ tr, family=binomial)) # tr ns sf[8,1] - 1 summary(glm(sf ~ tr, family=binomial)) # tr significative ** Thanks for any suggestion Juli -- http://www.ceam.es/pausas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glm binomial with no successes
Thank you very much for your reply. Then I understand that would not be correct to perform the test in summary for testing the significance of the different levels of a factor in relation to the first level, including when there are more than 2 levels, as in my real case; at least for binomial regressions. So here a more close-to-real example, with a 3-level factor s - c(rpois(8, 4), rep(0, 4)) f - rpois(12, 30) tr - gl(3, 4) sf - cbind(s,f) drop1(glm(sf ~ tr, family=binomial), test=Chisq) # significant summary(glm(sf ~ tr, family=binomial)) # the 3rd level is not significant from the 1st So I understand that I need to explite the data and perform the two tests separately: drop1(glm(sf ~ tr, family=binomial, subset=(tr %in% c(1, 2))), test=Chisq) # ns as expected drop1(glm(sf ~ tr, family=binomial, subset=(tr %in% c(1, 3))), test=Chisq) # significant, as expected Is this the correct approach? Many thanks Juli On Wed, Feb 27, 2008 at 12:13 PM, Prof Brian Ripley [EMAIL PROTECTED] wrote: On Wed, 27 Feb 2008, juli pausas wrote: Dear all, I have a question on glm, family binomial. I do not see significant differences between the levels of a factor (treatment) if all data for a level is 0; and replacing a 0 for a 1 (in fact reducing the difference), then I detect the significant difference that I expected. This is because you are using the wrong test, one with negligible power. See MASS4 pp.197-8 -- you need to use the LRT, as in drop1(glm(sf ~ tr, family=binomial), test=Chisq) Single term deletions Model: sf ~ tr Df DevianceAICLRT Pr(Chi) none 1.595 17.730 tr 1 24.244 38.379 22.649 1.944e-06 (and in your example you can replace 'drop1' by 'anova'). Is there a way to overcome this problem? or this is an expected behaviour ? Here is an example: s - c(2,4,4,5,0,0,0,0) f - c(31,28,28,28,32,37,34,35) tr - gl(2, 4) sf - cbind(s,f) # numbers of successes and failures summary(glm(sf ~ tr, family=binomial)) # tr ns sf[8,1] - 1 summary(glm(sf ~ tr, family=binomial)) # tr significative ** Thanks for any suggestion Juli -- http://www.ceam.es/pausas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 -- http://www.ceam.es/pausas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] reshape
Dear colleagues, I'd like to reshape a datafame in a long format to a wide format, but I do not quite get what I want. Here is an example of the data I've have (dat): sp - c(a, a, a, a, b, b, b, c, d, d, d, d) tr - c(A, B, B, C, A, B, C, A, A, B, C, C) code - c(a1, a2, a2, a3, a3, a3, a4, a4, a4, a5, a5, a6) dat - data.frame(id=1:12, sp=sp, tr=tr, val=31:42, code=code) and below is what I'd like to obtain. That is, I'd like the tr variable in different columns (as a timevar) with their value (val). sp code tr.A tr.B tr.C aa1 31NANA aa2 NA32NA aa2 NA33NA** aa3 NANA34 ba3 3536NA ba4 NANA37 ca4 38NANA da4 39NANA da5 NA4041 da6 NANA42 Using reshape: reshape(dat[,2:5], direction=wide, timevar=tr, idvar=c(code,sp )) I'm getting very close. The only difference is in the 3rd row (**), that is when sp and code are the same I only get one record. Is there a way to get all records? Any idea? Thank you very much for any help Juli Pausas -- http://www.ceam.es/pausas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] repeated measures - aov, lme, lmer - help
Dear all, I'm not very sure on the use of repeated measures in R, so some advice would be very appreciate. Here is a simple example similar to my real problem (R 2.6.0 for windows): Lets supose I have annual tree production measured in 9 trees during 3 years; the 9 trees are located in 3 different mountains (sites), and each tree receive different annual rainfall (different locations). I would like to known the parameters that explain the variability in production. The data would be something like: set.seed(111) mydat - data.frame(tree= factor(rep(1:9,3)), year= gl(3,9,lab=2001:2003, orde=T), site= gl(3,3,27,lab=c(A,B,C)), rain= c(rnorm(9, 100), rnorm(9, 200),rnorm(9, 300)), prod= 51:77+rnorm(27, 1), pr01= rbinom(27, 1, 0.5)) mydat # see for instance interaction.plot(mydat$year, mydat$site, mydat$prod) #My first attempt was to use aov: summary(aov(prod ~ rain + year + site + Error(tree), data=mydat)) # # Error: tree # Df Sum Sq Mean Sq F value Pr(F) # rain 1 36.814 36.814 6.4423 0.05201 . # site 2 112.588 56.294 9.8513 0.01843 * # Residuals 5 28.572 5.714 # --- # Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 # # Error: Within # Df Sum Sq Mean Sq F valuePr(F) # rain 1 1688.50 1688.50 1142.6437 1.422e-15 *** # year 2 12.746.374.3103 0.03319 * # Residuals 15 22.171.48 # # The results seems OK to me, # Is there a way to get predictions from the model? (as in predict.lm) # And to get the explained variance? # Using the lme (which allows to predict), I suppose the same model would be (is it right?): library(nlme) res - lme(prod ~ rain + year + site, data=mydat, random= ~ 1 | tree); anova(res) # numDF denDF F-value p-value # (Intercept) 115 23027.700 .0001 # rain115 1144.461 .0001 # year215 2.288 0.1358 # site2 617.267 0.0032 # The rain, varies within tree and between trees (in time), thus the aov give me the significance of each part (less significant Between than Within) # Does lme give me Within only? or it include both Between and Within? Year was significant in the aov and not in the lme # I also want to test a binary (binomial) variable (no production vs production), so I guess I should use lmer # First the same model as above but with lmer: library(lme4) res2 - lmer(prod ~ rain + year + site + (1|tree), data=mydat); anova(res2) res2 # Is this the correct way to fit the model above? I'm unsure; results are slightly different. # I understand that there is no predict for lmer models. anova does not give me the significance for lmer modes, so I guess I should enter the variables by steps and compare models with anova(m1, m2, ...) # and now the binary data. Is this correct? res3 - lmer(pr01 ~ rain + year + site + (1|tree), data=mydat, family=binomial); anova(res3) res3 Many thanks for any comments Juli -- http://www.ceam.es/pausas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.