[R] Introduction to R (in french)
Hi all, I recently put a new version of my french introduction to R online. It is more specifically targeted at social sciences students and researchers, but could be interesting for beginners who are not really familiar with statistics and coding. The document is available (in french) in PDF, as well as the Sweave source code, from the following page : http://alea.fr.eu.org/j/intro_R.html Someone advised me to submit it to CRAN, for the contributed documentation section, but I don't know who I must contact for that. Should I just send a mail to c...@r-project.org ? Anyway, I would be happy to receive any feedback on the document. Sincerely, Julien -- Julien Barnier Groupe de recherche sur la socialisation ENS-LSH - Lyon, France __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sliding window over a large vector
Hi: Veslot: I'm too tired to even try to figure out why but I think that there is something wrong with your sl function. see below for an empirical proof of that statement. OR maybe you're definition of sliding window is different than rollapply's definition but rollapply's answer makes more sense to me ? Output set.seed(1) x - rbinom(24, 1, 0.5) print(x) [1] 0 0 1 1 0 1 1 1 1 0 0 0 1 0 1 0 1 1 0 1 1 0 1 0 xx1 - sl(x,3) print(xx1) [1] 1 1 2 2 1 2 2 2 2 1 1 1 2 1 2 1 2 2 1 2 2 temp - zoo(x) ans-rollapply(temp,3,sum) print(ans) 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 1 2 2 2 2 3 3 2 1 0 1 1 2 1 2 2 2 2 2 2 2 1 On Tue, Dec 16, 2008 at 3:47 AM, Veslot Jacques wrote: sl - function(x,z) c(0,cumsum(diff(x)[1:(length(x)-z-1)])) + rep(sum(x[1:z]),length(x)-z) x - rbinom(10, 1, 0.5) system.time(xx1 - slide(x,12)) utilisateur système écoulé 36.860.45 37.32 system.time(xx2 - sl(x,12)) utilisateur système écoulé0.010.00 0.02 all.equal(xx1,xx2) [1] TRUE Jacques VESLOT CEMAGREF - UR Hydrobiologie Route de Cézanne - CS 40061 13182 AIX-EN-PROVENCE Cedex 5, France Tél. + 0033 04 42 66 99 76 fax+ 0033 04 42 66 99 34 email jacques.ves...@cemagref.fr -Message d'origine- De : r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] De la part de Chris Oldmeadow Envoyé : mardi 16 décembre 2008 05:20 À : r-help@r-project.org Objet : [R] sliding window over a large vector Hi all, I have a very large binary vector, I wish to calculate the number of 1's over sliding windows. this is my very slow function slide-function(seq,window){ n-length(seq)-window tot-c() tot[1]-sum(seq[1:window]) for (i in 2:n) { tot[i]- tot[i-1]-seq[i-1]+seq[i] } return(tot) } this works well for for reasonably sized vectors. Does anybody know a way for large vectors ( length=12 million), im trying to avoid using C. Thanks, Chris __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sliding window over a large vector
For this particular proble (counting), doesn't cumsum solve it effectively and efficiently? vv - cumsum(v) vv[n:length(vv)] - vv[1:(length(vv)-n+1] Of course, this doesn't work for the general case of an arbitrary sliding window function. -s On 12/15/08, Chris Oldmeadow c.oldmea...@student.qut.edu.au wrote: Hi all, I have a very large binary vector, I wish to calculate the number of 1's over sliding windows. this is my very slow function slide-function(seq,window){ n-length(seq)-window tot-c() tot[1]-sum(seq[1:window]) for (i in 2:n) { tot[i]- tot[i-1]-seq[i-1]+seq[i] } return(tot) } this works well for for reasonably sized vectors. Does anybody know a way for large vectors ( length=12 million), im trying to avoid using C. Thanks, Chris __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Sent from my mobile device __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] convert opengis wkt to geometry?
Jeff Hamann jeff.hamann at forestinformatics.com writes: After writing some code (stupidly without checking to see if there was code to do this already) to generate PostGIS SQL insert statements for simple geometry (wkt), I didn't check see if there is already something available to convert WKT strings into some R package geometry (sp?). Does anyone have any advice, hints, code (?) for converting the following OpenGIS strings into something useful in R: If you are thinking of PostGIS, then readOGR() in rgdal will read them if build with the necessary driver(s). On Linux and/or OSX, the user would configure OGR to choose the external headers and libraries. On Windows, the user might choose to install rgdal from source against the FWTools DLLs, which are at: http://fwtools.maptools.org/ The PostGIS driver is described here: http://www.gdal.org/ogr/drv_pg.html There are some notes on using the driver with rgdal here: http://wiki.intamap.org/index.php/PostGIS POINT, MULTIPOINT, LINESTRING, MULTILINESTRING, POLYGON, MULTIPOLYGON, GEOMETRYCOLLECTION In principle, only POINT, LINESTRING, and POLYGON (maybe MULTIPOLYGON, but handled like a shapefile, that is flattened) are supported in sp/rgdal. Please consider following up on R-sig-geo; there are possibly more eyes with relevant experience there. If the OpenGIS strings could rather be treated as GML, OGR has a driver for that too. Roger Bivand Jeff. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Soundex codes
Dear List: Has anyone done any work developing functions for producing Soundex codes in R? RSiteSearch('soundex') did not yield any results or did my google searches. Harold __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sliding window over a large vector
if you want the speed, you can simply build an fts time series from it, then apply the moving.sum function and throw away the dates. this will probably be the fastest implementation of rolling applies out there unless you do a cumsum difference function. I got a sample timing of 2 seconds on 12m length vector (see botttom of email). library(fts) your.data - c(0,1,1,0,1,1,1,0,0,0,0,1,1,1,1) ## dates generated automatically fake.fts - fts(data=your.data) answer.fts - moving.sum(fake.fts,10) ## throw away dates answer.as.vector - as.numeric(answer.fts) my timing: library(fts) big.fts - fts(data=rep(1,1200)) system.time(ans.fts - moving.sum(big.fts,20)) user system elapsed 1.970 0.081 2.051 nrow(big.fts) [1] 1200 nrow(ans.fts) [1] 1181 -Whit On Tue, Dec 16, 2008 at 9:12 AM, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Tue, Dec 16, 2008 at 8:23 AM, Gabor Grothendieck ggrothendi...@gmail.com wrote: There seems to be something wrong: slide(c(1, 1, 0, 1), 2) [1] 2 2 but the output should be c(2, 1, 2) That should be c(2, 1, 1) At any rate try this: library(zoo) 3 * rollmean(x, 3) On Mon, Dec 15, 2008 at 11:19 PM, Chris Oldmeadow c.oldmea...@student.qut.edu.au wrote: Hi all, I have a very large binary vector, I wish to calculate the number of 1's over sliding windows. this is my very slow function slide-function(seq,window){ n-length(seq)-window tot-c() tot[1]-sum(seq[1:window])for (i in 2:n) { tot[i]- tot[i-1]-seq[i-1]+seq[i] } return(tot) } this works well for for reasonably sized vectors. Does anybody know a way for large vectors ( length=12 million), im trying to avoid using C. Thanks, Chris __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OT: (quasi-?) separation in a logistic GLM
sorry for reposting. Some code was missing in my previous email... -- Dear Gavin glm reported exactly what it noticed, giving a warning that some very small fitted probabilities have been found. However, your data are **not** quasi-separated. The maximum likelihood estimates are really those reported by glm. A first elementary way is to change the tolerance and maximum number of iterations in glm and see if you get the same result: # mod1 - glm(formula = analogs ~ Dij, family = binomial, data = dat, control = glm.control(epsilon = 1e-16, maxit = 1000)) mod1 Call: glm(formula = analogs ~ Dij, family = binomial, data = dat, control = glm.control(epsilon = 1e-16, maxit = 1000)) Coefficients: (Intercept) Dij 4.191 -29.388 Degrees of Freedom: 4033 Total (i.e. Null); 4032 Residual Null Deviance: 1929 Residual Deviance: 613.5AIC: 617.5 # This is exactly the same fit as the one you have. If separation occured the effects ususally diverge as we allow more iterations to glm and at some point. ** Secondly an inspection of the estimated asymptotic standard errors, reveals nothing to worry for. # summary(mod1) Call: glm(formula = analogs ~ Dij, family = binomial, data = dat, control = glm.control(epsilon = 1e-16, maxit = 1000)) Deviance Residuals: Min 1Q Median 3Q Max -1.676e+00 -1.319e-02 -1.250e-04 -1.958e-06 4.104e+00 Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) 4.1912 0.3248 12.90 2e-16 *** Dij -29.3875 1.9345 -15.19 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 1928.62 on 4033 degrees of freedom Residual deviance: 613.53 on 4032 degrees of freedom AIC: 617.53 Number of Fisher Scoring iterations: 11 # If separation occurred the estimated asymptotic standard errors would be unnaturally large. This is because, in the case of separation (quasi or not) glm would calculate the standard errors taking the sqrt of the diagonal elements of minus the hessian of the log-likelihood, in a point where the log-likelihood appears to be flat for the given tolerance. ** To be certain, you could also try fitting with brglm, which is guaranteed to give finite estimates, that have bias of smaller order than the MLE and compare the results. # library(brglm) mod.br - brglm(analogs ~ Dij, data = dat, family = binomial) mod.br Call: brglm(formula = analogs ~ Dij, family = binomial, data = dat) Coefficients: (Intercept) Dij 4.161 -29.188 Degrees of Freedom: 4033 Total (i.e. Null); 4032 Residual Deviance: 613.5448 Penalized Deviance: 610.2794AIC: 617.5448 # The estimates are similar a bit shrunk towards the origin which is natural for bias removal. If separation occurred, and given the previous discussion, the bias-reduced estimates would be considerably different than the estimates that glm reports. ** Lastly, the more certain way to check for separation is to inspect the profiles of the log-likelihood. Vito suggested this but the chosen limits for the xval are not appropriate. If separation would occur the estimate would be -Inf so that the profiling as done in his email should be done starting from example from -40 rather than -20. This would reveal that the profile deviance starts increasing again, while if separation occured there would be an asymptote on the left. Below I give the correct profiles, as reported by profileModel. library(profileModel) pp - profileModel(mod1, quantile = qchisq(0.95, 1), objective = ordinaryDeviance) Preliminary iteration .. Done Profiling for parameter (Intercept) ... Done Profiling for parameter Dij ... Done plot(pp) The profiles are quite quadratic. In the case of separation you would have seen asymptotes on the left or on the right (see help(profileModel) for an example). ** It appears that the fitted logistic curve, while steep still has a finite gradient, for example, at the LD50 point library(MASS) dose.p(mod) Dose SE p = 0.5: 0.1426167 0.003646903 When separation occurs the LD50 point cannot be identified (computer software would return something with enormous estimated standard error). In conclusion, if you get data sets that result in large estimated effects on the log-odds scale, the above checks can be used to convince you whether separation occurred or not. If there is separation (not the case in the current example) then, you could use an alternative to maximum likelihood for estimation ---such as penalized maximum likelihood in brglm--- which always return finite estimates. Though in that case, I suggest you incorporate the uncertainty on how large the estimated
[R] Prediction intervals for zero inflated Poisson regression
Dear all, I'm using zeroinfl() from the pscl-package for zero inflated Poisson regression. I would like to calculate (aproximate) prediction intervals for the fitted values. The package itself does not provide them. Can this be calculated analyticaly? Or do I have to use bootstrap? What I tried until now is to use bootstrap to estimate these intervals. Any comments on the code are welcome. The data and the model are based on the examples in zeroinfl(). #aproximate prediction intervals with Poisson regression fm_pois - glm(art ~ fem, data = bioChemists, family = poisson) newdata - na.omit(unique(bioChemists[, fem, drop = FALSE])) prediction - predict(fm_pois, newdata = newdata, se.fit = TRUE) ci - data.frame(exp(prediction$fit + matrix(prediction$se.fit, ncol = 1) %*% c(-1.96, 1.96))) newdata$fit - exp(prediction$fit) newdata - cbind(newdata, ci) newdata$model - Poisson library(pscl) #aproximate prediction intervals with zero inflated poisson regression fm_zip - zeroinfl(art ~ fem | 1, data = bioChemists) fit - predict(fm_zip) Pearson - resid(fm_zip, type = pearson) VarComp - resid(fm_zip, type = response) / Pearson fem - bioChemists$fem bootstrap - replicate(999, { yStar - pmax(round(fit + sample(Pearson) * VarComp, 0), 0) predict(zeroinfl(yStar ~ fem | 1), newdata = newdata) }) newdata0 - newdata newdata0$fit - predict(fm_zip, newdata = newdata, type = response) newdata0[, 3:4] - t(apply(bootstrap, 1, quantile, c(0.025, 0.975))) newdata0$model - Zero inflated #compare the intervals in a nice plot. newdata - rbind(newdata, newdata0) library(ggplot2) ggplot(newdata, aes(x = fem, y = fit, min = X1, max = X2, colour = model)) + geom_point(position = position_dodge(width = 0.4)) + geom_errorbar(position = position_dodge(width = 0.4)) Best regards, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] structure Arrays
Hi Does anyone know how I can use structured arrays in r similar to a dataframe in matlab [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Find all numbers in a certain interval
Hi, sorry, but it shouldn't be different. The result should be the same but I was looking if there is a method I can use... # having a function defined like baptiste proposed: isIn - function (interval, x) { (x min(interval)) (x max(interval)) } #-- a - rnorm(100) # it's simply more human readable if I can write which( isIn( c(-0.5, 0.5), a) ) # instead of which( a -0.5 a 0.5 ) Thanks to baptiste! So there is no method available doing this and I have to define this by myself. That's all I wanted to know :-) Antje markle...@verizon.net schrieb: hi: could you explain EXACTLY what you want to do with the dataframe because it shouldn't be that different ? On Tue, Dec 16, 2008 at 5:09 AM, Antje wrote: Hi all, I'd like to know, if I can solve this with a shorter command: a - rnorm(100) which(a -0.5 a 0.5) # would give me all indices of numbers greater than -0.5 and smaller than +0.5 I have something similar with a dataframe and it produces sometimes quite long commands... I'd like to have something like: which(within.interval(a, -0.5, 0.5)) Is there anything I could use for this purpose? Antje __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sorting a date vector
You cannot keep them as strings and still get the benefits of working with date-class objects. You should read more documentation regarding dates. The as.Date function turns strings into a form that is stored internally as number of days since some reference date and what you are seeing is the default display format, %Y-%m-%d. Learn how to use the output formats so that you see what you desire. ?as.Date ?Dates ?format.Date -- David Winsemius On Dec 16, 2008, at 8:24 AM, RON70 wrote: Yes you are right. However using that code, format of date is altered. I need to main same format as the input data i.e. 10-02-2008 not 2008-10-02, still having date-class. Any better idea? David Winsemius wrote: You might want to look at your date format more closely. Both the separator and the year format specs fail to match your input. as.Date(10-02-2008, format = %m/%d/%y) [1] NA as.Date(10-02-2008, format = %m-%d-%Y) [1] 2008-10-02 -- David Winsemius On Dec 16, 2008, at 7:54 AM, RON70 wrote: I have a date-like-vector like : date_file 10-02-2008 10-03-2008 10-06-2008 10-07-2008 10-09-2008 10-10-2008 10-13-2008 10-14-2008 10-15-2008 10-16-2008 10-17-2008 10-20-2008 10-21-2008 10-22-2008 10-23-2008 10-24-2008 10-28-2008 10-29-2008 10-30-2008 10-31-2008 11-03-2008 11-04-2008 11-05-2008 11-06-2008 11-07-2008 11-10-2008 11-11-2008 11-12-2008 11-13-2008 11-14-2008 11-17-2008 11-18-2008 11-19-2008 11-20-2008 11-21-2008 11-24-2008 11-25-2008 11-26-2008 11-28-2008 12-01-2008 12-02-2008 12-03-2008 12-04-2008 12-05-2008 12-08-2008 12-09-2008 12-10-2008 12-11-2008 12-12-2008 12-15-2008 4-18-2008 4-21-2008 4-22-2008 4-23-2008 4-24-2008 4-28-2008 4-29-2008 5-01-2008 5-05-2008 5-06-2008 5-07-2008 5-09-2008 5-12-2008 5-13-2008 5-14-2008 5-15-2008 5-16-2008 5-19-2008 5-20-2008 5-21-2008 5-22-2008 5-23-2008 5-27-2008 5-28-2008 5-29-2008 5-30-2008 6-02-2008 6-03-2008 6-05-2008 6-06-2008 6-09-2008 6-10-2008 6-11-2008 6-12-2008 6-13-2008 6-17-2008 6-18-2008 6-19-2008 6-20-2008 6-23-2008 6-24-2008 6-25-2008 6-26-2008 6-27-2008 7-01-2008 7-02-2008 7-04-2008 7-07-2008 7-08-2008 7-09-2008 7-10-2008 7-11-2008 7-15-2008 7-16-2008 7-18-2008 7-21-2008 7-22-2008 7-23-2008 7-24-2008 7-25-2008 7-28-2008 7-30-2008 7-31-2008 8-01-2008 8-04-2008 8-05-2008 8-06-2008 8-07-2008 8-08-2008 8-11-2008 8-12-2008 8-13-2008 8-15-2008 8-18-2008 8-19-2008 8-20-2008 8-21-2008 8-22-2008 8-25-2008 8-26-2008 8-27-2008 8-28-2008 8-29-2008 9-03-2008 9-04-2008 9-05-2008 9-08-2008 9-09-2008 9-10-2008 9-11-2008 9-12-2008 9-15-2008 9-16-2008 9-17-2008 9-18-2008 9-19-2008 9-22-2008 9-23-2008 9-24-2008 9-25-2008 9-26-2008 9-29-2008 9-30-2008 I wanted to sort this in ascending order. I tried using simply sort() function, without altering the format of date, but it didnot work. Next I tried to convert that vector in a date-class vector so that, I could sort them but in vein :( I used : as.Date(date_file, format=%m/%d/%y) However it did not work. Can anyone please tell me what would be correct approach? -- View this message in context: http://www.nabble.com/Sorting-a-date-vector-tp21032540p21032540.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Sorting-a-date-vector-tp21032540p21032997.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] R CMD check on window XP
Hi, there, I used R CMD check to build my ATGGS package under window XP system. My R version is 2.7.2. But I encounter some problems. The log file is like: ** installing R.css in C:/ATGGS.Rcheck -- Making package ATGGS adding build stamp to DESCRIPTION installing R files installing inst files find: `C:/ATGGS.Rcheck/ATGGS/csvscripts': Permission denied make[2]: *** [C:/ATGGS.Rcheck/ATGGS/inst] Error 1 make[1]: *** [all] Error 2 make: *** [pkg-ATGGS] Error 2 Can't read C:/ATGGS.Rcheck/ATGGS/auxData: Invalid argument at c:\R\R-27~1.2/bin/INSTALL line 434 Can't remove directory C:/ATGGS.Rcheck/ATGGS/auxData: Directory not empty at c:\R\R-27~1.2/bin/INSTALL line 434 Can't read C:/ATGGS.Rcheck/ATGGS/csvData: Invalid argument at c:\R\R-27~1.2/bin/INSTALL line 434 Can't remove directory C:/ATGGS.Rcheck/ATGGS/csvData: Directory not empty at c:\R\R-27~1.2/bin/INSTALL line 434 Can't read C:/ATGGS.Rcheck/ATGGS/csvscripts: Invalid argument at c:\R\R-27~1.2/bin/INSTALL line 434 Can't remove directory C:/ATGGS.Rcheck/ATGGS/csvscripts: Directory not empty at c:\R\R-27~1.2/bin/INSTALL line 434 Can't read C:/ATGGS.Rcheck/ATGGS/doc: Invalid argument at c:\R\R-27~1.2/bin/INSTALL line 434 Can't remove directory C:/ATGGS.Rcheck/ATGGS: Directory not empty at c:\R\R-27~1.2/bin/INSTALL line 434 *** Installation of ATGGS failed *** Removing 'C:/ATGGS.Rcheck/ATGGS' I am not able to delete c:/ATGGS.Rcheck until I change the permission of the folder. I'm the admin of C driver. I have full control of all other folders under C driver. Thanks for help. Sue ** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Parameter Estimation - Generalized Extreme Value Distribution
Maithili Shiva wrote: Dear R helpers, How do you estimate the (Location, Scale, Shape) parameters of Generalized Extreme Value distribution using R? ... Package lmom, function pelgev. J. R. M. Hosking __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating a pdf
On Tue, 16 Dec 2008, Sergi M.Garrido wrote: Hi guys, I'm working on a package, and I want to create a new version file pdf. On R 2.6.2 it ran ok with the code: R CMD Rd2dvi.sh --pdf pkg. But it doesn't run on R 2.8.0. What I'm doing wrong? Not telling us what the problem was. But the correct syntax is R CMD Rd2dvi --pdf pkg (and probably was in 2.6.2). If that does not work, show us a session log including all the errors. These are my components: ActivePerl-5.8.8.822-MSWin32-x86-280952 basic-miktex-2.7.2960 htmlhelp MinGW-3.2.0-rc-3 (BTW, looks really old.) Rtools28 Thanks in advance, Sergi M.Garrido [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] socket server, textConnection and readLines
Hello; This is bit long email but hope someone can guide me. I have questions regarding socket, readLines and textConnection. I am not sure if my code is efficient (due to textConnection) and how to handle client disconnect and restart of the socket server in R. I have a huge(3.5+G) text file on machine 'A', which I want to process on machine 'B' using read.table (one line or a chunk at a time). On machine B, I would like to use NWS and multiple R scripts to process each line/chunk. To do this I am running netcat (http://netcat.sourceforge.net/) on macine 'A' and sending data to machine 'B's R socket server. Here is the data that I have on machine 'A' ---data--- RELIANCE,1200.00,03-NOV-2008,09:00:02:286 RELIANCE,1200.20,03-NOV-2008,09:00:02:287 RELIANCE,1200.10,03-NOV-2008,09:00:02:289 RELIANCE,1201.10,03-NOV-2008,09:00:02:310 INFOSYSTCH,1400.00,03-NOV-2008,09:00:02:286 INFOSYSTCH,1400.20,03-NOV-2008,09:00:02:287 INFOSYSTCH,1400.10,03-NOV-2008,09:00:02:289 INFOSYSTCH,1401.10,03-NOV-2008,09:00:02:310 ---end data--- Here is the code that I am using for reading this data on machine 'B'. ---code--- a.connection - socketConnection(host = 'localhost', 1234, server = TRUE, blocking = TRUE, open = r, encoding = getOption(encoding) ) while(1) { line.raw - NULL; line.raw - readLines( a.connection, n = 1, ok = TRUE); tConnection - textConnection(line.raw); line.data - read.table(tConnection); if ( (class(line.data) == 'try-error') || (length(line.data) = 0)) { print (may be client is disconnected! ); break; } # validate line.data and store it using print (line.data); close(tConnection); } ---end code--- Questions: 1) Is there a way to avoid creation and closing of textConnection in above code? How can I directly read a line over socket in R? If I do not explicitly close the connection I get an warning message saying closing unused connection 7 (line.raw). 2) What is the best way to detect that client is disconnected? 3) In C, we can create a socket, bind it but do accept() in side a while loop using select() call but how do I do the same in R. Thanks again for reading such a long email and thanks in advance for your pointers. Thanks and Regards Krishna __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem assigning NA as a level name in a list
I want to generate a list (called dataList below) where each of its levels is named. These names are assigned to nameList, which contains all possible permutations of size two taking letters from a larger alphabet, e.g., aa,...,Fd,..,Z1,... One of these permutations is the character string NA. It seems that when I try to name one of the dataList levels NA, using names(dataList)- nameList, the names() function assigns the missing character to the level. Is there someway to preserve NA as the name of a level in dataList? Here is the R code I have been using to do this. namePerms- permutations(ncol(coinMat),2,colnames(coinMat),repeats=TRUE) nameList - paste(namePerms[,1],namePerms[,2],sep=) dataList - lapply(1:length(nameList), function(level) {}) names(dataList)- nameList ## The NA in nameList is interpreted so that the name NA is missing for one level in dataList I am running R 2.4.1 in the Windows XP environment. Thanks for any help that can be offerred. Cliff Behrens __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sorting a date vector
Yes you are right. However using that code, format of date is altered. I need to main same format as the input data i.e. 10-02-2008 not 2008-10-02, still having date-class. Any better idea? David Winsemius wrote: You might want to look at your date format more closely. Both the separator and the year format specs fail to match your input. as.Date(10-02-2008, format = %m/%d/%y) [1] NA as.Date(10-02-2008, format = %m-%d-%Y) [1] 2008-10-02 -- David Winsemius On Dec 16, 2008, at 7:54 AM, RON70 wrote: I have a date-like-vector like : date_file 10-02-2008 10-03-2008 10-06-2008 10-07-2008 10-09-2008 10-10-2008 10-13-2008 10-14-2008 10-15-2008 10-16-2008 10-17-2008 10-20-2008 10-21-2008 10-22-2008 10-23-2008 10-24-2008 10-28-2008 10-29-2008 10-30-2008 10-31-2008 11-03-2008 11-04-2008 11-05-2008 11-06-2008 11-07-2008 11-10-2008 11-11-2008 11-12-2008 11-13-2008 11-14-2008 11-17-2008 11-18-2008 11-19-2008 11-20-2008 11-21-2008 11-24-2008 11-25-2008 11-26-2008 11-28-2008 12-01-2008 12-02-2008 12-03-2008 12-04-2008 12-05-2008 12-08-2008 12-09-2008 12-10-2008 12-11-2008 12-12-2008 12-15-2008 4-18-2008 4-21-2008 4-22-2008 4-23-2008 4-24-2008 4-28-2008 4-29-2008 5-01-2008 5-05-2008 5-06-2008 5-07-2008 5-09-2008 5-12-2008 5-13-2008 5-14-2008 5-15-2008 5-16-2008 5-19-2008 5-20-2008 5-21-2008 5-22-2008 5-23-2008 5-27-2008 5-28-2008 5-29-2008 5-30-2008 6-02-2008 6-03-2008 6-05-2008 6-06-2008 6-09-2008 6-10-2008 6-11-2008 6-12-2008 6-13-2008 6-17-2008 6-18-2008 6-19-2008 6-20-2008 6-23-2008 6-24-2008 6-25-2008 6-26-2008 6-27-2008 7-01-2008 7-02-2008 7-04-2008 7-07-2008 7-08-2008 7-09-2008 7-10-2008 7-11-2008 7-15-2008 7-16-2008 7-18-2008 7-21-2008 7-22-2008 7-23-2008 7-24-2008 7-25-2008 7-28-2008 7-30-2008 7-31-2008 8-01-2008 8-04-2008 8-05-2008 8-06-2008 8-07-2008 8-08-2008 8-11-2008 8-12-2008 8-13-2008 8-15-2008 8-18-2008 8-19-2008 8-20-2008 8-21-2008 8-22-2008 8-25-2008 8-26-2008 8-27-2008 8-28-2008 8-29-2008 9-03-2008 9-04-2008 9-05-2008 9-08-2008 9-09-2008 9-10-2008 9-11-2008 9-12-2008 9-15-2008 9-16-2008 9-17-2008 9-18-2008 9-19-2008 9-22-2008 9-23-2008 9-24-2008 9-25-2008 9-26-2008 9-29-2008 9-30-2008 I wanted to sort this in ascending order. I tried using simply sort() function, without altering the format of date, but it didnot work. Next I tried to convert that vector in a date-class vector so that, I could sort them but in vein :( I used : as.Date(date_file, format=%m/%d/%y) However it did not work. Can anyone please tell me what would be correct approach? -- View this message in context: http://www.nabble.com/Sorting-a-date-vector-tp21032540p21032540.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Sorting-a-date-vector-tp21032540p21032997.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Parameter Estimation - Generalized Extreme Value Distribution
Dear R helpers, How do you estimate the (Location, Scale, Shape) parameters of Generalized Extreme Value distribution using R? I have tried VGAM but just not able to write the R script. Please advise. With regards Maithili __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem assigning NA as a level name in a list
Cliff Behrens wrote: I want to generate a list (called dataList below) where each of its levels is named. These names are assigned to nameList, which contains all possible permutations of size two taking letters from a larger alphabet, e.g., aa,...,Fd,..,Z1,... One of these permutations is the character string NA. It seems that when I try to name one of the dataList levels NA, using names(dataList)- nameList, the names() function assigns the missing character to the level. Is there someway to preserve NA as the name of a level in dataList? Here is the R code I have been using to do this. namePerms- permutations(ncol(coinMat),2,colnames(coinMat),repeats=TRUE) nameList - paste(namePerms[,1],namePerms[,2],sep=) dataList - lapply(1:length(nameList), function(level) {}) names(dataList)- nameList ## The NA in nameList is interpreted so that the name NA is missing for one level in dataList I am running R 2.4.1 in the Windows XP environment. Thanks for any help that can be offerred. Your example is not reproducible and self-contained. What is permutations and coinMat?? I bet it isn't minimal either. It doesn't seem to be happening for me with a recent(!) version of R, but you could just be misinterpreting the backtick quoting. - O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Find all numbers in a certain interval
Hi all, I'd like to know, if I can solve this with a shorter command: a - rnorm(100) which(a -0.5 a 0.5) # would give me all indices of numbers greater than -0.5 and smaller than +0.5 I have something similar with a dataframe and it produces sometimes quite long commands... I'd like to have something like: which(within.interval(a, -0.5, 0.5)) Is there anything I could use for this purpose? Antje __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem assigning NA as a level name in a list
Quite irritating to me as the Manager of .NA too, when I used NA for .NA :-)-O el Peter Dalgaard wrote: Cliff Behrens wrote: One of these permutations is the character string NA. It seems that when I try to name one of the dataList levels NA, using names(dataList)- nameList, the names() function assigns the missing character to the level. Is there someway to preserve NA as the name of a level in dataList? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem assigning NA as a level name in a list
Peter, OK...here is reproducible, self-contained code: library(gregmisc) columnNames - c(A,B,C,D,N,a,b,c) namePerms- permutations(length(columnNames),2,columnNames,repeats=TRUE) nameList - paste(namePerms[,1],namePerms[,2],sep=) dataList - lapply(1:length(nameList), function(level) {}) names(dataList)- nameList ## The NA is interpreted that the name is missing for one list in dataList If you inspect the contents of dataList, you will find the following showing that the name NA is treated differently: .. $Na NULL $`NA` NULL $Nb NULL . Peter Dalgaard wrote: Cliff Behrens wrote: I want to generate a list (called dataList below) where each of its levels is named. These names are assigned to nameList, which contains all possible permutations of size two taking letters from a larger alphabet, e.g., aa,...,Fd,..,Z1,... One of these permutations is the character string NA. It seems that when I try to name one of the dataList levels NA, using names(dataList)- nameList, the names() function assigns the missing character to the level. Is there someway to preserve NA as the name of a level in dataList? Here is the R code I have been using to do this. namePerms- permutations(ncol(coinMat),2,colnames(coinMat),repeats=TRUE) nameList - paste(namePerms[,1],namePerms[,2],sep=) dataList - lapply(1:length(nameList), function(level) {}) names(dataList)- nameList ## The NA in nameList is interpreted so that the name NA is missing for one level in dataList I am running R 2.4.1 in the Windows XP environment. Thanks for any help that can be offerred. Your example is not reproducible and self-contained. What is permutations and coinMat?? I bet it isn't minimal either. It doesn't seem to be happening for me with a recent(!) version of R, but you could just be misinterpreting the backtick quoting. - O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sliding window over a large vector
the function works for me s-rbinom(1000,1,0.5) t-slide(s,50) just too slow. Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] pwr.prop.test and continuity correction
Daniel Brewer wrote: Hi, I am trying to sort out a discrepancy between power calculations results between me and another statistician. I use R but I am not sure what she uses. It is on the proportions test and so I have been using pwr.prop.test. I think I have tracked the problem down to pwr.prop.test not using the continuity correction for the test (I did this by using the java applet from http://stat.ethz.ch/R-manual/R-patched/library/stats/html/power.prop.test.html). So I was wondering whether: 1) Someone could confirm that pwr.prop.test does not use a continuity correction in its calculation. 2) Someone could tell me either how to use pwr.prop.test or another function to get the power of a prop.test with continuity correction. The reason I want this is that I would normally apply the correction when I actually used the test. Many thanks Dan power.prop.test (sic) is relying heavily on asymptotic normality, as do similar formulas. It doesn't use continuity correction, but if you're working with such small group sizes, I suspect that the correction term is the least of your worries and that direct simulation would be better. (Another source of discrepancy, sometimes seen in textbooks, is that authors use the null variance of p1-p2 also under the alternative. This simplifies the formulas considerably, but it does assume that the actual difference is rather small.) R is Open Source. If you want a correction term, it is just a matter of figuring out where to modify expressions like p.body - quote(pnorm(((sqrt(n) * abs(p1 - p2) - (qnorm(sig.level/tside, lower.tail = FALSE) * sqrt((p1 + p2) * (1 - (p1 + p2)/2/sqrt(p1 * (1 - p1) + p2 * (1 - p2) by adding or subtracting 0.5 or 0.5/n in the appropriate places. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] pwr.prop.test and continuity correction
Peter Dalgaard wrote: Daniel Brewer wrote: Hi, I am trying to sort out a discrepancy between power calculations results between me and another statistician. I use R but I am not sure what she uses. It is on the proportions test and so I have been using pwr.prop.test. I think I have tracked the problem down to pwr.prop.test not using the continuity correction for the test (I did this by using the java applet from http://stat.ethz.ch/R-manual/R-patched/library/stats/html/power.prop.test.html). So I was wondering whether: 1) Someone could confirm that pwr.prop.test does not use a continuity correction in its calculation. 2) Someone could tell me either how to use pwr.prop.test or another function to get the power of a prop.test with continuity correction. The reason I want this is that I would normally apply the correction when I actually used the test. Many thanks Dan power.prop.test (sic) is relying heavily on asymptotic normality, as do similar formulas. It doesn't use continuity correction, but if you're working with such small group sizes, I suspect that the correction term is the least of your worries and that direct simulation would be better. (Another source of discrepancy, sometimes seen in textbooks, is that authors use the null variance of p1-p2 also under the alternative. This simplifies the formulas considerably, but it does assume that the actual difference is rather small.) R is Open Source. If you want a correction term, it is just a matter of figuring out where to modify expressions like p.body - quote(pnorm(((sqrt(n) * abs(p1 - p2) - (qnorm(sig.level/tside, lower.tail = FALSE) * sqrt((p1 + p2) * (1 - (p1 + p2)/2/sqrt(p1 * (1 - p1) + p2 * (1 - p2) by adding or subtracting 0.5 or 0.5/n in the appropriate places. In addition to what Peter said, the continuity correction is in effect an attempt to make the proportion test behave like Fisher's exact test which is known to be conservative. We don't usually desire P-values that are too large, so I don't recommend the continuity correction. See the bpower.sim function in the Hmisc package for a simulation-based method, and the reference below. Frank @Article{cra08how, author = {Crans, Gerald G. and Shuster, Jonathan J.}, title = {How conservative is {Fisher's} exact test? {A} quantitative evaluation of the two-sample comparative binomial trial}, journal = Stat in Med, year = 2008, volume = 27, pages ={3598-3611}, annote = {Fisher's exact test; $2\times 2$ contingency table;size of test; comparative binomial experiment;first paper to truly quantify the conservativeness of Fisher's test;``the test size of FET was less than 0.035 for nearly all sample sizes before 50 and did not approach 0.05 even for sample sizes over 100.'';conservativeness of ``exact'' methods} } -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sorting a date vector
I have a date-like-vector like : date_file 10-02-2008 10-03-2008 10-06-2008 10-07-2008 10-09-2008 10-10-2008 10-13-2008 10-14-2008 10-15-2008 10-16-2008 10-17-2008 10-20-2008 10-21-2008 10-22-2008 10-23-2008 10-24-2008 10-28-2008 10-29-2008 10-30-2008 10-31-2008 11-03-2008 11-04-2008 11-05-2008 11-06-2008 11-07-2008 11-10-2008 11-11-2008 11-12-2008 11-13-2008 11-14-2008 11-17-2008 11-18-2008 11-19-2008 11-20-2008 11-21-2008 11-24-2008 11-25-2008 11-26-2008 11-28-2008 12-01-2008 12-02-2008 12-03-2008 12-04-2008 12-05-2008 12-08-2008 12-09-2008 12-10-2008 12-11-2008 12-12-2008 12-15-2008 4-18-2008 4-21-2008 4-22-2008 4-23-2008 4-24-2008 4-28-2008 4-29-2008 5-01-2008 5-05-2008 5-06-2008 5-07-2008 5-09-2008 5-12-2008 5-13-2008 5-14-2008 5-15-2008 5-16-2008 5-19-2008 5-20-2008 5-21-2008 5-22-2008 5-23-2008 5-27-2008 5-28-2008 5-29-2008 5-30-2008 6-02-2008 6-03-2008 6-05-2008 6-06-2008 6-09-2008 6-10-2008 6-11-2008 6-12-2008 6-13-2008 6-17-2008 6-18-2008 6-19-2008 6-20-2008 6-23-2008 6-24-2008 6-25-2008 6-26-2008 6-27-2008 7-01-2008 7-02-2008 7-04-2008 7-07-2008 7-08-2008 7-09-2008 7-10-2008 7-11-2008 7-15-2008 7-16-2008 7-18-2008 7-21-2008 7-22-2008 7-23-2008 7-24-2008 7-25-2008 7-28-2008 7-30-2008 7-31-2008 8-01-2008 8-04-2008 8-05-2008 8-06-2008 8-07-2008 8-08-2008 8-11-2008 8-12-2008 8-13-2008 8-15-2008 8-18-2008 8-19-2008 8-20-2008 8-21-2008 8-22-2008 8-25-2008 8-26-2008 8-27-2008 8-28-2008 8-29-2008 9-03-2008 9-04-2008 9-05-2008 9-08-2008 9-09-2008 9-10-2008 9-11-2008 9-12-2008 9-15-2008 9-16-2008 9-17-2008 9-18-2008 9-19-2008 9-22-2008 9-23-2008 9-24-2008 9-25-2008 9-26-2008 9-29-2008 9-30-2008 I wanted to sort this in ascending order. I tried using simply sort() function, without altering the format of date, but it didnot work. Next I tried to convert that vector in a date-class vector so that, I could sort them but in vein :( I used : as.Date(date_file, format=%m/%d/%y) However it did not work. Can anyone please tell me what would be correct approach? -- View this message in context: http://www.nabble.com/Sorting-a-date-vector-tp21032540p21032540.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem assigning NA as a level name in a list
Cliff Behrens wrote: Peter, OK...here is reproducible, self-contained code: library(gregmisc) Relying on a 3rd party package is not kosher either... Whatever did list(NA=2) or l - list(2); names(l) - NA do to you? columnNames - c(A,B,C,D,N,a,b,c) namePerms- permutations(length(columnNames),2,columnNames,repeats=TRUE) nameList - paste(namePerms[,1],namePerms[,2],sep=) dataList - lapply(1:length(nameList), function(level) {}) names(dataList)- nameList ## The NA is interpreted that the name is missing for one list in dataList If you inspect the contents of dataList, you will find the following showing that the name NA is treated differently: Anyways As I thought: Remember that NA is a reserved word. You get the same kind of reaction if you name an element for or in. It denotes that you need to quote the name for indexing with $: names(l) - NA l$NA Error: unexpected numeric constant in l$NA l$`NA` [1] 2 l$NA [1] 2 l[[NA]] [1] 2 names(l) [1] NA .. $Na NULL $`NA` NULL $Nb NULL . Peter Dalgaard wrote: Cliff Behrens wrote: I want to generate a list (called dataList below) where each of its levels is named. These names are assigned to nameList, which contains all possible permutations of size two taking letters from a larger alphabet, e.g., aa,...,Fd,..,Z1,... One of these permutations is the character string NA. It seems that when I try to name one of the dataList levels NA, using names(dataList)- nameList, the names() function assigns the missing character to the level. Is there someway to preserve NA as the name of a level in dataList? Here is the R code I have been using to do this. namePerms- permutations(ncol(coinMat),2,colnames(coinMat),repeats=TRUE) nameList - paste(namePerms[,1],namePerms[,2],sep=) dataList - lapply(1:length(nameList), function(level) {}) names(dataList)- nameList ## The NA in nameList is interpreted so that the name NA is missing for one level in dataList I am running R 2.4.1 in the Windows XP environment. Thanks for any help that can be offerred. Your example is not reproducible and self-contained. What is permutations and coinMat?? I bet it isn't minimal either. It doesn't seem to be happening for me with a recent(!) version of R, but you could just be misinterpreting the backtick quoting. - O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem install modul R-base-2.5.0-2.1.x86_64.rpm on SLES9 64-bit
good morging, I need to install R-2.x on my host: Linux, 2.6.5-7.308-smp #1 SMP Mon Dec 10 11:36:40 UTC 2007 x86_64 x86_64 x86_64 GNU/Linux now I have check the packages on your document: http://cran.r-project.org/bin/linux/suse/ReadMe.html but I have a problem whit xorg-x11-lib, in my host I have installed package: XFree86-libs-4.3.99.902-43.94 and I have a conflit (see my log) log1.txt can you help me or inform me where find right information for install in my host? tanks in advance, best regards _ Repubblica e Cantone Ticino, www.ti.ch/csi Dipartimento delle finanze e dell'economia Centro sistemi informativi/PESC Luca Mutti Via Pretorio 16 (+41 91) 815.57.49 6901 Lugano luca.mu...@ti.ch Salvaguarda l'ambiente; stampa questo messaggio soltanto se è veramente necessario! Il presente messaggio e i suoi eventuali allegati possono contenere dati o informazioni confidenziali o protette giuridicamente. Esso è destinato esclusivamente alle persone sopra indicate che sono le uniche autorizzate ad usarlo, copiarlo e, sotto la propria responsabilità, diffonderlo. Chiunque ricevesse questo messaggio (o una sua copia) per errore è pregato di rinviarlo immediatamente al mittente, eliminando definitivamente l'originale, senza distribuire, copiare, inoltrare o fare altrimenti uso dello stesso. # rpm -i xorg-x11-libs-6.8.2-0.1.x86_64.rpm warning: xorg-x11-libs-6.8.2-0.1.x86_64.rpm: V3 DSA signature: NOKEY, key ID 6143b445 file /usr/X11R6/bin/xauth from install of xorg-x11-libs-6.8.2-0.1 conflicts with file from package XFree86-libs-4.3.99.902-43.94 file /usr/X11R6/lib/X11/locale/compose.dir from install of xorg-x11-libs-6.8.2-0.1 conflicts with file from package XFree86-libs-4.3.99.902-43.94 file /usr/X11R6/lib/X11/locale/en_US.UTF-8/Compose from install of xorg-x11-libs-6.8.2-0.1 conflicts with file from package XFree86-libs-4.3.99.902-43.94 file /usr/X11R6/lib/X11/locale/en_US.UTF-8/XLC_LOCALE from install of xorg-x11-libs-6.8.2-0.1 conflicts with file from package XFree86-libs-4.3.99.902-43.94 file /usr/X11R6/lib/X11/locale/iso8859-15/Compose from install of xorg-x11-libs-6.8.2-0.1 conflicts with file from package XFree86-libs-4.3.99.902-43.94 file /usr/X11R6/lib/X11/locale/ja_JP.UTF-8/Compose from install of xorg-x11-libs-6.8.2-0.1 conflicts with file from package XFree86-libs-4.3.99.902-43.94 file /usr/X11R6/lib/X11/locale/ja_JP.UTF-8/XLC_LOCALE from install of xorg-x11-libs-6.8.2-0.1 conflicts with file from package XFree86-libs-4.3.99.902-43.94 file /usr/X11R6/lib/X11/locale/ko_KR.UTF-8/Compose from install of xorg-x11-libs-6.8.2-0.1 conflicts with file from package XFree86-libs-4.3.99.902-43.94 file /usr/X11R6/lib/X11/locale/ko_KR.UTF-8/XLC_LOCALE from install of xorg-x11-libs-6.8.2-0.1 conflicts with file from package XFree86-libs-4.3.99.902-43.94 file /usr/X11R6/lib/X11/locale/lib64/common/ximcp.so.2 from install of xorg-x11-libs-6.8.2-0.1 conflicts with file from package XFree86-libs-4.3.99.902-43.94 file /usr/X11R6/lib/X11/locale/lib64/common/xlcDef.so.2 from install of xorg-x11-libs-6.8.2-0.1 conflicts with file from package XFree86-libs-4.3.99.902-43.94 file /usr/X11R6/lib/X11/locale/lib64/common/xlcUTF8Load.so.2 from install of xorg-x11-libs-6.8.2-0.1 conflicts with file from package XFree86-libs-4.3.99.902-43.94 file /usr/X11R6/lib/X11/locale/lib64/common/xlibi18n.so.2 from install of xorg-x11-libs-6.8.2-0.1 conflicts with file from package XFree86-libs-4.3.99.902-43.94 file /usr/X11R6/lib/X11/locale/lib64/common/xlocale.so.2 from install of xorg-x11-libs-6.8.2-0.1 conflicts with file from package XFree86-libs-4.3.99.902-43.94 file /usr/X11R6/lib/X11/locale/lib64/common/xomGeneric.so.2 from install of xorg-x11-libs-6.8.2-0.1 conflicts with file from package XFree86-libs-4.3.99.902-43.94 file /usr/X11R6/lib/X11/locale/locale.alias from install of xorg-x11-libs-6.8.2-0.1 conflicts with file from package XFree86-libs-4.3.99.902-43.94 file /usr/X11R6/lib/X11/locale/locale.dir from install of xorg-x11-libs-6.8.2-0.1 conflicts with file from package XFree86-libs-4.3.99.902-43.94 file /usr/X11R6/lib/X11/locale/th_TH.UTF-8/Compose from install of xorg-x11-libs-6.8.2-0.1 conflicts with file from package XFree86-libs-4.3.99.902-43.94 file /usr/X11R6/lib/X11/locale/th_TH.UTF-8/XLC_LOCALE from install of xorg-x11-libs-6.8.2-0.1 conflicts with file from package XFree86-libs-4.3.99.902-43.94 file /usr/X11R6/lib/X11/locale/zh_CN.UTF-8/Compose from install of xorg-x11-libs-6.8.2-0.1 conflicts with file from package XFree86-libs-4.3.99.902-43.94 file /usr/X11R6/lib/X11/locale/zh_CN.UTF-8/XLC_LOCALE from install of xorg-x11-libs-6.8.2-0.1 conflicts with file from package XFree86-libs-4.3.99.902-43.94 file
Re: [R] ggplot2 and lattice
stephen sefick wrote: yes a parallel coordinates plot- I understand that it is for multivariate data, but I am having a hard time figuring out what it is telling me. Thanks for your help. In the lattice book, the author mentions that static parallel plots aren't very useful, in general. With a lot of data, they tend to be a spaghetti mess. They're more useful when you can brush over data to highlight it dynamically, which could show you common patterns. (E.g. that cars with smaller engines tend to have better mileage, but poorer acceleration.) At least that's my limited experience with them. Wikipedia has a page: http://en.wikipedia.org/wiki/Parallel_coordinates and the sample graph they have at the top of the page shows data that clusters on the first 5 features/dimensions, and then goes spaghetti on you. (As the article says, ordering of the dimensions is important, and they obviously got a reasonable order... or had boring data.) -- View this message in context: http://www.nabble.com/ggplot2-and-lattice-tp19579003p21036174.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Parameter estimation - Generalized Extreme value Distribution
Dear R help, I have an xls file with the name ONS.csv having 25 obseravations as given below. This is my data. (i.e. the first column of file ONS.csv) (5.55,4.56,17.82,5.03,5.3,40.28,8.05,27.8,5.85,5.42,14.75,46.13,18.5,4.58, 4.31,9.19,6.61,15.92,96.94,21.63,4.44,4.88,241.74,38592.1,5.24) I am trying to fit the Generalized Extreme Value distribution to this data. Following is my R - script Library (lsmev) ONS - read.csv(GEV.csv,header = TRUE) gev.fit(ONS[,1]) I get following output $conv [1] 0 $nllh [1] 99.28817 $mle [1] 5.940866 2.703154 1.425794 $se [1] 0.6827288 1.1263298 0.2590853 What is the meaning of mle (entries). Does it give me the parameter estimated for the location(5.940866), scale(2.703154) and shape(1.425794) parameter of the Generalized Extreme Value distribution. Please guide. Thanking you in advance __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sorting a date vector
2008/12/16 RON70 ron_michae...@yahoo.com Yes you are right. However using that code, format of date is altered. I need to main same format as the input data i.e. 10-02-2008 not 2008-10-02, still having date-class. Any better idea? You may try this: format(sort(a,decreasing=TRUE),%m-%d-%Y) David Winsemius wrote: You might want to look at your date format more closely. Both the separator and the year format specs fail to match your input. as.Date(10-02-2008, format = %m/%d/%y) [1] NA as.Date(10-02-2008, format = %m-%d-%Y) [1] 2008-10-02 -- David Winsemius On Dec 16, 2008, at 7:54 AM, RON70 wrote: I have a date-like-vector like : date_file 10-02-2008 10-03-2008 10-06-2008 10-07-2008 10-09-2008 10-10-2008 10-13-2008 10-14-2008 10-15-2008 10-16-2008 10-17-2008 10-20-2008 10-21-2008 10-22-2008 10-23-2008 10-24-2008 10-28-2008 10-29-2008 10-30-2008 10-31-2008 11-03-2008 11-04-2008 11-05-2008 11-06-2008 11-07-2008 11-10-2008 11-11-2008 11-12-2008 11-13-2008 11-14-2008 11-17-2008 11-18-2008 11-19-2008 11-20-2008 11-21-2008 11-24-2008 11-25-2008 11-26-2008 11-28-2008 12-01-2008 12-02-2008 12-03-2008 12-04-2008 12-05-2008 12-08-2008 12-09-2008 12-10-2008 12-11-2008 12-12-2008 12-15-2008 4-18-2008 4-21-2008 4-22-2008 4-23-2008 4-24-2008 4-28-2008 4-29-2008 5-01-2008 5-05-2008 5-06-2008 5-07-2008 5-09-2008 5-12-2008 5-13-2008 5-14-2008 5-15-2008 5-16-2008 5-19-2008 5-20-2008 5-21-2008 5-22-2008 5-23-2008 5-27-2008 5-28-2008 5-29-2008 5-30-2008 6-02-2008 6-03-2008 6-05-2008 6-06-2008 6-09-2008 6-10-2008 6-11-2008 6-12-2008 6-13-2008 6-17-2008 6-18-2008 6-19-2008 6-20-2008 6-23-2008 6-24-2008 6-25-2008 6-26-2008 6-27-2008 7-01-2008 7-02-2008 7-04-2008 7-07-2008 7-08-2008 7-09-2008 7-10-2008 7-11-2008 7-15-2008 7-16-2008 7-18-2008 7-21-2008 7-22-2008 7-23-2008 7-24-2008 7-25-2008 7-28-2008 7-30-2008 7-31-2008 8-01-2008 8-04-2008 8-05-2008 8-06-2008 8-07-2008 8-08-2008 8-11-2008 8-12-2008 8-13-2008 8-15-2008 8-18-2008 8-19-2008 8-20-2008 8-21-2008 8-22-2008 8-25-2008 8-26-2008 8-27-2008 8-28-2008 8-29-2008 9-03-2008 9-04-2008 9-05-2008 9-08-2008 9-09-2008 9-10-2008 9-11-2008 9-12-2008 9-15-2008 9-16-2008 9-17-2008 9-18-2008 9-19-2008 9-22-2008 9-23-2008 9-24-2008 9-25-2008 9-26-2008 9-29-2008 9-30-2008 I wanted to sort this in ascending order. I tried using simply sort() function, without altering the format of date, but it didnot work. Next I tried to convert that vector in a date-class vector so that, I could sort them but in vein :( I used : as.Date(date_file, format=%m/%d/%y) However it did not work. Can anyone please tell me what would be correct approach? -- View this message in context: http://www.nabble.com/Sorting-a-date-vector-tp21032540p21032540.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Sorting-a-date-vector-tp21032540p21032997.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sorting a date vector
You might want to look at your date format more closely. Both the separator and the year format specs fail to match your input. as.Date(10-02-2008, format = %m/%d/%y) [1] NA as.Date(10-02-2008, format = %m-%d-%Y) [1] 2008-10-02 -- David Winsemius On Dec 16, 2008, at 7:54 AM, RON70 wrote: I have a date-like-vector like : date_file 10-02-2008 10-03-2008 10-06-2008 10-07-2008 10-09-2008 10-10-2008 10-13-2008 10-14-2008 10-15-2008 10-16-2008 10-17-2008 10-20-2008 10-21-2008 10-22-2008 10-23-2008 10-24-2008 10-28-2008 10-29-2008 10-30-2008 10-31-2008 11-03-2008 11-04-2008 11-05-2008 11-06-2008 11-07-2008 11-10-2008 11-11-2008 11-12-2008 11-13-2008 11-14-2008 11-17-2008 11-18-2008 11-19-2008 11-20-2008 11-21-2008 11-24-2008 11-25-2008 11-26-2008 11-28-2008 12-01-2008 12-02-2008 12-03-2008 12-04-2008 12-05-2008 12-08-2008 12-09-2008 12-10-2008 12-11-2008 12-12-2008 12-15-2008 4-18-2008 4-21-2008 4-22-2008 4-23-2008 4-24-2008 4-28-2008 4-29-2008 5-01-2008 5-05-2008 5-06-2008 5-07-2008 5-09-2008 5-12-2008 5-13-2008 5-14-2008 5-15-2008 5-16-2008 5-19-2008 5-20-2008 5-21-2008 5-22-2008 5-23-2008 5-27-2008 5-28-2008 5-29-2008 5-30-2008 6-02-2008 6-03-2008 6-05-2008 6-06-2008 6-09-2008 6-10-2008 6-11-2008 6-12-2008 6-13-2008 6-17-2008 6-18-2008 6-19-2008 6-20-2008 6-23-2008 6-24-2008 6-25-2008 6-26-2008 6-27-2008 7-01-2008 7-02-2008 7-04-2008 7-07-2008 7-08-2008 7-09-2008 7-10-2008 7-11-2008 7-15-2008 7-16-2008 7-18-2008 7-21-2008 7-22-2008 7-23-2008 7-24-2008 7-25-2008 7-28-2008 7-30-2008 7-31-2008 8-01-2008 8-04-2008 8-05-2008 8-06-2008 8-07-2008 8-08-2008 8-11-2008 8-12-2008 8-13-2008 8-15-2008 8-18-2008 8-19-2008 8-20-2008 8-21-2008 8-22-2008 8-25-2008 8-26-2008 8-27-2008 8-28-2008 8-29-2008 9-03-2008 9-04-2008 9-05-2008 9-08-2008 9-09-2008 9-10-2008 9-11-2008 9-12-2008 9-15-2008 9-16-2008 9-17-2008 9-18-2008 9-19-2008 9-22-2008 9-23-2008 9-24-2008 9-25-2008 9-26-2008 9-29-2008 9-30-2008 I wanted to sort this in ascending order. I tried using simply sort() function, without altering the format of date, but it didnot work. Next I tried to convert that vector in a date-class vector so that, I could sort them but in vein :( I used : as.Date(date_file, format=%m/%d/%y) However it did not work. Can anyone please tell me what would be correct approach? -- View this message in context: http://www.nabble.com/Sorting-a-date-vector-tp21032540p21032540.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] surface contour plot help
I am trying to do a surface profile plot. data is X Y(1) Z(1) 1-jan-02 2002 number 2-jan-02 2002 number . . . 1-jan-03 2003 (Y2) number Z(2) 2-jan-03 2003 (Y2) number Z(2) . . . until dec 31 2007. I used the plot3d funtions to build a scatter point plot. Call rinterface.rrun(library(rgl)) Call rinterface.rrun(plot3d(x,y1,z1,xlab='Date',ylab='Year',zlab='Vol',ylim=c(2001,2008))) Call rinterface.rrun(plot3d(x,y2,z2,add=TRUE)) Call rinterface.rrun(plot3d(x,y3,z3,add=TRUE)) Call rinterface.rrun(plot3d(x,y4,z4,add=TRUE)) Call rinterface.rrun(plot3d(x,y5,z5,add=TRUE)) Call rinterface.rrun(plot3d(x,y6,z6,add=TRUE)) Is thier a way to lay a surface to this? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [R-pkgs] R CMD check on window XP
This message accidentally (list moderator mistake) made it to R-packages; it clearly should have been R-help only. Please only reply to R-help if you can help Sue. Martin Maechler, R-packages list moderator Hi, there, I used R CMD check to build my ATGGS package under window XP system. My R version is 2.7.2. But I encounter some problems. The log file is like: .. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Check if data frame column is numeric
Hi R-users, I want to apply a function to each column of a data frame that is numeric. Thus I tried to check it for each column first: apply(df, 2, function(x) is.numeric(x)) A60 A64 A66a A67 A71 A75a A80 A85 A91 A95 A96 A97 A98 A99 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE I get only FALSE results although the variables are numeric. When I try the following it works: is.numeric(df$A60) [1] TRUE What am I doing wrong? TIA Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Check if data frame column is numeric
Try: sapply(df, is.numeric) On Tue, Dec 16, 2008 at 1:25 PM, Mark Heckmann mark.heckm...@gmx.de wrote: Hi R-users, I want to apply a function to each column of a data frame that is numeric. Thus I tried to check it for each column first: apply(df, 2, function(x) is.numeric(x)) A60 A64 A66a A67 A71 A75a A80 A85 A91 A95 A96 A97 A98 A99 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE I get only FALSE results although the variables are numeric. When I try the following it works: is.numeric(df$A60) [1] TRUE What am I doing wrong? TIA Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Find all numbers in a certain interval
On Dec 16, 2008, at 7:19 AM, Antje wrote: Hi David, thanks a lot for your proposal. I got a lot of useful hints from all of you :-) David Winsemius schrieb: It's not entirely clear what you are asking for, since which(within.interval(a, -0.5, 0.5)) is actually longer than which(a -0.5 a 0.5). Right but in case 'a' is something with a long name and '0.5' is a variable you might end up with something like this (for the data frame example): DF[which( DF$myReallyLongColumnName -myReallyLongThreshold DF $myReallyLongColumnName -myReallyLongThreshold ), ] I see your point, but I must point out that no cases would ever satisfy that construction. instead of: DF[which( within.interval(DF$myReallyLongColumnName, myReallyLongThreshold), ] That would be a different within.interval function than I suggested, but you could certainly create one which accepted a vector. within.interval - function(x, y) { min(y) x x max(y) } -- within.interval2 - function(x,y) { min(y) x x max(y)} y - c(-.1, -.2, .1,.2) which(within.interval2(DF$a,y)) [1] 7 13 14 17 You mention that you want a solution that applies to dataframes. Using indexing you can get entire rows of dataframes that satisfy multiple conditions on one of its columns: DF - data.frame(a = rnorm(20), b= LETTERS[1:20], c = letters[20:1], stringsAsFactors=FALSE) DF[which( DF$a -0.5 DF$a 0.5 ), ] # note that one needs to avoid DF[which(a -0.5 a0.5) , ] # the a vector is not the same as the a column vector within DF a b c 3 -0.47310672 C r 6 -0.49784460 F o 9 0.02571058 I l 10 0.16893759 J k 11 -0.11963322 K j 12 0.39378887 L i 16 0.03712263 P e Could get the indices that satisfy more than one condition: which(DF$a 0.5 DF$b K) [1] 1 2 6 10 Or you can get rows of DF that satisfy conditions on multiple columns with the subset function: subset(DF, a 0.5 b K) a b c 1 2.2500997 A t 2 0.7251357 B s 6 0.7845355 F o 10 1.0685649 J k Or if you wanted a within.interval function within.interval - function(x,a,b) { x a x b} which(within.interval(DF$a, -0.5, 0.5)) [1] 3 4 7 8 9 13 14 17 20 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is = now the same as - in assigning values
Thank you all for the reply. I´ll start using -. Best regards, Petter Hedberg University of Warsaw. 2008/12/16 Gabor Grothendieck ggrothendi...@gmail.com: In most cases - and = are the same yet its not always true so its safest to use - for assignment. Check this out: http://tolstoy.newcastle.edu.au/R/e4/help/08/06/12940.html On Mon, Dec 15, 2008 at 4:26 PM, Petter Hedberg ekologkons...@gmail.com wrote: I´m a PhD student at the University of Warsaw, and have started using R. In many books they specify to use - instead of = when assigning values, and this is also mentioned in older posts on the R website. However, it seams to me that some update has occured, becuase I continously get the same result wether I use - or =. I would be extremely helpful for any answer to this. = seams more intuitive, so I assumed that an update had been made due to popular demand and that was why I get the same output wether I use - or =. Best regards, Petter Hedberg __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OT: (quasi-?) separation in a logistic GLM
On Tue, 2008-12-16 at 13:31 +0100, vito muggeo wrote: dear Gavin, I do not know whether such comment may be still useful.. Very much so, Thank you. Why are you unsure about quasi-separation? I think that it is quite evident in the plot Unsure in the sense that I had been unable to ascertain what quasi-complete separation was ;-) I'm still not convinced about the quasi-separation issue though. The coefficients on the glm are large but the standard errors don't indicate anything much wrong. I tried brglm() in the package of the same name and this gave effectively the same coefficients and standard errors as glm() where I would have expected them to differ considerably if (quasi-)separation were an issue. I'm not very familiar with the approach behind brglm() however. I'll take a look at the profiling you describe below also when our computing problems here get sorted. Apologies if people have had problems downloading the file from my web space - we are having all sorts of filestore problems here this week. Thanks again Vito for your comments, G plot(analogs ~ Dij, data = dat) Also it may be useful to see the plot of the monotone (profile) deviance (or the log-lik) for the coef of Dij, xval-seq(-20,0,l=50) ll-vector(length=50) for(i in 1:length(xval)){ mod - glm(analogs ~ offset(xval[i]*Dij), data = dat, family = binomial) ll[i]-mod$dev } plot(xval, ll) Hope this helps you, vito Gavin Simpson ha scritto: Dear List, Apologies for this off-topic post but it is R-related in the sense that I am trying to understand what R is telling me with the data to hand. ROC curves have recently been used to determine a dissimilarity threshold for identifying whether two samples are from the same type or not. Given the bashing that ROC curves get whenever anyone asks about them on this list (and having implemented the ROC methodology in my analogue package) I wanted to try directly modelling the probability that two sites are analogues for one another for given dissimilarity using glm(). The data I have then are a logical vector ('analogs') indicating whether the two sites come from the same vegetation and a vector of the dissimilarity between the two sites ('Dij'). These are in a csv file currently in my university web space. Each 'row' in this file corresponds to single comparison between 2 sites. When I analyse these data using glm() I get the familiar fitted probabilities numerically 0 or 1 occurred warning. The data do not look linearly separable when plotted (code for which is below). I have read Venables and Ripley's discussion of this in MASS4 and other sources that discuss this warning and R (Faraway's Extending the Linear Model with R and John Fox's new Applied Regression, Generalized Linear Models, and Related Methods, 2nd Ed) as well as some of the literature on Firth's bias reduction method. But I am still somewhat unsure what (quasi-)separation is and if this is the reason for the warnings in this case. My question then is, is this a separation issue with my data, or is it quasi-separation that I have read a bit about whilst researching this problem? Or is this something completely different? Code to reproduce my problem with the actual data is given below. I'd appreciate any comments or thoughts on this. Begin code snippet ## note data file is ~93Kb in size dat - read.csv(url(http://www.homepages.ucl.ac.uk/~ucfagls/dat.csv;)) head(dat) ## fit model --- produces warning mod - glm(analogs ~ Dij, data = dat, family = binomial) ## plot the data plot(analogs ~ Dij, data = dat) fit.mod - fitted(mod) ord - with(dat, order(Dij)) with(dat, lines(Dij[ord], fit.mod[ord], col = red, lwd = 2)) End code snippet ## Thanks in advance Gavin -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [ExternalEmail] Pearson Correlation Speed
On Tue, 16 Dec 2008, Nathan S. Watson-Haigh wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Charles C. Berry wrote: On Mon, 15 Dec 2008, Nathan S. Watson-Haigh wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Nathan S. Watson-Haigh wrote: I'm trying to calculate Pearson correlation coefficients for a large matrix of size 18563 x 18563. The following function takes about XX minutes to complete, and I'd like to do this calculation about 15 times and so speed is some what of an issue. I think you are on the wrong track, Nathan. The matrix you are starting with is 18563 x 18563 and the result of finding the correlations amongst the columns of that matrix is also 18563 x 18563. It will require more than 5 Gigabytes of memory to store the result and the original matrix. Yes the memory usage is somewhat large - luckily I have the use of a cluster with lots of shared memory! However, I'm interested to learn how you came about the calculation to determine the memory requirements. The original object is 18563^2*8/1024^3 [1] 2.567358 Gigabytes, and so is the result. I added them together. Likely the time needed to do the calc is inflated because of caching issues and if your machine has less than enough memory to store the result and all the intermediate pieces by swapping as well. You can finesse these by breaking your problem into smaller pieces, say computing the correlations between each pair of 19 blocks of columns (columns 1:977, 977+1:977, ... 18*977+1:977 ), then assembling the results. This is possibly, however why is something like this not implemented internally in the cor() function if it poorly scales due to the large memory requirements? Because nobody ever really needed it? Seriously, optimizing something like this is machine dependent, and R-core probably has higher priorities. cor() provides lots of options - it handles NAs, for example - and it is probably not worth the trouble to try to optimize over those options. The calculation sans NAs is a simple one and can be done using the built in BLAS (as crossprod() does), which BLAS can in turn be tuned to the machine used. So, if your environment has a tuned or multithreaded BLAS, you might be better off to use crossprod() and scale the result. --- BTW, R already has the necessary machinery to calculate the crossproduct matrix (etc) needed to find the correlations. You can access the low level linear algebra that R uses. You can marry R to an optimized BLAS if you like. So pulling in some other code to do this will not save you anything. If you ever do decide to import C[++] code there is excellent documentation in the Writing R Extensions manual, which you should review before attempting to import C++ code into R. Thanks, I have seen this and it seemed quite technical to use as a starting point for someone unfamiliar with both C++ and incorporating C++ code into R. Well, in that case the path of least resistance is to start the process when you leave for the night and pick up the results the next morning. HTH, Chuck Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [R-pkgs] R CMD check on window XP
On Mon, 15 Dec 2008, Shu Chen wrote: Hi, there, I used R CMD check to build my ATGGS package under window XP system. My R version is 2.7.2. But I encounter some problems. The log file is like: ** installing R.css in C:/ATGGS.Rcheck -- Making package ATGGS adding build stamp to DESCRIPTION installing R files installing inst files find: `C:/ATGGS.Rcheck/ATGGS/csvscripts': Permission denied make[2]: *** [C:/ATGGS.Rcheck/ATGGS/inst] Error 1 make[1]: *** [all] Error 2 make: *** [pkg-ATGGS] Error 2 Can't read C:/ATGGS.Rcheck/ATGGS/auxData: Invalid argument at c:\R\R-27~1.2/bin/INSTALL line 434 Can't remove directory C:/ATGGS.Rcheck/ATGGS/auxData: Directory not empty at c:\R\R-27~1.2/bin/INSTALL line 434 Can't read C:/ATGGS.Rcheck/ATGGS/csvData: Invalid argument at c:\R\R-27~1.2/bin/INSTALL line 434 Can't remove directory C:/ATGGS.Rcheck/ATGGS/csvData: Directory not empty at c:\R\R-27~1.2/bin/INSTALL line 434 Can't read C:/ATGGS.Rcheck/ATGGS/csvscripts: Invalid argument at c:\R\R-27~1.2/bin/INSTALL line 434 Can't remove directory C:/ATGGS.Rcheck/ATGGS/csvscripts: Directory not empty at c:\R\R-27~1.2/bin/INSTALL line 434 Can't read C:/ATGGS.Rcheck/ATGGS/doc: Invalid argument at c:\R\R-27~1.2/bin/INSTALL line 434 Can't remove directory C:/ATGGS.Rcheck/ATGGS: Directory not empty at c:\R\R-27~1.2/bin/INSTALL line 434 *** Installation of ATGGS failed *** Removing 'C:/ATGGS.Rcheck/ATGGS' I am not able to delete c:/ATGGS.Rcheck until I change the permission of the folder. I'm the admin of C driver. I have full control of all other folders under C driver. Please sort out the permissions in your source directory, including under ATGGS/inst. Something there has incorrect permissions that are confusing the Cygwin tools used. (It might be worth checking ownership too: I've seen similar problems on a drive shared between XP and Vista where they disagreed about ownership.) -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Find all numbers in a certain interval
Hi David, thanks a lot for your proposal. I got a lot of useful hints from all of you :-) David Winsemius schrieb: It's not entirely clear what you are asking for, since which(within.interval(a, -0.5, 0.5)) is actually longer than which(a -0.5 a 0.5). Right but in case 'a' is something with a long name and '0.5' is a variable you might end up with something like this (for the data frame example): DF[which( DF$myReallyLongColumnName -myReallyLongThreshold DF$myReallyLongColumnName -myReallyLongThreshold ), ] instead of: DF[which( within.interval(DF$myReallyLongColumnName, myReallyLongThreshold), ] You mention that you want a solution that applies to dataframes. Using indexing you can get entire rows of dataframes that satisfy multiple conditions on one of its columns: DF - data.frame(a = rnorm(20), b= LETTERS[1:20], c = letters[20:1], stringsAsFactors=FALSE) DF[which( DF$a -0.5 DF$a 0.5 ), ] # note that one needs to avoid DF[which(a -0.5 a0.5) , ] # the a vector is not the same as the a column vector within DF a b c 3 -0.47310672 C r 6 -0.49784460 F o 9 0.02571058 I l 10 0.16893759 J k 11 -0.11963322 K j 12 0.39378887 L i 16 0.03712263 P e Could get the indices that satisfy more than one condition: which(DF$a 0.5 DF$b K) [1] 1 2 6 10 Or you can get rows of DF that satisfy conditions on multiple columns with the subset function: subset(DF, a 0.5 b K) a b c 1 2.2500997 A t 2 0.7251357 B s 6 0.7845355 F o 10 1.0685649 J k Or if you wanted a within.interval function within.interval - function(x,a,b) { x a x b} which(within.interval(DF$a, -0.5, 0.5)) [1] 3 4 7 8 9 13 14 17 20 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Application b-spline basis for polynomial splines
On Mon, 15 Dec 2008, ARIF WIJAYA wrote: Hai everbody,??Is there anyone have simple application b-spline in r language? I need it for make me understanding about b-spline for polynomial spline. Try this: library(splines) example(bs) Did you reading the posting guide?? There are some terrific hints about how to learn about things -- like R's splines capabilities --- in the 'Do Your Homework' section. HTH, Chuck Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem assigning NA as a level name in a list
Peter, I've inserted response inline below: Cliff Peter Dalgaard wrote: Cliff Behrens wrote: Peter, OK...here is reproducible, self-contained code: library(gregmisc) Relying on a 3rd party package is not kosher either... Whatever did list(NA=2) or l - list(2); names(l) - NA do to you? I'm not sure what you mean by 3rd party? I downloaded this package from the CRAN site where I get all others. I don't understand your question. columnNames - c(A,B,C,D,N,a,b,c) namePerms- permutations(length(columnNames),2,columnNames,repeats=TRUE) nameList - paste(namePerms[,1],namePerms[,2],sep=) dataList - lapply(1:length(nameList), function(level) {}) names(dataList)- nameList ## The NA is interpreted that the name is missing for one list in dataList If you inspect the contents of dataList, you will find the following showing that the name NA is treated differently: Anyways As I thought: Remember that NA is a reserved word. You get the same kind of reaction if you name an element for or in. It denotes that you need to quote the name for indexing with $: I thought that since all of the names in namesList were type char, there was no need to enclose these in quotation marks. names(l) - NA l$NA Error: unexpected numeric constant in l$NA l$`NA` [1] 2 l$NA [1] 2 l[[NA]] [1] 2 names(l) [1] NA .. $Na NULL $`NA` NULL $Nb NULL . Peter Dalgaard wrote: Cliff Behrens wrote: I want to generate a list (called dataList below) where each of its levels is named. These names are assigned to nameList, which contains all possible permutations of size two taking letters from a larger alphabet, e.g., aa,...,Fd,..,Z1,... One of these permutations is the character string NA. It seems that when I try to name one of the dataList levels NA, using names(dataList)- nameList, the names() function assigns the missing character to the level. Is there someway to preserve NA as the name of a level in dataList? Here is the R code I have been using to do this. namePerms- permutations(ncol(coinMat),2,colnames(coinMat),repeats=TRUE) nameList - paste(namePerms[,1],namePerms[,2],sep=) dataList - lapply(1:length(nameList), function(level) {}) names(dataList)- nameList ## The NA in nameList is interpreted so that the name NA is missing for one level in dataList I am running R 2.4.1 in the Windows XP environment. Thanks for any help that can be offerred. Your example is not reproducible and self-contained. What is permutations and coinMat?? I bet it isn't minimal either. It doesn't seem to be happening for me with a recent(!) version of R, but you could just be misinterpreting the backtick quoting. - O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Beta Conjugate Prior for Random intercept model -WInBUGS
I have been using the following random intercept model with non-informative prior: model { for (i in 1:n.samples) { vomit[i] ~ dbern(p[i]) logit(p[i]) - beta0 + alpha[siteid[i]] } for (j in 1:n.sites) { alpha[j]~dnorm(0,tau) } beta0 ~ dnorm(0.0,1.0E-6) tau ~ dgamma(0.01,0.01) } list(n.samples=3780,n.sites=63) How could I use a beta conjugate prior for the same model so that p(i) ~ dbeta(alpha,beta)? Thanks for your help. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Find all numbers in a certain interval
Antje wrote: Hi, sorry, but it shouldn't be different. The result should be the same but I was looking if there is a method I can use... # having a function defined like baptiste proposed: isIn - function (interval, x) { (x min(interval)) (x max(interval)) } Along the lines I suggested before, I'd suggest a function ordered(...) (or increasing()?) that could be called as ordered(-0.5, x, 0.5) If you do write this, be careful about how you handle recycling of values. Duncan Murdoch #-- a - rnorm(100) # it's simply more human readable if I can write which( isIn( c(-0.5, 0.5), a) ) # instead of which( a -0.5 a 0.5 ) Thanks to baptiste! So there is no method available doing this and I have to define this by myself. That's all I wanted to know :-) Antje markle...@verizon.net schrieb: hi: could you explain EXACTLY what you want to do with the dataframe because it shouldn't be that different ? On Tue, Dec 16, 2008 at 5:09 AM, Antje wrote: Hi all, I'd like to know, if I can solve this with a shorter command: a - rnorm(100) which(a -0.5 a 0.5) # would give me all indices of numbers greater than -0.5 and smaller than +0.5 I have something similar with a dataframe and it produces sometimes quite long commands... I'd like to have something like: which(within.interval(a, -0.5, 0.5)) Is there anything I could use for this purpose? Antje __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sorting a date vector
On Dec 16, 2008, at 8:49 AM, Prof Brian Ripley wrote: On Tue, 16 Dec 2008, David Winsemius wrote: You cannot keep them as strings and still get the benefits of working with date-class objects. You should read more documentation regarding dates. The You can: order() will work on the Date class and the ordering can be applied to the original data. Got it. Worked examples: dts - c(10-02-2008, 10-03-2008, 10-06-2008, 10-07-2008, 10-09-2008, 12-09-2008, 12-10-2008, 12-11-2008, 12-12-2008, 12-15-2008,4-18-2008, 4-21-2008, 4-22-2008, 4-23-2008) order(as.Date(dts, format = %m-%d-%Y)) [1] 11 12 13 14 1 2 3 4 5 6 7 8 9 10 rev(order(as.Date(dts, format = %m-%d-%Y))) [1] 10 9 8 7 6 5 4 3 2 1 14 13 12 11 dts[rev(order(as.Date(dts, format = %m-%d-%Y)))] [1] 12-15-2008 12-12-2008 12-11-2008 12-10-2008 12-09-2008 10-09-2008 10-07-2008 10-06-2008 10-03-2008 [10] 10-02-2008 4-23-2008 4-22-2008 4-21-2008 4-18-2008 dts[order(as.Date(dts, format = %m-%d-%Y))] [1] 4-18-2008 4-21-2008 4-22-2008 4-23-2008 10-02-2008 10-03-2008 10-06-2008 10-07-2008 10-09-2008 [10] 12-09-2008 12-10-2008 12-11-2008 12-12-2008 12-15-2008 I still suggest that RON70 educate himself further regarding the Date class and formats. -- David Winsemius as.Date function turns strings into a form that is stored internally as number of days since some reference date and what you are seeing is the default display format, %Y-%m-%d. Learn how to use the output formats so that you see what you desire. ?as.Date ?Dates ?format.Date -- David Winsemius On Dec 16, 2008, at 8:24 AM, RON70 wrote: Yes you are right. However using that code, format of date is altered. I need to main same format as the input data i.e. 10-02-2008 not 2008-10-02, still having date-class. Any better idea? David Winsemius wrote: You might want to look at your date format more closely. Both the separator and the year format specs fail to match your input. as.Date(10-02-2008, format = %m/%d/%y) [1] NA as.Date(10-02-2008, format = %m-%d-%Y) [1] 2008-10-02 -- David Winsemius On Dec 16, 2008, at 7:54 AM, RON70 wrote: I have a date-like-vector like : date_file 10-02-2008 10-03-2008 10-06-2008 10-07-2008 10-09-2008 10-10-2008 10-13-2008 10-14-2008 10-15-2008 10-16-2008 10-17-2008 10-20-2008 10-21-2008 10-22-2008 10-23-2008 10-24-2008 10-28-2008 10-29-2008 10-30-2008 10-31-2008 11-03-2008 11-04-2008 11-05-2008 11-06-2008 11-07-2008 11-10-2008 11-11-2008 11-12-2008 11-13-2008 11-14-2008 11-17-2008 11-18-2008 11-19-2008 11-20-2008 11-21-2008 11-24-2008 11-25-2008 11-26-2008 11-28-2008 12-01-2008 12-02-2008 12-03-2008 12-04-2008 12-05-2008 12-08-2008 12-09-2008 12-10-2008 12-11-2008 12-12-2008 12-15-2008 4-18-2008 4-21-2008 4-22-2008 4-23-2008 4-24-2008 4-28-2008 4-29-2008 5-01-2008 5-05-2008 5-06-2008 5-07-2008 5-09-2008 5-12-2008 5-13-2008 5-14-2008 5-15-2008 5-16-2008 5-19-2008 5-20-2008 5-21-2008 5-22-2008 5-23-2008 5-27-2008 5-28-2008 5-29-2008 5-30-2008 6-02-2008 6-03-2008 6-05-2008 6-06-2008 6-09-2008 6-10-2008 6-11-2008 6-12-2008 6-13-2008 6-17-2008 6-18-2008 6-19-2008 6-20-2008 6-23-2008 6-24-2008 6-25-2008 6-26-2008 6-27-2008 7-01-2008 7-02-2008 7-04-2008 7-07-2008 7-08-2008 7-09-2008 7-10-2008 7-11-2008 7-15-2008 7-16-2008 7-18-2008 7-21-2008 7-22-2008 7-23-2008 7-24-2008 7-25-2008 7-28-2008 7-30-2008 7-31-2008 8-01-2008 8-04-2008 8-05-2008 8-06-2008 8-07-2008 8-08-2008 8-11-2008 8-12-2008 8-13-2008 8-15-2008 8-18-2008 8-19-2008 8-20-2008 8-21-2008 8-22-2008 8-25-2008 8-26-2008 8-27-2008 8-28-2008 8-29-2008 9-03-2008 9-04-2008 9-05-2008 9-08-2008 9-09-2008 9-10-2008 9-11-2008 9-12-2008 9-15-2008 9-16-2008 9-17-2008 9-18-2008 9-19-2008 9-22-2008 9-23-2008 9-24-2008 9-25-2008 9-26-2008 9-29-2008 9-30-2008 I wanted to sort this in ascending order. I tried using simply sort() function, without altering the format of date, but it didnot work. Next I tried to convert that vector in a date-class vector so that, I could sort them but in vein :( I used : as.Date(date_file, format=%m/%d/%y) However it did not work. Can anyone please tell me what would be correct approach? -- View this message in context: http://www.nabble.com/Sorting-a-date-vector-tp21032540p21032540.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context:
[R] stably updating the SD
Hi, I have some summary data from which I know a few points and I'd like to remove them. Does anyone know if there is an R module that implements something like Hanson (1975)? Hanson (1975). Stably updating mean and standard deviation of data. Communications of the ACM, 18(1), 57-58. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] pwr.prop.test and continuity correction
Hi, I am trying to sort out a discrepancy between power calculations results between me and another statistician. I use R but I am not sure what she uses. It is on the proportions test and so I have been using pwr.prop.test. I think I have tracked the problem down to pwr.prop.test not using the continuity correction for the test (I did this by using the java applet from http://stat.ethz.ch/R-manual/R-patched/library/stats/html/power.prop.test.html). So I was wondering whether: 1) Someone could confirm that pwr.prop.test does not use a continuity correction in its calculation. 2) Someone could tell me either how to use pwr.prop.test or another function to get the power of a prop.test with continuity correction. The reason I want this is that I would normally apply the correction when I actually used the test. Many thanks Dan -- ** Daniel Brewer, Ph.D. Institute of Cancer Research Molecular Carcinogenesis Email: daniel.bre...@icr.ac.uk ** The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the a...{{dropped:2}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OT: (quasi-?) separation in a logistic GLM
dear Gavin, I do not know whether such comment may be still useful.. Why are you unsure about quasi-separation? I think that it is quite evident in the plot plot(analogs ~ Dij, data = dat) Also it may be useful to see the plot of the monotone (profile) deviance (or the log-lik) for the coef of Dij, xval-seq(-20,0,l=50) ll-vector(length=50) for(i in 1:length(xval)){ mod - glm(analogs ~ offset(xval[i]*Dij), data = dat, family = binomial) ll[i]-mod$dev } plot(xval, ll) Hope this helps you, vito Gavin Simpson ha scritto: Dear List, Apologies for this off-topic post but it is R-related in the sense that I am trying to understand what R is telling me with the data to hand. ROC curves have recently been used to determine a dissimilarity threshold for identifying whether two samples are from the same type or not. Given the bashing that ROC curves get whenever anyone asks about them on this list (and having implemented the ROC methodology in my analogue package) I wanted to try directly modelling the probability that two sites are analogues for one another for given dissimilarity using glm(). The data I have then are a logical vector ('analogs') indicating whether the two sites come from the same vegetation and a vector of the dissimilarity between the two sites ('Dij'). These are in a csv file currently in my university web space. Each 'row' in this file corresponds to single comparison between 2 sites. When I analyse these data using glm() I get the familiar fitted probabilities numerically 0 or 1 occurred warning. The data do not look linearly separable when plotted (code for which is below). I have read Venables and Ripley's discussion of this in MASS4 and other sources that discuss this warning and R (Faraway's Extending the Linear Model with R and John Fox's new Applied Regression, Generalized Linear Models, and Related Methods, 2nd Ed) as well as some of the literature on Firth's bias reduction method. But I am still somewhat unsure what (quasi-)separation is and if this is the reason for the warnings in this case. My question then is, is this a separation issue with my data, or is it quasi-separation that I have read a bit about whilst researching this problem? Or is this something completely different? Code to reproduce my problem with the actual data is given below. I'd appreciate any comments or thoughts on this. Begin code snippet ## note data file is ~93Kb in size dat - read.csv(url(http://www.homepages.ucl.ac.uk/~ucfagls/dat.csv;)) head(dat) ## fit model --- produces warning mod - glm(analogs ~ Dij, data = dat, family = binomial) ## plot the data plot(analogs ~ Dij, data = dat) fit.mod - fitted(mod) ord - with(dat, order(Dij)) with(dat, lines(Dij[ord], fit.mod[ord], col = red, lwd = 2)) End code snippet ## Thanks in advance Gavin -- Vito M.R. Muggeo Dip.to Sc Statist e Matem `Vianelli' Università di Palermo viale delle Scienze, edificio 13 90128 Palermo - ITALY tel: 091 6626240 fax: 091 485726/485612 http://dssm.unipa.it/vmuggeo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Check if data frame column is numeric
from ?apply: If 'X' is not an array but has a dimension attribute, 'apply' attempts to coerce it to an array via as.matrix' if it is two-dimensional (e.g., data frames) or via 'as.array'. if any of the columns in your dataframe is not numeric, apply will try to coerce all of them to the least common supertype, and you'll get FALSE for each column; this is not the case with sapply. d1 = data.frame(x=numeric(10), y=numeric(10)) d2 = data.frame(d1, z=character(10)) apply(d1, 2, is.numeric) # TRUE TRUE apply(d1, 2, function(x) is.numeric(x)) # same as above, redundant code sapply(d1, is.numeric) # TRUE TRUE apply(d2, 2, is.numeric) # FALSE FALSE FALSE sapply(d2, is.numeric) # TRUE TRUE FALSE vQ Mark Heckmann wrote: Hi R-users, I want to apply a function to each column of a data frame that is numeric. Thus I tried to check it for each column first: apply(df, 2, function(x) is.numeric(x)) A60 A64 A66a A67 A71 A75a A80 A85 A91 A95 A96 A97 A98 A99 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE I get only FALSE results although the variables are numeric. When I try the following it works: is.numeric(df$A60) [1] TRUE What am I doing wrong? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Change in Lattice bwplot?
Dear list, Sorry for asking this question, but has something changed in the syntax for bwplot in Lattice? In an old publication, I used bwplot( VOTMS ~gender |type * group, data=merge(vot,words,by=ord), nint=30, horizontal=F, layout=c(3,3), box.ratio=0.8) which produced a lovelly 3x3 lattice plot with one box/gender in each panel. Now, I try bwplot( SyllableNucleusDiff ~ SourceLanguage,data=Hstar,horizontal=F) to get just a simple 1x1 panel plot with groups (which I will then of course make into a panel plot by adding | factor1 +factor2...), but I get a Syntax error. So, has anything changed, or am I just doing something very silly? /Fredrik -- Life is like a trumpet - if you don't put anything into it, you don't get anything out of it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sliding window over a large vector
On Tue, Dec 16, 2008 at 8:23 AM, Gabor Grothendieck ggrothendi...@gmail.com wrote: There seems to be something wrong: slide(c(1, 1, 0, 1), 2) [1] 2 2 but the output should be c(2, 1, 2) That should be c(2, 1, 1) At any rate try this: library(zoo) 3 * rollmean(x, 3) On Mon, Dec 15, 2008 at 11:19 PM, Chris Oldmeadow c.oldmea...@student.qut.edu.au wrote: Hi all, I have a very large binary vector, I wish to calculate the number of 1's over sliding windows. this is my very slow function slide-function(seq,window){ n-length(seq)-window tot-c() tot[1]-sum(seq[1:window])for (i in 2:n) { tot[i]- tot[i-1]-seq[i-1]+seq[i] } return(tot) } this works well for for reasonably sized vectors. Does anybody know a way for large vectors ( length=12 million), im trying to avoid using C. Thanks, Chris __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Check if data frame column is numeric
... and an addendum Hadley Wickham's plyR package attempts to redress these (nevertheless documented) apparent inconsistencies in the *apply family of functions by handling everything in a more consistent intuitive manner. You may wish to use those instead of the base R *apply functions. -- Bert Gunter Genentech -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Henrique Dallazuanna Sent: Tuesday, December 16, 2008 9:32 AM To: Mark Heckmann Cc: r-help@r-project.org Subject: Re: [R] Check if data frame column is numeric Try: sapply(df, is.numeric) On Tue, Dec 16, 2008 at 1:25 PM, Mark Heckmann mark.heckm...@gmx.de wrote: Hi R-users, I want to apply a function to each column of a data frame that is numeric. Thus I tried to check it for each column first: apply(df, 2, function(x) is.numeric(x)) A60 A64 A66a A67 A71 A75a A80 A85 A91 A95 A96 A97 A98 A99 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE I get only FALSE results although the variables are numeric. When I try the following it works: is.numeric(df$A60) [1] TRUE What am I doing wrong? TIA Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Parana-Brasil 250 25' 40 S 490 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OT: (quasi-?) separation in a logistic GLM
Dear Gavin, glm reported exactly what it noticed, giving a warning that some very small fitted probabilities have been found. However, your data are **not** quasi-separated. The maximum likelihood estimates are really those reported by glm. A first elementary way is to change the tolerance and maximum number of iterations in glm and see if you get the same result: # mod1 Call: glm(formula = analogs ~ Dij, family = binomial, data = dat, control = glm.control(epsilon = 1e-16, maxit = 1000)) Coefficients: (Intercept) Dij 4.191 -29.388 Degrees of Freedom: 4033 Total (i.e. Null); 4032 Residual Null Deviance: 1929 Residual Deviance: 613.5AIC: 617.5 # This is exactly the same fit as the one you have. If separation occured the effects ususally diverge as we allow more iterations to glm and at some point. ** Secondly an inspection of the estimated asymptotic standard errors, reveals nothing to worry for. # summary(mod1) Call: glm(formula = analogs ~ Dij, family = binomial, data = dat, control = glm.control(epsilon = 1e-16, maxit = 1000)) Deviance Residuals: Min 1Q Median 3Q Max -1.676e+00 -1.319e-02 -1.250e-04 -1.958e-06 4.104e+00 Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) 4.1912 0.3248 12.90 2e-16 *** Dij -29.3875 1.9345 -15.19 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 1928.62 on 4033 degrees of freedom Residual deviance: 613.53 on 4032 degrees of freedom AIC: 617.53 Number of Fisher Scoring iterations: 11 # If separation occurred the estimated asymptotic standard errors would be unnaturally large. This is because, in the case of separation (quasi or not) glm would calculate the standard errors taking the sqrt of the diagonal elements of minus the hessian of the log-likelihood, in a point where the log-likelihood appears to be flat for the given tolerance. ** To be certain, you could also try fitting with brglm, which is guaranteed to give finite estimates, that have bias of smaller order than the MLE and compare the results. # library(brglm) mod.br - brglm(analogs ~ Dij, data = dat, family = binomial) mod.br Call: brglm(formula = analogs ~ Dij, family = binomial, data = dat) Coefficients: (Intercept) Dij 4.161 -29.188 Degrees of Freedom: 4033 Total (i.e. Null); 4032 Residual Deviance: 613.5448 Penalized Deviance: 610.2794AIC: 617.5448 # The estimates are similar a bit shrunk towards the origin which is natural for bias removal. If separation occurred, and given the previous discussion, the bias-reduced estimates would be considerably different than the estimates that glm reports. ** Lastly, the more certain way to check for separation is to inspect the profiles of the log-likelihood. Vito suggested this but the chosen limits for the xval are not appropriate. If separation would occur the estimate would be -Inf so that the profiling as done in his email should be done starting from example from -40 rather than -20. This would reveal that the profile deviance starts increasing again, while if separation occured there would be an asymptote on the left. Below I give the correct profiles, as reported by profileModel. library(profileModel) pp - profileModel(mod1, quantile = qchisq(0.95, 1), objective = ordinaryDeviance) Preliminary iteration .. Done Profiling for parameter (Intercept) ... Done Profiling for parameter Dij ... Done plot(pp) The profiles are quite quadratic. In the case of separation you would have seen asymptotes on the left or on the right (see help(profileModel) for an example). ** It appears that the fitted logistic curve, while steep still has a finite gradient, for example, at the LD50 point library(MASS) dose.p(mod) Dose SE p = 0.5: 0.1426167 0.003646903 When separation occurs the LD50 point cannot be identified (computer software would return something with enormous estimated standard error). In conclusion, if you get data sets that result in large estimated effects on the log-odds scale, the above checks can be used to convince you whether separation occurred or not. If there is separation (not the case in the current example) then, you could use an alternative to maximum likelihood for estimation ---such as penalized maximum likelihood in brglm--- which always return finite estimates. Though in that case, I suggest you incorporate the uncertainty on how large the estimated effects are in having confidence intervals with one infinite endpoint, for example confidence intervals as in help(profile.brglm). Hope this helps, Best wishes, Ioannis On 15 Dec 2008, at 18:03, Gavin Simpson wrote: Dear List,
Re: [R] Find all numbers in a certain interval
Antje wrote: Hi all, I'd like to know, if I can solve this with a shorter command: a - rnorm(100) which(a -0.5 a 0.5) # would give me all indices of numbers greater than -0.5 and smaller than +0.5 I have something similar with a dataframe and it produces sometimes quite long commands... I'd like to have something like: which(within.interval(a, -0.5, 0.5)) Is there anything I could use for this purpose? Not in general, but in this particular case abs(a) 0.5 gives you the right result. By the way, some advice I read many years ago (in Kernighan and Plauger): always use or =, avoid or = in multiple comparisons. It's easier to read -0.5 a a 0.5 than it is to read the form you used, because it is so much like the math notation -0.5 a 0.5. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R2winbugs : vectorization
I remember having similar problem with inprod function. As far as I could remember a sole deference in my models was that I used inprod instead of explicit sum (exactly as you did). In my case the inprod version was faster but result were completely aberrant. So I abandoned the inprod as unreliable. I did use OpenBugs (it's newer version of WinBugs) and BRugs interface from R. On Mon, 15 Dec 2008 18:23:39 +0100, Philip A. Viton vito...@osu.edu wrote: I'm new to bugs, so please bear with me. Can someone tell me if the following two models are doing the same thing? The reason I ask is that with the same data, the first (based on 4 separate coeffs a1--a4) takes about 50 secs, while the second (based on a vectorized form, a[]) takes about 300. The means are about the same, though R-hat's in the second version are quite a bit better. (Also, and completely unrelated: is there any way to get more than 2 decimal places in the display of the means?) Thanks!! Here are the two models: (these are censored regressions, the first is essentially a copy of code in Gelman+Hill): = model 1 : 4 separate a's model{ for (i in 1:n){ z.lo[i]- C * equals(y[i],C) z[i]~dnorm(z.hat[i],tau.y)I(z.lo[i],) z.hat[i]-a1*x[i,1]+a2*x[i,2]+a3*x[i,3]+a4*x[i,4] } a1~dunif(0,100) a2~dunif(0,100) a3~dunif(0,100) a4~dunif(0,100) tau.y-pow(sigma.y,-2) sigma.y~dunif(0,100) } == model 2 : vector of a's model{ for (i in 1:n){ z.lo[i]- C * equals(y[i],C) z[i]~dnorm(z.hat[i],tau.y)I(z.lo[i],) z.hat[i]-inprod(a[],x[i,]) } for (j in 1:k){ a[j]~dunif(0,100) } tau.y-pow(sigma.y,-2) sigma.y~dunif(0,100) } and here, for reference, is the R calling code: x-as.matrix(iv) y-dv C-cens z-ifelse(y==C,NA,y) n-length(z) data1-list(x=x,y=y,z=z,n=n,C=C) inits1-function(){ list(a1=runif(1),a2=runif(1),a3=runif(1),a4=runif(1),sigma.y=runif(1))} params1-c(a1,a2,a3,a4,sigma.y) ## now the bugs call for model 1 proc.time() aasho.1-bugs(data1,inits1,params1,aasho1.bug,n.iter=1,debug=FALSE) proc.time() print(aasho.1,digits=4) now we try a vector approach k-4 # niv data2-list(x=x,y=y,z=z,n=n,C=C,k=k) inits2-function(){ list(a=runif(k),sigma.y=runif(1))} params2-c(a,sigma.y) ## now the bugs call for model 2 proc.time() aasho.2-bugs(data2,inits2,params2,aasho2.bug,n.iter=1,debug=FALSE) proc.time() print(aasho.2,digits=6) Philip A. Viton City Planning, Ohio State University 275 West Woodruff Avenue, Columbus OH 43210 vito...@osu.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem assigning NA as a level name in a list
Cliff Behrens wrote: Peter, I've inserted response inline below: Cliff Peter Dalgaard wrote: Cliff Behrens wrote: Peter, OK...here is reproducible, self-contained code: library(gregmisc) Relying on a 3rd party package is not kosher either... Whatever did list(NA=2) or l - list(2); names(l) - NA do to you? I'm not sure what you mean by 3rd party? I downloaded this package from the CRAN site where I get all others. I don't understand your question. 3rd party means that you didn't write it and neither did I/we. You are requesting people to help you, yet expecting that they go out of their way to install a package first. (As it happens, I really don't have gregmisc on this machine.) You could easily have created an example of a list with NA as a name, but that would of course have been work for you rather than for people on the list. columnNames - c(A,B,C,D,N,a,b,c) namePerms- permutations(length(columnNames),2,columnNames,repeats=TRUE) nameList - paste(namePerms[,1],namePerms[,2],sep=) dataList - lapply(1:length(nameList), function(level) {}) names(dataList)- nameList ## The NA is interpreted that the name is missing for one list in dataList If you inspect the contents of dataList, you will find the following showing that the name NA is treated differently: Anyways As I thought: Remember that NA is a reserved word. You get the same kind of reaction if you name an element for or in. It denotes that you need to quote the name for indexing with $: I thought that since all of the names in namesList were type char, there was no need to enclose these in quotation marks. That's not the point. It works fine, it is just that the output is showing you how to access the element afterwards. names(l) - NA l$NA Error: unexpected numeric constant in l$NA l$`NA` [1] 2 l$NA [1] 2 l[[NA]] [1] 2 names(l) [1] NA .. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sorting a date vector
On Tue, 16 Dec 2008, RON70 wrote: I have a date-like-vector like : date_file 10-02-2008 10-03-2008 10-06-2008 10-07-2008 10-09-2008 10-10-2008 10-13-2008 10-14-2008 10-15-2008 10-16-2008 10-17-2008 10-20-2008 10-21-2008 10-22-2008 10-23-2008 10-24-2008 10-28-2008 10-29-2008 10-30-2008 10-31-2008 11-03-2008 11-04-2008 11-05-2008 11-06-2008 11-07-2008 11-10-2008 11-11-2008 11-12-2008 11-13-2008 11-14-2008 11-17-2008 11-18-2008 11-19-2008 11-20-2008 11-21-2008 11-24-2008 11-25-2008 11-26-2008 11-28-2008 12-01-2008 12-02-2008 12-03-2008 12-04-2008 12-05-2008 12-08-2008 12-09-2008 12-10-2008 12-11-2008 12-12-2008 12-15-2008 4-18-2008 4-21-2008 4-22-2008 4-23-2008 4-24-2008 4-28-2008 4-29-2008 5-01-2008 5-05-2008 5-06-2008 5-07-2008 5-09-2008 5-12-2008 5-13-2008 5-14-2008 5-15-2008 5-16-2008 5-19-2008 5-20-2008 5-21-2008 5-22-2008 5-23-2008 5-27-2008 5-28-2008 5-29-2008 5-30-2008 6-02-2008 6-03-2008 6-05-2008 6-06-2008 6-09-2008 6-10-2008 6-11-2008 6-12-2008 6-13-2008 6-17-2008 6-18-2008 6-19-2008 6-20-2008 6-23-2008 6-24-2008 6-25-2008 6-26-2008 6-27-2008 7-01-2008 7-02-2008 7-04-2008 7-07-2008 7-08-2008 7-09-2008 7-10-2008 7-11-2008 7-15-2008 7-16-2008 7-18-2008 7-21-2008 7-22-2008 7-23-2008 7-24-2008 7-25-2008 7-28-2008 7-30-2008 7-31-2008 8-01-2008 8-04-2008 8-05-2008 8-06-2008 8-07-2008 8-08-2008 8-11-2008 8-12-2008 8-13-2008 8-15-2008 8-18-2008 8-19-2008 8-20-2008 8-21-2008 8-22-2008 8-25-2008 8-26-2008 8-27-2008 8-28-2008 8-29-2008 9-03-2008 9-04-2008 9-05-2008 9-08-2008 9-09-2008 9-10-2008 9-11-2008 9-12-2008 9-15-2008 9-16-2008 9-17-2008 9-18-2008 9-19-2008 9-22-2008 9-23-2008 9-24-2008 9-25-2008 9-26-2008 9-29-2008 9-30-2008 I wanted to sort this in ascending order. I tried using simply sort() function, without altering the format of date, but it didnot work. Next I tried to convert that vector in a date-class vector so that, I could sort them but in vein :( I used : as.Date(date_file, format=%m/%d/%y) However it did not work. Your separator is '-' not '/', and you have 4-figure dates. Looks like sort(as.Date(date_file, format=%m-%d-%Y)) is what you intended. PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Please do, and remember to be more helpful than 'it did not work'! -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem assigning NA as a level name in a list
Sorry...I didn't realize that there were such distinct lines drawn around core vs contributed packages. I merely thought that r-help put those with questions in touch with others who might have used or authored a package and experienced the same problem. I didn't intend to make more work for you or anyone else on this list. In fact, I was merely trying to be thorough and exact, including a note with the version of R and the OS I am running. I have no idea what packages others have installed in their R environments. For future reference, am I to assume that no contributed packages should be implicated in resolving a problem? Peter Dalgaard wrote: Cliff Behrens wrote: Peter, I've inserted response inline below: Cliff Peter Dalgaard wrote: Cliff Behrens wrote: Peter, OK...here is reproducible, self-contained code: library(gregmisc) Relying on a 3rd party package is not kosher either... Whatever did list(NA=2) or l - list(2); names(l) - NA do to you? I'm not sure what you mean by 3rd party? I downloaded this package from the CRAN site where I get all others. I don't understand your question. 3rd party means that you didn't write it and neither did I/we. You are requesting people to help you, yet expecting that they go out of their way to install a package first. (As it happens, I really don't have gregmisc on this machine.) You could easily have created an example of a list with NA as a name, but that would of course have been work for you rather than for people on the list. columnNames - c(A,B,C,D,N,a,b,c) namePerms- permutations(length(columnNames),2,columnNames,repeats=TRUE) nameList - paste(namePerms[,1],namePerms[,2],sep=) dataList - lapply(1:length(nameList), function(level) {}) names(dataList)- nameList ## The NA is interpreted that the name is missing for one list in dataList If you inspect the contents of dataList, you will find the following showing that the name NA is treated differently: Anyways As I thought: Remember that NA is a reserved word. You get the same kind of reaction if you name an element for or in. It denotes that you need to quote the name for indexing with $: I thought that since all of the names in namesList were type char, there was no need to enclose these in quotation marks. That's not the point. It works fine, it is just that the output is showing you how to access the element afterwards. names(l) - NA l$NA Error: unexpected numeric constant in l$NA l$`NA` [1] 2 l$NA [1] 2 l[[NA]] [1] 2 names(l) [1] NA .. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help!
Laura, Try using a different browser for your download. On MacOS X, Safari quite often does weird stuff to files I want to download, frequently damaging the files. Downloading the same file from the same site using FireFox usually works fine. Hope this helps, Kathi -- DropNet AG - Das Unternehmen fuer Ihren Internet-Auftritt! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] dip test p-values for large sample sizes
Hi list, I'm calculating dip statistics using the diptest package for large sample sizes. For everything under 5000 samples I can use the table qDiptab but over 5000 I have no reference. Is there any way to extend the table of hartigan's paper to larger sample sizes. Other solutions are also welcome. Kind regards, Koen [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Find all numbers in a certain interval
It's not entirely clear what you are asking for, since which(within.interval(a, -0.5, 0.5)) is actually longer than which(a -0.5 a 0.5). You mention that you want a solution that applies to dataframes. Using indexing you can get entire rows of dataframes that satisfy multiple conditions on one of its columns: DF - data.frame(a = rnorm(20), b= LETTERS[1:20], c = letters[20:1], stringsAsFactors=FALSE) DF[which( DF$a -0.5 DF$a 0.5 ), ] # note that one needs to avoid DF[which(a -0.5 a0.5) , ] # the a vector is not the same as the a column vector within DF a b c 3 -0.47310672 C r 6 -0.49784460 F o 9 0.02571058 I l 10 0.16893759 J k 11 -0.11963322 K j 12 0.39378887 L i 16 0.03712263 P e Could get the indices that satisfy more than one condition: which(DF$a 0.5 DF$b K) [1] 1 2 6 10 Or you can get rows of DF that satisfy conditions on multiple columns with the subset function: subset(DF, a 0.5 b K) a b c 1 2.2500997 A t 2 0.7251357 B s 6 0.7845355 F o 10 1.0685649 J k Or if you wanted a within.interval function within.interval - function(x,a,b) { x a x b} which(within.interval(DF$a, -0.5, 0.5)) [1] 3 4 7 8 9 13 14 17 20 -- David Winsemius Heritage Labs On Dec 16, 2008, at 5:09 AM, Antje wrote: Hi all, I'd like to know, if I can solve this with a shorter command: a - rnorm(100) which(a -0.5 a 0.5) # would give me all indices of numbers greater than -0.5 and smaller than +0.5 I have something similar with a dataframe and it produces sometimes quite long commands... I'd like to have something like: which(within.interval(a, -0.5, 0.5)) Is there anything I could use for this purpose? Antje __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem assigning NA as a level name in a list
Cliff Behrens-3 wrote: For future reference, am I to assume that no contributed packages should be implicated in resolving a problem? It does bring things one step closer to minimal, reproducible. If you can identify the problem as specifically involving the package, then it's still OK to query the general R list, but it's generally a good idea to Cc: the package maintainer as well. Ben Bolker -- View this message in context: http://www.nabble.com/Problem-assigning-%22NA%22-as-a-level-name-in-a-list-tp21036232p21039112.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem assigning NA as a level name in a list
Very good...thanks! As you can tell, I really haven't made much (READ any) previous use of this list. Cliff Ben Bolker wrote: Cliff Behrens-3 wrote: For future reference, am I to assume that no contributed packages should be implicated in resolving a problem? It does bring things one step closer to minimal, reproducible. If you can identify the problem as specifically involving the package, then it's still OK to query the general R list, but it's generally a good idea to Cc: the package maintainer as well. Ben Bolker __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sliding window over a large vector
There seems to be something wrong: slide(c(1, 1, 0, 1), 2) [1] 2 2 but the output should be c(2, 1, 2) At any rate try this: library(zoo) 3 * rollmean(x, 3) On Mon, Dec 15, 2008 at 11:19 PM, Chris Oldmeadow c.oldmea...@student.qut.edu.au wrote: Hi all, I have a very large binary vector, I wish to calculate the number of 1's over sliding windows. this is my very slow function slide-function(seq,window){ n-length(seq)-window tot-c() tot[1]-sum(seq[1:window])for (i in 2:n) { tot[i]- tot[i-1]-seq[i-1]+seq[i] } return(tot) } this works well for for reasonably sized vectors. Does anybody know a way for large vectors ( length=12 million), im trying to avoid using C. Thanks, Chris __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sliding window over a large vector
Hi, I just wrote a function quicker than slide() function with the same output, but I don't know what to do with this function! sl - function(x,z) c(0,cumsum(diff(x)[1:(length(x)-z-1)])) + rep(sum(x[1:z]),length(x)-z) sl(c(0,0,1,1,0,1,1,1,1,0,0,0,1,0,1,0,1,1,0,1,1,0,1,0),3) [1] 1 1 2 2 1 2 2 2 2 1 1 1 2 1 2 1 2 2 1 2 2 slide-function(seq,window){ +n-length(seq)-window +tot-c() +tot[1]-sum(seq[1:window]) +for (i in 2:n) { + tot[i]- tot[i-1]-seq[i-1]+seq[i] +} +return(tot) + } sl(c(0,0,1,1,0,1,1,1,1,0,0,0,1,0,1,0,1,1,0,1,1,0,1,0),3) [1] 1 1 2 2 1 2 2 2 2 1 1 1 2 1 2 1 2 2 1 2 2 slide(c(0,0,1,1,0,1,1,1,1,0,0,0,1,0,1,0,1,1,0,1,1,0,1,0),3) [1] 1 1 2 2 1 2 2 2 2 1 1 1 2 1 2 1 2 2 1 2 2 sl - function(x,z) c(0,cumsum(diff(x)[1:(length(x)-z-1)])) + rep(sum(x[1:z]),length(x)-z) x - rbinom(10, 1, 0.5) system.time(xx1 - slide(x,12)) utilisateur système écoulé 36.860.45 37.32 system.time(xx2 - sl(x,12)) utilisateur système écoulé 0.010.000.02 all.equal(xx1,xx2) [1] TRUE Jacques VESLOT CEMAGREF - UR Hydrobiologie Route de Cézanne - CS 40061 13182 AIX-EN-PROVENCE Cedex 5, France Tél. + 0033 04 42 66 99 76 fax+ 0033 04 42 66 99 34 email jacques.ves...@cemagref.fr -Message d'origine- De : markle...@verizon.net [mailto:markle...@verizon.net] Envoyé : mardi 16 décembre 2008 10:25 À : Veslot Jacques Cc : Chris Oldmeadow; r-help@r-project.org Objet : Re: [R] sliding window over a large vector Hi: Veslot: I'm too tired to even try to figure out why but I think that there is something wrong with your sl function. see below for an empirical proof of that statement. OR maybe you're definition of sliding window is different than rollapply's definition but rollapply's answer makes more sense to me ? Output set.seed(1) x - rbinom(24, 1, 0.5) print(x) [1] 0 0 1 1 0 1 1 1 1 0 0 0 1 0 1 0 1 1 0 1 1 0 1 0 xx1 - sl(x,3) print(xx1) [1] 1 1 2 2 1 2 2 2 2 1 1 1 2 1 2 1 2 2 1 2 2 temp - zoo(x) ans-rollapply(temp,3,sum) print(ans) 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 1 2 2 2 2 3 3 2 1 0 1 1 2 1 2 2 2 2 2 2 2 1 On Tue, Dec 16, 2008 at 3:47 AM, Veslot Jacques wrote: sl - function(x,z) c(0,cumsum(diff(x)[1:(length(x)-z-1)])) + rep(sum(x[1:z]),length(x)-z) x - rbinom(10, 1, 0.5) system.time(xx1 - slide(x,12)) utilisateur système écoulé 36.860.45 37.32 system.time(xx2 - sl(x,12)) utilisateur système écoulé0.010.00 0.02 all.equal(xx1,xx2) [1] TRUE Jacques VESLOT CEMAGREF - UR Hydrobiologie Route de Cézanne - CS 40061 13182 AIX-EN-PROVENCE Cedex 5, France Tél. + 0033 04 42 66 99 76 fax+ 0033 04 42 66 99 34 email jacques.ves...@cemagref.fr -Message d'origine- De : r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] De la part de Chris Oldmeadow Envoyé : mardi 16 décembre 2008 05:20 À : r-help@r-project.org Objet : [R] sliding window over a large vector Hi all, I have a very large binary vector, I wish to calculate the number of 1's over sliding windows. this is my very slow function slide-function(seq,window){ n-length(seq)-window tot-c() tot[1]-sum(seq[1:window]) for (i in 2:n) { tot[i]- tot[i-1]-seq[i-1]+seq[i] } return(tot) } this works well for for reasonably sized vectors. Does anybody know a way for large vectors ( length=12 million), im trying to avoid using C. Thanks, Chris __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with alignDailySwries in R-metrics
Hi Folks! I seem to be having a problem with alignDailySeries in Rmetrics: DTB6-fredSeries(DTB6,frequency = daily,from = 1980-01-01) trying URL ' http://research.stlouisfed.org/fred2/series/DTB6/downloaddata/DTB6.txt' Content type 'text/plain; charset=UTF-8' length 248392 bytes (242 Kb) opened URL downloaded 242 Kb Read 13060 items class(DTB6) [1] timeSeries attr(,package) [1] fSeries DTB6-alignDailySeries(DTB6, method = interp,include.weekends = FALSE, units = NULL) Error in getDataPart(S4 object of class timeSeries) : no '.Data' slot defined for class timeSeries What's causing this error? --John sessionInfo() R version 2.8.0 (2008-10-20) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] strucchange_1.3-5 sandwich_2.1-0 quantmod_0.3-7 Defaults_1.1-1 xts_0.0-16 FinTS_0.3-6 [7] fracdiff_1.3-1 fTrading_270.74fGarch_280.75 fMultivar_270.74 fBasics_280.74 sn_0.4-8 [13] mnormt_1.3-1 fSeries_270.76.1 fCalendar_270.78.1 fEcofin_270.75 fUtilities_270.75 MASS_7.2-44 [19] robustbase_0.4-3 dyn_0.2-6 zoo_1.5-4 fImport_270.74 timeSeries_290.79 timeDate_290.81 loaded via a namespace (and not attached): [1] grid_2.8.0 lattice_0.17-15 tools_2.8.0 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Dotted lines at the end of the KM-curve
R-ers! Referees demand that the line in the KM-curve should be changed to dotted at the point where standarerror is = 10 %. I don't think it's a good habit but I urgently need to implement such a thing in R with survfit, survplot or another program. They also want numbers at risk below the curve Some help, please Fredrik Fredrik Lundgren fredrik.bg.lundg...@gmail.com Engelbrektsgatan 31 582 21 Linköping tel 013 - 47 30 117 mob 0706 - 86 39 29 Sommarhus: Ljungnäs 158 380 30 Rockneby 0480 - 650 98 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2 and lattice
Am Dienstag 16 Dezember 2008 17:13:33 schrieb Wayne F: stephen sefick wrote: yes a parallel coordinates plot- I understand that it is for multivariate data, but I am having a hard time figuring out what it is telling me. Thanks for your help. In the lattice book, the author mentions that static parallel plots aren't very useful, in general. While for some data they are just natural: e.g. when spectra are treated as multidimensional data. Then the parallel coordinate plot just gives you the spectrum. Of course, in this situation it is maybe the treatment as high-dimensional data that is somewhat weird for spectra. However, this offers a way, that might help understanding what's going on. I have a data set of p dimensions. E.g. spectra measured with p channels. Now, we can either think of such a spectrum as a point in p-d. E.g. a spectrum consisting of red, green, blue intensity is at a certain point in rgb-space. On the other hand, here the p dimensions have something to do with each other (e.g. an intrinsic order, let's say, by the wavelength). So it does make sense to plot the intensity over the p dimensions. That's the parallel coordinate plot. What you can tell from such a plot, depends very much on your data, and how you treated it. Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 (0 40) 5 58-34 47 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] model.tables error from aov
Hi, I'm a new R user, coming from SPSS, and without a particularly strong stats background. I've got a data set that I'd like to do a mixed-design ANOVA with. No missing values. Here's the summary: summary(learnDat.ae) Type Subjectidio struct TrainErrscond 0:20 11 : 3 idio :28 ae :58 Min. : 0.00 idioae :28 2:19 12 : 3 nonidio:30 fact: 0 1st Qu.: 6.25 idiofact : 0 3:19 14 : 3 Median :11.50 nonidioae:30 15 : 3 Mean :13.40 18 : 3 3rd Qu.:16.00 2 : 3 Max. :59.00 (Other):40 Note that the TrainErrs column is the only numeric column, and I forced everything else to be a factor. (Is that correct?) I then do the following: aov.errs.ae - aov(TrainErrs ~ (idio*Type) + Error(Subject/Type) + (idio), learnDat.ae) So, idio is between-subjects and Type is within-subjects. This is based on examples I've found elsewhere. summary(aov.errs.ae) This seems to work fine: Error: Subject Df Sum Sq Mean Sq F value Pr(F) idio 1179 1790.89 0.36 Type 1210 2101.05 0.32 Residuals 17 3401 200 Error: Subject:Type Df Sum Sq Mean Sq F value Pr(F) Type 2515 2582.44 0.103 idio:Type 2680 3403.22 0.053 . Residuals 34 3595 106 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Now the problem: model.tables(aov.errs.ae,means) Error in outer(rownames(efficiency), colnames(efficiency), paste)[eff.used] : invalid subscript type 'list' In addition: Warning message: In any(efficiency) : coercing argument of type 'double' to logical All the examples and manuals I've found said this should work. When I did a fully between-subjects ANOVA on another data set, I had no problem with model.tables. I have no idea what to make of this error message. I've tried a number of variations on things, with no improvement. This is R version 2.7.2 (2008-08-25), running on Redhat, x86_64. Suggestions? Thanks! -Harlan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] model.tables error from aov
Your design seems to be unbalanced: multistatum aov is intended for balanced designs. My guess is that one idio subject has two Type=1 observations: in which case try removing one of them. On Tue, 16 Dec 2008, Harlan Harris wrote: Hi, I'm a new R user, coming from SPSS, and without a particularly strong stats background. I've got a data set that I'd like to do a mixed-design ANOVA with. No missing values. Here's the summary: summary(learnDat.ae) Type Subjectidio struct TrainErrscond 0:20 11 : 3 idio :28 ae :58 Min. : 0.00 idioae :28 2:19 12 : 3 nonidio:30 fact: 0 1st Qu.: 6.25 idiofact : 0 3:19 14 : 3 Median :11.50 nonidioae:30 15 : 3 Mean :13.40 18 : 3 3rd Qu.:16.00 2 : 3 Max. :59.00 (Other):40 Note that the TrainErrs column is the only numeric column, and I forced everything else to be a factor. (Is that correct?) I then do the following: aov.errs.ae - aov(TrainErrs ~ (idio*Type) + Error(Subject/Type) + (idio), learnDat.ae) So, idio is between-subjects and Type is within-subjects. This is based on examples I've found elsewhere. summary(aov.errs.ae) This seems to work fine: Error: Subject Df Sum Sq Mean Sq F value Pr(F) idio 1179 1790.89 0.36 Type 1210 2101.05 0.32 Residuals 17 3401 200 Error: Subject:Type Df Sum Sq Mean Sq F value Pr(F) Type 2515 2582.44 0.103 idio:Type 2680 3403.22 0.053 . Residuals 34 3595 106 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Now the problem: model.tables(aov.errs.ae,means) Error in outer(rownames(efficiency), colnames(efficiency), paste)[eff.used] : invalid subscript type 'list' In addition: Warning message: In any(efficiency) : coercing argument of type 'double' to logical All the examples and manuals I've found said this should work. When I did a fully between-subjects ANOVA on another data set, I had no problem with model.tables. I have no idea what to make of this error message. I've tried a number of variations on things, with no improvement. This is R version 2.7.2 (2008-08-25), running on Redhat, x86_64. Suggestions? Thanks! -Harlan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] odfWeave learning resources
In general I try not to post questions to forums until I've tried my best to read about them in the available documentation. I recently undertook a project that used odfWeave and have been very pleased with the package. But, the R help documentation suggests that there are more sophisticated things I can do - for example, with conditionally formatted tables. Can anyone point me to resources I could review to educate myself about the full capabilities of this lovely package? Thanks! -- View this message in context: http://www.nabble.com/odfWeave-learning-resources-tp21041939p21041939.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] model.tables error from aov
Ah, that was it. I had a bad row in there that I had forgotten to remove. Thank you very much for the prompt (and correct!) response. -Harlan On Tue, Dec 16, 2008 at 3:58 PM, Prof Brian Ripley rip...@stats.ox.ac.ukwrote: Your design seems to be unbalanced: multistatum aov is intended for balanced designs. My guess is that one idio subject has two Type=1 observations: in which case try removing one of them. On Tue, 16 Dec 2008, Harlan Harris wrote: Hi, I'm a new R user, coming from SPSS, and without a particularly strong stats background. I've got a data set that I'd like to do a mixed-design ANOVA with. No missing values. Here's the summary: summary(learnDat.ae) Type Subjectidio struct TrainErrscond 0:20 11 : 3 idio :28 ae :58 Min. : 0.00 idioae :28 2:19 12 : 3 nonidio:30 fact: 0 1st Qu.: 6.25 idiofact : 0 3:19 14 : 3 Median :11.50 nonidioae:30 15 : 3 Mean :13.40 18 : 3 3rd Qu.:16.00 2 : 3 Max. :59.00 (Other):40 Note that the TrainErrs column is the only numeric column, and I forced everything else to be a factor. (Is that correct?) I then do the following: aov.errs.ae - aov(TrainErrs ~ (idio*Type) + Error(Subject/Type) + (idio), learnDat.ae) So, idio is between-subjects and Type is within-subjects. This is based on examples I've found elsewhere. summary(aov.errs.ae) This seems to work fine: Error: Subject Df Sum Sq Mean Sq F value Pr(F) idio 1179 1790.89 0.36 Type 1210 2101.05 0.32 Residuals 17 3401 200 Error: Subject:Type Df Sum Sq Mean Sq F value Pr(F) Type 2515 2582.44 0.103 idio:Type 2680 3403.22 0.053 . Residuals 34 3595 106 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Now the problem: model.tables(aov.errs.ae,means) Error in outer(rownames(efficiency), colnames(efficiency), paste)[eff.used] : invalid subscript type 'list' In addition: Warning message: In any(efficiency) : coercing argument of type 'double' to logical All the examples and manuals I've found said this should work. When I did a fully between-subjects ANOVA on another data set, I had no problem with model.tables. I have no idea what to make of this error message. I've tried a number of variations on things, with no improvement. This is R version 2.7.2 (2008-08-25), running on Redhat, x86_64. Suggestions? Thanks! -Harlan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/http://www.stats.ox.ac.uk/%7Eripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] pwr.prop.test and continuity correction
On Tue, 16 Dec 2008, Peter Dalgaard wrote: power.prop.test (sic) is relying heavily on asymptotic normality, as do similar formulas. It doesn't use continuity correction, but if you're working with such small group sizes, I suspect that the correction term is the least of your worries and that direct simulation would be better. In fact, for tests in 2x2 tables, it is fairly straightforward and fast to compute the entire sampling distribution explicitly, over a grid of parameter values. This gives the exact power (under alternatives) and the exact Type I error (under null). You can also compare different tests and see how much the continuity correction moves the actual Type I error rate away from the nominal rate. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using a covariance matrix as input to relaimpo package
By trial and error I have discovered that it works if I don't use the formula interface in combination with a covariance matrix as input. If the covariance matrix has the dependent variable as its left-most variable as the relaimpo documentation suggests, then the relaimpo package will run by simply naming the covariance matrix as the first object in the call and not using a formula. The downside of this is needing to create different covariance matrices for different models. The following will work: # calculate covariance matrix from survey respondent data using pairwise deletion covmatrx = cov(respdata[,c(V0007,V0029,V0031,V0032,V0034,V0035,V0036)], use = pairwise) # try the lmg method of relative importance imps1 = calc.relimp(covmatrx, type=lmg, rela=TRUE) -- View this message in context: http://www.nabble.com/Using-a-covariance-matrix-as-input-to-relaimpo-package-tp21022295p21041633.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dotted lines at the end of the KM-curve
Fredrik Lundgren wrote: R-ers! Referees demand that the line in the KM-curve should be changed to dotted at the point where standarerror is = 10 %. I don't think it's a good habit but I urgently need to implement such a thing in R with survfit, survplot or another program. They also want numbers at risk below the curve Some help, please Fredrik Numbers at risk can be done with library(Design) f - cph(Surv( ) ~ ..., surv=TRUE) survplot(f, n.risk=TRUE, ...) Frank Fredrik Lundgren fredrik.bg.lundg...@gmail.com Engelbrektsgatan 31 582 21 Linköping tel013 - 47 30 117 mob 0706 - 86 39 29 Sommarhus: Ljungnäs 158 380 30 Rockneby 0480 - 650 98 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Find all numbers in a certain interval
Here are a couple of function definitions that may be more intuitive for some people (see the examples below the function defs). They are not perfect, but my tests showed they work left to right, right to left, outside in, but not inside out. `%%` - function(x,y) { xx - attr(x,'orig.y') yy - attr(y,'orig.x') if(is.null(xx)) { xx - x x - rep(TRUE, length(x)) } if(is.null(yy)) { yy - y y - rep(TRUE, length(y)) } out - x y (xx yy) attr(out, 'orig.x') - xx attr(out, 'orig.y') - yy out } `%=%` - function(x,y) { xx - attr(x,'orig.y') yy - attr(y,'orig.x') if(is.null(xx)) { xx - x x - rep(TRUE, length(x)) } if(is.null(yy)) { yy - y y - rep(TRUE, length(y)) } out - x y (xx = yy) attr(out, 'orig.x') - xx attr(out, 'orig.y') - yy out } x - -3:3 -2 %% x %% 2 c( -2 %% x %% 2 ) x[ -2 %% x %% 2 ] x[ -2 %=% x %=% 2 ] x - rnorm(100) y - rnorm(100) x[ -1 %% x %% 1 ] range( x[ -1 %% x %% 1 ] ) cbind(x,y)[ -1 %% x %% y %% 1, ] cbind(x,y)[ (-1 %% x) %% (y %% 1), ] cbind(x,y)[ ((-1 %% x) %% y) %% 1, ] cbind(x,y)[ -1 %% (x %% (y %% 1)), ] cbind(x,y)[ -1 %% (x %% y) %% 1, ] # oops Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Antje Sent: Tuesday, December 16, 2008 3:09 AM To: r-h...@stat.math.ethz.ch Subject: [R] Find all numbers in a certain interval Hi all, I'd like to know, if I can solve this with a shorter command: a - rnorm(100) which(a -0.5 a 0.5) # would give me all indices of numbers greater than -0.5 and smaller than +0.5 I have something similar with a dataframe and it produces sometimes quite long commands... I'd like to have something like: which(within.interval(a, -0.5, 0.5)) Is there anything I could use for this purpose? Antje __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Programmatically minimising main R window (on windows)
Hi all, Is it possible to programmatically minimise the main window of the windows R gui? I'm designing a small gui with gwidgets RGtk2 for an non-statistician to use, and it would be nice if I could easily hide all the R stuff that they don't need. Thanks, Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Applying a function to a dataframe
Another Newbie Question sorry: I am trying to apply a function a dataframe and could use some help: Assuming, dim(df) = (10,2) say, I would like to apply a function that looks at each row in turn and returns a list (dim =(10,1)) using the columns as inputs to the function, but with no INDEX stuff that the by() function refers to. For a simple function x+y I know in Mathematica it would be this; Table[df[[i,1]]+ df[[i,2]],{i,1,10}] Of if the function was defined is would read; Table[f[df[[i,1]], df[[i,2]]],{i,1,10}] Thanks for help in advance Glenn [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Applying a function to a dataframe
On Dec 16, 2008, at 6:00 PM, glenn roberts wrote: Another Newbie Question sorry: I am trying to apply a function a dataframe and could use some help: Assuming, dim(df) = (10,2) say, I would like to apply a function that looks at each row in turn and returns a list (dim =(10,1)) using the columns as inputs to the function, but with no INDEX stuff that the by() function refers to. For a simple function x+y I know in Mathematica it would be this; Table[df[[i,1]]+ df[[i,2]],{i,1,10}] Of if the function was defined is would read; Table[f[df[[i,1]], df[[i,2]]],{i,1,10}] I don't know Mathematica, but if you just want the sum by rows ?apply When the second argument is 1 the rows are taken singly as arguments to the third argument FUN: DF - data.frame(col1 = 1:10, col2 = 11:20) apply(DF,1,sum) [1] 12 14 16 18 20 22 24 26 28 30 If you want minimums then the apply method with FUN=min would still work: apply(DF,1,min) [1] 1 2 3 4 5 6 7 8 9 10 Thanks for help in advance Glenn [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Programmatically minimising main R window (on windows)
On Tue, 16 Dec 2008, hadley wickham wrote: Hi all, Is it possible to programmatically minimise the main window of the windows R gui? I'm designing a small gui with gwidgets RGtk2 for an non-statistician to use, and it would be nice if I could easily hide all the R stuff that they don't need. Not from R itself, but you can by Windows script programming (which you can launch by 'system'. It would also be esay to add a small bit of C code to do so. However why are you using Rgui if you don't want a GUI? That is what Rterm or Rscript or embedded R are for. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Extract Data from a Webpage
Hi All: I would like to extract the provider name, address, and phone number from multiple webpages like this: http://oasasapps.oasas.state.ny.us/portal/pls/portal/oasasrep.providersearch.take_to_rpt?P1=3489P2=11490 Based on searching R-help archives, it seems like the XML package might have something useful for this task. I can load the XML package and supply the url as an argument to htmlTreeParse(), but I don't know how to go from there. thanks, Chuck Cleland sessionInfo() R version 2.8.0 Patched (2008-12-04 r47066) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] XML_1.98-1 -- Chuck Cleland, Ph.D. NDRI, Inc. (www.ndri.org) 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] append lines to a created file
hi, I try to append a line to a file with; writeLines(xxx, con = file.txt, sep = \n) but it always overwrites the existing content. How can I change the mode of writeLines to append (a) ? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Programmatically minimising main R window (on windows)
On Tue, Dec 16, 2008 at 5:40 PM, Prof Brian Ripley rip...@stats.ox.ac.uk wrote: On Tue, 16 Dec 2008, hadley wickham wrote: Hi all, Is it possible to programmatically minimise the main window of the windows R gui? I'm designing a small gui with gwidgets RGtk2 for an non-statistician to use, and it would be nice if I could easily hide all the R stuff that they don't need. Not from R itself, but you can by Windows script programming (which you can launch by 'system'. It would also be esay to add a small bit of C code to do so. However why are you using Rgui if you don't want a GUI? That is what Rterm or Rscript or embedded R are for. That's a good question. The main reason is because it's easy for me to tell my remote user how to load the gui - source(http://;) Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] append lines to a created file
On 17/12/2008, at 1:43 PM, Jörg Groß wrote: hi, I try to append a line to a file with; writeLines(xxx, con = file.txt, sep = \n) but it always overwrites the existing content. How can I change the mode of writeLines to append (a) ? The help on connections says: In general functions using connections will open them if they are not open, but then close them again, so to leave a connection open call open explicitly. So do something like: zz - file(file.txt,w) writeLines(xxx,con=zz,sep=\n) writeLines(A load of dingos' kidneys.,con=zz,sep=\n) etc. etc. close(zz) But why not just use sink() and cat()? Much simpler, IMHO. cheers, Rolf Turner ## Attention: This e-mail message is privileged and confidential. If you are not the intended recipient please delete the message and notify the sender. Any views or opinions presented are solely those of the author. This e-mail has been scanned and cleared by MailMarshal www.marshalsoftware.com ## __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Programmatically minimising main R window (on windows)
G'day Hadley, On Tue, 16 Dec 2008 18:54:48 -0600 hadley wickham h.wick...@gmail.com wrote: On Tue, Dec 16, 2008 at 5:40 PM, Prof Brian Ripley rip...@stats.ox.ac.uk wrote: On Tue, 16 Dec 2008, hadley wickham wrote: [...] Is it possible to programmatically minimise the main window of the windows R gui? I'm designing a small gui with gwidgets RGtk2 for an non-statistician to use, and it would be nice if I could easily hide all the R stuff that they don't need. Not from R itself, but you can by Windows script programming (which you can launch by 'system'. It would also be esay to add a small bit of C code to do so. [...] Not sure if this is what you are after, but I believe we had once a similar problem. Client wanted to have a GUI that would read in a file with the list of people who had bought Raffle tickets, select the winners, and write the winners to a file. They were just interested in seeing the GUI stuff and not the underlying R main window c. We ended up giving them an USB stick with R on it and the package we wrote for them together with other packages they needed. I attach the instructions that I wrote up for our consultant (sitting in Perth) on how I could create such an USB stick (sitting in Singapore). UWA has an authenticating proxy while NUS does not, hence the references to --internet2 on that write up. As it turned out, if you do not use the standard way of installing R but select SDI during the installation (if memory serves correctly, this choice can also be made after R is installed, but then doing this change is a bit more involved), then you can start R minimized. That is the R main window does not appear. You just have to make sure that your code is executed when you start R (via .onAttach, .First c) and brings up the GUI that the user is supposed to see. HTH. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6515 4416 (secr) Dept of Statistics and Applied Probability+65 6515 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: sta...@nus.edu.sg Singapore 117546http://www.stat.nus.edu.sg/~statba 1) Install R on USB stick: Start R installer (named something like (R-2.x.y-win32.exe). During installation: i) Select drive corresponding to USB stick for location to which R is to be installed (i.e. if USB stick is drive DRV, then install R to location DRV:\R-2.x.y). ii) Customize startup: a) select SDI, everything else per default b) may be necessary to select internet2 as internet connection if sitting behind a proxy (but it is also possible to do so later, i.e o.k. to use the default) iii) Don't create a Start Menu folder iv) Don't create desktop icon or registry entries 2) Create a short cut to Rgui.exe (located in DRV:\R-2.x.y\bin\Rgui.exe) and move it to the top folder of the USB stick. Optionally, rename short cut (e.g. Raffle Draw) 3) Start R using the short cut. 4) Select Install package(s) from local zip files from Packages menu: select RaffleDraw_1.y.zip (currently y=1, but may change) for installation. 5) Select Install package(s)... from Packages menu: select appropriate CRAN mirror, then select gWidgets and gWidgetsrJava to be installed. quit R If this step does not work, then you are probably sitting behind a proxy. In that case, go to the short cut that points to Rgui.exe, right click on the short cut and select properties from pop-up window; add --internet2 to the target (i.e. the target should read something like DRV:\R-2.x.y\bin\Rgui.exe --internet2). Click Apply and then Ok and try again. 6) Right-click on short cut and select properties from pop-up window; change entry for Run: from Normal window to minimized. Click Apply and then Ok. (Remove the --internet2 option if it had been added) 7) Goto to the folder DRV:\R-2.x.y\etc and edit the Rprofile.site file located in that folder: add library(RaffleDraw) as last line (without the quotation marks) save file and quit 8) go to top folder of USB stick and double click on the short cut. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to iterate dataframe within a hash
Dear all, I have the following data structure print(testlib) $tags tagcount.raw count.adj err 1 aa94 93 0.5 2 ac 1 2 0.2 3 ag 3 2 0.1 4 ca 1 1 0.003 I want to iterate the data above and print only tag, count.raw and count.adj column. Why my script below failed to do the task? for (i in 1:nrow(testlib)) { cat(testlib$tags[[count.tag]],,, testlib$tags[[count.raw]], ,, testlib$tags[[count.adj]],\n) } - Gundala Viswanath Jakarta - Indonesia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to iterate dataframe within a hash
I don't know if this is what you want, but it seems that you just want to print a subset of your columns: testlib$tags[,c(tag, count.raw, count.adj)] if you want to do something other than just print the columns then look at the apply family of functions. On Tue, Dec 16, 2008 at 9:02 PM, Gundala Viswanath gunda...@gmail.com wrote: Dear all, I have the following data structure print(testlib) $tags tagcount.raw count.adj err 1 aa94 93 0.5 2 ac 1 2 0.2 3 ag 3 2 0.1 4 ca 1 1 0.003 I want to iterate the data above and print only tag, count.raw and count.adj column. Why my script below failed to do the task? for (i in 1:nrow(testlib)) { cat(testlib$tags[[count.tag]],,, testlib$tags[[count.raw]], ,, testlib$tags[[count.adj]],\n) } - Gundala Viswanath Jakarta - Indonesia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Stephen Sefick Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Noobie question, regression across levels
Much thanks! This helped a lot. Another quick one: In using the lmList function in the nlme package, is it possible to subset my data according to the number of observations in each level? (ie. I obviously want to include only those levels in which the observations are of sufficient size for regression). What is the best way to exclude factors of insufficient size? Can I do it inside the lmList function? I've read the requisite help files etc. and two hours later am still confused. Thanks in advance, Allen Ben Bolker wrote: RichardLang wrote: I've just started using R last week and am still scratching my head. I have a data set and want to run a separate regression across each level of a factor (treating each one separately). The data right now is arranged such that the value of the factor along which I want to split my data is one column among many. Best way to do this? Thanks! You can check out lmList function in the nlme package, or more crudely: lmfun - function(d) { lm(y~x,data=d) } myLmList - lapply(split(mydata,splitfactor),lmfun) even more compactly/confusingly: myLmList - lapply(split(mydata,splitfactor),lm,formula=y~x) good luck Ben Bolker -- View this message in context: http://www.nabble.com/Noobie-question%2C-regression-across-levels-tp21020222p21046298.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] append string to a string
hi, I want to append a string to a string like; x - c(abc) append(x, def) so that I get for x: [1] abcdef not (!) [1] abc def How can I do that in R? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extract Data from a Webpage
Hi Chuck. Well, here is one way theURL = http://oasasapps.oasas.state.ny.us/portal/pls/portal/oasasrep.providersearch.take_to_rpt?P1=3489P2=11490; doc = htmlParse(theURL, useInternalNodes = TRUE, error = function(...) {}) # discard any error messages # Find the nodes in the table that are of interest. x = xpathSApply(doc, //table//td|//table//th, xmlValue) Now depending on the regularity of the page, we can do something like i = seq(1, by = 2, length = 3) structure(x[i + 1], names = x[i]) And we end up with a named character vector with the fields of interest. The useInternalNodes is vital so that we can use XPath. The XPath language is very convenient for navigating subsets of the resulting XML tree. D. Chuck Cleland wrote: Hi All: I would like to extract the provider name, address, and phone number from multiple webpages like this: http://oasasapps.oasas.state.ny.us/portal/pls/portal/oasasrep.providersearch.take_to_rpt?P1=3489P2=11490 Based on searching R-help archives, it seems like the XML package might have something useful for this task. I can load the XML package and supply the url as an argument to htmlTreeParse(), but I don't know how to go from there. thanks, Chuck Cleland sessionInfo() R version 2.8.0 Patched (2008-12-04 r47066) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] XML_1.98-1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] append string to a string
?paste On Dec 16, 2008, at 10:39 PM, Jörg Groß wrote: hi, I want to append a string to a string like; x - c(abc) append(x, def) so that I get for x: [1] abcdef not (!) [1] abc def How can I do that in R? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] append string to a string
On Wed, Dec 17, 2008 at 9:09 AM, Jörg Groß jo...@licht-malerei.de wrote: hi, I want to append a string to a string like; x - c(abc) append(x, def) paste (x, def, sep=) see ?paste HTH Aval __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] surface contour plot help
I was able to get a surface plot with wireframe, however I cant rotate it around like you can with the plot3d function? Is thier a way to do this in R? To: r-help@r-project.org Date: Tuesday, December 16, 2008, 9:13 AM I am trying to do a surface profile plot. data is X Y(1) Z(1) 1-jan-02 2002 number 2-jan-02 2002 number . . . 1-jan-03 2003 (Y2) number Z(2) 2-jan-03 2003 (Y2) number Z(2) . . . until dec 31 2007. I used the plot3d funtions to build a scatter point plot. Call rinterface.rrun(library(rgl)) Call rinterface.rrun(plot3d(x,y1,z1,xlab='Date',ylab='Year',zlab='Vol',ylim=c(2001,2008))) Call rinterface.rrun(plot3d(x,y2,z2,add=TRUE)) Call rinterface.rrun(plot3d(x,y3,z3,add=TRUE)) Call rinterface.rrun(plot3d(x,y4,z4,add=TRUE)) Call rinterface.rrun(plot3d(x,y5,z5,add=TRUE)) Call rinterface.rrun(plot3d(x,y6,z6,add=TRUE)) Is thier a way to lay a surface to this? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] simulate binary markov chain
Hi all, I was hoping somebody may know of a function for simulating a large binary sequence (length 10 million) using a (1st order) markov model with known (2x2) transition matrix. It needs to be reasonably fast. I have tried the following; mc-function(sq,P){ s-c() x-row.names(P) n-length(sq) p1-sum(sq)/n s[1] - rbinom(1,1,p1); for ( i in 2:n){ s[i] - rbinom( 1, 1, P[s[i-1]+1] ) } return(s) } P-c(0.63,0.27) x-rbinom(500,1,0.5) new-mc(x,P) thanks in advance! Chris __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] surface contour plot help
On 16/12/2008 5:05 PM, Brad B wrote: I was able to get a surface plot with wireframe, however I cant rotate it around like you can with the plot3d function? Is thier a way to do this in R? You are making your question impossible to answer, by not giving the right details. If you show us code that uses wireframe to do what you want, surely someone could show you how to do the same thing in rgl. That means putting together a small, self-contained example. Don't post a vague description of your data and what you want, simplify your question to something we can see. There are many ways to add a surface to a plot in rgl. We have no idea which one would be appropriate for you. Duncan Murdoch To: r-help@r-project.org Date: Tuesday, December 16, 2008, 9:13 AM I am trying to do a surface profile plot. data is X Y(1) Z(1) 1-jan-02 2002number 2-jan-02 2002number . . . 1-jan-03 2003 (Y2) number Z(2) 2-jan-03 2003 (Y2) number Z(2) . . . until dec 31 2007. I used the plot3d funtions to build a scatter point plot. Call rinterface.rrun(library(rgl)) Call rinterface.rrun(plot3d(x,y1,z1,xlab='Date',ylab='Year',zlab='Vol',ylim=c(2001,2008))) Call rinterface.rrun(plot3d(x,y2,z2,add=TRUE)) Call rinterface.rrun(plot3d(x,y3,z3,add=TRUE)) Call rinterface.rrun(plot3d(x,y4,z4,add=TRUE)) Call rinterface.rrun(plot3d(x,y5,z5,add=TRUE)) Call rinterface.rrun(plot3d(x,y6,z6,add=TRUE)) Is thier a way to lay a surface to this? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simulate binary markov chain
On Wed, 17 Dec 2008, Chris Oldmeadow wrote: Hi all, I was hoping somebody may know of a function for simulating a large binary sequence (length 10 million) using a (1st order) markov model with known (2x2) transition matrix. It needs to be reasonably fast. Chris, The trick is to recognize that the length of each run is a sample from the geometric distribution (with 1 added to it). rgeom() is vectorized, so using it provides fast results. Suppose that your transition matrix is | | 0 | 1 | |---+---+---| | 0 | pi.11 | pi.12 | | 1 | pi.21 | pi.22 | |---+---+---| where pi.11+p.12 == 1 and pi.21+pi.22 == 1 This function foo - function(n,pi.12,pi.21) inverse.rle( list(values=rep(0:1,n) , lengths=1+rgeom( 2*n, rep( c( pi.12, pi.21 ), n) ))) will generate a sequence of 0/1's according to that matrix with length approximately n/pi.12+n/pi.21 On my macbook I get this timing: system.time(res - foo(1205000,.3,.2)) user system elapsed 1.088 0.204 1.291 prop.table(table(head(res,-1),tail(res,-1)),1) # check! 0 1 0 0.6999024 0.3000976 1 0.1997453 0.8002547 length(res) # long enough! [1] 10048040 So, if this is fast enough, you just choose 'n' to be a bit larger than desired length divided by (1/pi.12+1/pi.21) and then discard the excess. Chuck I have tried the following; mc-function(sq,P){ s-c() x-row.names(P) n-length(sq) p1-sum(sq)/n s[1] - rbinom(1,1,p1); for ( i in 2:n){ s[i] - rbinom( 1, 1, P[s[i-1]+1] ) } return(s) } P-c(0.63,0.27) x-rbinom(500,1,0.5) new-mc(x,P) thanks in advance! Chris __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problems with graphical devices, e.g., png(), pdf(): blurry graphical output
On my current home system, I am getting undesirable output from graphical devices such as png() and pdf(). The graphical output is blurry. I haven't experienced the problem on other systems. As you will see from the attached text file (more information on this file below), the problem does not occur when type='Xlib' is forced. The blurriness is more severe with bitmap output (yes, I am viewing the bitmap files at 100%), but occurs with pdf output as well. Software details: Fedora 10, with at least the following packages: -- R, R-core, R-devel -- cairo, cairo-devel -- pixman, pixman-devel -- libpng, libpng-devel -- poppler Everything is current and updated via Fedora's repository. R was installed via Fedora's repository. I've attached some commands and output in a text file. This file includes: (1) hardware information (2) information about my R installation (3) code for simple R graphics, with comments re output, plus URLs for the corresponding graphical output Any advice would be really appreciated. [u...@localhost ~]$ lspci 00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and 945GT Express Memory Controller Hub (rev 03) 00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller (rev 03) 00:02.1 Display controller: Intel Corporation Mobile 945GM/GMS/GME, 943/940GML Express Integrated Graphics Controller (rev 03) 00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 02) 00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 02) 00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 2 (rev 02) 00:1c.2 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 3 (rev 02) 00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 (rev 02) 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #1 (rev 02) 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #2 (rev 02) 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #3 (rev 02) 00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #4 (rev 02) 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 02) 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e2) 00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge (rev 02) 00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 02) 00:1f.2 SATA controller: Intel Corporation 82801GBM/GHM (ICH7 Family) SATA AHCI Controller (rev 02) 00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 02) 02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller 03:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG Network Connection (rev 02) 15:00.0 CardBus bridge: Texas Instruments PCI1510 PC card Cardbus Controller [u...@localhost ~]$ R sessionInfo() R version 2.8.0 (2008-10-20) i386-redhat-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base version _ platform i386-redhat-linux-gnu arch i386 os linux-gnu system i386, linux-gnu status major 2 minor 8.0 year 2008 month 10 day20 svn rev46754 language R version.string R version 2.8.0 (2008-10-20) capabilities() jpeg png tifftcltk X11 aqua http/ftp sockets TRUE TRUE TRUE TRUE TRUEFALSE TRUE TRUE libxml fifo clediticonv NLS profmemcairo TRUE TRUE TRUE TRUE TRUEFALSE TRUE X11.options(reset=TRUE) X11.options() $display [1] $width [1] NA $height [1] NA $pointsize [1] 12 $bg [1] transparent $canvas [1] white $gamma [1] 1 $colortype [1] true $maxcubesize [1] 256 $fonts [1] -adobe-helvetica-%s-%s-*-*-%d-*-*-*-*-*-*-* [2] -adobe-symbol-medium-r-*-*-%d-*-*-*-*-*-*-* $xpos [1] NA $ypos [1] NA $title [1] $type [1] cairo $antialias [1] 1 X11.options(reset=TRUE) plot(1:10) ## result: box lines fuzzy at top and left, and appears ## darker and thicker where the axes are overplotted X11.options(reset=TRUE) X11.options(antialias=2) # antialias=2 is 'none' plot(1:10,