Re: [R] Finding convex hull? [Broadcast]
From: Dong-hyun Oh Dear UseRs, I would like to know which function is the most efficient in finding convex hull of points in 3(or 2)-dimensional case? Functions for finding convex hull is the following: convex.hull (tripack), chull (grDevices), in.chull (sgeostat), convhulln (geometry), convexhull.xy (spatstat), calcConvexHull (PBSmapping). I also would like to know if there is a function that can be used for finding convex hull in multi-dimensional case, that is more than 3- dimension. If you had look a bit more carefully, you should have seen that convhulln (geometry) will handle more than 3 dimensions. Andy Thank you in advance. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Q: loess-like function that allows more predictors?
locfit() in the locfit package can do that. Andy From: D. R. Evans I have a feeling that this may be a stupid question, but here goes anyway: is there a function that I can use to replace loess but which allows a larger number of predictors? (I have a situation in which it would be very convenient to use 5 predictors, which violates the constraint in loess that the number of predictors be in the range from 1 to 4.) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Monotonic interpolation
Not if Mr. excalibur really want interpolating (as oppose to smooting) splines. Other than linear, I'm not even sure if it can be done (though I'm no expert on this). One possibility is to use the cobs package and play with the amount of smoothing... Andy From: Bert Gunter RSiteSearch(monotone, restr=func) will give you several packages and functions for monotone smoothing, including the isoreg() function in the standard stats package. You can determine if any of these does what you want. Bert Gunter Genetech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of excalibur Sent: Thursday, September 06, 2007 8:04 AM To: r-help@stat.math.ethz.ch Subject: Re: [R] Monotonic interpolation Le jeu. 6 sept. à 09:45, excalibur a écrit : Hello everybody, has anyone got a function for smooth monotonic interpolation (splines ...) of a univariate function (like a distribution function for example) ? approxfun() might be what your looking for. Is the result of approxfun() inevitably monotonic ? -- View this message in context: http://www.nabble.com/Monotonic-interpolation-tf4392288.html#a12524568 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Recursive concatenation
Or something like: R do.call(paste, c(expand.grid(LETTERS[1:3], 1:3), sep=)) [1] A1 B1 C1 A2 B2 C2 A3 B3 C3 (The ordering is bit different, but that shouldn't matter.) Andy From: Dimitris Rizopoulos try this: paste(rep(LETTERS[1:3], each = 3), 1:3, sep = ) Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm Quoting Dennis Fisher [EMAIL PROTECTED]: Colleagues, I want to create the following array: A1, A2, A3, B1, B2, B3, C1, C2, C3 I recall that there is a trick using c or paste permitting me to form all combinations of c(A, B, C) and 1:3. But, I can't recall the trick. Dennis Dennis Fisher MD P (The P Less Than Company) Phone: 1-866-PLessThan (1-866-753-7784) Fax: 1-415-564-2220 www.PLessThan.com [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Synchronzing workspaces
See the example in ?save on how to set defaults via options(). Andy From: Gabor Grothendieck You could try saving prior to quitting in the future if you want to try those arguments. On 9/3/07, Paul August [EMAIL PROTECTED] wrote: Thanks for sharing your experience. In my case, the involved machines are Windows Vista, XP and 2000. Not sure whether it contributes to my problem or not. I will look into this further. I just noticed the two arguments ascii and compress for save. However, my .RData file was created by q() with yes. The manual says that q() is equivalent to save(list = ls(all=TRUE), file = .RData). There seems to be no way to set ascii or compression of save through q function, unless the q function is replaced explicitly with save(list = ls(all=TRUE), file = .RData, ascii = T). Paul. - Original Message From: Gabor Grothendieck [EMAIL PROTECTED] To: Paul August [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch Sent: Thursday, August 30, 2007 11:24:31 PM Subject: Re: [R] Synchronzing workspaces I haven't had similar experience but note that save has ascii= and compress= arguments. You could check if varying those parameter values makes a difference. On 8/30/07, Paul August [EMAIL PROTECTED] wrote: I used to work on several computers and to use a flash drive to synchronize the workspace on each machine before starting to work on it. I found that .RData always caused some trouble: Often it is corrupted even though there is no error in copying process. Does anybody have the similar experience? Paul. - Original Message From: Barry Rowlingson [EMAIL PROTECTED] To: Eric Turkheimer [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch Sent: Wednesday, August 22, 2007 9:43:57 AM Subject: Re: [R] Synchronzing workspaces Eric Turkheimer wrote: How do people go about synchronizing multiple workspaces on different workstations? I tend to wind up with projects spread around the various machines I work on. I find that placing the directories on a server and reading them remotely tends to slow things down. If R were to store all its workspace data objects in individual files instead of one big .RData file, then you could use a revision control system like SVN. Check out the data, work on it, check it in, then on another machine just update to get the changes. However SVN doesn't work too well for binary files - conflicts being hard to resolve without someone backing down - so maybe its not such a good idea anyway... On unix boxes and derivatives, you can keep things in sync efficiently with the 'rsync' command. I think there are GUI addons for it, and Windows ports. Barry __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ __ Comedy with an Edge to see what's on, when. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable Importance - Random Forest
I'm slowly clearing my back-log of r-help messages... Please see reply inline below. Andy From: Mathe, Ewy (NIH/NCI) [F] Hello, I am trying to explore the use of random forests for classification and am certain about the interpretation of the importance measurements. When having the option importance = T in the randomForest call, the resulting 'importance' element matrix has four columns with the following headings: 0 - mean raw importance score of variable x for class 0 (where importance is the difference between the permutated data error and the original test set error) 1 - mean raw importance score of variable x for class 1 MeanDecreaseAccuracy : average lowering of the margin across all cases (where margin is the proportion of votes for the true class - the maximum proportion of votes for the other classes) MeanDecreaseGini : summation of the gini decreases over all trees in the forest Are these definitions correct? Why is the raw importance score calculated for each class? Could one just average the raw importance scores for class 0 and 1 to get a composite importance score? The permutation-based importance measures are based on OOB data. For each tree in the forest, the difference in error rates on the OOB data with and without permuting the variable of interest is computed. Call this d[i] for the i-th tree. The overall importance measure is mean(d[i]) / se(d[i]), where se(d[i]) is sd(d[i])/sqrt(ntree) (the standard error). The numbers in the 0 and 1 columns are the analogs computed separately for the 0 class and 1 class separately. These are useful, e.g., when balanced sampling is used. Now, when having the option importance = F in the randomForest call, the 'importance' element is now a vector. What values are those? That's the MeanDecreaseGini, because they come at nearly zero additional computation, so we might as well keep them. Thank you in advance for any input you may have. Best, Ewy Ewy Mathe, Ph. D. Laboratory of Human Carcinogenesis National Cancer Institute, NIH 37 Convent Drive Building 37, Room 3068 Bethesda, MD 20892-4255 Tel: 301-496-5835 Fax: 301-496-0497 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] categorical variable coefficients in QSAR [Broadcast]
No one seemed to have picked up on this, so I'll take a stab: You need to read para and meta into R as factors, and if you want the coefficients to match the way you showed, you also need to take care that the factor levels are in the same order as you showed in the coefficient table. I cut-and-pasted the three columns of data into R separately, like so: [copy para data to the clipboard] R para - factor(scan(clipboard, what=)) Read 22 items [copy meta data to the clipboard] R meta - factor(scan(clipboard, what=)) Read 22 items [copy biological activity to the clipboard] R y - scan(clipboard) Read 22 items [copy the column heading of the coefficient table to the clipboard] R lvl - scan(clipboard, what=) Read 6 items R para - factor(as.character(para), levels=lvl) R meta - factor(as.character(meta), levels=lvl) R qsar - lm(y ~ para + meta) R qsar Call: lm(formula = y ~ para + meta) Coefficients: (Intercept)paraF paraCl paraBrparaI paraMe 7.8213 0.3400 0.7675 1.0200 1.4287 1.2560 metaF metaCl metaBrmetaI metaMe -0.3013 0.2068 0.4340 0.5787 0.4540 These coefficients match the ones you showed quite closely. If you don't reorder the levels of the factors, then by default R orders them alphabetically, so that Br becomes the reference and all coefficients are differences from Br. HTH, Andy From: [EMAIL PROTECTED] Dear list: I am interested in the following sort of problem, as is found frequently in the field of QSAR. I have biological activity as a function of chemical structure, with structure defined in a categorical manner in that the SUBSTITUENT is the levels of the POSITION factor. For example, data from Kubinyi (http://www.kubinyi.de/dd-12.pdf) for this type of analysis is presented as follows: factor para: H F Cl Br I Me H H H H H F F F Cl Cl Cl Br Br Br Me Me factor meta: H H H H H H F Cl Br I Me Cl Br Me Cl Br Me Cl Br Me Me Br observed biological activity: 7.46 8.16 8.68 8.89 9.25 9.30 7.52 8.16 8.30 8.40 8.46 8.19 8.57 8.82 8.89 8.92 8.96 9.00 9.35 9.22 9.30 9.52 I then think the following analysis should be appropriate meta-factor(scan(file=meta,what=character)) para-factor(scan(file=para,what=character)) ba-scan(file=ba) rslt-lm(ba~meta+para-1) What I wish to obtain is a coefficient for each substituent at each position, as does Kubinyi: H F Cl Br I Me meta 0.00 -0.30 0.21 0.43 0.58 0.45 para 0.00 0.34 0.77 1.02 1.43 1.26 However, I do not get a coefficient for the Br substituent at the para position. I would like to know if there is an error in this formulation. The technique is quite well established in the field of medicinal chemistry and it is traditional that the binary incidence matrix is formed by hand as an intermediate step in the analysis, instead of the much simpler formulation that I am considering here. Thank you for whatever insight you may give. Prof. Roy Little Dept. Chem. Universidad de los Andes Mérida, Venezuela __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] randomForest help
What software are you using, exactly? I'm the maintainer of the randomForest package, yet I do not know which manual you are quoting. If you are using the randomForest package, the model object can be saved to a file by save(Rfobject, file=myRFobject.rda). If you need that to be in ascii, use ascii=TRUE in save(). You can get it back into R by using load() or attach(). To run data down the model, use predict(Rfobject, datatopredict) (see ?predict.randomForest). What exactly do you want to print to a csv file, the prediction? See ?write or ?write.table. Andy From: Jennifer Dawn Watts Hello! As a new R user, I'm sure this will be a silly question for the rest of you. I've been able to successfully run a forest but yet to figure out proper command lines for the following: 1. saving the forest. The guide just says isavef=1. I'm unsure how expand on this to create the command. 2. Running new data down the mode. Again, the guide just states irunf 3. Print to file. I need to be able to export this data to a cvs file, to then incorporate into an Arc shapefile. The manual just says ntestout. Again, I feel like these should be easy steps that I just can't relate to as a beginner. Any advice would be greatly appreciated. Thanks, Jenny __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ecological meaning of randomForest vegetation classification? [Broadcast]
Hi Christoph, I'm not exactly sure what you're looking for, but I'll take a stab anyway. The trees in a random forest is not designed to be interpreted as one would with an ordinary tree. There are several things you may try to see if they help you any. One is the distribution of votes. It looks like you are classifying each data point into one of many possible classes. RF with give you the fraction of trees in the forest that classified the observation as a particular class (and the class with the highest fraction of votes is the predicted class). Another is the partial dependence plot: You can use plot(importance(rf.object)) to see which variables are the most important, and then use partialPlot() to examine their marginal effects. These offer some clue of what the RF black box is doing, and hopefully will make some sense to you. Best, Andy From: Christoph Muller Hi, everyone, I haven't found anything similar in the forum, so here's my problem (I'm no expert in R nor statistics): I have a data set of 59.000 cases with 9 variables each (fractional coverage of 9 different plant types, such as deciduous broad-leaved temperate trees or evergreen tropical trees etc.), which was generated by a vegetation model. In order to evaluate the quality of the vegetation model's output, I want to compare it to a land-cover data set which has 23 different land-cover types (such as needle leaved evergreen forest, dense broad-leaved forest, barren, etc.). A statistician advised me to use the randomForest package in R and using a sub-set to generate the random Forest, I get a very good prediction for the rest. However, I need to evaluate how meaningful this classification is in an ecological sense (boreal trees should not play a role in the definition of tropical land-cover types, for example), otherwise I cannot judge the quality of the vegetation model's output. Unfortunately, randomForest gives me about 15.000 splits of which about 5000 are end branches (rough guess), so it's very hard and time-consuming to check each single branch of one of the final trees for its ecological meaning. Is there any utility to summarize the characteristics of each of the 23 prediction classes? Such as land-cover class 1 has less than 5% of plant types 1-5, 20-50% of plant type 7 and at least 30% of plant type 8. Or is there a more suitable method to classify my data? Thanks a lot in advance! Christoph __ __ Click on the following link for the Netherlands Environmental Assessment Agency(MNP)mission and contact information: http://www.mnp.nl/signature.html Klik op de volgende link voor missie en contactinformatie van het Milieu- en Natuurplanbureau (MNP): http://www.mnp.nl/signature.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (Most efficient) way to make random sequences of random sequences
Similarly: s - c(replicate(N, sample(3))) Andy From: roger koenker One way: N - 10 s - c(apply(matrix(rep(1:3,N),3,N),2,sample)) url:www.econ.uiuc.edu/~rogerRoger Koenker email[EMAIL PROTECTED]Department of Economics vox: 217-333-4558University of Illinois fax: 217-244-6678Champaign, IL 61820 On Aug 21, 2007, at 3:49 PM, Emmanuel Levy wrote: Hi, I was wondering the what would be the (most efficient) way to generate a sequence of sequences, i mean: if I have 1,2 and 3. I'd like to generate a sequence of length N*3 (N ~ 1,000,000 or more) Where random permutations of the sequence 1,2,3 follow each other. i.e 1,2,3,1,3,2,3,2,1 /!\ The thing is that there should never be twice the same number of in the same sub-sequence, meaning that this is different from generating a vector with the numbers 1,2 and 3 randomly distributed. Any suggestion very welcome! Thanks, Emmanuel __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RFclustering - is it available in R?
Basically the random forest algorithm can generate a proximity matrix of the data, and it's up to you how you would want to proceed from there. You can feed that into clustering algorithms that accept a similarity matrix, or turn it into a distance matrix for clustering algorithms that need a distance matrix (e.g., hclust()). You may or may not want to do ordination as the UCLA folks suggest. I think this is one of the great things about working in R: you have the freedom to choose how you want to proceed from some intermediate result, and not locked in to something some one decide to hardwire into the software. Andy From: Gavin Simpson On Wed, 2007-08-15 at 09:44 -0700, David Katz wrote: Several searches turned up nothing. Perhaps I will try to implement it if nobody else has. Thanks. You can do this with Andy Liaw's randomForest package can do this and the first hit on a Google search (on term RFclustering) was this: http://www.genetics.ucla.edu/labs/horvath/RFclustering/RFclust ering.htm which shows how one might go about this with some helper functions. G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loading JMP Files
JMP can write CSV, and that's probably a safer choice than XPT. Andy From: Diana C. Dolan Hi, I know how to use SPSS and JMP, and have quite a few JMP files I would like to use in R. I converted them to .xpt files, downloaded the 'foreign' library then tried this command: read.xport(D:\\Databases\nameoffile.xpt) to which I get: Error in lookup.xport(file) : unable to open file I have read FAQ lists and Google searched and cannot figure out what I'm doing wrong that my files won't open. I tried saving to the C drive, but no luck there. I also have no luck getting R to read my SPSS files with read.spss My file names do have spaces and dashes, and I do have variables/variable names longer than 8 characters. Please help! I am very new to R and do not understand all the package reference manuals...I can not seem to find a simple, basic guide to how to command R and use basic functions without a bunch of jargon (eg 'usage' and 'arguments'). It would help to at least be able to load my files to practice on. Any help would be appreciated! Thanks, Diana __ __ Pinpoint customers who are looking for what you sell. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rfImpute
I seem to recall that rfImpute() can sometimes come up with NAs at some point in the iterations. Could you please send me a (small) set of data/code that reproduces the problem? Andy From: Eric Turkheimer I am having trouble with the rfImpute function in the randomForest package. Here is a sample... clunk.roughfix-na.roughfix(clunk) clunk.impute-rfImpute(CONVERT~.,data=clunk) ntree OOB 1 2 300: 26.80% 3.83% 85.37% ntree OOB 1 2 300: 18.56% 5.74% 51.22% Error in randomForest.default(xf, y, ntree = ntree, ..., do.trace = ntree, : NA not permitted in predictors So roughFix works, but rfImpute doesn't Thanks, Eric ent3c *at* virginia.edu -- Eric Turkheimer, PhD Department of Psychology University of Virginia PO Box 400400 Charlottesville, VA 22904-4400 434-982-4732 434-982-4766 (FAX) [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sourcing commands but delaying their execution
Here's one possibility: The file garbage.R has x - rnorm(100) print(summary(x)) You can do: cmds - parse(file=garbage.R, n=NA) and when you want to execute those commands, do eval(cmds) Andy From: Dennis Fisher Colleagues: I have encountered the following situation: SERIES OF COMMANDS source(File1) MORE COMMANDS source(File2) Optimally, I would like File1 and File2 to be merged into a single file (FileMerged). However, if I wrote the following: SERIES OF COMMANDS source(FileMerged) MORE COMMANDS I encounter an error: the File2 portion of FileMerged contains commands that cannot be executed properly until MORE COMMANDS are executed. Similarly, sourcing FileMerged after MORE COMMANDS does not work because MORE COMMANDS requires the information from the File1 portion of FileMerged. I am looking for a means to source FileMerged but not execute some of the commands immediately. Functionally this would look like: SERIES OF COMMANDS source(FileMerged)# but withhold execution of some of the commands MORE COMMANDS COMMAND TO EXECUTE THE WITHHELD COMMANDS Does R offer some option to accomplish this? Dennis Dennis Fisher MD P (The P Less Than Company) Phone: 1-866-PLessThan (1-866-753-7784) Fax: 1-415-564-2220 www.PLessThan.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] getting the name of variables passed to a function
Here's one possibility: R f - function(...) { call - match.call(); sapply(as.list(call[-1]), deparse) } R f(x, y) [1] x y R f(x=x, y=y) x y x y You basically need to know how to manipulate call objects. The relevant section in the R Language Definition should help. Andy From: Horace Tso Folks, I've entered into an R programming territory I'm not very familiar with, thus this probably very elementary question concerning the mechanic of a function call. I want to know from within a function the name of the variables I pass down. The function makes use of the ... to allow for multiple unknown arguments, myfun = function(...) { do something } In the body I put, { nm - names(list(...)) nm } When the function is called with two vectors x, and y myfun(x, y) It returns NULL. However, when the call made is, myfun(x=x, y=y) The result is [1] x y Question : how do i get the names of the unknown variables without explicitly saying x=x... Thanks in advance. Horace __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] randomForest importance problem with combine [Broadcast]
I've been fixing some problems in the combine() function, but that's only for regression data. Looks like you are doing classification, and I don't see the problem: R library(randomForest) randomForest 4.5-19 Type rfNews() to see new features/changes/bug fixes. R set.seed(1) R rflist - replicate(50, randomForest(iris[-5], iris[[5]], ntree=50, importance=TRUE), simplify=FALSE) R rfall - do.call(combine, rflist) R importance(rfall) setosa versicolor virginica MeanDecreaseAccuracy Sepal.Length 0.4457861 0.53883425 0.55806570.4120840 Sepal.Width 0.3266790 0.07652383 0.36202400.2128450 Petal.Length 1.1950989 1.42014628 1.32204710.7989841 Petal.Width 1.1986973 1.40855969 1.36406200.7951053 MeanDecreaseGini Sepal.Length 9.578580 Sepal.Width 2.301172 Petal.Length42.935832 Petal.Width 44.409058 R importance(rflist[[1]]) setosa versicolor virginica MeanDecreaseAccuracy Sepal.Length 0.401714 0.71583422 0.49464200.4166555 Sepal.Width 0.00 -0.03155946 0.68292870.2317111 Petal.Length 1.290430 1.47915219 1.34567700.8219003 Petal.Width 1.110142 1.44996777 1.35847990.7881210 MeanDecreaseGini Sepal.Length 6.168439 Sepal.Width 2.240723 Petal.Length48.821726 Petal.Width 42.059112 Please provide a reproducible example. Andy From: Joseph Retzer My apologies, subject corrected. I'm building a RF 50 trees at a time due to memory limitations (I have roughly .5 million observations and around 20 variables). I thought I could combine some or all of my forests later and look at global importance. If I have say 2 forests : tree1 and tree2, they have similar Gini and Raw importances and, additionally, are similar to one another. After combining (using the combine command) the trees into one however, the combined tree Raw importances have changed in rank order rather dramtically (e.g. the top most important becomes least important. It is not however a completely reversed ordering). In addtion, the scale of both the Raw and Gini importances is orders of magnitude smaller for the combined tree. Note that the combined tree Gini importance looks roughly similar to the individual tree Gini (and Raw) importance, at least in terms of rank ordering. I'm using the non-formula randomForest specification along with norm.votes=FALSE to facilitate large sample estimation and tree combining. I'm using R 2.5.0 on a windows XP machine with 2 gig RAM. I'm also using randomForest 4.5-18. Any advice is appreciated, Many thanks, Joe [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Viewing a data object
I believe JGR has an object browser. See the screenshots at the bottom of http://rosuda.org/JGR/. Andy From: Stephen Tucker Hi Horace, I have also thought that it may be useful but I don't know of any Object Explorer available for R. However, (you may alread know this but) (1) you can view your list of objects in R with objects(), (2) view objects in a spreadsheet-like table (if they are matrices or data frames) with invisible(edit(objectName)) [which isn't easy on the fingers]. fix(objectName) is also a shorter option but it has the side effect of possibly changing your object when you close the viewing data. For instance, this can happen if you mistakenly type something into a cell; it can also change your column classes when you don't - for example: options(stringsAsFactors=TRUE) x - data.frame(letters[1:5],1:5) sapply(x,class) letters.1.5. X1.5 factorinteger fix(x) # no user-changes made sapply(x,class) letters.1.5. X1.5 factornumeric (3) I believe Deepayan Sarkar contributed the tab-completion capability at the command line. So unless you have a lot of objects beginning with 'AuroraStoch...' you should be able to type a few letters and let the auto-completion handle the rest. Best regards, ST --- Horace Tso [EMAIL PROTECTED] wrote: Dear list, First apologize that this is trivial and just betrays my slothfulness at the keyboard. I'm sick of having to type a long name just to get a glimpse of something. For example, if my data frame is named 'AuroraStochasticRunsJune1.df and I want to see what the middle looks like, I have to type AuroraStochasticRunsJune1.df[ 400:500, ] And often I'm not even sure rows 400 to 500 are what I want to see. I might have to type the same line many times. Is there sort of a R-equivalence of the Object Explorer, like in Splus, where I could mouse-click an object in a list and a window pops up? Short of that, is there any trick of saving a couple of keystrokes here and there? Thanks for tolerating this kind of annoying questions. H. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ __ Sucker-punch spam with award-winning protection. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R for bioinformatics
Just to complete this thread: A colleague sent me the following regarding the book. Following up on this post from a few months back... The author has recently posted a public-domain version of this book on CRAN under Documentation - Contributed - Statistics Using R with Biological Examples by Kim Seefeld and Ernst Linder (PDF). Unfortunately not all the mirror sites have it yet. Andy From: Benoit Ballester Marc Schwartz wrote: On Thu, 2007-02-01 at 21:32 +0100, Peter Dalgaard wrote: Marc Schwartz wrote: On Thu, 2007-02-01 at 10:45 -0800, Seth Falcon wrote: Benoit Ballester [EMAIL PROTECTED] writes: Hi, I was wondering if someone could tell me more about this book, (if it's a good or bad one). I can't find it, as it seems that O'Reilly doesn't publish any more. I've never seen a copy so I can't comment about its quality (has anyone seen a copy?). You might want to take a look at _Bioinformatics and Computational Biology Solutions Using R and Bioconductor_. http://www.bioconductor.org/pub/docs/mogr/ I'll stand (or sit) to be corrected on this as I cannot find the source, but I have a recollection from seeing something quite some time ago that the book may have never been published. It's been a while since the status was something along the lines that the authors may or may not complete it. Subject matter moving faster than pen, I suspect Peter, that wording does seem familiar, just cannot recall where I saw it. Perhaps on the O'Reilly web site, where it is no longer listed. For confirmation, I called O'Reilly's customer service in Cambridge, MA. They confirm that the book was indeed cancelled and never published. No reasons were given. Thanks for those replies. I did also contacted the O'reilly offices in UK, and they told me the same thing. The book was never published. I just wanted to compare the R for bioinformatics with the Bioinformatics and Computational Biology Solutions Using R and Bioconductor, and see which one suit me more - But guess I don't have the choice now :-) Ben -- Benoit Ballester Ensembl Team __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] normality tests [Broadcast]
From: [EMAIL PROTECTED] On 25/05/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote: Hi all, apologies for seeking advice on a general stats question. I ve run normality tests using 8 different methods: - Lilliefors - Shapiro-Wilk - Robust Jarque Bera - Jarque Bera - Anderson-Darling - Pearson chi-square - Cramer-von Mises - Shapiro-Francia All show that the null hypothesis that the data come from a normal distro cannot be rejected. Great. However, I don't think it looks nice to report the values of 8 different tests on a report. One note is that my sample size is really tiny (less than 20 independent cases). Without wanting to start a flame war, are there any advices of which one/ones would be more appropriate and should be reported (along with a Q-Q plot). Thank you. Regards, Wow - I have so many concerns with that approach that it's hard to know where to begin. But first of all, why care about normality? Why not use distribution-free methods? You should examine the power of the tests for n=20. You'll probably find it's not good enough to reach a reliable conclusion. And wouldn't it be even worse if I used non-parametric tests? I believe what Frank meant was that it's probably better to use a distribution-free procedure to do the real test of interest (if there is one) instead of testing for normality, and then use a test that assumes normality. I guess the question is, what exactly do you want to do with the outcome of the normality tests? If those are going to be used as basis for deciding which test(s) to do next, then I concur with Frank's reservation. Generally speaking, I do not find goodness-of-fit for distributions very useful, mostly for the reason that failure to reject the null is no evidence in favor of the null. It's difficult for me to imagine why there's insufficient evidence to show that the data did not come from a normal distribution would be interesting. Andy Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University -- yianni __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] in unix opening data object created under win
What are the versions of R on the two platform? Is the version on Unix at least as new as the one on Windows? Andy From: [EMAIL PROTECTED] Hi All I am saving a dataframe in my MS-Win R with save(). Then I copy it onto my personal AFS space. Then I start R and run it with emacs and load() the data. It loads only 2 lines: head() shows only two lines nrow() als say it has only 2 lines, I get error message, when trying to use this data object, saying that some row numbers are missing. If anyone had similar situation, I appreciate letting me know. Best Toby __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using lm() with variable formula [Broadcast]
One way to do it is by giving a data frame with the right variables to lm() as the first argument each time. If lm() is given a data frame as the first argument, it will treat the first variable as the LHS and the rest as the RHS of the formula. As examples, you can do: lm(myData[c(height, weight, BP, Cals)]) (The drawback to this is that the formula in the fitted model object looks a bit strange...) Andy From: Chris Elsaesser New to R; please excuse me if this is a dumb question. I tried to RTFM; didn't help. I want to do a series of regressions over the columns in a data.frame, systematically varying the response variable and the the terms; and not necessarily including all the non-response columns. In my case, the columns are time series. I don't know if that makes a difference; it does mean I have to call lag() to offset non-response terms. I can not assume a specific number of columns in the data.frame; might be 3, might be 20. My central problem is that the formula given to lm() is different each time. For example, say a data.frame had columns with the following headings: height, weight, BP (blood pressure), and Cals (calorie intake per time frame). In that case, I'd need something like the following: lm(height ~ weight + BP + Cals) lm(height ~ weight + BP) lm(height ~ weight + Cals) lm(height ~ BP + Cals) lm(weight ~ height + BP) lm(weight ~ height + Cals) etc. In general, I'll have to read the header to get the argument labels. Do I have to write several functions, each taking a different number of arguments? I'd like to construct a string or list representing the varialbes in the formula and apply lm(), so to say [I'm mainly a Lisp programmer where that part would be very simple. Anyone have a Lisp API for R? :-}] Thanks, chris Chris Elsaesser, PhD Principal Scientist, Machine Learning SPADAC Inc. 7921 Jones Branch Dr. Suite 600 McLean, VA 22102 703.371.7301 (m) 703.637.9421 (o) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] more woes trying to convert a data.frame to a numerical matrix
I think this might be a bit more straight forward: R mat - do.call(cbind, scan(clipboard, what=list(NULL, 0, 0, 0), sep=,, skip=2)) Read 3 records R mat [,1] [,2] [,3] [1,]123 [2,]456 [3,]789 Andy From: Andrew Yee Thanks again to everyone for all your help. I think I've figured out the solution to my dilemma. Instead of using data.matrix or sapply, this works for me: sample.data-read.csv(sample.csv) sample.matrix.raw-as.matrix(sample.data[-1,-1]) sample.matrix - matrix(as.numeric(sample.matrix.raw), nrow=attributes(sample.matrix.raw)$dim[1], ncol=attributes( sample.matrix.raw)$dim[2]) With the above code, I get the desired matrix of: 1 2 3 4 5 6 7 8 9 (I'd like to be able to import the whole csv and then subset the relevant header and data sections (rather than creating a separate csv for the header and csv for the data) Of course, the above code seems kind of clunky, and welcome any suggestions for improvement. Thanks, Andrew On 5/16/07, Andrew Yee [EMAIL PROTECTED] wrote: Thanks for the suggestion. However, I've tried sapply and data.matrix. The problem is that it while it returns a numeric matrix, it gives back: 1 1 1 2 2 2 3 3 3 instead of 1 2 3 4 5 6 7 8 9 The latter matrix is the desired result Thanks, Andrew On 5/16/07, Marc Schwartz [EMAIL PROTECTED] wrote: On Wed, 2007-05-16 at 08:40 -0400, Andrew Yee wrote: Thanks for the suggestion and the explanation for why I was running into these troubles. I've tried: as.numeric(as.matrix(sample.data[-1, -1])) However, this creates another vector rather than a matrix. Right. That's because I'm an idiot and need more caffeine... :-) Is there a straight forward way to convert this directly into a numeric matrix rather than a vector? Yeah, Dimitris' approach below of using data.matrix(). You could also use: mat - sapply(sample.data[-1, -1], as.numeric) rownames(mat) - rownames(sample.data[-1, -1]) mat x y z 2 1 1 1 3 2 2 2 4 3 3 3 Though, this is essentially what data.matrix() does internally. Additionally, I've also considered: data.matrix(sample.data[-1,-1] but bizarrely, it returns: x y z 2 1 1 1 3 2 2 2 4 3 3 3 That is a numeric matrix: str(data.matrix(sample.data[-1, -1])) int [1:3, 1:3] 1 2 3 1 2 3 1 2 3 - attr(*, dimnames)=List of 2 ..$ : chr [1:3] 2 3 4 ..$ : chr [1:3] x y z HTH, Marc Thanks, Andrew On 5/16/07, Marc Schwartz [EMAIL PROTECTED] wrote: On Wed, 2007-05-16 at 08:10 -0400, Andrew Yee wrote: I have the following csv file: name,x,y,z category,delta,gamma,epsilon a,1,2,3 b,4,5,6 c,7,8,9 I'd like to create a numeric matrix of just the numbers in this csv dataset. I've tried the following program: sample.data - read.csv(sample.csv) numerical.data - as.matrix (sample.data[-1,-1]) However, print(numerical.data ) returns what appears to be a matrix of characters: x y z 2 1 2 3 3 4 5 6 4 7 8 9 How do I force it to be numbers rather than characters? Thanks, Andrew The problem is that you have two rows which contain alpha entries. The first row is treated as the header, but the second row is treated as actual data, thus overriding the numeric values in the subsequent rows. You could use: as.numeric(as.matrix(sample.data [-1, -1])) to coerce the matrix to numeric, or if you don't need the alpha entries, you could modify the read.csv() call to something like: read.csv(sample.csv, header = FALSE, skip = 2, row.names = 1, col.names = c(name, x, y, z) This will skip the first two rows, set the first column to the row names and give you a data frame with numeric columns, which in most cases can be treated as a numeric matrix and/or you could explicitly coerce it to one. HTH, Marc Schwartz [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and
Re: [R] Testing for existence inside a function [Broadcast]
Not sure which one you want, but the following should cover it: R f - function(x) c(x=missing(x), y=exists(y)) R f(1) x y FALSE FALSE R f() x y TRUE FALSE R y - 1 R f() xy TRUE TRUE R f(1) x y FALSE TRUE Andy From: Talbot Katz Hi. I'm having trouble testing for existence of an object inside a function. Suppose I have a function: f-function(x){ ... } and I call it with argument y: f(y) I'd like to check inside the function whether argument y exists. Is this possible, or do I have to either check outside the function or pass the name of the argument as a separate argument? If I do exists(x) or exists(eval(x)) inside the function and y does not exist, it generates an error message. If I do exists(x) it says that x exists even if y does not. If I had a separate argument to hold the text string y then I could check that. But is it possible to check the existence of the argument inside the function without passing its name as a separate argument? Thanks! -- TMK -- 212-460-5430 home 917-656-5351 cell __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Testing for existence inside a function
Just need a bit more work: R f - function(x) exists(deparse(substitute(x))) R f(y) [1] FALSE R y - 1 R f(y) [1] TRUE R f(z) [1] FALSE Andy From: Talbot Katz Hi, Andy. Thank you for the quick response! Unfortunately, none of these are exactly what I'm looking for. I'm looking for the following: Suppose object y exists and object z does not exist. If I pass y as the value of the argument to my function, I want to be able to verify, inside my function, the existence of y; similarly, if I pass z as the value of the argument, I want to be able to see, inside the function, that z doesn't exist. The missing function just checks whether the argument is missing; in my case, the argument is not missing, but the object may not exist. And the way you use the exists function inside the user-defined function doesn't test the argument to the user-defined function, it's just hard-coded for the object y. So I'm sorry if I wasn't clear before, and I hope this is clear now. Perhaps what I'm attempting to do is unavailable because it's a bad programming paradigm. But even an explanation if that's the case would be appreciated. -- TMK -- 212-460-5430 home 917-656-5351 cell From: Liaw, Andy [EMAIL PROTECTED] To: Talbot Katz [EMAIL PROTECTED],r-help@stat.math.ethz.ch Subject: RE: [R] Testing for existence inside a function [Broadcast] Date: Tue, 15 May 2007 11:03:12 -0400 Not sure which one you want, but the following should cover it: R f - function(x) c(x=missing(x), y=exists(y)) R f(1) x y FALSE FALSE R f() x y TRUE FALSE R y - 1 R f() xy TRUE TRUE R f(1) x y FALSE TRUE Andy From: Talbot Katz Hi. I'm having trouble testing for existence of an object inside a function. Suppose I have a function: f-function(x){ ... } and I call it with argument y: f(y) I'd like to check inside the function whether argument y exists. Is this possible, or do I have to either check outside the function or pass the name of the argument as a separate argument? If I do exists(x) or exists(eval(x)) inside the function and y does not exist, it generates an error message. If I do exists(x) it says that x exists even if y does not. If I had a separate argument to hold the text string y then I could check that. But is it possible to check the existence of the argument inside the function without passing its name as a separate argument? Thanks! -- TMK -- 212-460-5430 home 917-656-5351 cell __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - - Notice: This e-mail message, together with any attachments, contains information of Merck Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp Dohme or MSD and in Japan, as Banyu - direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. - - -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Testing for existence inside a function
Another thing to watch out for is that an argument to a function can be an expression (or even literal constants), instead of just the name of an object. exists() wouldn't really do the right thing. I'm not sure how to properly do the exhaustive check. Andy From: Gabor Grothendieck Try this modification: chk - function(x) exists(deparse(substitute(x)), parent.env(environment())) ab - 1 chk(ab) [1] TRUE exists(x) [1] FALSE chk(x) [1] FALSE On 5/15/07, Talbot Katz [EMAIL PROTECTED] wrote: Hi. Thanks once more for the swift response. This solution works pretty well. The only small glitch is if I pass the function an argument with the same name as the function argument. That is, suppose x is the argument name in my user-defined function, and that object x does not exist. If I call the function f(x), i.e., using the non-existent object x as the argument value, then the function says that x exists. Here is my example log: chkex5 - function(objn){ + c(exob=exists(deparse(substitute(objn + } exists(objn) [1] FALSE chkex5(objn) exob TRUE But I suppose I can live with this. Thanks again! -- TMK -- 212-460-5430home 917-656-5351cell From: Liaw, Andy [EMAIL PROTECTED] To: Talbot Katz [EMAIL PROTECTED],r-help@stat.math.ethz.ch Subject: RE: [R] Testing for existence inside a function Date: Tue, 15 May 2007 11:41:17 -0400 Just need a bit more work: R f - function(x) exists(deparse(substitute(x))) R f(y) [1] FALSE R y - 1 R f(y) [1] TRUE R f(z) [1] FALSE Andy From: Talbot Katz Hi, Andy. Thank you for the quick response! Unfortunately, none of these are exactly what I'm looking for. I'm looking for the following: Suppose object y exists and object z does not exist. If I pass y as the value of the argument to my function, I want to be able to verify, inside my function, the existence of y; similarly, if I pass z as the value of the argument, I want to be able to see, inside the function, that z doesn't exist. The missing function just checks whether the argument is missing; in my case, the argument is not missing, but the object may not exist. And the way you use the exists function inside the user-defined function doesn't test the argument to the user-defined function, it's just hard-coded for the object y. So I'm sorry if I wasn't clear before, and I hope this is clear now. Perhaps what I'm attempting to do is unavailable because it's a bad programming paradigm. But even an explanation if that's the case would be appreciated. -- TMK -- 212-460-5430home 917-656-5351cell From: Liaw, Andy [EMAIL PROTECTED] To: Talbot Katz [EMAIL PROTECTED],r-help@stat.math.ethz.ch Subject: RE: [R] Testing for existence inside a function [Broadcast] Date: Tue, 15 May 2007 11:03:12 -0400 Not sure which one you want, but the following should cover it: R f - function(x) c(x=missing(x), y=exists(y)) R f(1) x y FALSE FALSE R f() x y TRUE FALSE R y - 1 R f() xy TRUE TRUE R f(1) x y FALSE TRUE Andy From: Talbot Katz Hi. I'm having trouble testing for existence of an object inside a function. Suppose I have a function: f-function(x){ ... } and I call it with argument y: f(y) I'd like to check inside the function whether argument y exists. Is this possible, or do I have to either check outside the function or pass the name of the argument as a separate argument? If I do exists(x) or exists(eval(x)) inside the function and y does not exist, it generates an error message. If I do exists(x) it says that x exists even if y does not. If I had a separate argument to hold the text string y then I could check that. But is it possible to check the existence of the argument inside the function without passing its name as a separate argument? Thanks! -- TMK -- 212-460-5430home 917-656-5351cell __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - - Notice: This e-mail message, together with any attachments, contains information of Merck Co., Inc. (One Merck
Re: [R] Optimized File Reading with R
If it's a matrix, use scan(). If the columns are not all the same type, use the colClasses argument to read.table() to specify their types, instead of leaving it to R to guess. That will speed things up quite a lot. Andy From: Lorenzo Isella Dear All, Hope I am not bumping into a FAQ, but so far my online search has been fruitless I need to read some data file using R. I am using the (I think) standard command: data_150-read.table(y_complete06000, header=FALSE) where y_complete06000 is a 6000 by 40 table of numbers. I am puzzled at the fact that R is taking several minutes to read this file. First I thought it may have been due to its shape, but even re-expressing and saving the matrix as a 1D array does not help. It is not a small file, but not even huge (it amounts to about 5Mb of text file). Is there anything I can do to speed up the file reading? Many thanks Lorenzo __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] geeting name of an object to which a variable refers?
Something like this? R f - function(x) deparse(substitute(x)) R a - 1:3 R f(a) [1] a Andy From: new ruser #Sorry for the convoluted subject line. #I have: a=c(1,2,3) x=a #example of user supplied input #Is there any function that will tell me the name of the object x refers to, referring only to x itself? #i.e. the answer I want is a #I want: #fun(x) == 'a' #(I don't think this is possible, but figured I'd ask.) - Got a little couch potato? Check out fun summer activities for kids. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Allocating shelf space
I don't know if there's an R solution, but this sounds to me like some variation of the knapsack problem... http://en.wikipedia.org/wiki/Knapsack_problem Andy From: [EMAIL PROTECTED] Hi Folks, This is not an R question as such, though it may well have an R answer. (And, in any case, this community probably knows more about most things than most others ... indeed, has probably pondered this very question). I: Given a catalogue of hundreds of books, where each entry has author and title (or equivalent ID), and also Ia) The dimensions (thickness, height, depth) of the book Ib) A sort of classification of its subject/type/genre II: Given also a specification of available and possibly potential bookshelf space (numbers of book-cases, the width, height and shelf-spacing of each, and the dimensions of any free wall-space where further book-cases may be placed), where some book-cases have fixed shelves and some have shelves with (discretely) adjustable position, and additional book-cases can be designed to measure (probably with adjustable shelves). Question: Is there a resource to approach the solution of the problem of optimising the placement of adjustable shelves, the design of additional bookcases, and the placement of the books in the resulting shelf-space so as to A: Make the efficient use of space B: Minimise the spatial disclocation of related books (it is acceptable to separate large books from small books on the same subject, for the sake of efficient packing). Awaiting comments and suggestions with interest! With thanks, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 09-May-07 Time: 18:23:53 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] summing values according to a factor
Howdy! I guess what you want to do is compare Q1/T1 among the sections? If you want to compute the sum of Q1/T1 by Section, you can do something like: sum.by.section - with(mydata, tapply(Q1/T1, section, sum)) Substitute sum with anything you want to compute. Cheers, Andy From: Salvatore Enrico Indiogine Greetings! I have exam scores of students in several sections. The data looks like: StuNum Section Q1 T1 111 502 45 123 112 502 23123 113 503 58123 114 504 63 123 115 504 83 123 .. where Q1 is the score for question 1 and T1 is the maximum possible score for question 1 I need to check whether the section has an effect on the scores. I thought about using chisq.test and calculate the sums of scores per section. I think that I have to use apply() but I am lost here. Thanks in advance, Enrico -- Enrico Indiogine Mathematics Education Texas AM University [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R package development in windows
I guess it depends on what you want to be able to do with such a private package; e.g., does it not need to have any documentation (i.e., the Rd files)? If all you want is to be able to access the objects, you can just save() all those objects (mostly functions, I presume) in a .rda file, and whenever you need them. just attach() the .rda file. Andy From: Lucke, Joseph F Might there be an (semi-)automated procedure to create a minimal, personal package, for my eyes only, that I can load with a libray(MyStuff) command? This would be preferable to having to source() the files. Is there already such a procedure? Joe -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Doran, Harold Sent: Thursday, May 03, 2007 2:33 PM To: Duncan Murdoch Cc: r-help@stat.math.ethz.ch Subject: Re: [R] [SPAM] - Re: R package development in windows - BayesianFilter detected spam Thanks, Duncan. I'll look into that. Is there an authoritative document that codifies the new package development procedures for 2.5.0 (windows-specific), or is that Writing R Extensions? In this thread alone I've received multiple emails pointing to multiple web sites with instructions for windows. Inasmuch as its appreciated, I'm a bit confused as to which I should consider authoritative. I do hope I can resolve this and appreciate the help I've received. However, I feel a bit compelled to note how very difficult this process is. Harold -Original Message- From: Duncan Murdoch [mailto:[EMAIL PROTECTED] Sent: Thursday, May 03, 2007 3:24 PM To: Doran, Harold Cc: Gabor Grothendieck; r-help@stat.math.ethz.ch Subject: [SPAM] - Re: [R] R package development in windows - Bayesian Filter detected spam On 5/3/2007 3:04 PM, Doran, Harold wrote: Thanks Gabor, Sundar, and Tony. Indeed, Rtools was missing from the path. With that resolved, and another 10 minute windows restart, I get the following below. The log suggests that hhc is not installed. It is, and, according to the directions I am following, I have placed it in the c:\cygwin directory. I think the problem is that you are following a real mix of instructions, and they don't make sense. It would be nice if folks would submit patches to the R Admin manual (or to the Rtools web site) rather than putting together web sites with advice that is bad from day one, and quickly gets worse when it is not updated. BTW, package.skeleton() doesn't seem to create the correct DESCRIPTION template. I had to add the DEPENDS line. Without this, I get another error. C:\Program Files\R\R-2.4.1\binRcmd build --force --binary g:\foo R 2.4.1 is no longer current; the package building instructions in R 2.5.0 have been simplified a bit. You might want to try those. Duncan Murdoch * checking for file 'g:\foo/DESCRIPTION' ... OK * preparing 'g:\foo': * checking DESCRIPTION meta-information ... OK * removing junk files * checking for LF line-endings in source files * checking for empty or unneeded directories * building binary distribution WARNING * some HTML links may not be found installing R.css in c:/TEMP/Rinst40061099 Using auto-selected zip options '' latex: not found latex: not found latex: not found -- Making package foo latex: not found adding build stamp to DESCRIPTION latex: not found latex: not found latex: not found installing R files latex: not found installing data files latex: not found installing man source files installing indices latex: not found not zipping data installing help Warning: \alias{foo} already in foo-package.Rd -- skipping the one in foo.Rd Building/Updating help pages for package 'foo' Formats: text html latex example chm foo-package texthtmllatex example chm foo texthtmllatex example chm mydatatexthtmllatex example chm hhc: not found cp: cannot stat `c:/TEMP/Rbuild40048815/foo/chm/foo.chm': No such file or direct ory make[1]: *** [chm-foo] Error 1 make: *** [pkg-foo] Error 2 *** Installation of foo failed *** Removing 'c:/TEMP/Rinst40061099/foo' ERROR * installation failed C:\Program Files\R\R-2.4.1\bin -Original Message- From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] Sent: Thursday, May 03, 2007 2:50 PM To: Doran, Harold Cc: r-help@stat.math.ethz.ch Subject: Re: [R] R package development in windows It can find sh.exe so you haven't installed Rtools. There are several HowTo's listed in the links section here that include pointers to R manuals and other step by step instructions:
Re: [R] R question [Broadcast]
Bill, A couple more points: 1. Please use an informative subject line. I'd deleted the original post w/o reading if I didn't catch Marc's reply. 2. Are you sure you have bivariate response? To me bivariate means two variables, and randomForest surely does not handle that (at least for now). Andy From: Marc Schwartz On Fri, 2007-05-04 at 12:05 -0500, Bill Vorias wrote: I had a question about Random Forests. I have a text file with 10 dichotomous variables and a bivariate response vector. I read this file into R as a data frame, and then used the command randomForest(Response ~., dataset, etc.. where Response is the column header of the response variable and dataset is the name of the data frame. I get an error that says Response not found. I was looking at the Iris data example in the R help files, and it seems like this is exactly what they did. Do you have any suggestions? Thanks. R you sure that you have correctly specified the column and data frame names in the call to randomForest()? Be sure to check for typos, including capitalization. You can use: ls() to check for the current objects in your working environment and you can then use: str(YourDataFrame) or names(YourDataFrame) to display information about the detailed structure and/or column names, respectively, in the data frame that you created from the imported data. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to concatinate the elements of some text vectors cat() or print() ?
Is paste() what you're looking for? Andy From: John Kane I have some comment text taken from a SAS data file. It is stored in two vectors and is difficult to read. I would like to simply concatentate the individual entries and end up with a character vector that give me one line of text per comment. I cannot see how to do this, yet it must be very easy. I have played around with cat() and print with no success. Would someone kindly point out where I am going wrong? Thanks Simple Example: aa - LETTERS[1:5] bb - letters[1:5] cat(aa[1], bb[1])# works for individuals cat(aa,bb)#(concatinates entire vectors) # Using sink I might get it to work if I could figure out how to escape a # new line command. encodeString does not seem appropriate here. harry - c(rep(NA,5)) for (i in 1:5 ) { cat (aa[i],bb[i]) } __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] thousand separator (was RE: weight)
I've run into this occasionally. My current solution is simply to read it into Excel, re-format the offending column(s) by unchecking the thousand separator box, and write it back out. Not exactly ideal to say the least. If anyone can provide a better solution in R, I'm all ears... Andy From: Natalie O'Toole Hi, These are the variables in my file. I think the variable i'm having problems with is WTPP which is of the Factor type. Does anyone know how to fix this, please? Thanks, Nat data.frame': 290 obs. of 5 variables: $ PROV : num 48 48 48 48 48 48 48 48 48 48 ... $ REGION: num 4 4 4 4 4 4 4 4 4 4 ... $ GRADE : num 7 7 7 7 7 7 7 7 7 7 ... $ Y_Q10A: num 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 ... $ WTPP : Factor w/ 1884 levels 1,106.8250,1,336.5138,..: 1544 67 1568 40 221 1702 1702 1434 310 310 ... __ --- Douglas Bates [EMAIL PROTECTED] wrote: On 4/28/07, John Kane [EMAIL PROTECTED] wrote: IIRC you have a yes/no smoking variable scored 1/2 ? It is possibly being read in as a factor not as an integer. try class(df$smoking.variable) to see . Good point. In general I would recommend using str(df) to check on the class or storage type of all variables in a data frame if you are getting unexpected results when manipulating it. That function is carefully written to provide a maximum of information in a minimum of space. Yes but I'm an relative newbie at R and didn't realise that str() would do that. I always thought it was some kind of string function. Thanks, it makes life much easier. --- Natalie O'Toole [EMAIL PROTECTED] wrote: Hi, I'm getting an error message: Error in df[, 1:4] * df[, 5] : non-numeric argument to binary operator In addition: Warning message: Incompatible methods (Ops.data.frame, Ops.factor) for * here is my code: ##reading in the file happyguys-read.table(c:/test4.dat, header=TRUE, row.names=1) ##subset the file based on Select If test-subset(happyguys, PROV==48 GRADE == 7 Y_Q10A 9) ##sorting the file mydata-test mydataSorted-mydata[ order(mydata$Y_Q10A), ] print(mydataSorted) ##assigning a different name to file happyguys-mydataSorted ##trying to weight my data data.frame-happyguys df-data.frame df1-df[, 1:4] * df[, 5] ##getting error message here?? Error in df[, 1:4] * df[, 5] : non-numeric argument to binary operator In addition: Warning message: Incompatible methods (Ops.data.frame, Ops.factor) for * Does anyone know what this error message means? I've been reviewing R code all day getting more familiar with it Thanks, Nat -- -- This communication is intended for the use of the recipient to which it is addressed, and may contain confidential, personal, and or privileged information. Please contact the sender immediately if you are not the intended recipient of this communication, and do not copy, distribute, or take action relying on it. Any communication received in error, or subsequent reply, should be deleted or destroyed. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Be smarter than spam. See how smart SpamGuard is at giving junk email the boot with the All-new Yahoo! Mail at http://mrd.mail.yahoo.com/try_beta?.intl=ca -- -- This communication is intended for the use of the recipient to which it is addressed, and may contain confidential, personal, and or privileged information. Please contact the sender immediately if you are not the intended recipient of this communication, and do not copy, distribute, or take action relying on it. Any communication received in error, or subsequent reply, should be deleted or destroyed. [[alternative HTML version deleted]] __
Re: [R] thousand separator (was RE: weight)
Still, though, it would be nice to have the data read in correctly in the first place, instead of having to do this kind of post-processing afterwards... Andy From: Bert Gunter Nothing! My mistake! gsub -- not sub -- is what you want to get 'em all. -- Bert Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Marc Schwartz Sent: Monday, April 30, 2007 10:18 AM To: Bert Gunter Cc: r-help@stat.math.ethz.ch Subject: Re: [R] thousand separator (was RE: weight) Bert, What am I missing? print(as.numeric(gsub(,, , 1,123,456.789)), 10) [1] 1123456.789 FWIW, this is using: R version 2.5.0 Patched (2007-04-27 r41355) Marc On Mon, 2007-04-30 at 10:13 -0700, Bert Gunter wrote: Except this doesn't work for 1,123,456.789 Marc. I hesitate to suggest it, but gregexpr() will do it, as it captures the position of **every** match to ,. This could be then used to process the vector via some sort of loop/apply statement. But I think there **must** be a more elegant way using regular expressions alone, so I, too, await a clever reply. -- Bert Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Marc Schwartz Sent: Monday, April 30, 2007 10:02 AM To: Liaw, Andy Cc: r-help@stat.math.ethz.ch Subject: Re: [R] thousand separator (was RE: weight) One possibility would be to use something like the following post-import: WTPP [1] 1,106.8250 1,336.5138 str(WTPP) Factor w/ 2 levels 1,106.8250,1,336.5138: 1 2 as.numeric(gsub(,, , WTPP)) [1] 1106.825 1336.514 Essentially strip the ',' characters from the factors and then coerce the resultant character vector to numeric. HTH, Marc Schwartz On Mon, 2007-04-30 at 12:26 -0400, Liaw, Andy wrote: I've run into this occasionally. My current solution is simply to read it into Excel, re-format the offending column(s) by unchecking the thousand separator box, and write it back out. Not exactly ideal to say the least. If anyone can provide a better solution in R, I'm all ears... Andy From: Natalie O'Toole Hi, These are the variables in my file. I think the variable i'm having problems with is WTPP which is of the Factor type. Does anyone know how to fix this, please? Thanks, Nat data.frame': 290 obs. of 5 variables: $ PROV : num 48 48 48 48 48 48 48 48 48 48 ... $ REGION: num 4 4 4 4 4 4 4 4 4 4 ... $ GRADE : num 7 7 7 7 7 7 7 7 7 7 ... $ Y_Q10A: num 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 ... $ WTPP : Factor w/ 1884 levels 1,106.8250,1,336.5138,..: 1544 67 1568 40 221 1702 1702 1434 310 310 ... __ --- Douglas Bates [EMAIL PROTECTED] wrote: On 4/28/07, John Kane [EMAIL PROTECTED] wrote: IIRC you have a yes/no smoking variable scored 1/2 ? It is possibly being read in as a factor not as an integer. try class(df$smoking.variable) to see . Good point. In general I would recommend using str(df) to check on the class or storage type of all variables in a data frame if you are getting unexpected results when manipulating it. That function is carefully written to provide a maximum of information in a minimum of space. Yes but I'm an relative newbie at R and didn't realise that str() would do that. I always thought it was some kind of string function. Thanks, it makes life much easier. --- Natalie O'Toole [EMAIL PROTECTED] wrote: Hi, I'm getting an error message: Error in df[, 1:4] * df[, 5] : non-numeric argument to binary operator In addition: Warning message: Incompatible methods (Ops.data.frame, Ops.factor) for * here is my code: ##reading in the file happyguys-read.table(c:/test4.dat, header=TRUE, row.names=1) ##subset the file based on Select If test-subset(happyguys, PROV==48 GRADE == 7 Y_Q10A 9) ##sorting the file mydata-test mydataSorted-mydata[ order(mydata$Y_Q10A), ] print(mydataSorted) ##assigning a different name to file happyguys-mydataSorted ##trying to weight my data data.frame-happyguys df-data.frame df1-df[, 1:4] * df[, 5] ##getting error message here?? Error in df[, 1:4] * df[, 5] : non-numeric argument to binary operator In addition: Warning message
Re: [R] thousand separator (was RE: weight)
Looks very neat, Gabor! I just cannot fathom why anyone who want to write numerics with those separators in a flat file. That's usually not for human consumption, and computers don't need those separators! Andy From: Gabor Grothendieck That could be accomplished using a custom class like this: library(methods) setClass(num.with.junk) setAs(character, num.with.junk, function(from) as.numeric(gsub(,, , from))) ### test ### Input - A B 1,000 1 2,000 2 3,000 3 DF - read.table(textConnection(Input), header = TRUE, colClasses = c(num.with.junk, numeric)) str(DF) On 4/30/07, Liaw, Andy [EMAIL PROTECTED] wrote: Still, though, it would be nice to have the data read in correctly in the first place, instead of having to do this kind of post-processing afterwards... Andy From: Bert Gunter Nothing! My mistake! gsub -- not sub -- is what you want to get 'em all. -- Bert Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Marc Schwartz Sent: Monday, April 30, 2007 10:18 AM To: Bert Gunter Cc: r-help@stat.math.ethz.ch Subject: Re: [R] thousand separator (was RE: weight) Bert, What am I missing? print(as.numeric(gsub(,, , 1,123,456.789)), 10) [1] 1123456.789 FWIW, this is using: R version 2.5.0 Patched (2007-04-27 r41355) Marc On Mon, 2007-04-30 at 10:13 -0700, Bert Gunter wrote: Except this doesn't work for 1,123,456.789 Marc. I hesitate to suggest it, but gregexpr() will do it, as it captures the position of **every** match to ,. This could be then used to process the vector via some sort of loop/apply statement. But I think there **must** be a more elegant way using regular expressions alone, so I, too, await a clever reply. -- Bert Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Marc Schwartz Sent: Monday, April 30, 2007 10:02 AM To: Liaw, Andy Cc: r-help@stat.math.ethz.ch Subject: Re: [R] thousand separator (was RE: weight) One possibility would be to use something like the following post-import: WTPP [1] 1,106.8250 1,336.5138 str(WTPP) Factor w/ 2 levels 1,106.8250,1,336.5138: 1 2 as.numeric(gsub(,, , WTPP)) [1] 1106.825 1336.514 Essentially strip the ',' characters from the factors and then coerce the resultant character vector to numeric. HTH, Marc Schwartz On Mon, 2007-04-30 at 12:26 -0400, Liaw, Andy wrote: I've run into this occasionally. My current solution is simply to read it into Excel, re-format the offending column(s) by unchecking the thousand separator box, and write it back out. Not exactly ideal to say the least. If anyone can provide a better solution in R, I'm all ears... Andy From: Natalie O'Toole Hi, These are the variables in my file. I think the variable i'm having problems with is WTPP which is of the Factor type. Does anyone know how to fix this, please? Thanks, Nat data.frame': 290 obs. of 5 variables: $ PROV : num 48 48 48 48 48 48 48 48 48 48 ... $ REGION: num 4 4 4 4 4 4 4 4 4 4 ... $ GRADE : num 7 7 7 7 7 7 7 7 7 7 ... $ Y_Q10A: num 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 ... $ WTPP : Factor w/ 1884 levels 1,106.8250,1,336.5138,..: 1544 67 1568 40 221 1702 1702 1434 310 310 ... __ --- Douglas Bates [EMAIL PROTECTED] wrote: On 4/28/07, John Kane [EMAIL PROTECTED] wrote: IIRC you have a yes/no smoking variable scored 1/2 ? It is possibly being read in as a factor not as an integer. try class(df$smoking.variable) to see . Good point. In general I would recommend using str(df) to check on the class or storage type of all variables in a data frame if you are getting unexpected results when manipulating it. That function is carefully written to provide a maximum of information in a minimum of space. Yes but I'm an relative newbie at R and didn't realise that str() would do that. I always thought it was some kind of string function. Thanks, it makes life much easier. --- Natalie O'Toole [EMAIL PROTECTED] wrote: Hi, I'm getting an error message: Error in df[, 1:4] * df[, 5] : non-numeric argument to binary operator
Re: [R] NA and NaN randomForest
Hi Clayton, If you use the formula interface, then it should do what you want: R library(randomForest) randomForest 4.5-18 Type rfNews() to see new features/changes/bug fixes. R iris1 - iris[-(1:5),] R iris2 - iris[1:5,] R iris2[1, 3] - NA R iris2[3, 1] - NA R iris.rf - randomForest(Species ~ ., iris1) R predict(iris.rf, iris2[-5]) [1] NA setosa NA setosa setosa Levels: setosa versicolor virginica The problem, of course, is that the formula interface is not suitable for data with large number of variables. I'll look into doing the same thing in the default method. Andy From: [EMAIL PROTECTED] Dear R-help, This is about randomForest's handling of NA and NaNs in test set data. Currently, if the test set data contains an NA or NaN then predict.randomForest will skip that row in the output. I would like to change that behavior to outputting an NA. Can this be done with flags to randomForest? If not can some sort of wrapper be built to put the NAs back in? thanks, Clayton _ CONFIDENTIALITY NOTICE\ \ The information contained in this ...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Getting Confused [Broadcast]
If you are serious in getting useful help, please do try to follow suggestions in the Posting Guide. You have not told us anything about your OS, the versions of R you tried to install, and exactly what you typed to build/install them. Many Linux distro by default do not install the Fortran part of GCC, so don't be surprised if that's the case for you (if you are trying to do this on some version of Linux). Andy From: Steiner, Julien Hello, I'm getting confused with my experience of R installing. I had R installed on January without any trouble. (I just had to install gcc4.1.1) Now I'd like to install a packages which requires tcl/tk. So basically I need to reconfigure and re install R right after having installed tcl/tk. So I installed tcl/tk I run the process to install R but I receive this error : checking for dummy main to link with Fortran libraries... none checking for Fortran name-mangling scheme... configure: error: cannot compile a simple Fortran program See `config.log' for more details. I checked in the config.log and the fact is that there's no fortran compiler installed. But don't gcc already have a fortran compiler in it? If somebody could help I would be thankful and especially if somebody has a clue why it worked without any error before and now yes. Thanks a lot Julien Steiner [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to convert the lower triangle of a matrix to a symmetricmatrix [Broadcast]
Ranjan and Prof. Fox, Similar approach can be found in stats:::as.matrix.dist(). Andy From: John Fox Dear Ranjan, If the elements are ordered by rows, then the following should do the trick: X - diag(p) X[upper.tri(X, diag=TRUE)] - elements X - X + t(X) - diag(diag(X)) If they are ordered by columns, substitute lower.tri() for upper.tri(). I hope this helps, John John Fox, Professor Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ranjan Maitra Sent: Thursday, April 19, 2007 9:28 PM To: r-help@stat.math.ethz.ch Subject: [R] how to convert the lower triangle of a matrix to a symmetricmatrix Hi, I have a vector of p*(p+1)/2 elements, essentially the lower triangle of a symmetric matrix. I was wondering if there is an easy way to make it fill a symmetric matrix. I have to do it several times, hence some efficient approach would be very useful. Many thanks and best wishes, Ranjan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Suggestions for statistical computing course
I really like John Monahan's Numerical Methods of Statistics (Cambridge University Press). As to running/editing R scripts, you may want to look into JGR. The built-in editor is not as smart as ESS in some respect, but smarter than ESS in others. The only thing that keep me from using it regularly is the fact that it won't take arguments to R itself (at least on Windows): I need the --internet2 argument to be able to access the net from R. Andy From: Giovanni Petris Dear R-helpers, I am planning a course on Statistical Computing and Computational Statistics for the Fall semester, aimed at first year Masters students in Statistics. Among the topics that I would like to cover are linear algebra related to least squares calculations, optimization and root-finding, numerical integration, Monte Carlo methods (possibly including MCMC), bootstrap, smoothing and nonparametric density estimation. Needless to say, the software I will be using is R. 1. Does anybody have a suggestion about a book to follow that covers (most of) the topics above at a reasonable revel for my audience? Are there any on-line publicly-available manuals, lecture notes, instructional documents that may be useful? 2. I do most of my work in R using Emacs and ESS. That means that I keep a file in an emacs window and I submit it to R one line at a time or one region at a time, making corrections and iterating as needed. When I am done, I just save the file with the last, working, correct (hopefully!) version of my code. Is there a way of doing something like that, or in the same spirit, without using Emacs/ESS? What approach would you use to polish and save your code in this case? For my course I will be working in a Windows environment. While I am looking for simple and effective solutions that do not require installing emacs in our computer lab, the answer you should teach your students emacs/ess on top of R is perfecly acceptable. Thank you for your consideration, and thank you in advance for the useful replies. Have a good day, Giovanni -- Giovanni Petris [EMAIL PROTECTED] Department of Mathematical Sciences University of Arkansas - Fayetteville, AR 72701 Ph: (479) 575-6324, 575-8630 (fax) http://definetti.uark.edu/~gpetris/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help on averaging sets of rows defined by row name
You might want to check which of the following scales better for the size of data you have. ## Make up some data to try. R dat - data.frame(gene=rep(letters[1:3], each=3), s1=runif(9), s2=runif(9)) R dat genes1s2 1a 0.9959172 0.9531052 2a 0.2064497 0.4257022 3a 0.4791100 0.5977923 4b 0.1307096 0.8256453 5b 0.7887983 0.8904983 6b 0.7841745 0.6901540 7c 0.3356583 0.7125086 8c 0.5859311 0.0509323 9c 0.7681325 0.8677725 ## Use aggregate(): R aggregate(dat[-1], dat[1], mean) genes1s2 1a 0.5604923 0.6588666 2b 0.5678941 0.8020992 3c 0.5632407 0.5437378 ## Do it by hand: need a bit more work if there are Nas. R rowsum(dat[-1], dat[[1]]) / table(dat[[1]]) s1s2 a 0.5604923 0.6588666 b 0.5678941 0.8020992 c 0.5632407 0.5437378 Andy From: Booman, M Dear all, This is my problem: I have a table of gene expression data, where 1st column is gene name, and 2nd -39th columns each are exression data for 38 samples. There are multiple measurements per sample for each gene, so there are multiple rows for each gene name. I want to average these measurements so i end up with one value per sample for each gene name. The output data frame (table.averaged) is further used in other R script. The code I use now (see below) takes 20 secs for each loop, so it takes 45 minutes to average my files of 13500 unique genes. Can anyone help me do this faster? Cheers, marije Code I use: table.imputed[,1] - as.character(table.imputed[,1]) #table.imputed is data.frame,1st column = gene name (class factor), rest of columns = expression data (class numeric) genesunique - unique(table.imputed[,1]) #To make list of unique genes in the set table.averaged - NULL for (j in 1:length(genesunique)) { if (j%%100 == 0){ #To report progress cat(j, genes finished, sep= , fill=TRUE) } table.averaged-rbind(table.averaged,givemean(genesunique[j], table.imputed)) #collects all rows of average values and binds them back into one data frame } givemean - function (gene, table.imputed) { thisgene-table.imputed[table.imputed[,1]==gene,] #make a subset containing only the rows for one gene name data.frame(gene,t(sapply(thisgene[,2:ncol(thisgene)],mean, na.rm=TRUE))) #calculates average for each sample (column) and outputs one row of average values and the gene name } De inhoud van dit bericht is vertrouwelijk en alleen bestemd voor de geadresseerde(n). Anderen dan de geadresseerde mogen geen gebruik maken van dit bericht, het openbaar maken of op enige wijze verspreiden of vermenigvuldigen. Het UMCG kan niet aansprakelijk gesteld worden voor een incomplete aankomst of vertraging van dit verzonden bericht. The contents of this message are confidential and only intended for the eyes of the addressee(s). Others than the addressee(s) are not allowed to use this message, to make it public or to distribute or multiply this message in any way. The UMCG cannot be held responsible for incomplete reception or delay of this transferred message. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help on averaging sets of rows defined by row name
Do note that I used dat[1] instead of dat[,1] or dat[[1]] as the second argument to aggregate(): If dat is a data frame, then dat[1] is also a data frame with only the first column. Since data frame is also a list, dat[1] is a one-component list. My guess is that Tierry didn't try his suggestion, or else he would have noticed the error. Andy From: Booman, M [mailto:[EMAIL PROTECTED] Sent: Friday, April 20, 2007 10:26 AM To: Liaw, Andy; r-help@stat.math.ethz.ch Subject: RE: [R] Help on averaging sets of rows defined by row name Thanks for your help everyone! I had some trouble with the 'aggregate' function because the 'table.impute[,1]' was not a list (which the 'by' argument should be), and it took a very very long time to coerce it into one. But the rowmeans method works almost instantly! And I have no problems with NA's because I used a knn imputer first. -Original Message- From: Liaw, Andy [mailto:[EMAIL PROTECTED] Sent: Fri 4/20/2007 4:09 PM To: Booman, M; r-help@stat.math.ethz.ch Subject: RE: [R] Help on averaging sets of rows defined by row name You might want to check which of the following scales better for the size of data you have. ## Make up some data to try. R dat - data.frame(gene=rep(letters[1:3], each=3), s1=runif(9), s2=runif(9)) R dat genes1s2 1a 0.9959172 0.9531052 2a 0.2064497 0.4257022 3a 0.4791100 0.5977923 4b 0.1307096 0.8256453 5b 0.7887983 0.8904983 6b 0.7841745 0.6901540 7c 0.3356583 0.7125086 8c 0.5859311 0.0509323 9c 0.7681325 0.8677725 ## Use aggregate(): R aggregate(dat[-1], dat[1], mean) genes1s2 1a 0.5604923 0.6588666 2b 0.5678941 0.8020992 3c 0.5632407 0.5437378 ## Do it by hand: need a bit more work if there are Nas. R rowsum(dat[-1], dat[[1]]) / table(dat[[1]]) s1s2 a 0.5604923 0.6588666 b 0.5678941 0.8020992 c 0.5632407 0.5437378 Andy From: Booman, M Dear all, This is my problem: I have a table of gene expression data, where 1st column is gene name, and 2nd -39th columns each are exression data for 38 samples. There are multiple measurements per sample for each gene, so there are multiple rows for each gene name. I want to average these measurements so i end up with one value per sample for each gene name. The output data frame (table.averaged) is further used in other R script. The code I use now (see below) takes 20 secs for each loop, so it takes 45 minutes to average my files of 13500 unique genes. Can anyone help me do this faster? Cheers, marije Code I use: table.imputed[,1] - as.character(table.imputed[,1]) #table.imputed is data.frame,1st column = gene name (class factor), rest of columns = expression data (class numeric) genesunique - unique(table.imputed[,1]) #To make list of unique genes in the set table.averaged - NULL for (j in 1:length(genesunique)) { if (j%%100 == 0){ #To report progress cat(j, genes finished, sep= , fill=TRUE) } table.averaged-rbind(table.averaged,givemean(genesunique[j], table.imputed)) #collects all rows of average values and binds them back into one data frame } givemean - function (gene, table.imputed) { thisgene-table.imputed[table.imputed[,1]==gene,] #make a subset containing only the rows for one gene name data.frame(gene,t(sapply(thisgene[,2:ncol(thisgene)],mean, na.rm=TRUE))) #calculates average for each sample (column) and outputs one row of average values and the gene name } De inhoud van dit bericht is vertrouwelijk en alleen bestemd voor de geadresseerde(n). Anderen dan de geadresseerde mogen geen gebruik maken van dit bericht, het openbaar maken of op enige wijze verspreiden of vermenigvuldigen. Het UMCG kan niet aansprakelijk gesteld worden voor een incomplete aankomst of vertraging van dit verzonden bericht. The contents of this message are confidential and only intended for the eyes of the addressee(s
Re: [R] How to return more than one variable from function
From: Vincent Goulet Le Vendredi 20 Avril 2007 07:46, Julien Barnier a écrit : Hi, I have written a function which computes variance, sd, r^2, R^2adj etc. But i am not able to return all of them in return statement. You can return a vector, or a list. For example : func - function() { ... result - list(variance=3, sd=sqrt(3)) return(result) # you can omit this } Nitpicking and for the record: if you omit the return(result) line, the function will return nothing since it ends with an assignment. Have you actually checked? Counterexample: R f - function(x) y - 2 * x R f(3) R (z - f(3)) [1] 6 R f2 - function(x) { y - 2 * x; y } R f2(3) [1] 6 Furthermore, explicit use of return() is never needed at the end of a function. The above snippet is correct, but this is enough: func - function() { ... result -list(variance=3, sd=sqrt(3)) result } But then, why assign to a variable just to return its value? Better still: func - function() { ... list(variance=3, sd=sqrt(3)) } Or, as has been suggested, if all values to be returned are of the same type, just use a (named) vector: func - function(...) { ... c(Variance=3, R-squared=0.999) } Andy a - func() a$variance a$sd HTH, Julien -- Vincent Goulet, Professeur agrégé École d'actuariat Université Laval, Québec [EMAIL PROTECTED] http://vgoulet.act.ulaval.ca __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Udpate R under a proxy
This is what I just tried (thanks, Dirk!): Start R and then Sys.putenv(http_proxy, whatever), options(download.file.method=wget) doesn't work. Open up a command prompt, define http_proxy there, then run Rgui. Set options(download.file.method=wget). This works. Perhaps you can define http_proxy in Renviron. I have not tried that. Andy From: justin bem dear all, I get internet via a proxy server when I try to downlaod package it always fail. Even when i add and environnment variable for the http proxy server. I use windows XP SP2 Sincerly Justin BEM Elève Ingénieur Statisticien Economiste BP 294 Yaoundé. Tél (00237)9597295. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] general question about plotting multiple regression results [Broadcast]
I suspect you'll greatly benefit a read of Prof. Fox's book(s) on regression models, as well as making use of his car package. You may want to read up on partial residual plots and partial regression plots. Andy From: Simon Pickett Hi all, I have been bumbling around with r for years now and still havent come up with a solution for plotting reliable graphs of relationships from a linear regression. Here is an example illustrating my problem 1.I do a linear regression as follows summary(lm(n.day13~n.day1+ffemale.yell+fmale.yell+fmale.chroma ,data=surv)) which gives some nice sig. results Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -0.739170.43742 -1.690 0.093069 . n.day11.004600.05369 18.711 2e-16 *** ffemale.yell 0.224190.06251 3.586 0.000449 *** fmale.yell0.258740.06925 3.736 0.000262 *** fmale.chroma 0.235250.11633 2.022 0.044868 * 2. I want to plot the effect of ffemale.yell, fmale.yell and fmale.chroma on my response variable. So, I either plot the raw values (which is fine when there is a very strong relationship) but what if I want to plot the effects from the model? In this case I would usually plot the fitted values values against the raw values of x... Is this the right approach? fit-fitted(lm(n.day13~n.day1+ffemale.yell+fmale.yell+fmale.ch roma,data=fsurv1)) plot(fit~ffemale.yell) #make a dummy variable across the range of x x-seq(from=min(fsurv1$ffemale.yell),to=max(fsurv1$ffemale.yel l), length=100) #get the coefficients and draw the line co-coef(lm(fit~ffemale.yell,data=fsurv1)) y-(co[2]*x)+co[1] lines(x,y, lwd=2) This often does the trick but for some reason, especially when my model has many terms in it or when one of the independent variables is only significant when the other independent variables are in the equation, it gives me strange lines. Please can someone show me the light? Thanks in advance, Simon. Simon Pickett PhD student Centre For Ecology and Conservation Tremough Campus University of Exeter in Cornwall TR109EZ Tel 01326371852 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R in cron job: X problems
This is in the FAQ, if I remember correctly... However, alternatively: As Jeff Horner recently pointed out on the list, the Cairo package is a good way of generating png without needing an X display. You may want to look into that. I've just installed cairo on our CentOS boxes and the Cairo package from CRAN. Andy From: Mark Liberman I'd like to use an R CMD BATCH script as part of a chron job that is set up to run every hour. The trouble is that the script creates a graphical output in a file via png(), and apparently this in turn works through X. When cron invokes the job, no X server is available -- I suppose that the DISPLAY variable is not set -- and so R exits with an error message in the output file. (If I run the same script in an environment where an X server is properly available, it works as I want it to.) I tried setting DISPLAY to thecomputername:0.0 (where thecomputername is the X.Y.Z form of the computer's name as names it for ssh etc.), but that didn't work. Any advice about how to persuade the graphics subsystem to bypass X, or how to set DISPLAY in a safe way to run in a cron job? This is a linux system (a recent RedHat server system) with R 2.2.1. Thanks, Mark Liberman __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] installing new packages
See if 2.19 The Internet download functions fail in the R for Windows FAQ helps. Andy From: Bill Shipley Hello, I have just installed the newest version of R (2.4.1) for Windows XP. I can no longer install new packages. When trying to connect to a server (I have tried several) I get the following message: chooseCRANmirror() Error in open.connection(file, r) : unable to open connection In addition: Warning message: unable to connect to 'cran.r-project.org' on port 80. Have other people had the same problem with this version, or is it unique to my computer? Can someone suggest a solution? Thanks. Bill Shipley [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subsetting list of vectors with list of (boolean) vectors?
From: Marc Schwartz On Thu, 2007-04-12 at 18:12 +0200, Johannes Graumann wrote: Dear Rologists, I'm stuck with this. How would you do this efficiently: aPGI [[1]] [1] 864 5576 aPGItest [[1]] [1] TRUE FALSE result - [magic box involving subset) result [[1]] [1] 864 Thanks for any hints, Joh lapply(seq(along = length(aPGI)), function(x) aPGI[[x]][aPGItest[[x]]]) [[1]] [1] 864 Alternatively: R mapply([, aPGI, aPGItest, SIMPLIFY=FALSE) [[1]] [1] 864 Cheers, Andy I think that this should be a generic solution for multiple (but common) levels in each list. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random Forest Imputations [Broadcast]
Please provide the information the posting guide asks (version of R, packages used, version of package used, etc). There are no yaImpute() or yai() functions in the randomForest package. Andy From: [EMAIL PROTECTED] on behalf of Ricky Jacob Sent: Wed 4/11/2007 5:55 AM To: r-help@stat.math.ethz.ch Subject: [R] Random Forest Imputations [Broadcast] Dear All, I am not able to run the random forest with my dataset.. X- 280 records with satellite data(28 columns) - B1min, b1max, b1std etc.. y- 280 records with 3 columns - TotBasal Area, Stem density and Volume yref - y[1:230,] #Keeping 1st 230 records as reference records want to set 0 to y values for records 231 to 280.. yimp - y[231:280,] #records for which we want to impute the basal area, stem density and volume mal1 - yai(x=x, y=yref, method=mahalanobis, k=1, noRefs = TRUE) # This works fine for mahalanobis, msn, gnn, raw and Euclidean Want to do a similar thing with random forest where the 1st 230 records alone should be used for calculating Nearest Neighbours for the records with number 231 to 280.. What needs to be done.. Went through the yaImpute document.. but all i could do without any error message was to have NN generated using the yai() where all 280 records have been used for finding nearest neighbour. Regards Ricky [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random Forest Imputations [Broadcast] [Broadcast]
The package has a doc/ subdirectory (in the pre-compiled package, or inst/doc in the source package), which contains yaImputePaper.pdf. Page 9 of that document may be of some help to you. This is the first time I've seen this package, so can't help you much there. It looks like the package authors would like me to add some feature to the randomForest package (which I maintain). I'll look into that. Andy From: Ricky Jacob [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 11, 2007 7:11 AM To: Liaw, Andy Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Random Forest Imputations [Broadcast] [Broadcast] I am currently using R 2.4.1 version. Am using the yaImpute package for k-NN imputation.. http://forest.moscowfsl.wsu.edu/gems/yaImpute.pdf In yaImpute, i am using the yai function which uses randomForest as a method for finding out the k-Nearest Neighbours.. http://cran.r-project.org/doc/packages/yaImpute.pdf With the help iof the example given I was able to use the other methods available. from the document, and the MoscowMtStJoe exampe, is similar to the work i am trying to do. But the y variable needs to be entered in the form of a factor for random forest. what can be done here?! On 4/11/07, Liaw, Andy [EMAIL PROTECTED] wrote: Please provide the information the posting guide asks (version of R, packages used, version of package used, etc). There are no yaImpute() or yai() functions in the randomForest package. Andy From: [EMAIL PROTECTED] on behalf of Ricky Jacob Sent: Wed 4/11/2007 5:55 AM To: r-help@stat.math.ethz.ch Subject: [R] Random Forest Imputations [Broadcast] Dear All, I am not able to run the random forest with my dataset.. X- 280 records with satellite data(28 columns) - B1min, b1max, b1std etc.. y- 280 records with 3 columns - TotBasal Area, Stem density and Volume yref - y[1:230,] #Keeping 1st 230 records as reference records want to set 0 to y values for records 231 to 280.. yimp - y[231:280,] #records for which we want to impute the basal area, stem density and volume mal1 - yai(x=x, y=yref, method=mahalanobis, k=1, noRefs = TRUE) # This works fine for mahalanobis, msn, gnn, raw and Euclidean Want to do a similar thing with random forest where the 1st 230 records alone should be used for calculating Nearest Neighbours for the records with number 231 to 280.. What needs to be done.. Went through the yaImpute document.. but all i could do without any error message was to have NN generated using the yai() where all 280 records have been used for finding nearest neighbour. Regards Ricky [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments, contains information of Merck Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp Dohme or MSD and in Japan, as Banyu - direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete
Re: [R] Reasons to Use R [Broadcast]
From: Douglas Bates On 4/10/07, Wensui Liu [EMAIL PROTECTED] wrote: Greg, As far as I understand, SAS is more efficient handling large data probably than S+/R. Do you have any idea why? SAS originated at a time when large data sets were stored on magnetic tape and the only reasonable way to process them was sequentially. Thus most statistics procedures in SAS act as filters, processing one record at a time and accumulating summary information. In the past SAS performed a least squares fit by accumulating the crossproduct of [X:y] and then using the using the sweep operator to reduce that matrix. For such an approach the number of observations does not affect the amount of storage required. Adding observations just requires more time. This works fine (although there are numerical disadvantages to this approach - try mentioning the sweep operator to an expert in numerical linear algebra - you get a blank stare) For those who stared blankly at the above: The sweep operator is just a facier version of the good old Gaussian elimination... Andy as long as the operations that you wish to perform fit into this model. Making the desired operations fit into the model is the primary reason for the awkwardness in many SAS analyses. The emphasis in R is on flexibility and the use of good numerical techniques - not on processing large data sets sequentially. The algorithms used in R for most least squares fits generate and analyze the complete model matrix instead of summary quantities. (The algorithms in the biglm package are a compromise that work on horizontal sections of the model matrix.) If your only criterion for comparison is the ability to work with very large data sets performing operations that can fit into the filter model used by SAS then SAS will be a better choice. However you do lock yourself into a certain set of operations and you are doing it to save memory, which is a commodity that decreases in price very rapidly. As mentioned in other replies, for many years the majority of SAS uses are for data manipulation rather than for statistical analysis so the filter model has been modified in later versions. On 4/10/07, Greg Snow [EMAIL PROTECTED] wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bi-Info (http://members.home.nl/bi-info) Sent: Monday, April 09, 2007 4:23 PM To: Gabor Grothendieck Cc: Lorenzo Isella; r-help@stat.math.ethz.ch Subject: Re: [R] Reasons to Use R [snip] So what's the big deal about S using files instead of memory like R. I don't get the point. Isn't there enough swap space for S? (Who cares anyway: it works, isn't it?) Or are there any problems with S and large datasets? I don't get it. You use them, Greg. So you might discuss that issue. Wilfred This is my understanding of the issue (not anything official). If you use up all the memory while in R, then the OS will start swapping memory to disk, but the OS does not know what parts of memory correspond to which objects, so it is entirely possible that the chunk swapped to disk contains parts of different data objects, so when you need one of those objects again, everything needs to be swapped back in. This is very inefficient. S-PLUS occasionally runs into the same problem, but since it does some of its own swapping to disk it can be more efficient by swapping single data objects (data frames, etc.). Also, since S-PLUS is already saving everything to disk, it does not actually need to do a full swap, it can just look and see that a particular data frame has not been used for a while, know that it is already saved on the disk, and unload it from memory without having to write it to disk first. The g.data package for R has some of this functionality of keeping data on the disk until needed. The better approach for large data sets is to only have some of the data in memory at a time and to automatically read just the parts that you need. So for big datasets it is recommended to have the actual data stored in a database and use one of the database connection packages to only read in the subset that you need. The SQLiteDF package for R is working on automating this process for R. There are also the bigdata module for S-PLUS and the biglm package for R have ways of doing some of the common analyses using chunks of data at a time. This idea is not new. There was a program in the late 1970s and 80s called Rummage by Del Scott (I guess technically it still exists, I have a copy on a 5.25 floppy somewhere) that used the approach of specify the model you wanted to fit first, then specify the data file. Rummage would then figure out which
Re: [R] Reasons to Use R [Broadcast]
I've probably been away from SAS for too long... we've recently tried to get SAS on our 64-bit Linux boxes (because SAS on PC is not sufficient for some of my colleagues who need it). I was shocked by the quote for our 28-core Scyld cluster--- the annual fee was a few times the total cost of our hardware. We ended up buying a new quad 3GHz Opterons box with 32GB ram just so that the fee for SAS on such a box would be more tolerable. It just boggles my mind that the right to use SAS for a year is about the price of a nice four-bedroom house (near SAS Institute!). I don't understand people who rather pay that kind of price for the software, instead of spending the money on state-of-the-art hardware and save more than a bundle. Just my $0.02... Andy From: Jorge Cornejo-Donoso I have a Dell with 2 Intel XEON 3.0 procesors and 2GB of ram The problem is the DB size. -Mensaje original- De: Gabor Grothendieck [mailto:[EMAIL PROTECTED] Enviado el: Lunes, 09 de Abril de 2007 11:28 Para: Jorge Cornejo-Donoso CC: r-help@stat.math.ethz.ch Asunto: Re: [R] Reasons to Use R Have you tried 64 bit machines with larger memory or do you mean that you can't use R on your current machines? Also have you tried S-Plus? Will that work for you? The transition from that to R would be less than from SAS to R. On 4/9/07, Jorge Cornejo-Donoso [EMAIL PROTECTED] wrote: tha s9ze of db is an issue with R. We are still using SAS because R can't handle own db, and of couse we don't want to sacrify resolution, because the data collection is expensive (at least in fisheries and oceagraphy), so.. I think that R need to improve the use of big DBs. Now I only can use R for graph preparation and some data analisis, but we can't do the main work on R, abd that is really sad. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subset [Broadcast]
From: Thomas Lumley On Mon, 26 Mar 2007, Marc Schwartz wrote: Sergio, Please be sure to cc: the list (ie. Reply to All) with follow up questions. In this case, you would use %in% with a negation: NewDF - subset(DF, (var1 == 0) (var2 == 0) (!var3 %in% 2:3)) Probably a typo: should be !(var3 %in% 2:3) rather than (!var %in% 2:3) I used to think so, but found I didn't need the parens: R a - 1:3; b - c(1, 3, 5) R ! a %in% b [1] FALSE TRUE FALSE R ! (a %in% b) [1] FALSE TRUE FALSE Andy -thomas See ?%in% for more information. HTH, Marc On Mon, 2007-03-26 at 17:30 +0200, Sergio Della Franca wrote: Ok, this run correctly. Another question for you: I want to put more than one condition for var3, i.e.: I like to create a subset when: - var1=0 - var2=0 - var3 is different from 2 and from 3. Like you suggested, i perform this code: NewDF - subset(DF, (var1 == 0) (var2 == 0) (var 3 != 2)) (var 3 != 3)) There is a method to combine (var 3 != 2)) (var 3 != 3)) in one condition? Thank you. Sergio 2007/3/26, Marc Schwartz [EMAIL PROTECTED]: On Mon, 2007-03-26 at 17:02 +0200, Sergio Della Franca wrote: Dear R-Helpers, I want to make a subset from my data set. I'd like to perform different condition for subset. I.e.: I like to create a subset when: - var1=0 - var2=0 - var3 is different from 2. How can i develop a subset under this condition? Thank you in advance. Sergio Della Franca. See ?subset Something along the lines of the following should work: NewDF - subset(DF, (var1 == 0) (var2 == 0) (var 3 != 0)) HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] frequency tables and sorting by rowSum
1. This is probably overkill, but works: as.data.frame(table(as.data.frame(m))) V1 V2 V3 Freq 1 0 0 00 2 1 0 02 3 0 1 03 4 1 1 00 5 0 0 11 6 1 0 10 7 0 1 10 8 1 1 10 You can easily get rid of 0-frequency rows afterward. 2. Not sure what you want, but guessing something like: m.sorted - m[order(rowSums(m)), order(colSums(m))] Andy From: [EMAIL PROTECTED] on behalf of Stefan Nachtnebel Sent: Sat 3/24/2007 8:41 AM To: r-help@stat.math.ethz.ch Subject: [R] frequency tables and sorting by rowSum [Broadcast] Dear list, I have some trouble generating a frequency table over a number of vectors. Creating these tables over simple numbers is no problem with table() table(c(1,1,1,3,4,5)) 1 3 4 5 3 1 1 1 , but how can i for example turn: 0 1 0 0 0 1 0 1 0 1 0 0 0 1 0 1 0 0 into 0 0 1 1 1 0 0 2 0 1 0 3 My second problem is, sorting rows and columns of a matrix by the rowSums/colSums. I did it this way, but i think there should be a more efficient way: sortRowCol-function(taus) { swaprow - function(rsum) { taus[(rowSums(taus)==rsum),] } for( i in 1:2 ) taus-sapply(sort(rowSums(taus)),swaprow) } thanks in advantage, Stefan Nachtnebel -- Feel free - 5 GB Mailbox, 50 FreeSMS/Monat ... Jetzt GMX ProMail testen: www.gmx.net/de/go/mailfooter/promail-out __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Get home directory and simple I/O
From: Gabor Grothendieck See: ?R.home That's not what Alberto wanted: It gives the location of the R installation, not where user's home directory is. AFAIK Windows does not set the HOME environment variable by default. ?dput On 3/23/07, Alberto Monteiro [EMAIL PROTECTED] wrote: Is there any generic function that gets the home directory? This should return /home/user in Linux and x:/Documents and Settings/user (or whatever) in Windows XP. Another (unrelated) question: what is the _simplest_ way to read and write R variables to/from files such that they are stored in a human-readable but R-like form? For example, if (say), x is a vector defined as x - c(1, 2, 3), can I write (and read) x as a file with just one line, namely: c(1, 2, 3) ? Alberto Monteiro __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] objects of class matrix and mode list? [Broadcast]
It may help to (re-)read ?sapply a bit more in detail. Simplification is done only if it's possible, and what possible means is defined there. A list is a vector whose elements can be different objects, but a vector nonetheless. Thus a list can have dimensions. E.g., R a - list(1, 1:2, 3, c(abc, def)) R dim(a) - c(2, 2) R a [,1] [,2] [1,] 1 3 [2,] Integer,2 Character,2 That sometimes can be extremely useful (not like the example above!). Andy From: Stephen Tucker Hello everyone, I cannot seem to find information about objects of class matrix and mode list, and how to handle them (apart from flattening the list). I get this type of object from using sapply(). Sorry for the long example, but the code below illustrates how I get this type of object. Is anyone aware of documentation regarding this object? Thanks very much, Stephen = begin example # I am just making up a fake data set df - data.frame(Day=rep(1:3,each=24),Hour=rep(1:24,times=3), Name1=rnorm(24*3),Name2=rnorm(24*3)) # define a function to get a set of descriptive statistics tmp - function(x) { # this function will accept a data frame # and return a 1-row data frame of # max value, colname of max, min value, and colname of min return(data.frame(maxval=max(apply(x,2,max)), maxloc=names(x)[which.max(apply(x,2,max))], minval=min(apply(x,2,min)), minloc=names(x)[which.min(apply(x,2,min))])) } # Now applying function to data: # (1) split the data table by Day with split() # (2) apply the tmp function defined above to each data frame from (1) # using lapply() # (3) transpose the final matrix and convert it to a data frame # with mixed characters and numbers # using as.data.frame(), lapply(), and type.convert() final - as.data.frame(lapply(as.data.frame(t(sapply(split(df[,-c(1:2)], + f=df$Day),tmp))), + type.convert,as.is=TRUE)) Error in type.convert(x, na.strings, as.is, dec) : the first argument must be of mode character I thought sapply() would give me a data frame or matrix, which I would transpose into a character matrix, to which I can apply type.convert() and get the same matrix as what I would get from these two lines (Fold function taken from Gabor's post on R-help a few years ago): Fold - function(f, x, L) for(e in L) x - f(x, e) final2 - Fold(rbind,vector(),lapply(split(df[,-c(1:2)],f=day),tmp)) print(c(class(final2),mode(final2))) [1] data.frame list However, by my original method, sapply() gives me a matrix with mode, list intermediate1 - sapply(split(df[,-c(1:2)],f=df$Day),tmp) print(c(class(intermediate1),mode(intermediate1))) [1] matrix list Transposing, still a matrix with mode list, not character: intermediate2 - t(sapply(split(df[,-c(1:2)],f=day),tmp)) print(c(class(intermediate2),mode(intermediate2))) [1] matrix list Unclassing gives me the same thing... print(c(class(unclass(intermediate2)),mode(unclass(intermediate2 [1] matrix list __ __ Be a PS3 game guru. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to get lsmeans?
numbers of observations in the different levels of the factors that are held constant). The obstacle to computing either least-squares means or effect displays in R via predict() is that predict() wants factors in the new data to be set to particular levels. The effect() function in the effects package bypasses predict() and works directly with the model matrix, averaging over the columns that pertain to a factor (and reconstructing interactions as necessary). As mentioned, this has the effect of setting the factor to its proportional distribution in the data. This approach also has the advantage of being invariant with respect to the choice of contrasts for a factor. The only convenient way that I can think of to implement least-squares means in R would be to use deviation-coded regressors for a factor (that is, contr.sum) and then to set the columns of the model matrix for the factor(s) to be averaged over to 0. It may just be that I'm having a failure of imagination and that there's a better way to proceed. I've not implemented this solution because it is dependent upon the choice of contrasts and because I don't see a general advantage to it, but since the issue has come up several times now, maybe I should take a crack at it. Remember that I want this to work more generally, not just for levels of factors, and not just for linear models. Brian is quite right in mentioning that he suggested some time ago that I use critical values of t rather than of the standard normal distribution for producing confidence intervals, and I agree that it makes sense to do so in models in which the dispersion is estimated. My only excuse for not yet doing this is that I want to undertake a more general revision of the effects package, and haven't had time to do it. There are several changes that I'd like to make to the package. For example, I have results for multinomial and proportional odds logit models (described in a paper by me and Bob Andersen in the 2006 issue of Sociological Methodology) that I want to incorporate, and I'd like to improve the appearance of the default graphs. But Brian's suggestion is very straightforward, and I guess that I shouldn't wait to implement it; I'll do so very soon. Regards, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Prof Brian Ripley Sent: Wednesday, March 21, 2007 12:03 PM To: Chuck Cleland Cc: r-help Subject: Re: [R] how to get lsmeans? On Wed, 21 Mar 2007, Chuck Cleland wrote: Liaw, Andy wrote: I verified the result from the following with output from JMP 6 on the same data (don't have SAS: don't need it): set.seed(631) n - 100 dat - data.frame(y=rnorm(n), A=factor(sample(1:2, n, replace=TRUE)), B=factor(sample(1:2, n, replace=TRUE)), C=factor(sample(1:2, n, replace=TRUE)), d=rnorm(n)) fm - lm(y ~ A + B + C + d, dat) ## Form a data frame of points to predict: all combinations of the ## three factors and the mean of the covariate. p - data.frame(expand.grid(A=1:2, B=1:2, C=1:2)) p[] - lapply(p, factor) p - cbind(p, d=mean(dat$d)) p - cbind(yhat=predict(fm, p), p) ## lsmeans for the three factors: with(p, tapply(yhat, A, mean)) with(p, tapply(yhat, B, mean)) with(p, tapply(yhat, C, mean)) Using Andy's example data, these are the LSMEANS and intervals I get from SAS: Ay LSMEAN 95% Confidence Limits 1 -0.071847 -0.387507 0.243813 2 -0.029621 -0.342358 0.283117 By LSMEAN 95% Confidence Limits 1 -0.104859 -0.397935 0.188216 20.003391 -0.333476 0.340258 Cy LSMEAN 95% Confidence Limits 1 -0.084679 -0.392343 0.222986 2 -0.016789 -0.336374 0.302795 One way of reproducing the LSMEANS and intervals from SAS using predict() seems to be the following: dat.lm - lm(y ~ A + as.numeric(B) + as.numeric(C) + d, data = dat) newdat - expand.grid(A=factor(c(1,2)),B=1.5,C=1.5,d=mean(dat$d)) cbind(newdat, predict(dat.lm, newdat, interval=confidence)) A B C d fitlwr upr 1 1 1.5 1.5 0.09838595 -0.07184709 -0.3875070 0.2438128 2 2 1.5 1.5 0.09838595 -0.02962086 -0.3423582 0.2831165
Re: [R] how to get lsmeans?
I verified the result from the following with output from JMP 6 on the same data (don't have SAS: don't need it): set.seed(631) n - 100 dat - data.frame(y=rnorm(n), A=factor(sample(1:2, n, replace=TRUE)), B=factor(sample(1:2, n, replace=TRUE)), C=factor(sample(1:2, n, replace=TRUE)), d=rnorm(n)) fm - lm(y ~ A + B + C + d, dat) ## Form a data frame of points to predict: all combinations of the ## three factors and the mean of the covariate. p - data.frame(expand.grid(A=1:2, B=1:2, C=1:2)) p[] - lapply(p, factor) p - cbind(p, d=mean(dat$d)) p - cbind(yhat=predict(fm, p), p) ## lsmeans for the three factors: with(p, tapply(yhat, A, mean)) with(p, tapply(yhat, B, mean)) with(p, tapply(yhat, C, mean)) Andy From: Xingwang Ye Dear all, I search the mail list about this topic and learn that no simple way is available to get lsmeans in R as in SAS. Dr.John Fox and Dr.Frank E Harrell have given very useful information about lsmeans topic. Dr. Frank E Harrell suggests not to think about lsmeans, just to think about what predicted values wanted and to use the predict function. However, after reading the R help file for a whole day, I am still unclear how to do it. Could some one give me a hand? for example: A,B and C are binomial variables(factors); d is a continuous variable ; The response variable Y is a continuous variable too. To get lsmeans of Y according to A,B and C, respectively, in SAS, I tried proc glm data=a; class A B C; model Y=A B C d; lsmeans A B C/cl; run; In R, I tried this: library(Design) ddist-datadist(a) options(datadist=ddist) f-ols(Y~A+B+C+D,data=a,x=TRUE,y=TRUE,se.fit=TRUE) then how to get the lsmeans for A, B, and C, respectively with predict function? Best wishes yours, sincerely Xingwang Ye PhD candidate Research Group of Nutrition Related Cancers and Other Chronic Diseases Institute for Nutritional Sciences, Shanghai Institutes of Biological Sciences, Chinese Academy of Sciences P.O.Box 32 294 Taiyuan Road Shanghai 200031 P.R.CHINA -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] package:AlgDesign and .Random.seed [Broadcast]
From: Michael Kubovy On Mar 21, 2007, at 4:16 AM, Uwe Ligges wrote: Michael Kubovy wrote: Dear r-helpers, Could you please help me solve the following problem: When I run require(AlgDesign) trt - LETTERS[1:5] blk - 10 trtblk - 3 BIB - optBlock(~., withinData = trt, blocksizes = rep(trtblk, blk)) In response to the last command, R complains: Error in optBlock(~., withinData = trt, blocksizes = rep(trtblk, blk)) : object .Random.seed not found The documentation of optBlock() in AlgDesign doesn't say that I needed to set .Random.seed. I thought it was initiated automatically at the beginning of a session. What am I missing? The first line in that function is seed - .Random.seed but .Random.seed is generated at the first use of R's RNG, hence maybe later. This means the function contains a bug which you should report to the package maintainer, please. Best, Uwe Ligges Bob Wheeler's response: From: Bob Wheeler [EMAIL PROTECTED] Date: March 21, 2007 9:19:29 AM EDT To: Michael Kubovy [EMAIL PROTECTED] Subject: Re: Each workspace in R requires you to set a random seed to start. You have not done this. It is an R artifact, and has nothing to do with AlgDesign. I do not agree with that assessment (well, it's just my $0.02 anyway). I don't need a random seed unless I'm doing computations that requires pseudo-random numbers. There are plenty of times I use R without needing random seed. None of the builtin RNGs in R requires explcit seed setting, nor does any of the ones in the contributed packages that I know of. Thus I would claim that's a flaw in AlgDesign. Andy -- Bob Wheeler --- http://www.bobwheeler.com/ ECHIP, Inc. --- Randomness comes in bunches. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Select the last two rows by id group
Something like the following should work: last.n - function(x, n) { last - nrow(x) x[(last - n + 1):last, , drop=FALSE] } ## Example: get the last two rows. do.call(rbind, lapply(split(score, score$id), last.n, 2)) You might want to add a check in last.n() to make sure that there are at least n rows to extract. Andy From: Lauri Nikkinen Hi R-users, Following this post http://tolstoy.newcastle.edu.au/R/help/06/06/28965.html , how do I get last two rows (or six or ten) by id group out of the data frame? Here the example gives just the last row. Sincere thanks, Lauri [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bad points in regression [Broadcast]
(My turn on the soapbox ...) I'd like to add a bit of caveat to Bert's view. I'd argue (perhaps even plead) that robust/resistant procedures be used with care. They should not be used as a shortcut to avoid careful analysis of data. I recalled that in my first course on regression, the professor made it clear that we're fitting models to data, not the other way around. When the model fits badly to (some of the) the data, do examine and think carefully about what happened. Verify that bad data are indeed bad, instead of using statistical criteria to make that judgment. A scientific colleague reminded me of this point when I tried to sell him the idea of robust/resistant methods: Don't use these methods as a crutch to stand on badly run experiments (or poorly fitted models). Cheers, Andy From: Bert Gunter (mount soapbox...) While I know the prior discussion represents common practice, I would argue -- perhaps even plead -- that the modern(?? 30 years old now) alternative of robust/resistant estimation be used, especially in the readily available situation of least-squares regression. RSiteSearch(Robust) will bring up numerous possibilities.rrcov and robustbase are at least two packages devoted to this, but the functionality is available in many others (e.g. rlm() in MASS). Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ted Harding Sent: Friday, March 16, 2007 6:44 AM To: r-help@stat.math.ethz.ch Subject: Re: [R] Bad points in regression On 16-Mar-07 12:41:50, Alberto Monteiro wrote: Ted Harding wrote: alpha - 0.3 beta - 0.4 sigma - 0.5 err - rnorm(100) err[15] - 5; err[25] - -4; err[50] - 10 x - 1:100 y - alpha + beta * x + sigma * err ll - lm(y ~ x) plot(ll) ll is the output of a linear model fiited by lm(), and so has several components (see ?lm in the section Value), one of which is residuals (which can be abbreviated to res). So, in the case of your example, which(abs(ll$res)2) 15 25 50 extracts the information you want (and the 2 was inspired by looking at the residuals plot from your plot(ll)). Ok, but how can I grab those points _in general_? What is the criterium that plot used to mark those points as bad points? Ahh ... ! I see what you're after. OK, look at the plot method for lm(): ?plot.lm ## S3 method for class 'lm': plot(x, which = 1:4, caption = c(Residuals vs Fitted, Normal Q-Q plot, Scale-Location plot, Cook's distance plot), panel = points, sub.caption = deparse(x$call), main = , ask = prod(par(mfcol)) length(which) dev.interactive(), ..., id.n = 3, labels.id = names(residuals(x)), cex.id = 0.75) where (see further down): id.n: number of points to be labelled in each plot, starting with the most extreme. and note, in the default parameter-values listing above: id.n = 3 Hence, the 3 most extreme points (according to the criterion being plotted in each plot) are marked in each plot. So, for instance3, try plot(ll,id.n=5) and you will get points 10,15,25,28,50. And so on. But that pre-supposes that you know how many points are exceptional. What is meant by extremeis not stated in the help page ?plot.lm, but can be identified by inspecting the code for plot.lm(), which you can see by entering plot.lm In your example, if you omit the line which assigns anomalous values to err[15[, err[25] and err[50], then you are likely to observe that different points get identified on different plots. For instance, I just got the following results for the default id.n=3: [1] Residuals vs Fitted: 41,53,59 [2] Standardised Residuals:41,53,59 [3] sqrt(Stand Res) vs Fitted: 41,53,59 [4] Cook's Distance: 59,96,97 There are several approaches (with somewhat different outcomes) to identifying outliers. If you apply one of these, you will probably get the identities of the points anyway. Again in the context of your example (where in fact you deliberately set 3 points to have exceptional errors, thus coincidentally the same as the default value 3 of id.n), you could try different values for id.n and inspect the graphs to see whether a given value of id.n marks some points that do not look exceptional relative to the mass of the other points. So, the above plot(ll,id.n=5) gave me one point, 10 on the residuals plot, which apparently belonged to the general distribution of residuals. Hoping this helps, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 16-Mar-07 Time: 13:43:54 -- XFMail
Re: [R] how to assign fixed factor in lm
Either you did not read docs sufficiently carefully, or the source where you learn to do this from is questionable. The lm() function has no argument called fixed, and the warning should have made that clear to you. It was sheer luck on your part that you happen to put Value as the first variable in Food, in which case lm() will treat it as the response and the rest as predictors in the absence of a model formula. You should try: lm(Value ~ Gender, Food) lm() itself has no concept of fixed or random effects. lme() in the nlme package does, and it has the fixed argument. Andy From: [EMAIL PROTECTED] Hi there, Value=c(709,679,699,657,594,677,592,538,476,508,505,539) Lard=rep(c(Fresh,Rancid),each=6) Gender=rep(c(Male,Male,Male,Female,Female,Female),2) Food=data.frame(Value,Lard,Gender) Food Value Lard Gender 1709 Fresh Male 2679 Fresh Male 3699 Fresh Male 4657 Fresh Female 5594 Fresh Female 6677 Fresh Female 7592 Rancid Male 8538 Rancid Male 9476 Rancid Male 10 508 Rancid Female 11 505 Rancid Female 12 539 Rancid Female lm(fixed=Value~Gender,data=Food) Call: lm(data = Food, fixed = Value ~ Gender) Coefficients: (Intercept) LardRancid GenderMale 651.4 -142.8 35.5 Warning message: extra arguments fixed are just disregarded. in: lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) lm(fixed=Value~Lard+Gender,data=Food) Call: lm(data = Food, fixed = Value ~ Lard + Gender) Coefficients: (Intercept) LardRancid GenderMale 651.4 -142.8 35.5 Warning message: extra arguments fixed are just disregarded. in: lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) I wanted to consider only one factor. But why lm(fixed=Value~Gender,data=Food) return me two estimates of Gender and Lard. And I found the returning results are the same as lm(fixed=Value~Lard+Gender,data=Food). Why lm cannot do analysis of variance according to assigned formula? Thank you very much. Fan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 0 * NA = NA
From: Alberto Monteiro Is there any way to force 0 * NA to be 0 instead of NA? For example, suppose I have a vector with some valid values, while other values are NA. If I matrix-pre-multiply this by a weight row vector, whose weights that correspond to the NAs are zero, the outcome will still be NA: x - c(1, NA, 1) wt - c(2, 0, 1) wt %*% x # NA I don't think it's prudent to bend arthmetic rules of a system, especially when there are good reasons for them. Here's one: R 0 * Inf [1] NaN If you are absolutely sure that the Nas in x cannot be Inf (or -Inf), you might try to force the result to 0, but the only way I can think of is to do something like: R wt %*% ifelse(wt, x, 0) [,1] [1,]3 Andy Alberto Monteiro __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to apply the function cut( ) to many columns in a data.frame?
From: Chuck Cleland ahimsa campos-arceiz wrote: Dear useRs, In a data.frame (df) I have several columns (x1, x2, x3xn) containing data as a continuous numerical response: df var x1x2 x3 1143 147 137 2 9393 117 316439 101 4123 11897 5 63 125 97 612983 124 712393 136 812380 79 9 89 107 150 10 7895121 I want to classify the values in the columns x1, x2, etc, into bins of fix margins (0-5, 5-10, ). For one vector I can do it easily with the function cut: df$x1 - cut(df$x1, br=5*(0:40), labels=5*(1:40)) df$x1 [1] 145 95 165 125 65 130 125 125 90 80 40 Levels: 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 ... 200 However if I try to use a subset of my data.frame: df[,3:4] - cut(df[,3:4], br=5*(0:40), labels=5*(1:40)) Error in cut.default(df[, 3:4], br = 5 * (0:40), labels = 5 * (1:40)) : 'x' must be numeric How can I make this work with data frames in which I want to apply the function cut( ) to many columns in a data.frame? You have an answer within your question - use one of the various apply functions. For example: lapply(df[,3:4], function(x){cut(x, br=5*(0:40), labels=5*(1:40))}) Or perhaps a bit more simply: lapply(df[, 3:4], cut, br=5*(0:40), labels=5*(1:40))) and if a data frame is desired as output, wrap the above in as.data.frame(). (Just keep in mind that a data frame is like a list.) Andy ?lapply ?sapply ?apply I guess that I might have to use something like for ( ) (which I'm not familiar with), but maybe you know a straight forward method to use with data.frames. Thanks a lot! Ahimsa * # data var - 1:10 x1 - rnorm(10, mean=100, sd=25) x2 - rnorm(10, mean=100, sd=25) x3 - rnorm(10, mean=100, sd=25) df - data.frame(var,x1,x2,x3) df # classifying the values of the vector df$x1 into bins of width 5 df$x1 - cut(df$x1, br=5*(0:40), labels=5*(1:40)) df$x1 # trying it a subset of the data.frame df[,3:4] - cut(df[,3:4], br=5*(0:40), labels=5*(1:40)) df[,3:4] -- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to read in this data format?
You can't expect general-purpose tools like read.table in R to be able to deal with highly specialized file format. Here's what I'd start. It doesn't put data in the format you specified exactly, but I doubt you'll need that. This might be sufficient for your purpose: dat - readLines(file(yourdata.dat)) ## Get rid of blank lines. dat - dat[dat != ] scan.lines - grep(Scan, dat) ## Chop off the header rows. dat - dat[scan.lines[1]:length(dat)] scan.lines - scan.lines - scan.lines[1] + 1 lines.per.scan - c(scan.lines[-1], length(dat) + 1) - scan.lines ## Split the data into a list, with each scan taking up one component. dat - split(dat, rep(seq(along=lines.per.scan), each=lines.per.scan)) ## Process the data one scan at a time. result - lapply(dat, function(x) { x - strsplit(x, \t) rtime - x[[2]][2] # second field of second line t(matrix(as.numeric(do.call(rbind, c(rtime, x[-(1:2)]))), ncol=2)) }) This is what I get from the data you've shown: R result $`1` [,1] [,2] [,3] [,4] [1,] 0.017 399.8112 399.8742 399.9372 [2,] 0.017 184. 0. 152. $`2` [,1] [,2] [,3] [,4] [1,] 0.021 399.8112 399.8742 399.9372 [2,] 0.021 181. 1. 153. Note that you probably should avoid using numbers as column names in a data frame, even if it's possible. Andy From: Bart Joosen Hi, I recieved an ascii file, containing following information: $$ Experiment Number: $$ Associated Data: FUNCTION 1 Scan 1 Retention Time0.017 399.8112 184 399.8742 0 399.9372 152 Scan 2 Retention Time0.021 399.8112 181 399.8742 1 399.9372 153 . I would like to import this data in R into a dataframe, where there is a column time, the first numbers as column names, and the second numbers as data in the dataframe: Time 399.8112399.8742399.9372 0.017 184 0 152 0.021 181 1 153 I did take a look at the read.table, read.delim, scan, ... But I 've no idea about how to solve this problem. Anyone? Thanks Bart __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Package RGtk2, rattle, libatk-1.0.0.dll Errors
The way I have that problem resolved is by installing the rggobi package using the command shown on http://www.ggobi.org/rggobi/, which is source(http://www.ggobi.org/download/install.r;) That will install all the things that Ggobi needs. Since rattle depends on rggobi, it's probably a good idea to do this anyway. After that, rattle will start just fine. I have not actually use rattle, though. Andy From: j.joshua thomas Dear Group, I have followed the instructions from the link http://datamining.togaware.com/survivor/Installing_GTK.html However i couldn't fix the libatk01.0.0.dll application error Here, i did uninstall R-Gui-2.4.0 then did the fresh installation and still facing the same problem I am using Windows- XP *Please find the following* R version 2.4.0 (2006-10-03) Copyright (C) 2006 The R Foundation for Statistical Computing ISBN 3-900051-07-0 R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. [Previously saved workspace restored] install.packages(RGtk2) --- Please select a CRAN mirror for use in this session --- trying URL ' http://cran.au.r-project.org/bin/windows/contrib/2.4/RGtk2_2.8.7.zip' Content type 'application/zip' length 13050736 bytes opened URL downloaded 12744Kb package 'RGtk2' successfully unpacked and MD5 sums checked The downloaded packages are in C:\Documents and Settings\jjoshua\Local Settings\Temp\RtmpLetwrb\downloaded_packages updating HTML package descriptions install.packages(rattle) trying URL ' http://cran.au.r-project.org/bin/windows/contrib/2.4/rattle_2.2.0.zip' Content type 'application/zip' length 340875 bytes opened URL downloaded 332Kb package 'rattle' successfully unpacked and MD5 sums checked The downloaded packages are in C:\Documents and Settings\jjoshua\Local Settings\Temp\RtmpLetwrb\downloaded_packages updating HTML package descriptions library(RGtk2) Error in dyn.load(x, as.logical(local), as.logical(now)) : unable to load shared library 'C:/PROGRA~1/R/R-24~1.0/library/RGtk2/libs/RGtk2.dll': LoadLibrary failure: The specified module could not be found. In addition: Warning message: package 'RGtk2' was built under R version 2.4.1 Error: package/namespace load failed for 'RGtk2' -- Lecturer J. Joshua Thomas KDU College Penang Campus Research Student, University Sains Malaysia [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple conditional without if [Broadcast]
From: bunny, lautloscrew.com Dear all, i am stuck with a syntax problem. i have a matrix which has about 500 rows and 6 columns. now i want to kick some data out. i want create a new matrix which is basically the old one except for all entries which have a 4 in the 5 column AND a 1 in the 6th column. i tried the following but couldn´t get a new matrix, just some wierd errors: newmatrix=oldmatrix[,2][oldmatrix[,5]==4]oldmatrix[,2][oldmatrix[,6] ==1] all i get is: numeric(0) That's not a `weird error', but a numeric vector of length 0. does anybody have an idea how to fix this one ? Try: newmatrix = oldmatrix[oldmatrix[, 5]==4 oldmatrix[, 6] == 1, 2, drop=FALSE] If you just want a subset of column 2 as a vector, you can leave off the drop=FALSE part. Reading An Introduction to R should have save you some trouble in the first place. Andy thx in advance matthias [[alternative HTML version deleted]] -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to use apply with two variables
Yes. Just try it and see. BTW, your usage of return() is not recommended anymore. This is probably easier: myfun-function(x) c(mean=mean(x), sd=sd(x)) out - apply(mat, 1, myfun) ## or... out2 - cbind(mean=rowMeans(mat), sd=sd(t(mat))) Andy From: Serguei Kaniovski Hi, this is a made-up example. Function myfun returns two arguments. Can apply be used so that myfun is called only once? Thanks Serguei mat-matrix(runif(50),nrow=10,ncol=5) myfun-function(x) { mymean-mean(x) mysd-sd(x) return(mymean,mysd) } out1-t(apply(mat,1,function(x) myfun(x)$mymean)) out2-t(apply(mat,1,function(x) myfun(x)$mysd)) [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] JGR launcher for linux
Isn't it right in front of you? I get: JGR() Starting JGR ... (You can use /usr/local/lib64/R/library/JGR/cont/run to start JGR directly) ^^^ Andy From: Ronaldo Reis Junior Hi, anybody have a JGR launcher for linux? Maybe a script that launch JGR directly without open R then library(JGR) and JGR(). Thanks Ronaldo -- Deflector shields just came on, Captain. -- Prof. Ronaldo Reis Júnior | .''`. UNIMONTES/Depto. Biologia Geral/Lab. Ecologia Evolutiva | : :' : Campus Universitário Prof. Darcy Ribeiro, Vila Mauricéia `. | `'` CP: 126, CEP: 39401-089, Montes Claros - MG - Brasil | `- Fone: (38) 3229-8190 | [EMAIL PROTECTED] | | [EMAIL PROTECTED] | ICQ#: 5692561 | LinuxUser#: 205366 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] memory management uestion [Broadcast]
I don't see why making copies of the columns you need inside the loop is better memory management. If the data are in a matrix, accessing elements is quite fast. If you're worrying about speed of that, do what Charles suggest: work with the transpose so that you are accessing elements in the same column in each iteration of the loop. Andy From: Federico Calboli Charles C. Berry wrote: Whoa! You are accessing one ROW at a time. Either way this will tangle up your cache if you have many rows and columns in your orignal data. You might do better to do Y - t( X ) ### use '-' ! for (i in whatever ){ do something using Y[ , i ] } My question is NOT how to write the fastest code, it is whether dummy variables (for lack of better words) make the memory management better, i.e. faster, or not. Best, Fede -- Federico C. F. Calboli Department of Epidemiology and Public Health Imperial College, St Mary's Campus Norfolk Place, London W2 1PG Tel +44 (0)20 7594 1602 Fax (+44) 020 7594 3193 f.calboli [.a.t] imperial.ac.uk f.calboli [.a.t] gmail.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rpart tree node label [Broadcast]
Try the following to see: library(rpart) iris.rp(Sepal.Length ~ Species, iris) plot(iris.rp) text(iris.rp) Two possible solutions: 1. Use text(..., pretty=0). See ?text.rpart. 2. Use post(..., filename=). Andy From: Wensui Liu not sure how you want to label it. could you be more specific? thanks. On 2/14/07, Aimin Yan [EMAIL PROTECTED] wrote: I generate a tree use rpart. In the node of tree, split is based on the some factor. I want to label these node based on the levels of this factor. Does anyone know how to do this? Thanks, Aimin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- WenSui Liu A lousy statistician who happens to know a little programming (http://spaces.msn.com/statcompute/blog) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [BioC] Outlook does threading [Broadcast]
This is really off-topic for both BioC and R-help, so I'll keep it short. From: Kimpel, Mark William See below for Bert Gunter's off list reply to me (which I do appreciate). I'm putting it back on the list because it seems there is still confusion regarding the difference between threading and sorting by subject. I thought the example I will give below will serve as instructional for other Outlook users who may be similarly confused as I was (am?). Per Bert's instructions, I just set up my inbox to sort by subject. I sent one email to myself with the subject test1 and then replied to it without changing the subject. The reply correctly went to test1 in the inbox sorter. I then changed the subject heading in the test1 reply to test2 and sent it to myself. This time Outlook re-categorized it and put it in a separate compartment in the view called test2. If Outlook can do threading the way the R mail server does, I don't think this is the way to do it. AFAIK there's no proper way to get the correct threading in Outlook. What I do is group by conversation topic, but that doesn't solve the problem. This is only problem on your (and all Outlook users'?) end, though. The bigger problem that affects the lists is that some versions of MS Exchange Server do not include the In-reply-to header field that many mailing lists rely on for proper threading. As a result, when I reply to other people's post, it may show up in Outlook as having been threaded properly (because the subject is fine), but it throws everything else that does proper threading off. Unless someone has an idea of how to correctly set up Outlook to do threading in the manner that the R mail server does, Maybe some VBA coding can be done to get it right, but short of that, I very much doubt it. I think the message for us Outlook users is to just create, from scratch, a new message when initiating a new subject. That message ought to be clear for everyone. You should never reply to a message when you really mean to start a new topic, regardless what you are using. Andy Thanks for all your help. Mark -Original Message- From: Bert Gunter [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 31, 2007 7:03 PM To: Kimpel, Mark William Subject: Outlook does threading Mark: No need to bother the R list with this. Outlook does threading. Just sort on Subject in the viewer. Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kimpel, Mark William Sent: Wednesday, January 31, 2007 3:36 PM To: Peter Dalgaard Cc: r-help@stat.math.ethz.ch; [EMAIL PROTECTED] Subject: Re: [R] possible spam alert Peter, Thanks you for your explanation, I had taken Mr. Connolly's message to me to imply that I was not changing the subject line. I use MS Outlook 2007 and, unless I am just not seeing it, Outlook does not normally display the in reply to header, I was under the mistaken impression that that was what the Subject line was for. See, for example, the header to your message to me below. Outlook will, however, sort messages by Subject, and that is what I thought was meant by threading. Well, I learned something today and apologize for any inconvenience my posts may have caused. BTW, I use Outlook because it is supported by my university server and will synch my appointments and contacts with my PDA, which runs Windows CE. If anyone has a suggestion for me of a better email program that will provide proper threading AND work with a MS email server and synch with Windows CE, I'd love to hear it. Thanks again, Mark Mark W. Kimpel MD (317) 490-5129 Work, Mobile (317) 663-0513 Home (no voice mail please) 1-(317)-536-2730 FAX -Original Message- From: Peter Dalgaard [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 31, 2007 6:25 PM To: Kimpel, Mark William Cc: [EMAIL PROTECTED]; r-help@stat.math.ethz.ch Subject: Re: [R] possible spam alert Kimpel, Mark William wrote: The last two times I have originated message threads on R or Bioconductor I have received the message included below from someone named Patrick Connolly. Both times I was the originator of the message thread and used what I thought was a unique subject line that explained as best I could what my question was. Patrick seems to be implying that I am abusing the R and BioC help newsgroups in this fashion. When I emailed him to give me a specific example, he did not reply. The most recent thread that he seems concerned about was to the R list and was entitled regexpr and parsing question . I believe the previous post of mine that he had problems with was to the BioC list but I can't remember its subject. Is this spam? No. Breach of netiquette, yes. The
Re: [R] Memory problem on a linux cluster using a large data set [Broadcast]
In addition to my off-list reply to Iris (pointing her to an old post of mine that detailed the memory requirement of RF in R), she might consider the following: - Use larger nodesize - Use sampsize to control the size of bootstrap samples Both of these have the effect of reducing sizes of trees grown. For a data set that large, it may not matter to grow smaller trees. Still, with data of that size, I'd say 64-bit is the better solution. Cheers, Andy From: Martin Morgan Iris -- I hope the following helps; I think you have too much data for a 32-bit machine. Martin Iris Kolder [EMAIL PROTECTED] writes: Hello, I have a large data set 320.000 rows and 1000 columns. All the data has the values 0,1,2. It seems like a single copy of this data set will be at least a couple of gigabytes; I think you'll have access to only 4 GB on a 32-bit machine (see section 8 of the R Installation and Administration guide), and R will probably end up, even in the best of situations, making at least a couple of copies of your data. Probably you'll need a 64-bit machine, or figure out algorithms that work on chunks of data. on a linux cluster with R version R 2.1.0. which operates on a 32 This is quite old, and in general it seems like R has become more sensitive to big-data issues and tracking down unnecessary memory copying. cannot allocate vector size 1240 kb. I've searched through use traceback() or options(error=recover) to figure out where this is actually occurring. SNP - read.table(file.txt, header=FALSE, sep=)# read in data file This makes a data.frame, and data frames have several aspects (e.g., automatic creation of row names on sub-setting) that can be problematic in terms of memory use. Probably better to use a matrix, for which: 'read.table' is not the right tool for reading large matrices, especially those with many columns: it is designed to read _data frames_ which may have columns of very different classes. Use 'scan' instead. (from the help page for read.table). I'm not sure of the details of the algorithms you'll invoke, but it might be a false economy to try to get scan to read in 'small' versions (e.g., integer, rather than numeric) of the data -- the algorithms might insist on numeric data, and then make a copy during coercion from your small version to numeric. SNP$total.NAs = rowSums(is.na(SN # calculate the number of NA per row and adds a colum with total Na's This adds a column to the data.frame or matrix, probably causing at least one copy of the entire data. Create a separate vector instead, even though this unties the coordination between columns that a data frame provides. SNP = t(as.matrix(SNP)) # transpose rows and columns This will also probably trigger a copy; snp.na-SNP R might be clever enough to figure out that this simple assignment does not trigger a copy. But it probably means that any subsequent modification of snp.na or SNP *will* trigger a copy, so avoid the assignment if possible. snp.roughfix-na.roughfix(snp.na) fSNP-factor(snp.roughfix[, 1])# Asigns factor to case control status snp.narf- randomForest(snp.roughfix[,-1], fSNP, na.action=na.roughfix, ntree=500, mtry=10, importance=TRUE, keep.forest=FALSE, do.trace=100) Now you're entirely in the hands of the randomForest. If memory problems occur here, perhaps you'll have gained enough experience to point the package maintainer to the problem and suggest a possible solution. set it should be able to cope with that amount. Perhaps someone has tried this before in R or is Fortram a better choice? I added my R If you mean a pure Fortran solution, including coding the random forest algorithm, then of course you have complete control over memory management. You'd still likely be limited to addressing 4 GB of memory. I wrote a script to remove all the rows with more than 46 missing values. This works perfect on a smaller dataset. But the problem arises when I try to run it on the larger data set I get an error cannot allocate vector size 1240 kb. I've searched through previous posts and found out that it might be because i'm running it on a linux cluster with R version R 2.1.0. which operates on a 32 bit processor. But I could not find a solution for this problem. The cluster is a really fast one and should be able to cope with these large amounts of data the systems configuration are Speed: 3.4 GHz, memory 4GByte. Is there a way to change the settings or processor under R? I want to run the function Random Forest on my large data set it should be able to cope with that amount. Perhaps someone has tried this before in R or is Fortram a better choice? I added my R script down
Re: [R] plot.svm
Try debug(e1071:::plot.svm) and then re-run your plot command, stepping through one line at a time and see where it fails. Andy From: Aimin Yan where is plot.svm method? I just find plot(svm, data, formula) method Aimin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] any way to make the code more efficient ?
I don't know about efficiency, but at least for readability, you may want to do the following: 1. Indent your code. 2. Create a list of appropriate length, and populate the list with objects you're creating in the loop. 3. After the loop, use do.call(rbind, list). HTH, Andy From: Leeds, Mark (IED) ravi : I appreciate your help but could you be a little more specific about what you mean ? I can just stack aggfxdata below the current full one ( the rbind works out the ordrering by date because it's a zoo object ) so it's not a question of where to put the new one. It's a question of how to avoid rbind ? I apologize because I don't think I understand what you are saying. Or maybe it's not possible to avoid rbind ? Thanks. -Original Message- From: Ravi Varadhan [mailto:[EMAIL PROTECTED] Sent: Friday, December 08, 2006 5:21 PM To: Leeds, Mark (IED); r-help@stat.math.ethz.ch Subject: RE: [R] any way to make the code more efficient ? Using rbind almost always slows things down significantly. You should define the objects aggfxdata and fullaggfxdata before the loop and then assign appropriate values to the corresponding rows and/or columns. Ravi. -- -- --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -- -- -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Leeds, Mark (IED) Sent: Friday, December 08, 2006 4:17 PM To: r-help@stat.math.ethz.ch Subject: [R] any way to make the code more efficient ? The code bekow works so this is why I didn't include the data to reproduce it. The loops about 500 times and each time, a zoo object with 1400 rows and 4 columns gets created. ( the rows represent minutes so each file is one day worth of data). Inside the loop, I keep rbinding the newly created zoo object to the current zoo object so that it gets bigger and bigger over time. Eventually, the new zoo object, fullaggfxdata, containing all the days of data is created. I was just wondering if there is a more efficient way of doing this. I do know the number of times the loop will be done at the beginning so maybe creating the a matrix or data frame at the beginning and putting the daily ones in something like that would Make it be faster. But, the proboem with this is I eventually do need a zoo object. I ask this question because at around the 250 mark of the loop, things start to slow down significiantly and I think I remember reading somewhere that doing an rbind of something to itself is not a good idea. Thanks. #= == === start-1 for (filecounter in (1:length(datafilenames))) { print(paste(File Counter = , filecounter)) datafile= paste(datadir,/,datafilenames[filecounter],sep=) aggfxdata-clnaggcompcurrencyfile(fxfile=datafile,aggminutes=a ggminutes, fillholes=1) logbidask-log(aggfxdata[,bidask]) aggfxdata-cbind(aggfxdata,logbidask) if ( start == 1 ) { fullaggfxdata-aggfxdata start-0 } else { fullaggfxdata-rbind(fullaggfxdata,aggfxdata) } } #= == == This is not an offer (or solicitation of an offer) to buy/se...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This is not an offer (or solicitation of an offer) to buy/se...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] from table to dataframe
Like this? R data.frame(unclass(df2)) a b c d s1 8 4 4 4 s2 8 4 4 4 s3 8 4 4 4 Andy From: Milton Cezar Ribeiro Hi there, I have a two-entrance dataframe, and I would like generate a new dataframe with its frequency. I tryed this site-rep(c(s1,s2,s3),20) species-rep(c(a,b,a,c,d),12) df-data.frame(cbind(site,species)) df2-table(df) But when I convert df2 to data.frame I miss the square format. I would like have my data.frame like this: site a b c d s1 8 4 4 4 s2 8 4 4 4 s3 8 4 4 4 Any help? Miltinho - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Count cases by indicator
I might be missing something, but the data you showed don't seem to match your expectation. Firstly, 1 in binary is 511 in decimal, so your coordinates are off by 1. Secondly, for the data you've shown, the matrix equivalent look like: m - matrix(df$x, ncol=9, byrow=TRUE) rownames(m) - levels(df$cases) print(m) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] 093/0188001011111 093/0206000000000 093/0216011111011 093/0305011111111 093/0325000000000 093/0449000000000 093/0473001111111 093/0499001111111 The counts of unique occurances are: table(do.call(paste, c(as.data.frame(m), sep=) 0 00101 00111 01011 0 3 1 2 1 1 which do not agree with yours. If I understood what you wanted, I would do: R table(rowSums(matrix(2^(0:8) * df$x, ncol=9, byrow=TRUE))) 0 446 500 508 510 3 1 1 2 1 Andy From: Serguei Kaniovski Hi, In the data below, case represents cases, x binary states. Each case has exactly 9 x, ie is a binary vector of length 9. There are 2^9=512 possible combinations of binary states in a given case, ie 512 possible vectors. I generate these in the order of the decimals the vectors represent, as: cmat-as.matrix(expand.grid(rep(list(0:1),9))) cmat-cmat[nrow(cmat):1,ncol(cmat):1] cmat contains the binary vectors as rows. QUESTION: I would like to know how often each of the 512 vectors occurs in case. With these data, the output should be a vector with 2^9=512 coordinates, having 2,2,1,3, as, respectively, the coordinate number 129, 193, 449, 512, and zeros in all other coordinates. Thank you for your help, Serguei df-read.delim(clipboard,sep=;) DATA: case;x 093/0188;0 093/0188;0 093/0188;1 093/0188;0 093/0188;1 093/0188;1 093/0188;1 093/0188;1 093/0188;1 093/0206;0 093/0206;0 093/0206;0 093/0206;0 093/0206;0 093/0206;0 093/0206;0 093/0206;0 093/0206;0 093/0216;0 093/0216;1 093/0216;1 093/0216;1 093/0216;1 093/0216;1 093/0216;0 093/0216;1 093/0216;1 093/0305;0 093/0305;1 093/0305;1 093/0305;1 093/0305;1 093/0305;1 093/0305;1 093/0305;1 093/0305;1 093/0325;0 093/0325;0 093/0325;0 093/0325;0 093/0325;0 093/0325;0 093/0325;0 093/0325;0 093/0325;0 093/0449;0 093/0449;0 093/0449;0 093/0449;0 093/0449;0 093/0449;0 093/0449;0 093/0449;0 093/0449;0 093/0473;0 093/0473;0 093/0473;1 093/0473;1 093/0473;1 093/0473;1 093/0473;1 093/0473;1 093/0473;1 093/0499;0 093/0499;0 093/0499;1 093/0499;1 093/0499;1 093/0499;1 093/0499;1 093/0499;1 093/0499;1 -- ___ Austrian Institute of Economic Research (WIFO) Name: Serguei Kaniovski P.O.Box 91 Tel.: +43-1-7982601-231 Arsenal Objekt 20 Fax: +43-1-7989386 1103 Vienna, Austria Mail: [EMAIL PROTECTED] http://www.wifo.ac.at/Serguei.Kaniovski __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] automatic cleaning of workspace
You can avoid loading .Rdata at start-up without deleting the .Rdata file by adding the --no-restore option to the R command. I have that, and additionally, --no-save, in my shortcut for the Rgui.exe command. I use explicit save() and load() in my scripts to save objects that are expensive to compute. Andy From: Leeds, Mark (IED) I'm having that problem where I am sometimes using an object that's from a previous workspace when I don't want to be doing that. I was thinking of putting rm(objects=ls()) in my first.R function But, the problem with doing this, is that it doesn't prompt you with are you sure and there could be very rare cases where I don't want to delete the workspace ? Is there a way to make the cleaning of the workspace automatic but still prompt you ? I guess I can always just try to remember to manually do rm(objects=ls())when I start up in whatever workspace I am in but I don't think I can trust my memory. Thanks. This is not an offer (or solicitation of an offer) to buy/se...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] random forest regression
One way is to graft the stratified sampling code from the classification part onto the regression part. I will get to it eventually, but just not now. Andy From: Naiara Pinto Dear all, I am doing a regression in ramdomForest, using the option sampsize reduce the number of records used to produce the randomForest object. The manual says For classification, if sampsize is a vector of the length the number of strata, then sampling is stratified by strata, and the elements of sampsize indicate the numbers to be drawn from the strata. I need my sampling to be done with factors, but I am doing a regression. Does anyone know a way to do that? Thanks, Naiara. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] CPU or memory
My understanding is that it doesn't have much to do with 32- vs. 64-bit, but what the instruction sets of the CPUs. If I'm not mistaken, at the same clock speed, a P4 would run slower than PIII simply because P4 does less per clock-cycle. Also, I believe for the same architecture, single core chips are available at higher clock speeds than their multi-core counterparts. That's why we recently went for a box with four single-core Opterons instead of two dual-core ones. 64-bit PCs should be really affordable: I've seen HP laptops based on the Turion chip selling below $500US. Andy From: John C Frain I would like to thank all who replied to my question about the efficiency of various cpu's in R. Following the advice of Bogdan Romocea I have put a sample simulation and the latest version of R on a USB drive and will go to a few suppliers to try it out. I will report back if I find anything of interest. With regard to 64-bit and 32-bit I thought that the 64-bit chip might require less clock cycles for a specific machine instruction than a 32-bit. This was one of the advantages of moving from 8 to 16 or from 16 to 32 bit chips. Thus a slower, in terms of clock speed, 64-bit chip might run faster than a somewhat similar 32-bit chip. I fully realize that the full advantage of a 64-bit chip is available only with a 64-bit operating system and I am preparing to switch some work to Linux in case I acquire a 64-bit PC. If I do I will time the simulations on that also. I already do some coarse-grained parallelism as described by *Brian Ripley * but on two separate PC's. This is not ideal but allows the processing time to be halved without the overheads. FORTRAN 2 was my first programming language and I agree that I should try to use C or FORTRAN to speed up things. Finally Rprof could be a great help. There are lots of utilities in the utils package with which I was not familiar. Again Many Thanks to all who made various suggestions. bogdan romocea[EMAIL PROTECTED] to *r-help*, me More options 07-Nov (1 day ago) Does any one know of comparisons of the Pentium 9x0, Pentium(r) Extreme/Core 2 Duo, AMD(r) Athlon(r) 64 , AMD(r) Athlon(r) 64 FX/Dual Core AM2 and similar chips when used for this kind of work. On 08/11/06, Prof Brian Ripley [EMAIL PROTECTED] wrote: On Wed, 8 Nov 2006, Christos Hatzis wrote: Prof. Ripley, Do you mind providing some pointers on how coarse-grained parallelism could be implemented on a Windows environment? Would it be as simple as running two R-console sessions and then (manually) combining the results of these simulations. Or it would be better to run them as batch processes. That is what I would do in any environment (I don't do such things under Windows since all my fast machines run Linux/Unix). Suppose you want to do 1 simulations. Set up two batch scripts that each run 5000, and save() the results as a list or matrix under different names, and set a different seed at the top. Then run each via R CMD BATCH simultaneously. When both have finished, use an interactive session to load() both sets of results and merge them. RSiteSearch('coarse grained') did not produce any hits so this topic might have not been discussed on this list. I am not really familiar with running R in any mode other than the default (R-console in Windows) so I might be missing something really obvious. I am interested in running Monte-Carlo cross-validation in some sort of a parallel mode on a dual core (Pentium D) Windows XP machine. Thank you. -Christos Christos Hatzis, Ph.D. Nuvera Biosciences, Inc. 400 West Cummings Park Suite 5350 Woburn, MA 01801 Tel: 781-938-3830 www.nuverabio.com -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Prof Brian Ripley Sent: Wednesday, November 08, 2006 5:29 AM To: Stefan Grosse Cc: r-help@stat.math.ethz.ch; Taka Matzmoto Subject: Re: [R] CPU or memory On Wed, 8 Nov 2006, Stefan Grosse wrote: 64bit does not make anything faster. It is only of use if you want to use more then 4 GB of RAM of if you need a higher precision of your variables The dual core question: dual core is faster if programs are able to use that. What is sure that R cannot make (until now) use of the two cores if you are stuck on Windows. It works excellent if you use Linux. So if you want dual core you should work with linux (and then its faster of course). Not necessarily. We have seen several examples in which using a multithreaded BLAS (the only easy way to make use of multiple CPUs under Linux for a single R process) makes things many times slower. For tasks that are do not make heavy use of linear algebra, the advantage
[R] graphics ignore tabs in text
Dear R-help, I seem to recall that I can use \t to get tab in a string on a graphics device, but it doesn't seem to work. Try: lab - a\tb\tc cat(lab, \n) # works in the console output plot(1:5, main=lab) # no tabs in the title text(3, 3, lab) # no tabs in the text I get the same result both in the windows() and pdf() devices. Any ideas? This is R-patched Windows binary just downloaded from CRAN. R version 2.4.0 Patched (2006-10-29 r39744) Best, Andy Andy Liaw, PhD Biometrics ResearchPO Box 2000 RY33-300 Merck Research LabsRahway, NJ 07065 andy_liaw(a)merck.com 732-594-0820 -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] one problem about how to hold graphic with R
I'm not familiar with Matlab, but from what I know, hold on is used to overlay more stuff on the existing plot. In R such things are accomplished a bit differently: One put up a plot, then use things like lines(), points(), abline(), etc. to add to the existing plot. The closest thing to hold on in Matlab, I think, is par(new=TRUE). Andy From: Gavin Simpson On Tue, 2006-10-31 at 21:36 +0800, yang baohua wrote: Sorry to disturb you, but can you help me to solve one little problem? I want to draw a graphic after another with R but I cannot find the first one after that. Do you know the command to hold the graphic with R? I remember with Matlab you may use hold on. Thanks. You don't say which OS. On MS Windows one can turn on a history of plots to the graphics device and replay your plots - it is in the menu bar for example. In all OSes, you can start up a new device to take the plot - which is what Matlab does IIRC, so you have two or more plot windows on screen at any one time. This is done like this: plot(1:10) x11() plot(1:20) x11() plot(rnorm(100)) see ?Devices You can set a device to be active, i.e. switch around between plotting windows using dev.set(), e.g.: dev.cur() # example from above leaves device 4 active X11 4 dev.set(3) # switch to device dev.cur() # check X11 3 plot(sort(rnorm(100))) # plot something new on this device Is this what you were looking for? HTH G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [t] +44 (0)20 7679 0522 ECRC ENSIS, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regarding large csv file import [Broadcast]
TFM (R Data Import/Export manual in this case) can be a better place to look than the archive. Try specifying colClasses in read.csv() might help. Andy From: [EMAIL PROTECTED] hi All, i have a .csv of size 272 MB and a RAM of 512MB and working on windows XP. I am not able to import the csv file. R hangs means it stops responding even SciViews hangs. i am using read.csv(FILENAME,sep=,,header=TRUE). Is there any way to import it. i have tried archives already but i was not able to sense much. thanks in advance Sayonara With Smile With Warm Regards :-) G a u r a v Y a d a v Assistant Manager, Economic Research Surveillance Department, Clearing Corporation Of India Limited. Address: 5th, 6th, 7th Floor, Trade Wing 'C', Kamala City, S.B. Marg, Mumbai - 400 013 Telephone(Office): - +91 022 6663 9398 , Mobile(Personal) (0)9821286118 Email(Office) :- [EMAIL PROTECTED] , Email(Personal) :- [EMAIL PROTECTED] == == DISCLAIMER AND CONFIDENTIALITY CAUTION:\ \ This message and ...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to improve the efficiency of the following lapply codes [Broadcast]
Make good use of Rprof(): It has helped me a great deal in pinpointing bottlenecks where I would not have suspected. Cheers, Andy From: Weiwei Shi object.size(intersect.matrix) 41314204 but my machine has 4 G memory, so it should be ok since after 12 hours, it finishes 16k out of 60k but still slow non-linearly. I am thinking to chop 60k into multiple 5k data.frames to run the program. but just wondering is there a way around it? version _ platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status major 2 minor 3.1 year 2006 month 06 day01 svn rev38247 language R version.string Version 2.3.1 (2006-06-01) [EMAIL PROTECTED] ox]$ more /proc/meminfo total:used:free: shared: buffers: cached: Mem: 4189724672 3035549696 11541749760 282836992 2057129984 Swap: 4293586944 645042176 3648544768 [EMAIL PROTECTED] ox]$ more /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 3.60GHz stepping: 3 cpu MHz : 3591.419 cache size : 2048 KB thanks. On 10/25/06, Weiwei Shi [EMAIL PROTECTED] wrote: Hi, I have a series of lda analysis using the following lapply function: n - dim(intersect.matrix)[1] net1.lda - lapply(1:(n), function(k) i.lda(data.list, intersect.matrix, i=k, w)) i.lda is function to do the real lda analysis. intersect.matrix is a nx1026 matrix, n can be a really huge number like 60k. The target is perform a random search. Building a n=120k matrix is impossible for my machine. When n=5k, the task can be done in 30 min while n=60k, it is estimated to take 5 days. So I am wondering where my coding problem is, which causes this to be a nonlinearity. If more info is needed, I will provide. thanks -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. Did you always know? No, I did not. But I believed... ---Matrix III -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] binom.test [Broadcast]
To quote one of the previous answers you've got: The formula you're using is the TV. The one binom.test() uses is the ballpark. Take your pick. Andy From: Ethan Johnsons Thank you for the info. It helps. After all, it would be: 0.1304348-1.96*(sqrt((0.1304348*(1-0.1304348))/46)) [1] 0.03310968 0.1304348+1.96*(sqrt((0.1304348*(1-0.1304348))/46)) [1] 0.2277599 Does R have a function for the calculation above? ej On 10/20/06, Francisco J. Zagmutt [EMAIL PROTECTED] wrote: Ethan, You need to explain why you think this is not the right function to use. R is doing exactly what you are asking it to do. Now is up to you to choose the methodology you feel is correct. For a good discussion on your particular issue I recommend you the following reference: A. Agresti and B. A. Coull, Approximate is better than exact for interval estimation of binomial proportions, The American Statistician, vol. 52, no. 2, pp. 119-126, 1998. Once you figure out the right function to use see if the function is available in R. If not readily available, and if after searching through R's documentation and the forum archives you still can't find a way to perform the calculation, then is time to get back to this forum. Regards, Francisco Dr. Francisco J. Zagmutt College of Veterinary Medicine and Biomedical Sciences Colorado State University From: Ethan Johnsons [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Subject: [R] binom.test Date: Fri, 20 Oct 2006 17:18:02 -0400 A quick question, please. 46 e coli lab samples are tested, 6 of them returned positive. So, the best point estimate for p is 6/46 = 0.1304348. For a 95% CI for p, I thought binom.test would give me the correct result, but it seems it is not the right function to use. What is the R function for this? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Box M test [Broadcast]
See http://finzi.psych.upenn.edu/R/Rhelp02a/archive/0.html Andy From: GRAHAM LEASK Dear List I am looking for a script that will calculate the Box M test to test the homogeneity of the variance/covariance matrix between two matrices. If anyone could send me the script I would appreciate it. I am aware of the scepticism about this test, where due to extreme sensitivity a p value of 0.01 is recommended. Despite this however Box's M test is the established method for identification of stable strategic time periods within the strategic management literature and I would like the opportunity to use this method within either R or S plus. Any help would be gratefully received. Kind regards Graham Kind regards Dr Graham Leask Economics and Strategy Group Aston Business School Aston University Aston Triangle Birmingham B4 7ET Tel: Direct line 0121 204 3150 email [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Getting group size in a data frame [Broadcast]
Is this sort of what you want? R aggregate(df[2:3], df[1], function(x) sum(!is.na(x))) factor val1 val2 1 2421 Andy From: Ulrik Stervbo Hi all, I have a data frame with some measured values of some animals. Sometimes the measurement failed, resulting in a NA for a measurement and sometimes the animal died, resulting in NA for all measurements. I have several groups of animals. How do I find the size of each group with only alive animals? And how do I find the size of the groups for each measurement? An example: l1 - list(factor=c(24,24,24), val1=c(2, 3, NA), val2=c(4, NA, NA)) df - as.data.frame(l1) df$factor - factor(df$factor) The size of factors should be 2 and not 3. The number of measurement in val1 should be 2 and the number of measurements in val2 should be 1 Thanks in advance for any help and suggestions Ulrik [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about random sampling in R
When sampling with replacement (like ordinary bootstrap), each draw is done independently, and in each draw every point has equal probability of being drawn. When sampling without replacement (random permutation), all possible sequences (permutations) have equal probability of occurring. E.g., if the data is 1:2, then (1, 2) has the same probability of occurring as (2, 1). Andy From: tom soyer Hi, I looked up the help file on sample(), but didn't find the info I was looking for. When sample() is used to resample from a distribution, e.g., bootstrap, how does it do it? Does it use an uniform distribution, e.g., runif(), or something else? And, when the help file says:sample(x) generates a random permutation of the elements of x (or 1:x), would I be correct if I translate the statement as follows: it means that the order of sequence, which was generated from a uniform distribution, would look like a random normal distribution. Thanks, Tom [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [Q] How to fit data to HORIZONTAL line [Broadcast]
The horizontal line can be fitted by lm(y ~ 1). Andy From: Young-Jin Lee Dear R users I posted a question about how to fit data to a straight line this afternoon. But I realized that my question was not correct because I needed to fit data to a HORIZONTAL line, not a ordinary straight line. I looked at lm method, but could not figure out how to fix the regression coefficient to 0. I also tried nls, but it did not work. The reason I wanted to fit the data to a horizontal line is that I want to compare AIC/BIC values of two models (a simple straight line mode vs a nonlinear curve model). I thought that I can call aic(horizontal_fit_model) and aic (nonlinear_fit_model) to achieve this goal. If I can compute AIC/BIC value of a horizontal fit model without doing acutal fitting, that would be fine, too. Thank in advance. Young-Jin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] MARS help?
Spencer, MARS fits splines, not disconnected lines. Perhaps the strucchange package has facility to fit your data better. Cheers, Andy From: [EMAIL PROTECTED] on behalf of Spencer Graves Sent: Tue 10/17/2006 11:43 PM To: R-help; Kurt Hornik Subject: [R] MARS help? [Broadcast] I'm trying to use mars{mda} to model functions that look fairly close to a sequence of straight line segments. Unfortunately, 'mars' seems to totally miss the obvious places for the knots in the apparent first order spline model, and I wonder if someone can suggest a better way to do this. The following example consists of a slight downward trend followed by a jump up after t1=4, following by a more marked downward trend after t1=5: Dat0 - cbind(t1=1:10, x=c(1, 0, 0, 90, 99, 95, 90, 87, 80, 77)) library(mda) fit0 - mars(Dat0[, 1, drop=FALSE], Dat0[, 2], penalty=.001) plot(Dat0, type=l) lines(Dat0[, 1], fit0$fitted.values, lty=2, col=red) Are there 'mars' options I'm missing or other software I should be using? I've got thousands of traces crudely like this of different lengths, and I want an automated way of summarizing similar traces in terms of a fixed number of knots and associated slopes for each linear spline segment max(0, t1-t.knot). Thanks, Spencer Graves __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Automatic File Reading [Broadcast]
Works on all platforms: flist - list.files(path=file.path(somedir, somewhere), pattern=[.]csv$) csvlist - lapply(flist, read.csv, header=TRUE) whateverList - lapply(csvlist, whatever) Andy From: Richard M. Heiberger Wensui Lui asks: is there a similar way to read all txt or csv files with same structure from a folder? On Windows I use this construct to find all files with the specified wild card name. I used the \\ in the file paths with the translate=FALSE, because the / in the DOS switches /w/B must not be translated. On Windows this picks up both lower and upper case filenames A similar construct can be written for Unix. tmp - shell('dir c:\\HOME\\rmh\\tmp\\*.R /w/B', intern=TRUE, translate=FALSE) ##msdos for (i in tmp) source(paste(c:\\HOME\\rmh\\tmp\\, i, sep=)) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Nested source() errors [Broadcast]
I've seen people doing that without problem. Not something I'd like to do myself, precisely because when problems occur, it's difficult to figure out what went wrong. Such practice usually indicate that you ought to organize your functions better. (You _are_ writing functions, instead of just scripts?) Andy From: Pierce, Ken Does anyone know of any issues with nesting source() calls within multiple scripts? I have at least one script which always finds errors when I source it but runs fine when run on its own. It containd source() calls to other scripts and it seems to fail during the first nested source() command. Ken Kenneth B. Pierce Jr. Research Ecologist Landscape Ecology, Modeling, Mapping and Analysis Team PNW Research Station - USDA-FS 3200 SW Jefferson Way, Corvallis, OR 97331 [EMAIL PROTECTED] 541 750-7393 http://www.fsl.orst.edu/lemma/gnnfire [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] CI
Here's one way: R x - c(6,11,5,14,30,11,17,3,9,3,8,8) R confint(lm(x~1), level=.9) 5 %95 % (Intercept) 6.546834 14.2865 Andy From: Ethan Johnsons I have a quick question, please. Does R have function to compute i.e. a 90% confidence interval for the mean for these numbers? mean (6,11,5,14,30,11,17,3,9,3,8,8) [1] 6 I thought pt or qt would give me the interval, but it seems not. thx much. ej __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] CI
You did ask for CI of mean, so that's what you got. If you want CI for proportion, here are two (non-bootstrap) ways: R confint(lm(I(x == 1) ~ 1), level=.9) 5 % 95 % (Intercept) 0.2666456 0.6133544 R binom.test(sum(x == 1), length(x), conf.level=.9) Exact binomial test data: sum(x == 1) and length(x) number of successes = 11, number of trials = 25, p-value = 0.69 alternative hypothesis: true probability of success is not equal to 0.5 90 percent confidence interval: 0.2698531 0.6213784 sample estimates: probability of success 0.44 I hope these are not HW problems? Andy From: Ethan Johnsons Thank you so much for the feedback. The random numbers are working great. I have tried non-random numbers, and the outcome is not correct with confint. Is there a way to compute i.e. a 90% confidence interval for percent of 1? i.e. where 1 = apple; 2 = orange x [1] 2 2 2 2 2 1 1 2 2 1 2 1 2 2 2 1 1 1 1 1 1 1 2 2 2 table (x) x 1 2 11 14 x =11 confint(lm(x~1), level=0.90) 5 % 95 % (Intercept) NaN NaN ej On 10/18/06, Liaw, Andy [EMAIL PROTECTED] wrote: Here's one way: R x - c(6,11,5,14,30,11,17,3,9,3,8,8) confint(lm(x~1), level=.9) 5 %95 % (Intercept) 6.546834 14.2865 Andy From: Ethan Johnsons I have a quick question, please. Does R have function to compute i.e. a 90% confidence interval for the mean for these numbers? mean (6,11,5,14,30,11,17,3,9,3,8,8) [1] 6 I thought pt or qt would give me the interval, but it seems not. thx much. ej __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments, contains information of Merck Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp Dohme or MSD and in Japan, as Banyu - direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. -- -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help on Random forest [Broadcast]
Do provide a reproducible example, as the Posting Guide suggests. Try: library(randomForest) example(predict.randomForest) iris.pred - predict(iris.rf, iris[ind == 2,], nodes=TRUE) str(iris.pred) attr(iris.pred, nodes) Andy From: Rupendra Hello all, I am trying to explore random forest in R. What I want to do is get the node number in which the case falls in the tree of random forest. For that I am calling the predict method as: learn.pred - predict (learn.rf, newdata=learn.data.x,norm.votes= TRUE,predict.all = TRUE, nodes= TRUE,type=response) Studying the manual of random forest, I suppose that learn,pred$nodes should contain the node numbers, but there is no attributes called nodes in learn.pred object. I am not much experienced with R. Please help me to resolve this issue. Thanks in advance, Rupendra PRIVACY NOTICE This email and any attachments may be confidential and/or\...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to convert all columns of a data frame into factors
Alternatively: x[] - lapply(x, factor) Recall that a data frame is a list, so lapply() is a natural choice. Andy From: Gabor Grothendieck Try this: replace(BOD, TRUE, lapply(BOD, factor)) On 10/4/06, Weiwei Shi [EMAIL PROTECTED] wrote: Hi, I use apply apply(x, 2, factor) but it does not work. please help. thanks. -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how do I tell configure where to find Java?
Dear R-help, I'm trying to build R-2.4.0 on our Opteron-based Scyld cluster. The system has gcj (the GNU Java compiler, part of GCC) stuff in /usr/bin. When I installed jdk 1.5.08, the install script placed it in /usr/java (I didn't have a choice, as the script didn't offer that option). Now when I run configure in R-2.4.0, it finds gcj, which is not what I want to use. Is there a way to tell configure where to look for Java? I tried configure --help but didn't see anything related to Java. Best, Andy Andy Liaw, PhD Biometrics ResearchPO Box 2000 RY33-300 Merck Research LabsRahway, NJ 07065 andy_liaw(a)merck.com 732-594-0820 -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how do I tell configure where to find Java? [Broadcast]
Before I do that, I would need to remove the gcj stuff that are in /usr/bin. If I know how to remove gcj, I'd gladly do that. However, for the particular version of the OS, the entire GCC seems to be bundled into one rpm, and I could not remove just the gcj component. Neither do I wish to mess with files that are part of some RPMs--- in my experience that's invitation for trouble later. Best, Andy From: mike waters I'm not familiar with gcj, but my initial reaction would be a ln -s for the relevant compiler executable from /usr/java into /usr/bin. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Liaw, Andy Sent: 03 October 2006 19:40 To: r-help Subject: [R] how do I tell configure where to find Java? Dear R-help, I'm trying to build R-2.4.0 on our Opteron-based Scyld cluster. The system has gcj (the GNU Java compiler, part of GCC) stuff in /usr/bin. When I installed jdk 1.5.08, the install script placed it in /usr/java (I didn't have a choice, as the script didn't offer that option). Now when I run configure in R-2.4.0, it finds gcj, which is not what I want to use. Is there a way to tell configure where to look for Java? I tried configure --help but didn't see anything related to Java. Best, Andy Andy Liaw, PhD Biometrics ResearchPO Box 2000 RY33-300 Merck Research LabsRahway, NJ 07065 andy_liaw(a)merck.com 732-594-0820 -- -- -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how do I tell configure where to find Java?
Thanks to everyone who provided the info. I tried Martin Morgan's suggestion (adding JAVA_HOME=/where/jdk/install/itself) to the list of variables defined after `configure', and config.log shows that the desired Java is found. The Scyld system is based on RH, but I believe it lags far behind FC. The JDK is from Sun, and didn't come as a RPM. Best, Andy From: Peter Dalgaard Logan Lewis [EMAIL PROTECTED] writes: Andy, On Tuesday 03 October 2006 3:30 pm, Liaw, Andy wrote: Before I do that, I would need to remove the gcj stuff that are in /usr/bin. If I know how to remove gcj, I'd gladly do that. However, for the particular version of the OS, the entire GCC seems to be bundled into one rpm, and I could not remove just the gcj component. Neither do I wish to mess with files that are part of some RPMs--- in my experience that's invitation for trouble later. The Red Hat way of dealing with different packages providing the same binaries is alternatives. You will see a bunch of links in /etc/alternatives, and the command /usr/sbin/alternatives allows you to switch between options that provide the same binaries. The trouble is that the Sun JDK package does not interface into this system, and doesn't show up as an option when you execute /usr/sbin/alternatives --config java. Hmm... I actually have it, but how did it get there? [EMAIL PROTECTED] R]$ /usr/sbin/alternatives --config java There are 2 programs which provide 'java'. SelectionCommand --- 1 /usr/lib/jvm/jre-1.4.2-gcj/bin/java *+ 2 /usr/lib/jvm/jre-1.5.0-sun/bin/java Enter to keep the current selection[+], or type selection number: failed to create /var/lib/alternatives/java.new: Permission denied -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] levels of factor when subsetting the factor
You have at least two choices: R factor(fact[1:6]) [1] A A A B B B Levels: A B R fact[1:6, drop=TRUE] [1] A A A B B B Levels: A B HTH, Andy From: Afshartous, David All, When I take a subset of a factor the reduced factor still maintains all the original levels of the factor when say forming the key in a plot. The data is correct, but the variable still remembers the original levels. See below for reproducible code. Does anyone know how to fix this? cheers, dave fact = as.factor(c(rep(A, 3),rep(B, 3), rep(C, 3))) new.fact = fact[1:6] new.fact [1] A A A B B B Levels: A B C## should only show A B __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 4^2 factorial help
If you really want the quadratic terms, you need to keep those variables as numeric, instead of factors. (You might also want to look into something like the central composite designs.) summary() and coef() on the resulting fitted object should give you want you need. Things like these are covered in the An Introduction to R manual... Andy From: [EMAIL PROTECTED] To whom it may concern: I am trying a factorial design a system of mine that has two factors. Each factor was set at four different levels, with one replication for each of the combinations. My data is as follows: A B Response 16002.5 0.0257 26002.5 0.0254 36005 0.0217 46005 0.0204 5600100.0191 6600100.0210 7600200.0133 8600200.0139 98002.5 0.0312 10 800 2.5 0.0317 11 800 5 0.0307 12 800 5 0.0309 13 800 100.0330 14 800 100.0318 15 800 200.0225 16 800 200.0234 17 1000 2.5 0.0350 18 1000 2.5 0.0352 19 1000 5 0.0373 20 1000 5 0.0361 21 1000 100.0432 22 1000 100.0402 23 1000 200.0297 24 1000 200.0306 25 1200 2.5 0.0324 26 1200 2.5 0.0326 27 1200 5 0.0353 28 1200 5 0.0353 29 1200 100.0453 30 1200 100.0436 31 1200 200.0348 32 1200 200.0357 I am able to enter my data into R and obtain an ANOVA table (which I have been able to verify as correct using an excel spreadsheet), using the following syntax: Factorial-data.frame(A=c(rep(c(600, 600, 600, 600, 800, 800, 800, 800, 1000, 1000, 1000, 1000, 1200, 1200, 1200, 1200), each=2)), B=c(rep(c(2.5, 5, 10, 20, 2.5, 5, 10, 20, 2.5, 5, 10, 20, 2.5, 5, 10, 20), each=2)), Response = c(0.0257, 0.0254, 0.0217, 0.0204, 0.0191, 0.021, 0.0133, 0.0139, 0.0312, 0.0317, 0.0307, 0.0309, 0.033, 0.0318, 0.0225, 0.0234, 0.035, 0.0352, 0.0373, 0.0361, 0.0432, 0.0402, 0.0297, 0.0306, 0.0324, 0.0326, 0.0353, 0.0353, 0.0453, 0.0436, 0.0348, 0.0357)) anova(aov(Response~A*B, data=Factorial)) However, this is as far as I am able to go. I would like to obtain the coefficients of my model, but am unable. I would also like to use other non-linear models as these factors are not linear. Also would like to add A^2 and B^2 into the ANOVA and modeling. Please can you help with regard and offer some advice. Your help is much appreciated. Yours sincerely, Leslie Correia Department of Process Engineering University of Stellenbosch Private Bag X1 Matieland, 7602 Stellenbosch Tel: 0837012017 E-mail: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.