Re: [R] Questions on pareto
Hi, Thanks a lot. I think this is what I want. actuar package get more distributions. Zero~ David Winsemius wrote: On Sep 15, 2009, at 1:09 PM, TsubasaZero wrote: Hi all, I had generated 1000 random variates (u,v), and I would like to find the corresponding (x,y) for a bivariate pareto distribution. Which x=inverse pareto of u and y=inverse pareto of v. What is the code I should use to find (x,y). Perhaps: ??Pareto On my machine it offers a choice of two packages (actuar and VGAM) that offer Pareto functions. But ?? only searches installed packages, so this would be more general: library(sos) ???Pareto retrieving page 1: found 187 matches; retrieving 10 pages 2 3 4 5 6 7 8 9 10 Or the old fashioned way with r-search... hint: put it on your browser toolbar: http://search.r-project.org/cgi-bin/namazu.cgi?query=paretomax=100result=normalsort=scoreidxname=functionsidxname=Rhelp08idxname=views -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Questions-on-pareto-tp25457966p25464918.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Rounding to the nearest 5
Hi all, I want to round my values to the nearest 5. For example, if I have a value of 11, I want R to round it to 10. I have been searching for this command on the web and in the help file, but I couldn't find anything useful. Any help would be greatly appreciated. Many thanks, Chris -- View this message in context: http://www.nabble.com/Rounding-to-the-nearest-5-tp25463367p25463367.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] best method to format output of frequency table
I have some security alert log data that I'm parsing and doing some stats on. One of the fields is the Classtype which is the enumerated value of the type of alert found. classtypes = factor( alerts$Classtype ) fclass_types = table( classtypes ) fclass_types gives me a frequency table of the intrusion types: fclass_types classtypes attempted-admin attempted-recon 18 93 35 attempted-usermisc-activity misc-attack 21 30 12 protocol-command-decodeunsuccessful-user web-application-activity 22 287 I would like to be able to somehow output this table in the form: attempted-admin 93 attempted-recon 35 attempted-user21 misc-activity 30 and so on. Is there a function I'm missing or some option that will let me be able to dump the fclass_types out in a two-column format? Thanks for any tips, Landon -- View this message in context: http://www.nabble.com/best-method-to-format-output-of-frequency-table-tp25462448p25462448.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cluster-lite
The R.batch package does most of this. It's been several years, but I used it run batch jobs on multiple (30-40) machines with different OS:es but sharing the same file system. No communication needed between hosts. It worked like a charm and I could process 3-4 CPU years in a few weeks for some bootstrap simulations. It got exception handling and much more. See also r-help thread 'R.batch (Was: Re: [R] Calling R from R and specifying wait until script is finished)' on May 22, 2005: https://www.stat.math.ethz.ch/pipermail/r-help/2005-May/071981.html The installation is no longer as documented there, but instead you can grab it from R-forge: http://r-forge.r-project.org/R/?group_id=428 This very moment, the r-forge server seems to be down, so you can also install a copy via: source(http://www.braju.com/R/hbLite.R;); hbLite(R.batch); Then try this: library(R.batch); example(JobBatch); See the FileProgressBar class in the R.utils package for reporting progress via the file size (0-100 bytes), which allows you to use ls -l (or ftp remotely) to check the progress of your jobs. Feel free to do whatever you want with it. /Henrik On Tue, Sep 15, 2009 at 5:01 PM, ivo welch ivo_we...@brown.edu wrote: I am about to write a cluster-lite R solution for myself. I wanted to know whether it already exists. If not, I will probably write up how I do this, and I will make the code available. Background: we have various linux and OSX systems, which are networked, but not set up as a cluster. I have no one here to set up a cluster, so I need a hack that facilitates parallel programming on standard networked machines. I have accounts on all the machines, ssh access (of course password-less), and networked file directory access. what I am ultimately trying to accomplish is built around a simple function, that my master program would invoke: master.R: multisystem( c(R slv.R 1 20 file1.out, R slv.R 21 40 file2.out, ssh anotherhost R slv.R 41 80 file3.out), announce=300) multisystem() should submit all jobs simultaneously and continue only after all are completed. it should also tell me every 300 seconds what jobs it is still waiting for, and which have completed. with basically no logic in the cluster, my master and slv programs have to make up for it. master.R must have the smarts to know where it can spawn jobs and how big each job should be. slv.R must have the smarts to place its outputs into the marked files on the networked file directory. master.R needs the smarts to combine the outputs of all jobs, and to resubmit jobs that did not complete successfully. again, the main reason for doing all of this is to avoid setting up a cluster across OSX and linux system, and still to make parallel processing across linux/osx as easy as possible. I don't think it gets much simpler than this. now, I know how to write the multisystem() in perl, but not in R. so, if I roll it myself, I will probably rely on a mixed R/perl system here. This is not desirable, but it is the only way I know how to do this. if something like multisystem() already exists in R native, please let me know and save me from reinventing the wheel. if it does not, some perl/R combo for this soon will. regards, /iaw -- Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to calculate min of a dataset that contains empty column
Hi all, I have got a dataset like the following: a b c 1 5 2 3 7 I am taking the minimum of each column and use it as the minimum of the y-axis of my graphs. My scripts (simplified version) are like the following: f-array f[1]=a f[2]=b f[3]=c for i in 1:3 name=f[i] ymin-min(dataset$f[i]) plot(x,y,ylim=c(ymin,100)) The script stops at b, because the min function returns inf value. What can I do to overcome it? Many thanks, Chris -- View this message in context: http://www.nabble.com/How-to-calculate-min-of-a-dataset-that-contains-empty-column-tp25463449p25463449.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] VAR analysis
[[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rounding to the nearest 5
x5 - 5*round(x/5) From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Chris Li [chri...@austwaterenv.com.au] Sent: 16 September 2009 09:15 To: r-help@r-project.org Subject: [R] Rounding to the nearest 5 Hi all, I want to round my values to the nearest 5. For example, if I have a value of 11, I want R to round it to 10. I have been searching for this command on the web and in the help file, but I couldn't find anything useful. Any help would be greatly appreciated. Many thanks, Chris -- View this message in context: http://www.nabble.com/Rounding-to-the-nearest-5-tp25463367p25463367.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to calculate min of a dataset that contains empty column
Let's say your data.frame is called xx Then try: apply(xx,2, function(x) {min(na.omit(x))} ) Does it work ? On Wed, Sep 16, 2009 at 2:24 AM, Chris Li chri...@austwaterenv.com.auwrote: Hi all, I have got a dataset like the following: a b c 1 5 2 3 7 I am taking the minimum of each column and use it as the minimum of the y-axis of my graphs. My scripts (simplified version) are like the following: f-array f[1]=a f[2]=b f[3]=c for i in 1:3 name=f[i] ymin-min(dataset$f[i]) plot(x,y,ylim=c(ymin,100)) The script stops at b, because the min function returns inf value. What can I do to overcome it? Many thanks, Chris -- View this message in context: http://www.nabble.com/How-to-calculate-min-of-a-dataset-that-contains-empty-column-tp25463449p25463449.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- -- My contact information: Tal Galili Phone number: 972-50-3373767 FaceBook: Tal Galili My Blogs: http://www.r-statistics.com/ http://www.talgalili.com http://www.biostatistics.co.il [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to load only lines that start with a particular symbol
On Tue, 15-Sep-2009 at 02:44PM -0700, William Dunlap wrote: | perl can do more complicated processing and filtering than grep. I once used perl to filter the useful 4 or 5 Mb out of a file that was over 250Mb. It took about 3 lines of perl code and about 40 seconds to run. Perl's not exactly my strong point, but I could look it up next time I'm on that machine if anyone is interested. -- ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ___Patrick Connolly {~._.~} Great minds discuss ideas _( Y )_ Average minds discuss events (:_~*~_:) Small minds discuss people (_)-(_) . Eleanor Roosevelt ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Registration
Hi, I have a question to post. But I have a problem finding the appropriate forum. Some of the FAQ on pastecs are answered by r-h...@stat.math.ethz.chmailing list. I searched to see if this mailing list requires registration or not. There was no information about that. I am a registered member of R- help general mailing list but I am aware it is better to post my question to a specific group in some cases. If your mailing list requires registration, please help me with the web page where I could do that. Please also advise if your forum attends to questions on use of pastecs package or if there is a better forum for that. Thank you. Ogbos [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how can we check file empty or not
Hi All, I need to check whether the file is empty or not, Please help me in this regards. Best regards, Deepak.M.R __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how can we check file empty or not
deepak m r wrote: Hi All, I need to check whether the file is empty or not, Please help me in this regards. Best regards, Deepak.M.R __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hi, file.info(your_file)$size == 0 cheers, Paul -- Drs. Paul Hiemstra Department of Physical Geography Faculty of Geosciences University of Utrecht Heidelberglaan 2 P.O. Box 80.115 3508 TC Utrecht Phone: +3130 274 3113 Mon-Tue Phone: +3130 253 5773 Wed-Fri http://intamap.geo.uu.nl/~paul __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Example data for analysis of a highly pseudoreplicate mixed-effects experiment
Hello everybody, it may be better to have sample data. I have provided data with less levels of gene and day and only ca. 400 data points per condition. Sample code: small=as.data.frame(read.csv(small.csv)) small$species=factor(small$species) small$gene=factor(small$gene) small$day=factor(small$day) small$repl=factor(small$repl) library(lme4) model1=lmer(APC~(FITC+species+gene)^3+(1|day)+(1|repl),REML=F,data=small) model2=lmer(APC~(FITC+species+gene)^2+(1|day)+(1|repl),REML=F,data=small) anova(model1,model2) gives me a significant difference, but in fact there are far too many residual df, and this is much worse in the original data. I suppose I cannot trust this difference. model3=lmer(APC~species*gene+(1|day/repl/FITC),data=small) Error: length(f1) == length(f2) is not TRUE In addition: Warning messages: 1: In FITC:(repl:day) : numerical expression has 886 elements: only the first used 2: In FITC:(repl:day) : numerical expression has 886 elements: only the first used Can I do nesting without incurring this kind of error ? And finally model4=lmer(APC~gene*species+(1|day) + (1|repl) + (1+(gene:species)|FITC),data=small) model5=lmer(APC~gene+species+(1|day) + (1|repl) + (1|FITC),data=small) anova(model4,model5) works with this reduced data set, but fails to converge for the original, much larger data set. I am unsure if this is the right kind of analysis for the data and there is only a problem of convergence, or if it is the wrong formula. Can anybody give me some advice on this problem ? Or should I just stick to reducing the data from each condition to a single parameter, as explained in my first email below? Thank you again! Matthias Gralle wrote: Hello everybody, I have been trying for some weeks to state the correct design of my experiment as a GLM formula, and have not been able to find something appropriate in Pinheiro Bates, so I am posting it here and hope somebody can help me. In each experimental condition, described by 1) gene (10 levels, fixed, because of high interest to me) 2) species (2 levels, fixed, because of high interest) 3) day (2 levels, random) 4) replicate (2 levels per day, random), I have several thousand data points consisting of two variables: 5) FITC (level of transfection of a cell) 6) APC (antibody binding to the cell) Because of intrinsic and uncontrollable cell-to-cell variation, FITC varies quite uniformly over a wide range, and APC correlates rather well with FITC. In some cases, I pasted day and replicate together as day_repl. My question is the following: Is there any gene (in my set of 10 genes) where the species makes a difference in the relation between FITC and APC ? If yes, in what gene does species have an effect ? And what is the effect of the species difference ? My attempts are the following: 1. Fit the data points of each experimental condition to a linear equation APC=Intercept+Slope*FITC and analyse the slopes : lm(Slope~species*gene*day_repl) This analysis shows clear differences between the genes, but no effect of species and no interaction gene:species. The linear fit to the cells is reasonably good, but of course does not represent the data set completely, so I wanted to incorporate the complete data set. 2a. lmer(APC~FITC*species*gene+(1|day)+(1|repl)) This gives extremely significant values for any interaction and variable because there are 200 000 df. Of course, it cannot be true, because the cells are not really independent. I have done many variations of the above, e.g. 2b. lmer(APC~FITC*species*gene+(1|day)+(1+FITC|day_repl)), but they all suffer from the excess of df. 3. lmer(APC~species*gene+(1|day/repl/FITC) gives several warning messages like this one: In repl:day : numerical expression has 275591 elements: only the first used 4. lmer(APC~gene*species+(1|day_repl) + (1+gene:species|FITC)) ran several days, but failed to converge... Can somebody give me any hint, or do you think the only possible analysis is a simplification as in my model 1 ? By the way, I am using R version 2.8.0 (2008-10-20) on Ubuntu 8.04 on a linux 2.6.24-24-generic kernel on different Intel systems. I am using the lme4 that came with R 2.8.0. Thank you very much for your time! -- Matthias Gralle, PhD Dept. Evolutionary Genetics Max Planck Institute for Evolutionary Anthropology Deutscher Platz 6 04103 Leipzig, Germany Tel +49 341 3550 519 Fax +49 341 3550 555 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Matthias Gralle, PhD Dept. Evolutionary Genetics Max Planck Institute for Evolutionary Anthropology Deutscher Platz 6 04103 Leipzig, Germany Tel +49 341 3550 519 Fax +49 341 3550 555 __ R-help@r-project.org mailing list
[R] Error with JGR
Hi, I'm trying to run the JGR package (in windows). I have followed all the steps in the web page http://www.rosuda.org/JGR but when run jgr.exe it returns the error message: Cannot find R.exe - please make sure R version 2.3.0 or higher is installed. I have in my computer R version 2.9.0 so I dont know wath is the problem... Thanks in advance for your help, Srpd _ m só local. rk-connector.aspx [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to calculate min of a dataset that contains empty column
Presumably there is no reason to attempt plotting an empty column, so best approach is probably to remove the columns that contain no values. Try f.temp-f[,-which(apply(f,2,function(x)all(is.na(x] and then run your script for f.temp instead of the original f. Also, you may find you have to use min(x, na.rm=T) instead of just min(x) for any other columns that contain NA values as well as real values. Chris Li wrote: Hi all, I have got a dataset like the following: a b c 1 5 2 3 7 I am taking the minimum of each column and use it as the minimum of the y-axis of my graphs. My scripts (simplified version) are like the following: f-array f[1]=a f[2]=b f[3]=c for i in 1:3 name=f[i] ymin-min(dataset$f[i]) plot(x,y,ylim=c(ymin,100)) The script stops at b, because the min function returns inf value. What can I do to overcome it? Many thanks, Chris -- View this message in context: http://www.nabble.com/How-to-calculate-min-of-a-dataset-that-contains-empty-column-tp25463449p25466916.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] extracting one element from all list elements
Hi I have a list which cosists out of dataframes of the same structure. Now I want to extract one column (let's say column A) from all dataframes and have them in a matrix (or dataframe). At the moment I am doiong: d - data.frame(A=runif(100), B=rnorm(100)) theList - list(e1=d, e2=d, e3=d, e4=d) f - sapply(theList, function(l){l$A} ) But I am sure ther is a more elegant way? Rainer -- Rainer M. Krug, Centre of Excellence for Invasion Biology, Stellenbosch University, South Africa [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting one element from all list elements
Try this: sapply(theList, '[[', 'A') On Wed, Sep 16, 2009 at 8:34 AM, Rainer M Krug r.m.k...@gmail.com wrote: Hi I have a list which cosists out of dataframes of the same structure. Now I want to extract one column (let's say column A) from all dataframes and have them in a matrix (or dataframe). At the moment I am doiong: d - data.frame(A=runif(100), B=rnorm(100)) theList - list(e1=d, e2=d, e3=d, e4=d) f - sapply(theList, function(l){l$A} ) But I am sure ther is a more elegant way? Rainer -- Rainer M. Krug, Centre of Excellence for Invasion Biology, Stellenbosch University, South Africa [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] comma as decimal separator in xtable
On Wed, Sep 16, 2009 at 07:11:46AM +0200, Schalk Heunis wrote: This might be of help, first applies the formatting:print(xtable(prettyNum(d, decimal.mark=,))) Thanks! Your solution works for the example that I gave. However, I failed to provide an example that really represent my problem because I'm passing a lm object to xtable. Currently, I'm using the following function, which also puts the table header in bold font: tabprint - function(x, ...) { colsani - function(x){paste({\\bf , x, }, sep = )} p - capture.output(print(x, caption.placement = top, sanitize.colnames.function = colsani, ...)) writeLines(p, /tmp/xtableOutPut) system(sed -i -e 's/\\([0-9]\\)\\.\\([0-9]\\)/\\1,\\2/g' /tmp/xtableOutPut) p - readLines(/tmp/xtableOutPut) cat(p, sep = \n) } tabprint(xtable(lm.model)) I could call gsub() if I knew the perl regular expression equivalent to the sed one that I'm using. On Tue, Sep 15, 2009 at 5:06 PM, Jakson A. Aquino jaksonaqu...@gmail.comwrote: Hello, How can I make xtable print a comma as decimal separator? Setting the option OutDec isn't enough for xtable: library(xtable) options(OutDec = ,) x - c(1.1, 1.2, 1.3) y - c(2.3, 2.2, 2.1) d - data.frame(x, y) d print(xtable(d)) Thanks! Jakson Aquino __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error: could not find function sd
sselamat wrote: Hi, Everytime I try to get the standard deviation of a vector and enter command of sd(x), i get a reply Error: could not find function sd Can someone help me please? Thanks. Hmmm. Please provide more information, it's hard to imagine how this could happen. Results of sessionInfo() ??? Here's the only way I could get this to happen: find(sd) [1] package:stats detach(package:stats) sd(1) Error: could not find function sd -- View this message in context: http://www.nabble.com/error%3A-could-not-find-function-%22sd%22-tp25462193p25470840.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] comma as decimal separator in xtable
Perhaps with the dcolumn package ? http://newton.ex.ac.uk/research/qsystems/people/latham/LaTeX/dcolumn.pdf David 2009/9/16 Jakson A. Aquino jaksonaqu...@gmail.com On Wed, Sep 16, 2009 at 07:11:46AM +0200, Schalk Heunis wrote: This might be of help, first applies the formatting:print(xtable(prettyNum(d, decimal.mark=,))) Thanks! Your solution works for the example that I gave. However, I failed to provide an example that really represent my problem because I'm passing a lm object to xtable. Currently, I'm using the following function, which also puts the table header in bold font: tabprint - function(x, ...) { colsani - function(x){paste({\\bf , x, }, sep = )} p - capture.output(print(x, caption.placement = top, sanitize.colnames.function = colsani, ...)) writeLines(p, /tmp/xtableOutPut) system(sed -i -e 's/\\([0-9]\\)\\.\\([0-9]\\)/\\1,\\2/g' /tmp/xtableOutPut) p - readLines(/tmp/xtableOutPut) cat(p, sep = \n) } tabprint(xtable(lm.model)) I could call gsub() if I knew the perl regular expression equivalent to the sed one that I'm using. On Tue, Sep 15, 2009 at 5:06 PM, Jakson A. Aquino jaksonaqu...@gmail.comwrote: Hello, How can I make xtable print a comma as decimal separator? Setting the option OutDec isn't enough for xtable: library(xtable) options(OutDec = ,) x - c(1.1, 1.2, 1.3) y - c(2.3, 2.2, 2.1) d - data.frame(x, y) d print(xtable(d)) Thanks! Jakson Aquino __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] installation problem
cls59 chuck at sharpsteen.net writes: wesley mathew wrote: Hello All I have some problem for installing XML_2.6-0.tar . I am working in widows and R version is R-2.9.1 Unfortunately, I think there are some problems with CRAN being able to build the XML package for Windows, at least the page: http://cran.r-project.org/web/packages/XML/index.html Lists it as unavailable. However, I know I have installed it on a Windows machine at the university using install.packages()-- R may be set up to pull from some additional repository there. I'll check the next time I'm in the lab. -Charlie As Charlie notes, the CRAN page lists the Windows version as unavailable, but links to a README file which details: The packages BRugs, ncdf, RCurl, RDieHarder, RNetCDF, udunits, and XML do not build out of the box or are special in other circumstances. Nevertheless these are available at http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/2.9/ kindly provided by Professor Brian D. Ripley. Thus you can download the Windows zip file from there, and install it using the install package(s) from local zip files... menu option. Michael Bibo Queensland Health __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] comma as decimal separator in xtable
Try this also; format(coef(summary(lm.D9)), decimal.mark = ',') or using gsub: apply(coef(summary(lm.D9)), 2, gsub, pattern = \\., replacement = ,) using lm.D9 object from ?lm example. On Wed, Sep 16, 2009 at 8:50 AM, Jakson A. Aquino jaksonaqu...@gmail.com wrote: On Wed, Sep 16, 2009 at 07:11:46AM +0200, Schalk Heunis wrote: This might be of help, first applies the formatting:print(xtable(prettyNum(d, decimal.mark=,))) Thanks! Your solution works for the example that I gave. However, I failed to provide an example that really represent my problem because I'm passing a lm object to xtable. Currently, I'm using the following function, which also puts the table header in bold font: tabprint - function(x, ...) { colsani - function(x){paste({\\bf , x, }, sep = )} p - capture.output(print(x, caption.placement = top, sanitize.colnames.function = colsani, ...)) writeLines(p, /tmp/xtableOutPut) system(sed -i -e 's/\\([0-9]\\)\\.\\([0-9]\\)/\\1,\\2/g' /tmp/xtableOutPut) p - readLines(/tmp/xtableOutPut) cat(p, sep = \n) } tabprint(xtable(lm.model)) I could call gsub() if I knew the perl regular expression equivalent to the sed one that I'm using. On Tue, Sep 15, 2009 at 5:06 PM, Jakson A. Aquino jaksonaqu...@gmail.comwrote: Hello, How can I make xtable print a comma as decimal separator? Setting the option OutDec isn't enough for xtable: library(xtable) options(OutDec = ,) x - c(1.1, 1.2, 1.3) y - c(2.3, 2.2, 2.1) d - data.frame(x, y) d print(xtable(d)) Thanks! Jakson Aquino __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting one element from all list elements
On Wed, Sep 16, 2009 at 1:35 PM, Henrique Dallazuanna www...@gmail.comwrote: Try this: sapply(theList, '[[', 'A') Thanks Henrique - that is much more elegant. Rainer On Wed, Sep 16, 2009 at 8:34 AM, Rainer M Krug r.m.k...@gmail.com wrote: Hi I have a list which cosists out of dataframes of the same structure. Now I want to extract one column (let's say column A) from all dataframes and have them in a matrix (or dataframe). At the moment I am doiong: d - data.frame(A=runif(100), B=rnorm(100)) theList - list(e1=d, e2=d, e3=d, e4=d) f - sapply(theList, function(l){l$A} ) But I am sure ther is a more elegant way? Rainer -- Rainer M. Krug, Centre of Excellence for Invasion Biology, Stellenbosch University, South Africa [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O -- Rainer M. Krug, Centre of Excellence for Invasion Biology, Stellenbosch University, South Africa [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] best method to format output of frequency table
Try this: as.matrix(table(x)) On Tue, Sep 15, 2009 at 6:50 PM, esawdust lan...@360vl.com wrote: I have some security alert log data that I'm parsing and doing some stats on. One of the fields is the Classtype which is the enumerated value of the type of alert found. classtypes = factor( alerts$Classtype ) fclass_types = table( classtypes ) fclass_types gives me a frequency table of the intrusion types: fclass_types classtypes attempted-admin attempted-recon 18 93 35 attempted-user misc-activity misc-attack 21 30 12 protocol-command-decode unsuccessful-user web-application-activity 2 2 287 I would like to be able to somehow output this table in the form: attempted-admin 93 attempted-recon 35 attempted-user 21 misc-activity 30 and so on. Is there a function I'm missing or some option that will let me be able to dump the fclass_types out in a two-column format? Thanks for any tips, Landon -- View this message in context: http://www.nabble.com/best-method-to-format-output-of-frequency-table-tp25462448p25462448.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How can I replicate Excel's 'growth trend' series interpolation?
Good day all, I'm trying to replicate Excel's growth trend series interpolation, available by highlighting a number of cells and then under Edit--Fill--Series--Type=growth and select 'Trend'. For example, if my starting point is 1.4 and end point is 30 with the sequence length being 11, the resulting series generated by the above is as follows: 1.4 1.902073774 2.584203316 3.510960968 4.770076272 6.480740698 8.804890658 11.96253686 16.25259117 22.08116245 30 Any idea how this can be replicated in R would be much appreciated. Thanks in advance, George. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error...
On Tue, Sep 15, 2009 at 9:05 AM, Inez Campbell i...@st-andrews.ac.uk wrote: Hi, I am trying to find in FAQ, in the manual, everywhere, the reason and most importantly, how to solve: Error: protect(): protection stack overflow It appeared after requesting a plot. That error message comes from deep inside the compiled code for the R interpreter. It may indicate that calls are nested too deeply or an infinite loop or ... We can't really tell without a reproducible example. (When an R object is created inside compiled code it must be protected from garbage collection and this is done by putting its address onto a structure called the protection stack. When the object goes out of scope it is unprotected. Like most such structures the protection stack has a finite capacity.) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How can I replicate Excel's 'growth trend' series interpolation?
exp(seq(log(1.4), log(30), length=11)) [1] 1.40 1.902074 2.584203 3.510961 4.770076 6.480741 8.804891 11.962537 16.252591 22.081162 [11] 30.00 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How can I replicate Excel's 'growth trend' series interpolation?
Wow :). Thanks a ton. On Wed, Sep 16, 2009 at 2:31 PM, Richard M. Heiberger r...@temple.edu wrote: exp(seq(log(1.4), log(30), length=11)) [1] 1.40 1.902074 2.584203 3.510961 4.770076 6.480741 8.804891 11.962537 16.252591 22.081162 [11] 30.00 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] coefficients of aov results has less number of elements?
On Tue, Sep 15, 2009 at 10:34 AM, Peng Yu pengyu...@gmail.com wrote: Hi, I run the following commands. 'A' has 3 levels and 'B' has 4 levels. Should there be totally 3+4 = 7 coefficients (A1, A2, A3, B1, B2, B3, B4)? If you were to define a model matrix for those coefficients it would be rank deficient. (the sum of the first three columns is 1 and the sum of the last four columns is 1) You only have 6 degrees of freedom in total, not 7. Because of the -1 in the formula the first factor is expanded to the indicator columns but the second is represented with a set of 3 contrasts which are the contr.treatment version, by default. a=3 b=4 n=1000 A = rep(sapply(1:a,function(x){rep(x,n)}),b) B = as.vector(sapply(sapply(1:b, function(x){rep(x,n)}), function(x){rep(x,a)})) Y = A + B + rnorm(a*b*n) fr = data.frame(Y=Y,A=as.factor(A),B=as.factor(B)) afit=aov(Y ~ A + B - 1,fr) afit$coefficients A1 A2 A3 B2 B3 B4 1.993067 2.969252 3.965888 1.005445 2.020717 3.023940 Regards, Peng __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Questions on pareto
The other package that I can think of that you might want to investigate if you are attempting to construct bivariate distributions is the copula package. -- David. On Sep 15, 2009, at 10:27 PM, TsubasaZero wrote: Hi, Thanks a lot. I think this is what I want. actuar package get more distributions. Zero~ David Winsemius wrote: On Sep 15, 2009, at 1:09 PM, TsubasaZero wrote: Hi all, I had generated 1000 random variates (u,v), and I would like to find the corresponding (x,y) for a bivariate pareto distribution. Which x=inverse pareto of u and y=inverse pareto of v. What is the code I should use to find (x,y). Perhaps: ??Pareto On my machine it offers a choice of two packages (actuar and VGAM) that offer Pareto functions. But ?? only searches installed packages, so this would be more general: library(sos) ???Pareto retrieving page 1: found 187 matches; retrieving 10 pages 2 3 4 5 6 7 8 9 10 Or the old fashioned way with r-search... hint: put it on your browser toolbar: http://search.r-project.org/cgi-bin/namazu.cgi?query=paretomax=100result=normalsort=scoreidxname=functionsidxname=Rhelp08idxname=views -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Questions-on-pareto-tp25457966p25464918.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error: could not find function sd
Ben Bolker wrote: sselamat wrote: Hi, Everytime I try to get the standard deviation of a vector and enter command of sd(x), i get a reply Error: could not find function sd Can someone help me please? Thanks. Hmmm. Please provide more information, it's hard to imagine how this could happen. Results of sessionInfo() ??? Here's the only way I could get this to happen: find(sd) [1] package:stats detach(package:stats) sd(1) Error: could not find function sd Or if you start R by specifying that stats is not among the default packages. Anyway, I guess it is a broken installation and re-installing will fix it. Uwe Ligges __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] installation problem
Michael Bibo wrote: cls59 chuck at sharpsteen.net writes: wesley mathew wrote: Hello All I have some problem for installing XML_2.6-0.tar . I am working in widows and R version is R-2.9.1 Unfortunately, I think there are some problems with CRAN being able to build the XML package for Windows, at least the page: http://cran.r-project.org/web/packages/XML/index.html Lists it as unavailable. However, I know I have installed it on a Windows machine at the university using install.packages()-- R may be set up to pull from some additional repository there. I'll check the next time I'm in the lab. -Charlie As Charlie notes, the CRAN page lists the Windows version as unavailable, but links to a README file which details: The packages BRugs, ncdf, RCurl, RDieHarder, RNetCDF, udunits, and XML do not build out of the box or are special in other circumstances. Nevertheless these are available at http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/2.9/ kindly provided by Professor Brian D. Ripley. Thus you can download the Windows zip file from there, and install it using the install package(s) from local zip files... menu option. or just say install.packages(XML) as that CRAN extras repository is already a default under Windows. Best, Uwe Ligges Michael Bibo Queensland Health __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] comma as decimal separator in xtable
On Wed, Sep 16, 2009 at 01:59:29PM +0200, David Hajage wrote: Perhaps with the dcolumn package ? http://newton.ex.ac.uk/research/qsystems/people/latham/LaTeX/dcolumn.pdf Thanks for the suggestion, but the problem isn't of alignment. When I have commas as decimal separators, the values are aligned because all values of each column were formated with the same number of digits. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Registration
ogbos okike wrote: Hi, I have a question to post. But I have a problem finding the appropriate forum. Some of the FAQ on pastecs are answered by r-h...@stat.math.ethz.chmailing list. I searched to see if this mailing list requires registration or not. There was no information about that. I am a registered member of R- help general mailing list ... which is the same as you found out now. So time to as the question. Uwe Ligges but I am aware it is better to post my question to a specific group in some cases. If your mailing list requires registration, please help me with the web page where I could do that. Please also advise if your forum attends to questions on use of pastecs package or if there is a better forum for that. Thank you. Ogbos [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] cycles in a graphical model
Hi, Is there is any R package or existing codes in R to detect cycles in a graphical model or DAG (Directed Acyclic Graph) ? Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] comma as decimal separator in xtable
But, it seems that dcolumn can change the decimal separator too (see the table on the first page of the pdf document). For example: \documentclass{article} \usepackage{dcolumn} \newcolumntype{d}[1]{D{.}{,}{#1}} \begin{document} results=tex= x - matrix(rnorm(4), 2, 2) library(xtable) xtable(x, align = c(l, |, d{2}, |, c, |)) @ \end{document} 2009/9/16 Jakson A. Aquino jaksonaqu...@gmail.com On Wed, Sep 16, 2009 at 01:59:29PM +0200, David Hajage wrote: Perhaps with the dcolumn package ? http://newton.ex.ac.uk/research/qsystems/people/latham/LaTeX/dcolumn.pdf Thanks for the suggestion, but the problem isn't of alignment. When I have commas as decimal separators, the values are aligned because all values of each column were formated with the same number of digits. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] A matrix problem
Hi all, I have following ch. matrix : mat [,1][,2] [,3][,4] [1,] NA0.0671746073115122 1.15281464953731 0.822277316923348 [2,] 0.113184828073156 -0.0133150789005112 0.640912657072027 -0.0667317787225847 [3,] -1.40593584523871 1.107555494147580.493828059815449 -1.09233516877992 [4,] 0.577643850085066 1.102795250710261.16725625310315 0.367724768195794 [5,] 0.746100264271392 -0.335556133578362 NA 0.328559028366446 Now I want to convert it to a numeric matrix. So I used following code : as.numeric(mat) [1] NA 0.11318483 -1.40593585 0.57764385 0.74610026 0.06717461 -0.01331508 1.10755549 1.10279525 -0.33555613 [11] 1.15281465 0.64091266 0.49382806 1.16725625 NA 0.82227732 -0.06673178 -1.09233517 0.36772477 0.32855903 Warning message: NAs introduced by coercion What I noticed is that : 1. Original matrix converted to vector 2. A waring message is there. I do not want such things to happen. Is there any direct way to convert my original ch. matrix to a numeric matrix ? Thanks -- View this message in context: http://www.nabble.com/A-matrix-problem-tp25472583p25472583.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cycles in a graphical model
On 9/16/2009 9:26 AM, shuva gupta wrote: Hi, Is there is any R package or existing codes in R to detect cycles in a graphical model or DAG (Directed Acyclic Graph) ? Thanks. You might try this to search R packages: library(sos) findFn('Directed Acyclic Graph') [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Chuck Cleland, Ph.D. NDRI, Inc. (www.ndri.org) 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to extract a specific substring from a string (regular expressions) ? See details inside
Hi all, I have thousands of strings like these ones: 1159_1; YP_177963; PPE FAMILY PROTEIN 1100_13; SECRETED L-ALANINE DEHYDROGENASE ALD CAA15575 1141_24; gi;2894249;emb;CAA17111.1; PROBABLE ISOCITRATE DEHYDROGENASE and various others.. I'm interested to extract the code for the protein (in this example: YP_177963, CAA15575, CAA17111). I found only one common criterion to identify the protein codes in ALL my strings: I need a sequence of characters selected in this way: start: the first alphabetic capital letter followed after three characters by a digit end: the last following digit before a non-digit character, or nothing. Tricky, isn't it? Well, I'm not an expert, and I played a lot with regular expressions and sub() command with no big results. Also with substring.location in Hmisc package (but here I don't know how to use regular expressions). Maybe there are other more useful functions or maybe is just a matter to use regular expression in a better way... Can anybody help me? Thanks a lot in advance... _ Racconta la tua estate, crea il tuo blog. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A matrix problem
On Sep 16, 2009, at 9:40 AM, Bogaso wrote: Hi all, I have following ch. matrix : mat [,1][,2] [,3][,4] [1,] NA0.0671746073115122 1.15281464953731 0.822277316923348 [2,] 0.113184828073156 -0.0133150789005112 0.640912657072027 -0.0667317787225847 [3,] -1.40593584523871 1.107555494147580.493828059815449 -1.09233516877992 [4,] 0.577643850085066 1.102795250710261.16725625310315 0.367724768195794 [5,] 0.746100264271392 -0.335556133578362 NA 0.328559028366446 Now I want to convert it to a numeric matrix. So I used following code : as.numeric(mat) [1] NA 0.11318483 -1.40593585 0.57764385 0.74610026 0.06717461 -0.01331508 1.10755549 1.10279525 -0.33555613 [11] 1.15281465 0.64091266 0.49382806 1.16725625 NA 0.82227732 -0.06673178 -1.09233517 0.36772477 0.32855903 Warning message: NAs introduced by coercion What I noticed is that : 1. Original matrix converted to vector I'm not sure if this is a recommended method but it is compact: M - matrix(as.character(c(NA,NA, rnorm(14))), ncol=4) M [,1][,2] [,3] [,4] [1,] NA-0.601520920256251 -1.00727470025995 -0.510937067339167 [2,] NA -0.643289410284616 -0.560094940433377 1.65539295869595 [3,] -1.58527121117001 -0.303929580683863 0.656380566243141 0.863929178838393 [4,] 0.466350502132621 1.85702073433996 0.349130873645143 -0.821650081436582 I put in both a real NA and a character NA just to be sure. mode(M) - numeric Warning message: In eval(expr, envir, enclos) : NAs introduced by coercion M [,1][,2][,3][,4] [1,] NA -0.6015 -1.0073 -0.5109 [2,] NA -0.6433 -0.5601 1.6554 [3,] -1.5853 -0.3039 0.6564 0.8639 [4,] 0.4664 1.8570 0.3491 -0.8217 2. A waring message is there. You can suppress warnings: options(warn = -1) # and then restore the warning level to the default of 0 I do not want such things to happen. Is there any direct way to convert my original ch. matrix to a numeric matrix ? Thanks -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] list of symbols to substitution
Ah whoops (and thank you). I forgot to mention an additional constraint: I'm trying to use exactly one eval statement. Some background information on this constraint. Some of the expressions that are being evaluated may return an error. At the moment, I'm wrapping each of the individual expressions inside a try-catch block and an eval statement. I was hoping to find a solution that uses a single try-catch block and a single eval statement. Today I realize that a partial solution is to use a single try-catch block and multiple eval statements, as has been suggested. Here is the code snippet, slightly updated: symnames - c('foo', 'bar', 'baz') foo - expression(1 + 2) bar - expression(3 + 4) baz - expression(abs('c')) expr - substitute(expr, list(expr = lapply(symnames, as.symbol))) eval(expr) Anybody know of a solution that uses a single eval expression? --Michael On Wed, Sep 16, 2009 at 1:31 AM, Schalk Heunis schalk.heu...@enerweb.co.za wrote: Instead of eval(expr), use lapply(expr,eval) HTH Schalk Heunis On Wed, Sep 16, 2009 at 5:50 AM, Michael Spiegel michael.m.spie...@gmail.com wrote: Hi, I'm trying to use a list of symbols as one of the values to be substituted in a substitute expression, but I can't figure out how to write the correct expression for my problem. Let me illustrate a simple example of what I'm trying to do. The following code snippet will evaluate to '5': symname - 'foo' foo - 5 expr - substitute(c(expr), list(expr = as.symbol(symname))) eval(expr) I would like the next similar example to evaluate to the list('5', '6', '7'), but instead it evaluates to the list(foo, bar, baz) where the type of foo, bar, and baz are all symbols. For the purposes of finding a solution assume that the length and contents of symnames are unknown ahead of time (symname is not a constant, assume it is some parameter to a function). symnames - c('foo', 'bar', 'baz') foo - 5 bar - 6 baz - 7 expr - substitute(expr, list(expr = lapply(symnames, as.symbol))) eval(expr) Thanks! --Michael __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] a sequence that wraps around
I'd like to have something like seq() where I can pass in a length of the desired sequence and a right limit so that the sequence goes up to the limit and then starts again from 1. # works now seq(from=2, length.out=3) [1] 2 3 4 # what I want seq(from=2, length.out=3, rlimit=3) [1] 2 3 1 # additional examples of what I want seq(from=2, length.out=4, rlimit=3) [1] 2 3 1 2 seq(from=2, length.out=4, rlimit=4) [1] 2 3 4 1 seq(from=2, length.out=3, rlimit=2) [1] 2 1 2 I can write this procedurally, but it seems like there ought to be a cleaner R way of doing it. Thanks in advance for any suggestions. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Is there a way to round numbers up or down to the nearest natural looking number?
Hi all, Given the following series of numbers: t [1] 21.85000 30.90410 43.71000 61.82234 87.43999 123.67296 [7] 174.91997 247.40249 349.91996 494.91815 700.0 What's the simplest way of formatting them into the nearest natural looking (rounded to nearest multiple of 10) numbers? So: t[1] becomes 20 t[2] becomes 30 t[3] becomes 40 t[4] becomes 60 t[5] becomes 90 t[6] becomes 120, etc ? Thus far I've tried signif(t,2) but that's not working great.. Any help would be much appreciated, Thanks in advance, George. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A matrix problem
On 16-Sep-09 13:40:03, Bogaso wrote: Hi all, I have following ch. matrix : mat [,1][,2] [,3][,4] [1,] NA0.0671746073115122 1.15281464953731 0.822277316923348 [2,] 0.113184828073156 -0.0133150789005112 0.640912657072027 -0.0667317787225847 [3,] -1.40593584523871 1.107555494147580.493828059815449 -1.09233516877992 [4,] 0.577643850085066 1.102795250710261.16725625310315 0.367724768195794 [5,] 0.746100264271392 -0.335556133578362 NA 0.328559028366446 Now I want to convert it to a numeric matrix. So I used following code: as.numeric(mat) [1] NA 0.11318483 -1.40593585 0.57764385 0.74610026 0.06717461 -0.01331508 1.10755549 1.10279525 -0.33555613 [11] 1.15281465 0.64091266 0.49382806 1.16725625 NA 0.82227732 -0.06673178 -1.09233517 0.36772477 0.32855903 Warning message: NAs introduced by coercion What I noticed is that : 1. Original matrix converted to vector 2. A waring message is there. I do not want such things to happen. Is there any direct way to convert my original ch. matrix to a numeric matrix ? Thanks matn-as.numeric(mat); dim(matn)-dim(mat) This does the job (without requiring you to use a specific indication of dimension, e.g. ncol=4), but the warning message will still be there unless, as David Winsemius said, you also use options(warn = -1). However, a warning message such as this does not mean that anything has gone wrong -- it is simply a notification that something which was not the character representation of a number was converted into NA. So you can ignore it. (It would lead to exactly the same result if, instead of NA in mat[1,1] and mat[5,3], you had some other string such as tiger -- try it! You would still get the warning, and this time there might even be some point to it ... ). Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 16-Sep-09 Time: 15:13:39 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to extract a specific substring from a string (regular expressions) ? See details inside
Try this: library(gsubfn) strapply(x, [A-Z]{3}[0-9]+) On Wed, Sep 16, 2009 at 10:53 AM, Giulio Di Giovanni perimessagg...@hotmail.com wrote: Hi all, I have thousands of strings like these ones: 1159_1; YP_177963; PPE FAMILY PROTEIN 1100_13; SECRETED L-ALANINE DEHYDROGENASE ALD CAA15575 1141_24; gi;2894249;emb;CAA17111.1; PROBABLE ISOCITRATE DEHYDROGENASE and various others.. I'm interested to extract the code for the protein (in this example: YP_177963, CAA15575, CAA17111). I found only one common criterion to identify the protein codes in ALL my strings: I need a sequence of characters selected in this way: start: the first alphabetic capital letter followed after three characters by a digit end: the last following digit before a non-digit character, or nothing. Tricky, isn't it? Well, I'm not an expert, and I played a lot with regular expressions and sub() command with no big results. Also with substring.location in Hmisc package (but here I don't know how to use regular expressions). Maybe there are other more useful functions or maybe is just a matter to use regular expression in a better way... Can anybody help me? Thanks a lot in advance... _ Racconta la tua estate, crea il tuo blog. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there a way to round numbers up or down to the nearest natural looking number?
round(t, -1) Uwe Ligges Jorgy Porgee wrote: Hi all, Given the following series of numbers: t [1] 21.85000 30.90410 43.71000 61.82234 87.43999 123.67296 [7] 174.91997 247.40249 349.91996 494.91815 700.0 What's the simplest way of formatting them into the nearest natural looking (rounded to nearest multiple of 10) numbers? So: t[1] becomes 20 t[2] becomes 30 t[3] becomes 40 t[4] becomes 60 t[5] becomes 90 t[6] becomes 120, etc ? Thus far I've tried signif(t,2) but that's not working great.. Any help would be much appreciated, Thanks in advance, George. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to extract a specific substring from a string (regular expressions) ? See details inside
This should do it for you: pat - .*(\\b[A-Z]..[0-9]+).* grep(pat, x) [1] 1 3 5 sub(pat, '\\1', x) [1] YP_177963 CAA15575CAA17111 On Wed, Sep 16, 2009 at 9:53 AM, Giulio Di Giovanni perimessagg...@hotmail.com wrote: Hi all, I have thousands of strings like these ones: 1159_1; YP_177963; PPE FAMILY PROTEIN 1100_13; SECRETED L-ALANINE DEHYDROGENASE ALD CAA15575 1141_24; gi;2894249;emb;CAA17111.1; PROBABLE ISOCITRATE DEHYDROGENASE and various others.. I'm interested to extract the code for the protein (in this example: YP_177963, CAA15575, CAA17111). I found only one common criterion to identify the protein codes in ALL my strings: I need a sequence of characters selected in this way: start: the first alphabetic capital letter followed after three characters by a digit end: the last following digit before a non-digit character, or nothing. Tricky, isn't it? Well, I'm not an expert, and I played a lot with regular expressions and sub() command with no big results. Also with substring.location in Hmisc package (but here I don't know how to use regular expressions). Maybe there are other more useful functions or maybe is just a matter to use regular expression in a better way... Can anybody help me? Thanks a lot in advance... _ Racconta la tua estate, crea il tuo blog. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Memory in R on windows running 64bit XP
Hi I'm running 32-bit R on Windows XP 64bit and the machine has 16Gb of RAM. The help for memory.limit states: If 32-bit R is run on some 64-bit versions of Windows the maximum value of obtainable memory is just under 4GB. So, using the help which states the size parameter can go up to 4095: memory.limit(size=4095) When I run mclust I get: Error: cannot allocate vector of size 1.3 Gb So, now I set the max-mem-size flag: --max-mem-size=3500M Repeat and I still get the same error: Error: cannot allocate vector of size 1.3 Gb So, can anyone help? 32 bit R is reported to be able to use up to 4Gb of RAM on 64 bit windows, yet mine croaks at 1.3Gb. sessionInfo() R version 2.9.2 (2009-08-24) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] mclust_3.3.1 Thanks Mick __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a sequence that wraps around
Try rep: rep(2:4, lenght.out = 3, times = 10) On Wed, Sep 16, 2009 at 11:08 AM, Jack Tanner i...@hotmail.com wrote: I'd like to have something like seq() where I can pass in a length of the desired sequence and a right limit so that the sequence goes up to the limit and then starts again from 1. # works now seq(from=2, length.out=3) [1] 2 3 4 # what I want seq(from=2, length.out=3, rlimit=3) [1] 2 3 1 # additional examples of what I want seq(from=2, length.out=4, rlimit=3) [1] 2 3 1 2 seq(from=2, length.out=4, rlimit=4) [1] 2 3 4 1 seq(from=2, length.out=3, rlimit=2) [1] 2 1 2 I can write this procedurally, but it seems like there ought to be a cleaner R way of doing it. Thanks in advance for any suggestions. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to extract a specific substring from a string (regular expressions) ? See details inside
I'm was guessing that the .1 was a part of the protein code for third example and looking at: http://www.plosone.org/article/fetchSingleRepresentation.action?uri=info:doi/10.1371/journal.pone.0003840.s002 I see quite a few protein codes of that form. I am a complete ignoramus with regex strings but I am guessing that the OP will need a . added to the termination pattern. Experimentation shows that simply adding a period after the 9 works for this example: pat - .*(\\b[A-Z]..[0-9.]+).* -- David On Sep 16, 2009, at 10:15 AM, jim holtman wrote: This should do it for you: pat - .*(\\b[A-Z]..[0-9]+).* grep(pat, x) [1] 1 3 5 sub(pat, '\\1', x) [1] YP_177963 CAA15575CAA17111 On Wed, Sep 16, 2009 at 9:53 AM, Giulio Di Giovanni perimessagg...@hotmail.com wrote: Hi all, I have thousands of strings like these ones: 1159_1; YP_177963; PPE FAMILY PROTEIN 1100_13; SECRETED L-ALANINE DEHYDROGENASE ALD CAA15575 1141_24; gi;2894249;emb;CAA17111.1; PROBABLE ISOCITRATE DEHYDROGENASE and various others.. I'm interested to extract the code for the protein (in this example: YP_177963, CAA15575, CAA17111). I found only one common criterion to identify the protein codes in ALL my strings: I need a sequence of characters selected in this way: start: the first alphabetic capital letter followed after three characters by a digit end: the last following digit before a non-digit character, or nothing. Tricky, isn't it? Well, I'm not an expert, and I played a lot with regular expressions and sub() command with no big results. Also with substring.location in Hmisc package (but here I don't know how to use regular expressions). Maybe there are other more useful functions or maybe is just a matter to use regular expression in a better way... Can anybody help me? Thanks a lot in advance... _ Racconta la tua estate, crea il tuo blog. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory in R on windows running 64bit XP
On 9/16/2009 10:16 AM, michael watson (IAH-C) wrote: Hi I'm running 32-bit R on Windows XP 64bit and the machine has 16Gb of RAM. The help for memory.limit states: If 32-bit R is run on some 64-bit versions of Windows the maximum value of obtainable memory is just under 4GB. So, using the help which states the size parameter can go up to 4095: memory.limit(size=4095) When I run mclust I get: Error: cannot allocate vector of size 1.3 Gb So, now I set the max-mem-size flag: --max-mem-size=3500M Repeat and I still get the same error: Error: cannot allocate vector of size 1.3 Gb So, can anyone help? 32 bit R is reported to be able to use up to 4Gb of RAM on 64 bit windows, yet mine croaks at 1.3Gb. You are misreading things in a couple of places. The most important one is the error message: it says allocation of a 1.3 Gb object failed. You may have 4 Gb available in total, but have used so much of it already that Windows won't allocate a 1.3 Gb piece to R. Using Rprofmem() might show how the allocations are happening. Another possibility is that your particular version of Windows doesn't support giving 4 Gb to R. Run memory.size(NA) to find out the limit, memory.size() to find out the current amount in use. And even if the limit is close to 4 Gb and you have more than 1.3 Gb free, the allocation may fail because there is no sufficiently large contiguous block available. Once memory is allocated, that address is in use and is unavailable. Since the 32 bit address space only covers 4 Gb, you can fairly easily end up with allocations that are spread out across it leaving no large blocks available. That's the advantage of switching to 64 bit R: even if you still only had 4 Gb of memory available, the address space is so much bigger that fragmentation probably won't be a problem. Duncan Murdoch sessionInfo() R version 2.9.2 (2009-08-24) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] mclust_3.3.1 Thanks Mick __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to extract a specific substring from a string (regular expressions) ? See details inside
Assuming the rule is an upper case alphabetic character followed by two other characters followed by a string of digits then try this: library(gsubfn) strapply(x, [A-Z][^ ][^ ][0-9]+) [[1]] [1] YP_177963 [[2]] [1] CAA15575 [[3]] [1] CAA17111 If you prefer the output as one long vector of strings try this: strapply(x, [A-Z][^ ][^ ][0-9]+, simplify = c) [1] YP_177963 CAA15575 CAA17111 If the string that denotes a protein can be part of a word which itself does not denote a protein then we will need something like this: strapply(x, \\b[A-Z][^ ][^ ][0-9]+\\b, perl = TRUE) [[1]] [1] YP_177963 [[2]] [1] CAA15575 [[3]] [1] CAA17111 however, I would expect this second solution using perl's \b to be much slower because the first one uses tcl code underneath whereas the second uses R code. See http://gsubfn.googlecode.com for more. On Wed, Sep 16, 2009 at 9:53 AM, Giulio Di Giovanni perimessagg...@hotmail.com wrote: Hi all, I have thousands of strings like these ones: 1159_1; YP_177963; PPE FAMILY PROTEIN 1100_13; SECRETED L-ALANINE DEHYDROGENASE ALD CAA15575 1141_24; gi;2894249;emb;CAA17111.1; PROBABLE ISOCITRATE DEHYDROGENASE and various others.. I'm interested to extract the code for the protein (in this example: YP_177963, CAA15575, CAA17111). I found only one common criterion to identify the protein codes in ALL my strings: I need a sequence of characters selected in this way: start: the first alphabetic capital letter followed after three characters by a digit end: the last following digit before a non-digit character, or nothing. Tricky, isn't it? Well, I'm not an expert, and I played a lot with regular expressions and sub() command with no big results. Also with substring.location in Hmisc package (but here I don't know how to use regular expressions). Maybe there are other more useful functions or maybe is just a matter to use regular expression in a better way... Can anybody help me? Thanks a lot in advance... _ Racconta la tua estate, crea il tuo blog. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to extract a specific substring from a string (regular expressions) ? See details inside
That did not work on all three: strapply(x, [A-Z]{3}[0-9]+) [[1]] NULL [[2]] [1] CAA15575 [[3]] [1] CAA17111 But adding a _ to the initiation pattern and a period to the termination pattern makes it complete: library(gsubfn) strapply(x, [A-Z_]{3}[0-9.]+) [[1]] [1] YP_177963 [[2]] [1] CAA15575 [[3]] [1] CAA17111.1 Maybe between the two of you and Jim Holtman, I can eventually learn how to use regular expressions. -- David. On Sep 16, 2009, at 10:14 AM, Henrique Dallazuanna wrote: Try this: library(gsubfn) strapply(x, [A-Z]{3}[0-9]+) On Wed, Sep 16, 2009 at 10:53 AM, Giulio Di Giovanni perimessagg...@hotmail.com wrote: Hi all, I have thousands of strings like these ones: 1159_1; YP_177963; PPE FAMILY PROTEIN 1100_13; SECRETED L-ALANINE DEHYDROGENASE ALD CAA15575 1141_24; gi;2894249;emb;CAA17111.1; PROBABLE ISOCITRATE DEHYDROGENASE and various others.. I'm interested to extract the code for the protein (in this example: YP_177963, CAA15575, CAA17111). I found only one common criterion to identify the protein codes in ALL my strings: I need a sequence of characters selected in this way: start: the first alphabetic capital letter followed after three characters by a digit end: the last following digit before a non-digit character, or nothing. Tricky, isn't it? Well, I'm not an expert, and I played a lot with regular expressions and sub() command with no big results. Also with substring.location in Hmisc package (but here I don't know how to use regular expressions). Maybe there are other more useful functions or maybe is just a matter to use regular expression in a better way... Can anybody help me? David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] wilcox.test p-value = 0
That's right, if the test is exact it is not possible to get a p-value of zero. wilcox.test does not provide an exact p-value in the presence of ties so if there are any ties in your data you are getting a normal approximation. Incidentally, if there are any ties in your data set I would strongly recommend computing the *exact* p-value because using the normal approximation on tied data sets will either inflate type I error rate or reduce power depending on how the ties are distributed. Depending on the pattern of ties this can result in gross under or over estimation of the p-value. I guess this is all by way of saying that you should always compute the exact p-value if possible. The package exactRankTests uses the algorithm by Mehta Patel and Tsiatis (1984). If your sample sizes are larger, there is a freely available .exe by Cheung and Klotz (1995) that will do exact p-values for sample sizes larger than 100 in each group! You can find it at http://pages.cs.wisc.edu/~klotz/ Bryan Hi Murat, I am not an expert in either statistics nor R, but I can imagine that since the default is exact=TRUE, It numerically computes the probability, and it may indeed be 0. if you use wilcox.test(x, y, exact=FALSE) it will give you a normal aproximation, which will most likely be different from zero. No, the exact p-value can't be zero for a discrete distribution. The smallest possible value in this case would, I think, be 1/choose(length(x)+length(y),length(x)), or perhaps twice that. More generally, the approach used by format.pvalue() is to display very small p-values as 2e-16, where 2e-16 is machine epsilon. I wouldn't want to claim optimality for this choice, but it seems a reasonable way to represent very small. -thomas Hope this helps. Keo. Murat Tasan escribi?: hi, folks, how have you gone about reporting a p-value from a test when the returned value from a test (in this case a rank-sum test) is numerically equal to 0 according to the machine? the next lowest value greater than zero that is distinct from zero on the machine is likely algorithm-dependent (the algorithm of the test itself), but without knowing the explicit steps of the algorithm implementation, it is difficult to provide any non-zero value. i initially thought to look at .mach...@double.xmin, but i'm not comfortable with reporting p .mach...@double.xmin, since without knowing the specifics of the implementation, this may not be true! to be clear, if i have data x, and i run the following line, the returned value is TRUE. wilcox.test(x)$p.value == 0 thanks for any help on this! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle -- - Bryan Keller, Doctoral Student/Project Assistant Educational Psychology - Quantitative Methods The University of Wisconsin - Madison __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory in R on windows running 64bit XP
Hi Duncan Many thanks for the reply. Here are the results of the commands: Rprofmem() Error in Rprofmem() : memory profiling is not available on this system memory.size(NA) [1] 3500 memory.size() [1] 16.19 Clearly Rprofmem() is a dead end, though if there is some way of enabling it, I will be happy to. I am only actually using 16.19Mb of memory, as all I do is open R and read in some data (a 2.7Mb text file). I am not sure how to guard against the memory fragmentation problem. The machine has 16Gb of RAM and I am only running R and Internet Explorer, but I don't know if that helps. Is there a way to deliver contiguous memory to R? Is 64bit R available for Windows? Sorry to bang on about Windows, I'm waiting for my IT department to build Linux machines, and this is what I have in the meantime. Thanks Mick From: Duncan Murdoch [murd...@stats.uwo.ca] Sent: 16 September 2009 15:43 To: michael watson (IAH-C) Cc: r-help@r-project.org Subject: Re: [R] Memory in R on windows running 64bit XP On 9/16/2009 10:16 AM, michael watson (IAH-C) wrote: Hi I'm running 32-bit R on Windows XP 64bit and the machine has 16Gb of RAM. The help for memory.limit states: If 32-bit R is run on some 64-bit versions of Windows the maximum value of obtainable memory is just under 4GB. So, using the help which states the size parameter can go up to 4095: memory.limit(size=4095) When I run mclust I get: Error: cannot allocate vector of size 1.3 Gb So, now I set the max-mem-size flag: --max-mem-size=3500M Repeat and I still get the same error: Error: cannot allocate vector of size 1.3 Gb So, can anyone help? 32 bit R is reported to be able to use up to 4Gb of RAM on 64 bit windows, yet mine croaks at 1.3Gb. You are misreading things in a couple of places. The most important one is the error message: it says allocation of a 1.3 Gb object failed. You may have 4 Gb available in total, but have used so much of it already that Windows won't allocate a 1.3 Gb piece to R. Using Rprofmem() might show how the allocations are happening. Another possibility is that your particular version of Windows doesn't support giving 4 Gb to R. Run memory.size(NA) to find out the limit, memory.size() to find out the current amount in use. And even if the limit is close to 4 Gb and you have more than 1.3 Gb free, the allocation may fail because there is no sufficiently large contiguous block available. Once memory is allocated, that address is in use and is unavailable. Since the 32 bit address space only covers 4 Gb, you can fairly easily end up with allocations that are spread out across it leaving no large blocks available. That's the advantage of switching to 64 bit R: even if you still only had 4 Gb of memory available, the address space is so much bigger that fragmentation probably won't be a problem. Duncan Murdoch sessionInfo() R version 2.9.2 (2009-08-24) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] mclust_3.3.1 Thanks Mick __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Can someone please explain why the last tick mark on this chart is not showing?
Hi all, I'm trying to log chart but with natural looking tick marks. My specifications are very specific -- it must indicate the lowest number's tick as well as the maximum. I've attached sample code and data for a particular case (and there are a few more like this) where the bottom tickmarks on the chart are not set to where I want them to be and yet they fit in the ylim range. Please, can anyone help? I'm tearing my hair out at this point.. Thanking you in advance, George. # Code below sample-c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 5525, 5600, 7100, 6124, 6550, 6061, 6950, 6650, 7150, 7275, 8700, 10511, 10399, 10345, 11200, 10200, 11035, 12500, 15750, 14445, 10902, 11500, 8527, 6440, 7140, 7190, 6844, 6750, 6924, 6268, 8010, 7575, 8200, 9240) # Plotting code # Box settings par(bty=l,ps=7,tcl=0.2,mgp=c(0.5,0.2,0),bg=white); # Make x axis match exactly the data par(xaxs=i,yaxs=r); # Margins par(mar=c(1.2,1.8,1,1)) # Step 1: Generate logs for the data sampleLog-log(sample,10) # Step 2: Determine pretty min/max for the data purtee-pretty(c(sample,pxTarget)) dataMin-min(sample,na.rm=T)*0.95 dataMax-max(purtee) # Step 3: Create exponential series t-exp(seq(log(dataMin),log(dataMax),length=11)) t2-round(t,-1) t2[t2==0]-1 # natural looking labels lab-signif(round(t,-1),2) lab[1]-signif(lab[1],1) ylimits-range(log(t2,10)) plot(sampleLog,type='l',xaxt=n,yaxt=n,ylim=ylimits,las=2,ann=FALSE) axis(2,las=2,at=log(lab,10),labels=lab); __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can someone please explain why the last tick mark on this chart is not showing?
On Sep 16, 2009, at 11:13 AM, Jorgy Porgee wrote: Hi all, I'm trying to log chart but with natural looking tick marks. My specifications are very specific -- it must indicate the lowest number's tick as well as the maximum. I've attached sample code and data for a particular case (and there are a few more like this) where the bottom tickmarks on the chart are not set to where I want them to be and yet they fit in the ylim range. Please, can anyone help? I'm tearing my hair out at this point.. Thanking you in advance, George. # Code below sample-c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 5525, 5600, 7100, 6124, 6550, 6061, 6950, 6650, 7150, 7275, 8700, 10511, 10399, 10345, 11200, 10200, 11035, 12500, 15750, 14445, 10902, 11500, 8527, 6440, 7140, 7190, 6844, 6750, 6924, 6268, 8010, 7575, 8200, 9240) # Plotting code # Box settings par(bty=l,ps=7,tcl=0.2,mgp=c(0.5,0.2,0),bg=white); # Make x axis match exactly the data par(xaxs=i,yaxs=r); # Margins par(mar=c(1.2,1.8,1,1)) # Step 1: Generate logs for the data sampleLog-log(sample,10) # Step 2: Determine pretty min/max for the data purtee-pretty(c(sample,pxTarget)) Error in pretty(c(sample, pxTarget)) : object 'pxTarget' not found ...which then creates a cascade of errors dataMin-min(sample,na.rm=T)*0.95 dataMax-max(purtee) # Step 3: Create exponential series t-exp(seq(log(dataMin),log(dataMax),length=11)) t2-round(t,-1) t2[t2==0]-1 # natural looking labels lab-signif(round(t,-1),2) lab[1]-signif(lab[1],1) ylimits-range(log(t2,10)) plot (sampleLog,type='l',xaxt=n,yaxt=n,ylim=ylimits,las=2,ann=FALSE) axis(2,las=2,at=log(lab,10),labels=lab); __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can someone please explain why the last tick mark on this chart is not showing?
Hi David, Sorry, forgot to define pxTarget. It can be anything you choose so say pxTarget-1. I've also renamed sample to sampleData. The following cut and paste should work: The last tick mark (5000) is not showing for some reason... :-/ Regards, George sampleData-c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 5525, 5600, 7100, 6124, 6550, 6061, 6950, 6650, 7150, 7275, 8700, 10511, 10399, 10345, 11200, 10200, 11035, 12500, 15750, 14445, 10902, 11500, 8527, 6440, 7140, 7190, 6844, 6750, 6924, 6268, 8010, 7575, 8200, 9240) pxTarget-1 # Plotting code # Box settings par(bty=l,ps=7,tcl=0.2,mgp=c(0.5,0.2,0),bg=white); # Make x axis match exactly the data par(xaxs=i,yaxs=r); # Margins par(mar=c(1.2,1.8,1,1)) # Step 1: Generate logs for the data sampleLog-log(sampleData,10) # Step 2: Determine pretty min/max for the data purtee-pretty(c(sampleData,pxTarget)) dataMin-min(sampleData,na.rm=T)*0.95 dataMax-max(purtee) # Step 3: Create exponential series t-exp(seq(log(dataMin),log(dataMax),length=11)) t2-round(t,-1) t2[t2==0]-1 # natural looking labels lab-signif(round(t,-1),2) lab[1]-signif(lab[1],1) ylimits-range(log(t2,10)) plot(sampleLog,type='l',xaxt=n,yaxt=n,ylim=ylimits,las=2,ann=FALSE) axis(2,las=2,at=log(lab,10),labels=lab); On Wed, Sep 16, 2009 at 5:20 PM, David Winsemius dwinsem...@comcast.net wrote: On Sep 16, 2009, at 11:13 AM, Jorgy Porgee wrote: Hi all, I'm trying to log chart but with natural looking tick marks. My specifications are very specific -- it must indicate the lowest number's tick as well as the maximum. I've attached sample code and data for a particular case (and there are a few more like this) where the bottom tickmarks on the chart are not set to where I want them to be and yet they fit in the ylim range. Please, can anyone help? I'm tearing my hair out at this point.. Thanking you in advance, George. # Code below sample-c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 5525, 5600, 7100, 6124, 6550, 6061, 6950, 6650, 7150, 7275, 8700, 10511, 10399, 10345, 11200, 10200, 11035, 12500, 15750, 14445, 10902, 11500, 8527, 6440, 7140, 7190, 6844, 6750, 6924, 6268, 8010, 7575, 8200, 9240) # Plotting code # Box settings par(bty=l,ps=7,tcl=0.2,mgp=c(0.5,0.2,0),bg=white); # Make x axis match exactly the data par(xaxs=i,yaxs=r); # Margins par(mar=c(1.2,1.8,1,1)) # Step 1: Generate logs for the data sampleLog-log(sample,10) # Step 2: Determine pretty min/max for the data purtee-pretty(c(sample,pxTarget)) Error in pretty(c(sample, pxTarget)) : object 'pxTarget' not found ...which then creates a cascade of errors dataMin-min(sample,na.rm=T)*0.95 dataMax-max(purtee) # Step 3: Create exponential series t-exp(seq(log(dataMin),log(dataMax),length=11)) t2-round(t,-1) t2[t2==0]-1 # natural looking labels lab-signif(round(t,-1),2) lab[1]-signif(lab[1],1)
Re: [R] Can someone please explain why the last tick mark on this chart is not showing?
It shows if you lower your minimum a bit by applying a factor of 0.90 to your generation of dataMin. -- David. On Sep 16, 2009, at 11:27 AM, Jorgy Porgee wrote: Hi David, Sorry, forgot to define pxTarget. It can be anything you choose so say pxTarget-1. I've also renamed sample to sampleData. The following cut and paste should work: The last tick mark (5000) is not showing for some reason... :-/ Regards, George sampleData-c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 5525, 5600, 7100, 6124, 6550, 6061, 6950, 6650, 7150, 7275, 8700, 10511, 10399, 10345, 11200, 10200, 11035, 12500, 15750, 14445, 10902, 11500, 8527, 6440, 7140, 7190, 6844, 6750, 6924, 6268, 8010, 7575, 8200, 9240) pxTarget-1 # Plotting code # Box settings par(bty=l,ps=7,tcl=0.2,mgp=c(0.5,0.2,0),bg=white); # Make x axis match exactly the data par(xaxs=i,yaxs=r); # Margins par(mar=c(1.2,1.8,1,1)) # Step 1: Generate logs for the data sampleLog-log(sampleData,10) # Step 2: Determine pretty min/max for the data purtee-pretty(c(sampleData,pxTarget)) dataMin-min(sampleData,na.rm=T)*0.95 dataMax-max(purtee) # Step 3: Create exponential series t-exp(seq(log(dataMin),log(dataMax),length=11)) t2-round(t,-1) t2[t2==0]-1 # natural looking labels lab-signif(round(t,-1),2) lab[1]-signif(lab[1],1) ylimits-range(log(t2,10)) plot (sampleLog,type='l',xaxt=n,yaxt=n,ylim=ylimits,las=2,ann=FALSE) axis(2,las=2,at=log(lab,10),labels=lab); On Wed, Sep 16, 2009 at 5:20 PM, David Winsemius dwinsem...@comcast.net wrote: On Sep 16, 2009, at 11:13 AM, Jorgy Porgee wrote: Hi all, I'm trying to log chart but with natural looking tick marks. My specifications are very specific -- it must indicate the lowest number's tick as well as the maximum. I've attached sample code and data for a particular case (and there are a few more like this) where the bottom tickmarks on the chart are not set to where I want them to be and yet they fit in the ylim range. Please, can anyone help? I'm tearing my hair out at this point.. Thanking you in advance, George. # Code below sample-c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 5525, 5600, 7100, 6124, 6550, 6061, 6950, 6650, 7150, 7275, 8700, 10511, 10399, 10345, 11200, 10200, 11035, 12500, 15750, 14445, 10902, 11500, 8527, 6440, 7140, 7190, 6844, 6750, 6924, 6268, 8010, 7575, 8200, 9240) # Plotting code # Box settings par(bty=l,ps=7,tcl=0.2,mgp=c(0.5,0.2,0),bg=white); # Make x axis match exactly the data par(xaxs=i,yaxs=r); # Margins par(mar=c(1.2,1.8,1,1)) # Step 1: Generate logs for the data sampleLog-log(sample,10) # Step 2: Determine pretty min/max for the data purtee-pretty(c(sample,pxTarget)) Error in pretty(c(sample, pxTarget)) : object 'pxTarget' not found ...which then creates a cascade of errors dataMin-min(sample,na.rm=T)*0.95 dataMax-max(purtee) # Step 3: Create exponential series
[R] lattice: How to display no box but only a y-axis on the left + Thicker lines
Hi, I have two somewhat embarassing questions about the lattice-related plot functions: 1.) How do I make lattice (e.g. barchart) to not draw a box but only a y-axis on the left hand side so that the plot looks like barplot with default settings? So that the following two code snippets look more alike: barplot(VADeaths) library(reshape) vad - melt(data.frame(VADeaths, Age=rownames(VADeaths)), id=Age) barchart(value ~ variable, groups=Age, horizontal=FALSE, stack=TRUE, data=vad) 2.) What is the proper way to make lines in xyplots thicker? When I set lwd=2, I get thicker lines but in conjunction with lty=1:n they also look somewhat ugly which makes me wonder if I'm doing something wrong. I guess both questions are newbie questions and the information should be easily available somewhere. Unfortunately I wasn't able to find the answer to these questions myself. Regards, Tom __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] best method to format output of frequency table
Perfect, that works well. Thank you for the suggestion. Henrique Dallazuanna wrote: Try this: as.matrix(table(x)) -- View this message in context: http://www.nabble.com/best-method-to-format-output-of-frequency-table-tp25462448p25472239.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a sequence that wraps around
Henrique Dallazuanna wwwhsd at gmail.com writes: Try rep: rep(2:4, length.out = 3, times = 10) That's close, but it doesn't wrap to start at 1. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a sequence that wraps around
On 2009.09.16. 16:08, Jack Tanner wrote: I'd like to have something like seq() where I can pass in a length of the desired sequence and a right limit so that the sequence goes up to the limit and then starts again from 1. Disclaimer: total R beginner here (started to learn a 1 day ago), who just came to this list to learn. You could use the modulo operator. # works now seq(from=2, length.out=3) [1] 2 3 4 # what I want seq(from=2, length.out=3, rlimit=3) [1] 2 3 1 # additional examples of what I want seq(from=2, length.out=4, rlimit=3) [1] 2 3 1 2 seq(from=1, length.out=4) %% 3 + 1 seq(from=2, length.out=4, rlimit=4) [1] 2 3 4 1 seq(from=1, length.out=4) %% 4 + 1 seq(from=2, length.out=3, rlimit=2) [1] 2 1 2 seq(from=1, length.out=3) %% 2 + 1 (the 'from' needed to be decreased by one, otherwise it'd start from 0) I can write this procedurally, but it seems like there ought to be a cleaner R way of doing it. Thanks in advance for any suggestions. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory in R on windows running 64bit XP
On 9/16/2009 10:54 AM, michael watson (IAH-C) wrote: Hi Duncan Many thanks for the reply. Here are the results of the commands: Rprofmem() Error in Rprofmem() : memory profiling is not available on this system memory.size(NA) [1] 3500 memory.size() [1] 16.19 Clearly Rprofmem() is a dead end, though if there is some way of enabling it, I will be happy to. I forgot that you need to recompile R to enable it. Not impossible, but probably not worth the trouble just for this problem. Using gcinfo(TRUE) may give you enough information: it will report whenever there's a garbage collection. I am only actually using 16.19Mb of memory, as all I do is open R and read in some data (a 2.7Mb text file). I would guess that mclust does a lot of other allocations before it fails. I am not sure how to guard against the memory fragmentation problem. The machine has 16Gb of RAM and I am only running R and Internet Explorer, but I don't know if that helps. Is there a way to deliver contiguous memory to R? R doesn't give you any real control over that. I would guess that R is getting 3.5 Gb in a contiguous chunk based on your memory.size report, but then it is fragmenting it. Is 64bit R available for Windows? Sorry to bang on about Windows, I'm waiting for my IT department to build Linux machines, and this is what I have in the meantime. Revolution Computing sells one. There's no (reliable) 64 bit version of gcc for Windows, so we don't distribute one. Duncan Murdoch Thanks Mick From: Duncan Murdoch [murd...@stats.uwo.ca] Sent: 16 September 2009 15:43 To: michael watson (IAH-C) Cc: r-help@r-project.org Subject: Re: [R] Memory in R on windows running 64bit XP On 9/16/2009 10:16 AM, michael watson (IAH-C) wrote: Hi I'm running 32-bit R on Windows XP 64bit and the machine has 16Gb of RAM. The help for memory.limit states: If 32-bit R is run on some 64-bit versions of Windows the maximum value of obtainable memory is just under 4GB. So, using the help which states the size parameter can go up to 4095: memory.limit(size=4095) When I run mclust I get: Error: cannot allocate vector of size 1.3 Gb So, now I set the max-mem-size flag: --max-mem-size=3500M Repeat and I still get the same error: Error: cannot allocate vector of size 1.3 Gb So, can anyone help? 32 bit R is reported to be able to use up to 4Gb of RAM on 64 bit windows, yet mine croaks at 1.3Gb. You are misreading things in a couple of places. The most important one is the error message: it says allocation of a 1.3 Gb object failed. You may have 4 Gb available in total, but have used so much of it already that Windows won't allocate a 1.3 Gb piece to R. Using Rprofmem() might show how the allocations are happening. Another possibility is that your particular version of Windows doesn't support giving 4 Gb to R. Run memory.size(NA) to find out the limit, memory.size() to find out the current amount in use. And even if the limit is close to 4 Gb and you have more than 1.3 Gb free, the allocation may fail because there is no sufficiently large contiguous block available. Once memory is allocated, that address is in use and is unavailable. Since the 32 bit address space only covers 4 Gb, you can fairly easily end up with allocations that are spread out across it leaving no large blocks available. That's the advantage of switching to 64 bit R: even if you still only had 4 Gb of memory available, the address space is so much bigger that fragmentation probably won't be a problem. Duncan Murdoch sessionInfo() R version 2.9.2 (2009-08-24) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] mclust_3.3.1 Thanks Mick __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a sequence that wraps around
-Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jack Tanner Sent: Wednesday, September 16, 2009 7:08 AM To: r-h...@stat.math.ethz.ch Subject: [R] a sequence that wraps around I'd like to have something like seq() where I can pass in a length of the desired sequence and a right limit so that the sequence goes up to the limit and then starts again from 1. # works now seq(from=2, length.out=3) [1] 2 3 4 # what I want seq(from=2, length.out=3, rlimit=3) [1] 2 3 1 # additional examples of what I want seq(from=2, length.out=4, rlimit=3) [1] 2 3 1 2 seq(from=2, length.out=4, rlimit=4) [1] 2 3 4 1 seq(from=2, length.out=3, rlimit=2) [1] 2 1 2 I can write this procedurally, but it seems like there ought to be a cleaner R way of doing it. Thanks in advance for any suggestions. The remainder (modulus) operator, x%%n, wraps to the range 0..n-1 but you can write a function that wraps to 1..n as mod1 - function(x, n) (x-1L)%%n + 1L Then pass the output of seq through that: mod1(-2:10, 3) [1] 1 2 3 1 2 3 1 2 3 1 2 3 1 Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory in R on windows running 64bit XP
Fair enough, thanks for your time. I will wait for my Linux PCs! -Original Message- From: Duncan Murdoch [mailto:murd...@stats.uwo.ca] Sent: 16 September 2009 16:42 To: michael watson (IAH-C) Cc: r-help@r-project.org Subject: Re: [R] Memory in R on windows running 64bit XP On 9/16/2009 10:54 AM, michael watson (IAH-C) wrote: Hi Duncan Many thanks for the reply. Here are the results of the commands: Rprofmem() Error in Rprofmem() : memory profiling is not available on this system memory.size(NA) [1] 3500 memory.size() [1] 16.19 Clearly Rprofmem() is a dead end, though if there is some way of enabling it, I will be happy to. I forgot that you need to recompile R to enable it. Not impossible, but probably not worth the trouble just for this problem. Using gcinfo(TRUE) may give you enough information: it will report whenever there's a garbage collection. I am only actually using 16.19Mb of memory, as all I do is open R and read in some data (a 2.7Mb text file). I would guess that mclust does a lot of other allocations before it fails. I am not sure how to guard against the memory fragmentation problem. The machine has 16Gb of RAM and I am only running R and Internet Explorer, but I don't know if that helps. Is there a way to deliver contiguous memory to R? R doesn't give you any real control over that. I would guess that R is getting 3.5 Gb in a contiguous chunk based on your memory.size report, but then it is fragmenting it. Is 64bit R available for Windows? Sorry to bang on about Windows, I'm waiting for my IT department to build Linux machines, and this is what I have in the meantime. Revolution Computing sells one. There's no (reliable) 64 bit version of gcc for Windows, so we don't distribute one. Duncan Murdoch Thanks Mick From: Duncan Murdoch [murd...@stats.uwo.ca] Sent: 16 September 2009 15:43 To: michael watson (IAH-C) Cc: r-help@r-project.org Subject: Re: [R] Memory in R on windows running 64bit XP On 9/16/2009 10:16 AM, michael watson (IAH-C) wrote: Hi I'm running 32-bit R on Windows XP 64bit and the machine has 16Gb of RAM. The help for memory.limit states: If 32-bit R is run on some 64-bit versions of Windows the maximum value of obtainable memory is just under 4GB. So, using the help which states the size parameter can go up to 4095: memory.limit(size=4095) When I run mclust I get: Error: cannot allocate vector of size 1.3 Gb So, now I set the max-mem-size flag: --max-mem-size=3500M Repeat and I still get the same error: Error: cannot allocate vector of size 1.3 Gb So, can anyone help? 32 bit R is reported to be able to use up to 4Gb of RAM on 64 bit windows, yet mine croaks at 1.3Gb. You are misreading things in a couple of places. The most important one is the error message: it says allocation of a 1.3 Gb object failed. You may have 4 Gb available in total, but have used so much of it already that Windows won't allocate a 1.3 Gb piece to R. Using Rprofmem() might show how the allocations are happening. Another possibility is that your particular version of Windows doesn't support giving 4 Gb to R. Run memory.size(NA) to find out the limit, memory.size() to find out the current amount in use. And even if the limit is close to 4 Gb and you have more than 1.3 Gb free, the allocation may fail because there is no sufficiently large contiguous block available. Once memory is allocated, that address is in use and is unavailable. Since the 32 bit address space only covers 4 Gb, you can fairly easily end up with allocations that are spread out across it leaving no large blocks available. That's the advantage of switching to 64 bit R: even if you still only had 4 Gb of memory available, the address space is so much bigger that fragmentation probably won't be a problem. Duncan Murdoch sessionInfo() R version 2.9.2 (2009-08-24) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] mclust_3.3.1 Thanks Mick __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a sequence that wraps around
On 9/16/2009 10:08 AM, Jack Tanner wrote: I'd like to have something like seq() where I can pass in a length of the desired sequence and a right limit so that the sequence goes up to the limit and then starts again from 1. # works now seq(from=2, length.out=3) [1] 2 3 4 # what I want seq(from=2, length.out=3, rlimit=3) [1] 2 3 1 # additional examples of what I want seq(from=2, length.out=4, rlimit=3) [1] 2 3 1 2 seq(from=2, length.out=4, rlimit=4) [1] 2 3 4 1 seq(from=2, length.out=3, rlimit=2) [1] 2 1 2 I can write this procedurally, but it seems like there ought to be a cleaner R way of doing it. Thanks in advance for any suggestions. Here's a start. You probably want some sanity checks on the arguments; e.g. from=1 doesn't work because of the old 1:0 infelicity. seq2 - function(from, length.out, rlimit) rep(1:rlimit, length.out=length.out+from-1)[-(1:(from-1))] Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] wilcox.test p-value = 0
Once one gets past the issue of the p value being extremely small, irrespective of the test being used, the OP has asked the question of how to report it. Most communities will have standards for how to report p values, covering things like how many significant digits and a minimum p value threshold to report. For example, in medicine, it is common to report 'small' p values as 'p 0.001' or 'p 0.0001'. Thus, below those numbers, the precision is largely irrelevant and one need not report the actual p value. I just wanted to be sure that we don't lose sight of the forest for the trees... :-) The OP should consult a relevant guidance document or an experienced author in the domain of interest. HTH, Marc Schwartz On Sep 16, 2009, at 9:54 AM, Bryan Keller wrote: That's right, if the test is exact it is not possible to get a p- value of zero. wilcox.test does not provide an exact p-value in the presence of ties so if there are any ties in your data you are getting a normal approximation. Incidentally, if there are any ties in your data set I would strongly recommend computing the *exact* p- value because using the normal approximation on tied data sets will either inflate type I error rate or reduce power depending on how the ties are distributed. Depending on the pattern of ties this can result in gross under or over estimation of the p-value. I guess this is all by way of saying that you should always compute the exact p-value if possible. The package exactRankTests uses the algorithm by Mehta Patel and Tsiatis (1984). If your sample sizes are larger, there is a freely available .exe by Cheung and Klotz (1995) that will do exact p- values for sample sizes larger than 100 in each group! You can find it at http://pages.cs.wisc.edu/~klotz/ Bryan Hi Murat, I am not an expert in either statistics nor R, but I can imagine that since the default is exact=TRUE, It numerically computes the probability, and it may indeed be 0. if you use wilcox.test(x, y, exact=FALSE) it will give you a normal aproximation, which will most likely be different from zero. No, the exact p-value can't be zero for a discrete distribution. The smallest possible value in this case would, I think, be 1/ choose(length(x)+length(y),length(x)), or perhaps twice that. More generally, the approach used by format.pvalue() is to display very small p-values as 2e-16, where 2e-16 is machine epsilon. I wouldn't want to claim optimality for this choice, but it seems a reasonable way to represent very small. -thomas Hope this helps. Keo. Murat Tasan escribi?: hi, folks, how have you gone about reporting a p-value from a test when the returned value from a test (in this case a rank-sum test) is numerically equal to 0 according to the machine? the next lowest value greater than zero that is distinct from zero on the machine is likely algorithm-dependent (the algorithm of the test itself), but without knowing the explicit steps of the algorithm implementation, it is difficult to provide any non-zero value. i initially thought to look at .mach...@double.xmin, but i'm not comfortable with reporting p .mach...@double.xmin, since without knowing the specifics of the implementation, this may not be true! to be clear, if i have data x, and i run the following line, the returned value is TRUE. wilcox.test(x)$p.value == 0 thanks for any help on this! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] expression
/Dear all,/// /I am very thankful, if you could tell what is the right way to write: mtext(paste(expression(R^2),round(marco2[1,i],digits=3), N° of proteins:,marco3[i]),side=4,cex=.6) in this case the output is: R^2 I tried also in this way: mtext(paste(expression(paste(R^2)),round(marco2[1,i],digits=3), N° of proteins:,marco3[i]),side=4,cex=.6) the output is: paste(R^2) Thank you in advance Marco / __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a sequence that wraps around
Szabolcs Horvát szhorvat at gmail.com writes: You could use the modulo operator. # additional examples of what I want seq(from=2, length.out=4, rlimit=3) [1] 2 3 1 2 seq(from=1, length.out=4) %% 3 + 1 Ah, that's so slick. You're off to a great start! Huge thanks to you and everyone else who responded. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] expression
Try this, marco2 = matrix(rnorm(4), nrow=2, ncol=2) marco3 = 2.12 i =1 plot.new() mtext(bquote(R^2*.(round(marco2[1,i],digits=3))* N° of proteins:*.(marco3[i])),side=4,cex=.6) HTH, baptiste 2009/9/16 Marco Chiapello marpe...@gmail.com /Dear all,/// /I am very thankful, if you could tell what is the right way to write: mtext(paste(expression(R^2),round(marco2[1,i],digits=3), N° of proteins:,marco3[i]),side=4,cex=.6) in this case the output is: R^2 I tried also in this way: mtext(paste(expression(paste(R^2)),round(marco2[1,i],digits=3), N° of proteins:,marco3[i]),side=4,cex=.6) the output is: paste(R^2) Thank you in advance Marco / __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- _ Baptiste Auguié School of Physics University of Exeter Stocker Road, Exeter, Devon, EX4 4QL, UK http://newton.ex.ac.uk/research/emag __ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to extract partial predictions, package mgcv
Thanks Simon, I wasn't aware that predict.gam could be used in this situation. I followed you advice and managed to extract the data I needed. For people interested, I add the code with comments I used here: # # Full code for extracting partial predictions # Start the package mgcv library(mgcv) # Simulate some data (this is from Simon Wood's example in the # package mgcv manual) n-200 sig - 2 dat - gamSim(1,n=n,scale=sig) # Check the data dat ## It looks alright :-) # Run the GAM b-gam(y~s(x0)+s(I(x1^2))+s(x2)+offset(x3),data=dat) # Plot the partial predictions (You may need to rescale your window to # see the non-linearity par(mfrow=c(1,3)) plot(b, scale=0) ### Clearly, the variables x0 and x2 are non-linear! # I would like to extract the solid line in the graphs and the # associated standard errors from the plots. Thus, I use predict # and change to a data.frame: c-data.frame(predict(b,type=terms,se.fit=TRUE)$fit) names(c)-c(pp_s.x0.,pp_s.I.x1.2..,pp_s.x2.) d-data.frame(predict(b,type=terms,se.fit=TRUE)$se.fit) names(d)-c(se_s.x0.,se_s.I.x1.2..,se_s.x2.) # Merge the three datasets: f=cbind(dat,c,d) #Restrict the number of variables to the ones I am interested in: f-f[,c(x0,pp_s.x0., se_s.x0.)] names(f) ### This data, i.e. f, can now be exported or used for further calculations within R #plot the data par(mfrow=c(1,1)) plot(pp_s.x0.~x0,type=p,pch=2,cex=0.50) sequence=order(x0) lines(x0[sequence],pp_s.x0.[sequence]) lines rug(jitter(x0)) ## Thanks, Staffan -- Staffan Roos, PhD Research Ecologist BTO Scotland Staffan, Take a look at ?predict.gam. You need to use type=terms with se=TRUE to get the quantities that you need. Simon ps. with binary data, method=REML or method=ML for the gam fit are often somewhat more reliable than the default. On Monday 14 September 2009 19:30, Staffan Roos wrote: Dear package mgcv users, I am using package mgcv to describe presence of a migratory bird species as a function of several variables, including year, day number (i.e. day-of-the-year), duration of survey, latitude and longitude. Thus, the global model is: global_model-gam(present ~ as.factor(year) + s(dayno, k=5) + s(duration, k=5) + s(x, k=5) + s(y, k=5), family = binomial(link = logit), data = presence, gamma =1.4) The model works fine, suggesting that all the variables have strong, non-linear, influences on the probability of detecting the species. My main interest is in the effect dayno has on presence, given the inclusion of the other explanatory variables. Thus, I would like to extract the values of the partial prediction of dayno and its associated 2 standard errors above and below the main effect, as shown by the code plot(global_model). I have tried to extract the values by using fitted.values(global_model,dayno), but when plotted, the figure doesn't look like the partial prediction for dayno. Instead, it gives me a very scattered figure (shotgun effect). Has anyone tried to extract the partial predictions? If so, please could you advise me how to do this? Thanks, Staffan P.S.. For comparison, please have a look at Simon Woods paper in R News, 1(2):20-25, June 2001, especially the figures showing the partial predictions of Mackerel egg densities. Using those graphs as an example, I would like to extract the partial predictions for e.g. s(b.depth), given the inclusion of the other variables. -- Staffan Roos, PhD Research Ecologist BTO Scotland __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK +44 1225 386603 www.maths.bath.ac.uk/~sw283 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/How-to-extract-partial-predictions%2C-package-mgcv-tp25441149p25476107.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to extract partial predictions, package mgcv
It is nice to see worked examples, but there were some errors that relate to not including dataframe references. I've paste in code that runs without error on my machine: On Sep 16, 2009, at 12:39 PM, Staffan Roos wrote: Thanks Simon, I wasn't aware that predict.gam could be used in this situation. I followed you advice and managed to extract the data I needed. For people interested, I add the code with comments I used here: # # Full code for extracting partial predictions # Start the package mgcv library(mgcv) # Simulate some data (this is from Simon Wood's example in the # package mgcv manual) n-200 sig - 2 dat - gamSim(1,n=n,scale=sig) # Check the data dat ## It looks alright :-) # Run the GAM b-gam(y~s(x0)+s(I(x1^2))+s(x2)+offset(x3),data=dat) # Plot the partial predictions (You may need to rescale your window to # see the non-linearity par(mfrow=c(1,3)) plot(b, scale=0) ### Clearly, the variables x0 and x2 are non-linear! # I would like to extract the solid line in the graphs and the # associated standard errors from the plots. Thus, I use predict # and change to a data.frame: c-data.frame(predict(b,type=terms,se.fit=TRUE)$fit) names(c)-c(pp_s.x0.,pp_s.I.x1.2..,pp_s.x2.) d-data.frame(predict(b,type=terms,se.fit=TRUE)$se.fit) names(d)-c(se_s.x0.,se_s.I.x1.2..,se_s.x2.) # Merge the three datasets: f=cbind(dat,c,d) #Restrict the number of variables to the ones I am interested in: f-f[,c(x0,pp_s.x0., se_s.x0.)] names(f) ### This data, i.e. f, can now be exported or used for further ### calculations within R #plot the data par(mfrow=c(1,1)) plot(pp_s.x0.~x0,type=p,pch=2,cex=0.50, data=f) sequence=order(f$x0) lines(f$x0[sequence],f$pp_s.x0.[sequence]) lines rug(jitter(f$x0)) ## Staffan, Take a look at ?predict.gam. You need to use type=terms with se=TRUE to get the quantities that you need. Simon ps. with binary data, method=REML or method=ML for the gam fit are often somewhat more reliable than the default. On Monday 14 September 2009 19:30, Staffan Roos wrote: Dear package mgcv users, I am using package mgcv to describe presence of a migratory bird species as a function of several variables, including year, day number (i.e. day-of-the-year), duration of survey, latitude and longitude. Thus, the global model is: global_model-gam(present ~ as.factor(year) + s(dayno, k=5) + s(duration, k=5) + s(x, k=5) + s(y, k=5), family = binomial(link = logit), data = presence, gamma =1.4) The model works fine, suggesting that all the variables have strong, non-linear, influences on the probability of detecting the species. My main interest is in the effect dayno has on presence, given the inclusion of the other explanatory variables. Thus, I would like to extract the values of the partial prediction of dayno and its associated 2 standard errors above and below the main effect, as shown by the code plot(global_model). I have tried to extract the values by using fitted.values(global_model,dayno), but when plotted, the figure doesn't look like the partial prediction for dayno. Instead, it gives me a very scattered figure (shotgun effect). Has anyone tried to extract the partial predictions? If so, please could you advise me how to do this? Staffan P.S.. For comparison, please have a look at Simon Woods paper in R News, 1(2):20-25, June 2001, especially the figures showing the partial predictions of Mackerel egg densities. Using those graphs as an example, I would like to extract the partial predictions for e.g. s(b.depth), given the inclusion of the other variables. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to extract partial predictions, package mgcv
But removing the extraneous second to last line that says just: lines ... would leave the console looking less puzzling. -- David On Sep 16, 2009, at 1:02 PM, David Winsemius wrote: It is nice to see worked examples, but there were some errors that relate to not including dataframe references. I've paste in code that runs without error on my machine: On Sep 16, 2009, at 12:39 PM, Staffan Roos wrote: Thanks Simon, I wasn't aware that predict.gam could be used in this situation. I followed you advice and managed to extract the data I needed. For people interested, I add the code with comments I used here: # # Full code for extracting partial predictions # Start the package mgcv library(mgcv) # Simulate some data (this is from Simon Wood's example in the # package mgcv manual) n-200 sig - 2 dat - gamSim(1,n=n,scale=sig) # Check the data dat ## It looks alright :-) # Run the GAM b-gam(y~s(x0)+s(I(x1^2))+s(x2)+offset(x3),data=dat) # Plot the partial predictions (You may need to rescale your window to # see the non-linearity par(mfrow=c(1,3)) plot(b, scale=0) ### Clearly, the variables x0 and x2 are non-linear! # I would like to extract the solid line in the graphs and the # associated standard errors from the plots. Thus, I use predict # and change to a data.frame: c-data.frame(predict(b,type=terms,se.fit=TRUE)$fit) names(c)-c(pp_s.x0.,pp_s.I.x1.2..,pp_s.x2.) d-data.frame(predict(b,type=terms,se.fit=TRUE)$se.fit) names(d)-c(se_s.x0.,se_s.I.x1.2..,se_s.x2.) # Merge the three datasets: f=cbind(dat,c,d) #Restrict the number of variables to the ones I am interested in: f-f[,c(x0,pp_s.x0., se_s.x0.)] names(f) ### This data, i.e. f, can now be exported or used for further ### calculations within R #plot the data par(mfrow=c(1,1)) plot(pp_s.x0.~x0,type=p,pch=2,cex=0.50, data=f) sequence=order(f$x0) lines(f$x0[sequence],f$pp_s.x0.[sequence]) rug(jitter(f$x0)) ## Staffan, Take a look at ?predict.gam. You need to use type=terms with se=TRUE to get the quantities that you need. Simon ps. with binary data, method=REML or method=ML for the gam fit are often somewhat more reliable than the default. On Monday 14 September 2009 19:30, Staffan Roos wrote: Dear package mgcv users, I am using package mgcv to describe presence of a migratory bird species as a function of several variables, including year, day number (i.e. day-of-the-year), duration of survey, latitude and longitude. Thus, the global model is: global_model-gam(present ~ as.factor(year) + s(dayno, k=5) + s(duration, k=5) + s(x, k=5) + s(y, k=5), family = binomial(link = logit), data = presence, gamma =1.4) The model works fine, suggesting that all the variables have strong, non-linear, influences on the probability of detecting the species. My main interest is in the effect dayno has on presence, given the inclusion of the other explanatory variables. Thus, I would like to extract the values of the partial prediction of dayno and its associated 2 standard errors above and below the main effect, as shown by the code plot(global_model). I have tried to extract the values by using fitted.values(global_model,dayno), but when plotted, the figure doesn't look like the partial prediction for dayno. Instead, it gives me a very scattered figure (shotgun effect). Has anyone tried to extract the partial predictions? If so, please could you advise me how to do this? Staffan P.S.. For comparison, please have a look at Simon Woods paper in R News, 1(2):20-25, June 2001, especially the figures showing the partial predictions of Mackerel egg densities. Using those graphs as an example, I would like to extract the partial predictions for e.g. s(b.depth), given the inclusion of the other variables. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] comma as decimal separator in xtable
On Wed, Sep 16, 2009 at 09:14:39AM -0300, Henrique Dallazuanna wrote: Try this also; format(coef(summary(lm.D9)), decimal.mark = ',') or using gsub: apply(coef(summary(lm.D9)), 2, gsub, pattern = \\., replacement = ,) using lm.D9 object from ?lm example. Thanks for your suggestion! The problem with the above approach is that all values get the same formatting. For example, x - c(1.2, 1.2e10) format(x, format = 'g', decimal.mark = ',') [1] 1,2e+00 1,2e+10 What I would like to get was: [1] 1,2 1,2e+10 I already solved my problem, but I'm using an external program to replace . with ,. Perhaps I should ask my question again, with a new subject, since what I need now is the perl regular expression equivalent to the sed command: s/\([0-9]\)a/a\1/ That's is, 1a becomes a1; 3b becomes b3, etc. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fourier Transform fft help
I wrote a script that I anticipating seeing a spike at 10Hz with the function 10* sin(2*pi*10*t). I can't figure out why my plots do not show spikes at the frequencies I expect. Am I doing something wrong or is my expectations wrong? require(stats) layout(matrix(c(1,2,3), 3, 1, byrow = TRUE)) #SETUP n- 256 #nextn(1001) gives next power 2 F- 100 #Hz -50 to 50 Hz dt - 1/F T- n*dt df - 1/T t- seq(0,T,by=dt) #also try ts function t-t[1:length(t)-1] freq - 5 #Hz #SIGNAL FUNCTION y - 10* sin(2*pi*10*t) #10*sin(2*pi*freq*t) #FREQ ARRAY #f - seq(-F/2,F/2,by=df) f - F/2*seq(-1,1,length.out=n+1) f-f[1:length(f)-1] #FOURIER WORK Y - fft(y) mag - sqrt(Re(Y)^2+Im(Y)^2) phase - atan(Im(Y)/Re(Y)) Yr- Re(Y) Yi- Im(Y) #PLOT SIGNALS plot(t,y,type=l,xlim=c(0,T)) plot(f,Re(Y),type=l) plot(f,Im(Y),type=l) -- View this message in context: http://www.nabble.com/Fourier-Transform-fft-help-tp25475063p25475063.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] comma as decimal separator in xtable
On Wed, Sep 16, 2009 at 03:29:44PM +0200, David Hajage wrote: But, it seems that dcolumn can change the decimal separator too (see the table on the first page of the pdf document). For example: \documentclass{article} \usepackage{dcolumn} \newcolumntype{d}[1]{D{.}{,}{#1}} \begin{document} results=tex= x - matrix(rnorm(4), 2, 2) library(xtable) xtable(x, align = c(l, |, d{2}, |, c, |)) @ \end{document} It works. Thanks! 2009/9/16 Jakson A. Aquino jaksonaqu...@gmail.com On Wed, Sep 16, 2009 at 01:59:29PM +0200, David Hajage wrote: Perhaps with the dcolumn package ? http://newton.ex.ac.uk/research/qsystems/people/latham/LaTeX/dcolumn.pdf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to extract partial predictions, package mgcv
Hi David, Yes, the strange code lines was clearly a mistake from my side. Sorry. What dataframe references did you add in your code? /Staffan David Winsemius wrote: But removing the extraneous second to last line that says just: lines ... would leave the console looking less puzzling. -- David On Sep 16, 2009, at 1:02 PM, David Winsemius wrote: It is nice to see worked examples, but there were some errors that relate to not including dataframe references. I've paste in code that runs without error on my machine: On Sep 16, 2009, at 12:39 PM, Staffan Roos wrote: Thanks Simon, I wasn't aware that predict.gam could be used in this situation. I followed you advice and managed to extract the data I needed. For people interested, I add the code with comments I used here: # # Full code for extracting partial predictions # Start the package mgcv library(mgcv) # Simulate some data (this is from Simon Wood's example in the # package mgcv manual) n-200 sig - 2 dat - gamSim(1,n=n,scale=sig) # Check the data dat ## It looks alright :-) # Run the GAM b-gam(y~s(x0)+s(I(x1^2))+s(x2)+offset(x3),data=dat) # Plot the partial predictions (You may need to rescale your window to # see the non-linearity par(mfrow=c(1,3)) plot(b, scale=0) ### Clearly, the variables x0 and x2 are non-linear! # I would like to extract the solid line in the graphs and the # associated standard errors from the plots. Thus, I use predict # and change to a data.frame: c-data.frame(predict(b,type=terms,se.fit=TRUE)$fit) names(c)-c(pp_s.x0.,pp_s.I.x1.2..,pp_s.x2.) d-data.frame(predict(b,type=terms,se.fit=TRUE)$se.fit) names(d)-c(se_s.x0.,se_s.I.x1.2..,se_s.x2.) # Merge the three datasets: f=cbind(dat,c,d) #Restrict the number of variables to the ones I am interested in: f-f[,c(x0,pp_s.x0., se_s.x0.)] names(f) ### This data, i.e. f, can now be exported or used for further ### calculations within R #plot the data par(mfrow=c(1,1)) plot(pp_s.x0.~x0,type=p,pch=2,cex=0.50, data=f) sequence=order(f$x0) lines(f$x0[sequence],f$pp_s.x0.[sequence]) rug(jitter(f$x0)) ## Staffan, Take a look at ?predict.gam. You need to use type=terms with se=TRUE to get the quantities that you need. Simon ps. with binary data, method=REML or method=ML for the gam fit are often somewhat more reliable than the default. On Monday 14 September 2009 19:30, Staffan Roos wrote: Dear package mgcv users, I am using package mgcv to describe presence of a migratory bird species as a function of several variables, including year, day number (i.e. day-of-the-year), duration of survey, latitude and longitude. Thus, the global model is: global_model-gam(present ~ as.factor(year) + s(dayno, k=5) + s(duration, k=5) + s(x, k=5) + s(y, k=5), family = binomial(link = logit), data = presence, gamma =1.4) The model works fine, suggesting that all the variables have strong, non-linear, influences on the probability of detecting the species. My main interest is in the effect dayno has on presence, given the inclusion of the other explanatory variables. Thus, I would like to extract the values of the partial prediction of dayno and its associated 2 standard errors above and below the main effect, as shown by the code plot(global_model). I have tried to extract the values by using fitted.values(global_model,dayno), but when plotted, the figure doesn't look like the partial prediction for dayno. Instead, it gives me a very scattered figure (shotgun effect). Has anyone tried to extract the partial predictions? If so, please could you advise me how to do this? Staffan P.S.. For comparison, please have a look at Simon Woods paper in R News, 1(2):20-25, June 2001, especially the figures showing the partial predictions of Mackerel egg densities. Using those graphs as an example, I would like to extract the partial predictions for e.g. s(b.depth), given the inclusion of the other variables. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/How-to-extract-partial-predictions%2C-package-mgcv-tp25441149p25477056.html Sent from the R help mailing list
Re: [R] Fourier Transform fft help
why not try spectrum (power spectrum) spectrum(y, log=no) the major spike is at 0.1 and 1/0.1 = 10Hz On Wed, Sep 16, 2009 at 12:13 PM, delic kran...@optonline.net wrote: I wrote a script that I anticipating seeing a spike at 10Hz with the function 10* sin(2*pi*10*t). I can't figure out why my plots do not show spikes at the frequencies I expect. Am I doing something wrong or is my expectations wrong? require(stats) layout(matrix(c(1,2,3), 3, 1, byrow = TRUE)) #SETUP n - 256 #nextn(1001) gives next power 2 F - 100 #Hz -50 to 50 Hz dt - 1/F T - n*dt df - 1/T t - seq(0,T,by=dt) #also try ts function t-t[1:length(t)-1] freq - 5 #Hz #SIGNAL FUNCTION y - 10* sin(2*pi*10*t) #10*sin(2*pi*freq*t) #FREQ ARRAY #f - seq(-F/2,F/2,by=df) f - F/2*seq(-1,1,length.out=n+1) f-f[1:length(f)-1] #FOURIER WORK Y - fft(y) mag - sqrt(Re(Y)^2+Im(Y)^2) phase - atan(Im(Y)/Re(Y)) Yr - Re(Y) Yi - Im(Y) #PLOT SIGNALS plot(t,y,type=l,xlim=c(0,T)) plot(f,Re(Y),type=l) plot(f,Im(Y),type=l) -- View this message in context: http://www.nabble.com/Fourier-Transform-fft-help-tp25475063p25475063.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Stephen Sefick Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Turnpoints
Good evening to everybody. I have a data of four columns - three of the columns represent date while the fourth is counts. Setting 4000 as my minimum point/value, I wish to find out the number of counts that are less or equal to 4000 and the dates they occur. I have installed pastecs and have been trying to use turnpoints function to do this. I have not been much lucky. I am a still learning. I have added part of my data here where y stands for year, m month, d day and lastly the count. I would be grateful to anyone who could show me the way the do this. Best regards Ogbos y m d count 93 02 07 3974.6 93 02 08 3976.7 93 02 09 3955.2 93 02 10 3955.0 93 02 11 3971.8 93 02 12 3972.8 93 02 13 3961.0 93 02 14 3972.8 93 02 15 4008.0 93 02 16 4004.2 93 02 17 3981.2 93 02 18 3996.8 93 02 19 4028.2 93 02 20 4029.5 93 02 21 3953.4 93 02 22 3857.3 93 02 23 3848.3 93 02 24 3869.8 93 02 25 3898.1 93 02 26 3920.5 93 02 27 3936.7 93 02 28 3931.9 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to extract partial predictions, package mgcv
Your code minus the extraneous lines was: #plot the data plot(pp_s.x0.~x0,type=p,pch=2,cex=0.50) sequence=order(x0) lines(x0[sequence],pp_s.x0.[sequence]) rug(jitter(x0)) My edition was: plot(pp_s.x0.~x0,type=p,pch=2,cex=0.50, data=f) sequence=order(f$x0) lines(f$x0[sequence],f$pp_s.x0.[sequence]) rug(jitter(f$x0)) So almost every one of the last set of plotting needed one or more additions of f$ or data=f in order to run without error. Perhaps you attached f earlier and didn't include that code? I personally have given up using attach() for just that reason. -- David. On Sep 16, 2009, at 1:31 PM, Staffan Roos wrote: Hi David, Yes, the strange code lines was clearly a mistake from my side. Sorry. What dataframe references did you add in your code? /Staffan David Winsemius wrote: But removing the extraneous second to last line that says just: lines ... would leave the console looking less puzzling. -- David On Sep 16, 2009, at 1:02 PM, David Winsemius wrote: It is nice to see worked examples, but there were some errors that relate to not including dataframe references. I've paste in code that runs without error on my machine: On Sep 16, 2009, at 12:39 PM, Staffan Roos wrote: Thanks Simon, I wasn't aware that predict.gam could be used in this situation. I followed you advice and managed to extract the data I needed. For people interested, I add the code with comments I used here: # # Full code for extracting partial predictions # Start the package mgcv library(mgcv) # Simulate some data (this is from Simon Wood's example in the # package mgcv manual) n-200 sig - 2 dat - gamSim(1,n=n,scale=sig) # Check the data dat ## It looks alright :-) # Run the GAM b-gam(y~s(x0)+s(I(x1^2))+s(x2)+offset(x3),data=dat) # Plot the partial predictions (You may need to rescale your window to # see the non-linearity par(mfrow=c(1,3)) plot(b, scale=0) ### Clearly, the variables x0 and x2 are non-linear! # I would like to extract the solid line in the graphs and the # associated standard errors from the plots. Thus, I use predict # and change to a data.frame: c-data.frame(predict(b,type=terms,se.fit=TRUE)$fit) names(c)-c(pp_s.x0.,pp_s.I.x1.2..,pp_s.x2.) d-data.frame(predict(b,type=terms,se.fit=TRUE)$se.fit) names(d)-c(se_s.x0.,se_s.I.x1.2..,se_s.x2.) # Merge the three datasets: f=cbind(dat,c,d) #Restrict the number of variables to the ones I am interested in: f-f[,c(x0,pp_s.x0., se_s.x0.)] names(f) ### This data, i.e. f, can now be exported or used for further ### calculations within R #plot the data par(mfrow=c(1,1)) plot(pp_s.x0.~x0,type=p,pch=2,cex=0.50, data=f) sequence=order(f$x0) lines(f$x0[sequence],f$pp_s.x0.[sequence]) rug(jitter(f$x0)) ## Staffan, Take a look at ?predict.gam. You need to use type=terms with se=TRUE to get the quantities that you need. Simon ps. with binary data, method=REML or method=ML for the gam fit are often somewhat more reliable than the default. On Monday 14 September 2009 19:30, Staffan Roos wrote: Dear package mgcv users, I am using package mgcv to describe presence of a migratory bird species as a function of several variables, including year, day number (i.e. day-of-the-year), duration of survey, latitude and longitude. Thus, the global model is: global_model-gam(present ~ as.factor(year) + s(dayno, k=5) + s(duration, k=5) + s(x, k=5) + s(y, k=5), family = binomial(link = logit), data = presence, gamma =1.4) The model works fine, suggesting that all the variables have strong, non-linear, influences on the probability of detecting the species. My main interest is in the effect dayno has on presence, given the inclusion of the other explanatory variables. Thus, I would like to extract the values of the partial prediction of dayno and its associated 2 standard errors above and below the main effect, as shown by the code plot(global_model). I have tried to extract the values by using fitted.values(global_model,dayno), but when plotted, the figure doesn't look like the partial prediction for dayno. Instead, it gives me a very scattered figure (shotgun effect). Has anyone tried to extract the partial predictions? If so, please could you advise me how to do this? Staffan P.S.. For comparison, please have a look at Simon Woods paper in R News, 1(2):20-25, June 2001, especially the figures showing the partial predictions of Mackerel egg densities. Using those graphs as an example, I would like to extract the partial predictions for e.g. s(b.depth), given the inclusion of the other variables. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David
Re: [R] Fourier Transform fft help
After extensive research I found some information hinting at my freq array being wrong. I thought that the data spanned -F/2 to F/2. It seems that I should only plot the first half of the fft vector? Can anyone confirm or shed some light on this? One other thing I thought the magnitude should be 10 for 10* sin(2*pi*10*t), my magnitudes are much higher. delic wrote: I wrote a script that I anticipating seeing a spike at 10Hz with the function 10* sin(2*pi*10*t). I can't figure out why my plots do not show spikes at the frequencies I expect. Am I doing something wrong or is my expectations wrong? require(stats) layout(matrix(c(1,2,3), 3, 1, byrow = TRUE)) #SETUP n- 256 #nextn(1001) gives next power 2 F- 100 #Hz -50 to 50 Hz dt - 1/F T- n*dt df - 1/T t- seq(0,T,by=dt) #also try ts function t-t[1:length(t)-1] freq - 5 #Hz #SIGNAL FUNCTION y - 10* sin(2*pi*10*t) #10*sin(2*pi*freq*t) #FREQ ARRAY #f - seq(-F/2,F/2,by=df) f - F/2*seq(-1,1,length.out=n+1) f-f[1:length(f)-1] #FOURIER WORK Y - fft(y) mag - sqrt(Re(Y)^2+Im(Y)^2) phase - atan(Im(Y)/Re(Y)) Yr- Re(Y) Yi- Im(Y) #PLOT SIGNALS plot(t,y,type=l,xlim=c(0,T)) plot(f,Re(Y),type=l) plot(f,Im(Y),type=l) -- View this message in context: http://www.nabble.com/Fourier-Transform-fft-help-tp25475063p25477281.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] re-code missing value
Dear all, I have partial data set with four colums. First column is site with three factors (i.e., A, B, and C). From second to fourth columns (v1 ~ v3) are my observations. In the observations of the data set, . indicates missing value. I replaced . with NA. To replace . with NA, I used two steps. First, I replaced . with NA, and then, changed each variable from factor to numeric using df1$v1 - as.numeric(df1$v1). The second step was OK when I have low numbers of variables, however, it is painful when I have a lot of variables. My question is: Is there any much more efficient way to convert this kind of large scale data? In short, I am looking for an alternative way of STEP 2. Or whole procedure if there is. Any comment will be highly appreciated. Thank you in advance!! Steve P.S.: Below is an example of what I did. STEP 1 df1 site v1 v2 v3 1 A 10 5 . 2 A 22 54 . 3 A 44 214 2 4 A 521 14 4 5 A5 73 1 6 A 1654 0.4 4 7 B 16 1 . 8 B. 4 5 9 B. . 4 10B. 4 1 11B 51 . 2 12B5 . . 13C1 0.4 . 14C0 4 . 15C1 1 4 16C 40 . 7 17C4 . 7 18C 10 . 1 str(df1) 'data.frame': 18 obs. of 4 variables: $ site: Factor w/ 3 levels A,B,C: 1 1 1 1 1 1 2 2 2 2 ... $ v1 : Factor w/ 13 levels .,0,1,10,..: 4 7 10 13 11 6 5 1 1 1 ... $ v2 : Factor w/ 9 levels .,0.4,1,..: 7 8 5 4 9 2 3 6 1 6 ... $ v3 : Factor w/ 6 levels .,1,2,4,..: 1 1 3 4 2 4 1 5 4 2 ... df1[df1==.] - NA Warning messages: 1: In `[-.factor`(`*tmp*`, thisvar, value = NA) : invalid factor level, NAs generated 2: In `[-.factor`(`*tmp*`, thisvar, value = NA) : invalid factor level, NAs generated 3: In `[-.factor`(`*tmp*`, thisvar, value = NA) : invalid factor level, NAs generated df1 site v1 v2 v3 1 A 105 NA 2 A 22 54 NA 3 A 44 2142 4 A 521 144 5 A5 731 6 A 1654 0.44 7 B 161 NA 8 B NA45 9 B NA NA4 10B NA41 11B 51 NA2 12B5 NA NA 13C1 0.4 NA 14C04 NA 15C114 16C 40 NA7 17C4 NA7 18C 10 NA1 str(df1) 'data.frame': 18 obs. of 4 variables: $ site: Factor w/ 3 levels A,B,C: 1 1 1 1 1 1 2 2 2 2 ... $ v1 : Factor w/ 13 levels .,0,1,10,..: 4 7 10 13 11 6 5 NA NA NA ... $ v2 : Factor w/ 9 levels .,0.4,1,..: 7 8 5 4 9 2 3 6 NA 6 ... $ v3 : Factor w/ 6 levels .,1,2,4,..: NA NA 3 4 2 4 NA 5 4 2 ... STEP 2. df1$v1 - as.numeric(df1$v1) df1$v2 - as.numeric(df1$v2) df1$v3 - as.numeric(df1$v3) df1 site v1 v2 v3 1 A 4 7 NA 2 A 7 8 NA 3 A 10 5 3 4 A 13 4 4 5 A 11 9 2 6 A 6 2 4 7 B 5 3 NA 8 B NA 6 5 9 B NA NA 4 10B NA 6 2 11B 12 NA 3 12B 11 NA NA 13C 3 2 NA 14C 2 6 NA 15C 3 3 4 16C 9 NA 6 17C 8 NA 6 18C 4 NA 2 str(df1) 'data.frame': 18 obs. of 4 variables: $ site: Factor w/ 3 levels A,B,C: 1 1 1 1 1 1 2 2 2 2 ... $ v1 : num 4 7 10 13 11 6 5 NA NA NA ... $ v2 : num 7 8 5 4 9 2 3 6 NA 6 ... $ v3 : num NA NA 3 4 2 4 NA 5 4 2 ... [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to extract partial predictions, package mgcv
Yes, you're correct David, I used attach(f) earlier. Thanks for clarifying! I should change my own code accordingly. /Staffan David Winsemius wrote: Your code minus the extraneous lines was: #plot the data plot(pp_s.x0.~x0,type=p,pch=2,cex=0.50) sequence=order(x0) lines(x0[sequence],pp_s.x0.[sequence]) rug(jitter(x0)) My edition was: plot(pp_s.x0.~x0,type=p,pch=2,cex=0.50, data=f) sequence=order(f$x0) lines(f$x0[sequence],f$pp_s.x0.[sequence]) rug(jitter(f$x0)) So almost every one of the last set of plotting needed one or more additions of f$ or data=f in order to run without error. Perhaps you attached f earlier and didn't include that code? I personally have given up using attach() for just that reason. -- David. On Sep 16, 2009, at 1:31 PM, Staffan Roos wrote: Hi David, Yes, the strange code lines was clearly a mistake from my side. Sorry. What dataframe references did you add in your code? /Staffan David Winsemius wrote: But removing the extraneous second to last line that says just: lines ... would leave the console looking less puzzling. -- David On Sep 16, 2009, at 1:02 PM, David Winsemius wrote: It is nice to see worked examples, but there were some errors that relate to not including dataframe references. I've paste in code that runs without error on my machine: On Sep 16, 2009, at 12:39 PM, Staffan Roos wrote: Thanks Simon, I wasn't aware that predict.gam could be used in this situation. I followed you advice and managed to extract the data I needed. For people interested, I add the code with comments I used here: # # Full code for extracting partial predictions # Start the package mgcv library(mgcv) # Simulate some data (this is from Simon Wood's example in the # package mgcv manual) n-200 sig - 2 dat - gamSim(1,n=n,scale=sig) # Check the data dat ## It looks alright :-) # Run the GAM b-gam(y~s(x0)+s(I(x1^2))+s(x2)+offset(x3),data=dat) # Plot the partial predictions (You may need to rescale your window to # see the non-linearity par(mfrow=c(1,3)) plot(b, scale=0) ### Clearly, the variables x0 and x2 are non-linear! # I would like to extract the solid line in the graphs and the # associated standard errors from the plots. Thus, I use predict # and change to a data.frame: c-data.frame(predict(b,type=terms,se.fit=TRUE)$fit) names(c)-c(pp_s.x0.,pp_s.I.x1.2..,pp_s.x2.) d-data.frame(predict(b,type=terms,se.fit=TRUE)$se.fit) names(d)-c(se_s.x0.,se_s.I.x1.2..,se_s.x2.) # Merge the three datasets: f=cbind(dat,c,d) #Restrict the number of variables to the ones I am interested in: f-f[,c(x0,pp_s.x0., se_s.x0.)] names(f) ### This data, i.e. f, can now be exported or used for further ### calculations within R #plot the data par(mfrow=c(1,1)) plot(pp_s.x0.~x0,type=p,pch=2,cex=0.50, data=f) sequence=order(f$x0) lines(f$x0[sequence],f$pp_s.x0.[sequence]) rug(jitter(f$x0)) ## Staffan, Take a look at ?predict.gam. You need to use type=terms with se=TRUE to get the quantities that you need. Simon ps. with binary data, method=REML or method=ML for the gam fit are often somewhat more reliable than the default. On Monday 14 September 2009 19:30, Staffan Roos wrote: Dear package mgcv users, I am using package mgcv to describe presence of a migratory bird species as a function of several variables, including year, day number (i.e. day-of-the-year), duration of survey, latitude and longitude. Thus, the global model is: global_model-gam(present ~ as.factor(year) + s(dayno, k=5) + s(duration, k=5) + s(x, k=5) + s(y, k=5), family = binomial(link = logit), data = presence, gamma =1.4) The model works fine, suggesting that all the variables have strong, non-linear, influences on the probability of detecting the species. My main interest is in the effect dayno has on presence, given the inclusion of the other explanatory variables. Thus, I would like to extract the values of the partial prediction of dayno and its associated 2 standard errors above and below the main effect, as shown by the code plot(global_model). I have tried to extract the values by using fitted.values(global_model,dayno), but when plotted, the figure doesn't look like the partial prediction for dayno. Instead, it gives me a very scattered figure (shotgun effect). Has anyone tried to extract the partial predictions? If so, please could you advise me how to do this? Staffan P.S.. For comparison, please have a look at Simon Woods paper in R News, 1(2):20-25, June 2001, especially the figures showing the partial predictions of Mackerel egg densities. Using those graphs as an example, I would like to extract the partial predictions for e.g. s(b.depth), given the inclusion of the other variables. David Winsemius, MD Heritage Laboratories West
Re: [R] Turnpoints
Hi, On Sep 16, 2009, at 1:42 PM, ogbos okike wrote: Good evening to everybody. I have a data of four columns - three of the columns represent date while the fourth is counts. Setting 4000 as my minimum point/value, I wish to find out the number of counts that are less or equal to 4000 and the dates they occur. I have installed pastecs and have been trying to use turnpoints function to do this. I have not been much lucky. I am a still learning. I have added part of my data here where y stands for year, m month, d day and lastly the count. I would be grateful to anyone who could show me the way the do this. Best regards Ogbos y m d count 93 02 07 3974.6 93 02 08 3976.7 93 02 09 3955.2 93 02 10 3955.0 93 02 11 3971.8 93 02 12 3972.8 93 02 13 3961.0 93 02 14 3972.8 93 02 15 4008.0 93 02 16 4004.2 93 02 17 3981.2 93 02 18 3996.8 93 02 19 4028.2 93 02 20 4029.5 93 02 21 3953.4 93 02 22 3857.3 93 02 23 3848.3 93 02 24 3869.8 93 02 25 3898.1 93 02 26 3920.5 93 02 27 3936.7 93 02 28 3931.9 Assume your data is stored in a variable called df Look at what this gives you: R less4000 - df$count 4000 1. You can sum that vector to tell you how many rows have count less than 4000 2. You can use it to select out of the vector. R df[less4000,] You can also use the subset function R subset(df, count 4000) -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] re-code missing value
Hi Steve, Here is a suggestion using your original df1: # Create a copy -- you can avoid this newdf1 - df1 # Process newdf1[,2:4] - apply(newdf1[,2:4], 2, function(x) as.numeric(x)) # Removing df1 rm(df1) # Result newdf1 # str() str(newdf1) # 'data.frame': 18 obs. of 4 variables: # $ site: Factor w/ 3 levels A,B,C: 1 1 1 1 1 1 2 2 2 2 ... # $ v1 : num 10 22 44 521 5 ... # $ v2 : num 5 54 214 14 73 0.4 1 4 NA 4 ... # $ v3 : num NA NA 2 4 1 4 NA 5 4 1 ... HTH, Jorge On Wed, Sep 16, 2009 at 1:50 PM, Steve Hong wrote: Dear all, I have partial data set with four colums. First column is site with three factors (i.e., A, B, and C). From second to fourth columns (v1 ~ v3) are my observations. In the observations of the data set, . indicates missing value. I replaced . with NA. To replace . with NA, I used two steps. First, I replaced . with NA, and then, changed each variable from factor to numeric using df1$v1 - as.numeric(df1$v1). The second step was OK when I have low numbers of variables, however, it is painful when I have a lot of variables. My question is: Is there any much more efficient way to convert this kind of large scale data? In short, I am looking for an alternative way of STEP 2. Or whole procedure if there is. Any comment will be highly appreciated. Thank you in advance!! Steve P.S.: Below is an example of what I did. STEP 1 df1 site v1 v2 v3 1 A 10 5 . 2 A 22 54 . 3 A 44 214 2 4 A 521 14 4 5 A5 73 1 6 A 1654 0.4 4 7 B 16 1 . 8 B. 4 5 9 B. . 4 10B. 4 1 11B 51 . 2 12B5 . . 13C1 0.4 . 14C0 4 . 15C1 1 4 16C 40 . 7 17C4 . 7 18C 10 . 1 str(df1) 'data.frame': 18 obs. of 4 variables: $ site: Factor w/ 3 levels A,B,C: 1 1 1 1 1 1 2 2 2 2 ... $ v1 : Factor w/ 13 levels .,0,1,10,..: 4 7 10 13 11 6 5 1 1 1 ... $ v2 : Factor w/ 9 levels .,0.4,1,..: 7 8 5 4 9 2 3 6 1 6 ... $ v3 : Factor w/ 6 levels .,1,2,4,..: 1 1 3 4 2 4 1 5 4 2 ... df1[df1==.] - NA Warning messages: 1: In `[-.factor`(`*tmp*`, thisvar, value = NA) : invalid factor level, NAs generated 2: In `[-.factor`(`*tmp*`, thisvar, value = NA) : invalid factor level, NAs generated 3: In `[-.factor`(`*tmp*`, thisvar, value = NA) : invalid factor level, NAs generated df1 site v1 v2 v3 1 A 105 NA 2 A 22 54 NA 3 A 44 2142 4 A 521 144 5 A5 731 6 A 1654 0.44 7 B 161 NA 8 B NA45 9 B NA NA4 10B NA41 11B 51 NA2 12B5 NA NA 13C1 0.4 NA 14C04 NA 15C114 16C 40 NA7 17C4 NA7 18C 10 NA1 str(df1) 'data.frame': 18 obs. of 4 variables: $ site: Factor w/ 3 levels A,B,C: 1 1 1 1 1 1 2 2 2 2 ... $ v1 : Factor w/ 13 levels .,0,1,10,..: 4 7 10 13 11 6 5 NA NA NA ... $ v2 : Factor w/ 9 levels .,0.4,1,..: 7 8 5 4 9 2 3 6 NA 6 ... $ v3 : Factor w/ 6 levels .,1,2,4,..: NA NA 3 4 2 4 NA 5 4 2 ... STEP 2. df1$v1 - as.numeric(df1$v1) df1$v2 - as.numeric(df1$v2) df1$v3 - as.numeric(df1$v3) df1 site v1 v2 v3 1 A 4 7 NA 2 A 7 8 NA 3 A 10 5 3 4 A 13 4 4 5 A 11 9 2 6 A 6 2 4 7 B 5 3 NA 8 B NA 6 5 9 B NA NA 4 10B NA 6 2 11B 12 NA 3 12B 11 NA NA 13C 3 2 NA 14C 2 6 NA 15C 3 3 4 16C 9 NA 6 17C 8 NA 6 18C 4 NA 2 str(df1) 'data.frame': 18 obs. of 4 variables: $ site: Factor w/ 3 levels A,B,C: 1 1 1 1 1 1 2 2 2 2 ... $ v1 : num 4 7 10 13 11 6 5 NA NA NA ... $ v2 : num 7 8 5 4 9 2 3 6 NA 6 ... $ v3 : num NA NA 3 4 2 4 NA 5 4 2 ... [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Getting best three dimentions of data after dimentionality reduction (using kpca)
Hi I am using kernlab package and kpca function for dimensionality reduction of my data. Can any body tell me that how can I get best three dimensions from output of kpca function, which are principal component vector, eignvalues, rotated vector, and original matrix. I want to get back my original matrix with reduced dimensions (best three dimensions). Thanks Abbas [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Example data for analysis of a highly pseudoreplicate mixed-effects experiment
There were some wrong NA values in the provided data set, this is now corrected. The data can be read in as small=read.csv(small.csv,colClasses=c(character,rep(integer,2),rep(factor,5))) The high number of residual df can be seen using the nlme package (can it be seen in the lme4 package, too ?): library(nlme) model1=lme(APC~(FITC+species+gene)^3,random=~1|day_repl,method=ML,data=small) anova(model1) numDF denDF F-value p-value (Intercept) 1 789 1365.7400 .0001 FITC 1 789 3040.8168 .0001 species 1 789 24.0521 .0001 gene 1 789 10.4035 0.0013 FITC:species 1 7895.0690 0.0246 FITC:gene 1 789 10.7925 0.0011 species:gene 1 789 72.5823 .0001 FITC:species:gene 1 7894.6742 0.0309 In the original data set, denDF is 200 000. Once again, how do I correctly account for pseudoreplication, avoiding the artificially high number of df ? Thank you, Matthias Matthias Gralle wrote: Hello everybody, it may be better to have sample data. I have provided data with less levels of gene and day and only ca. 400 data points per condition. Sample code: small=as.data.frame(read.csv(small.csv)) small$species=factor(small$species) small$gene=factor(small$gene) small$day=factor(small$day) small$repl=factor(small$repl) library(lme4) model1=lmer(APC~(FITC+species+gene)^3+(1|day)+(1|repl),REML=F,data=small) model2=lmer(APC~(FITC+species+gene)^2+(1|day)+(1|repl),REML=F,data=small) anova(model1,model2) gives me a significant difference, but in fact there are far too many residual df, and this is much worse in the original data. I suppose I cannot trust this difference. model3=lmer(APC~species*gene+(1|day/repl/FITC),data=small) Error: length(f1) == length(f2) is not TRUE In addition: Warning messages: 1: In FITC:(repl:day) : numerical expression has 886 elements: only the first used 2: In FITC:(repl:day) : numerical expression has 886 elements: only the first used Can I do nesting without incurring this kind of error ? And finally model4=lmer(APC~gene*species+(1|day) + (1|repl) + (1+(gene:species)|FITC),data=small) model5=lmer(APC~gene+species+(1|day) + (1|repl) + (1|FITC),data=small) anova(model4,model5) works with this reduced data set, but fails to converge for the original, much larger data set. I am unsure if this is the right kind of analysis for the data and there is only a problem of convergence, or if it is the wrong formula. Can anybody give me some advice on this problem ? Or should I just stick to reducing the data from each condition to a single parameter, as explained in my first email below? Thank you again! Matthias Gralle wrote: Hello everybody, I have been trying for some weeks to state the correct design of my experiment as a GLM formula, and have not been able to find something appropriate in Pinheiro Bates, so I am posting it here and hope somebody can help me. In each experimental condition, described by 1) gene (10 levels, fixed, because of high interest to me) 2) species (2 levels, fixed, because of high interest) 3) day (2 levels, random) 4) replicate (2 levels per day, random), I have several thousand data points consisting of two variables: 5) FITC (level of transfection of a cell) 6) APC (antibody binding to the cell) Because of intrinsic and uncontrollable cell-to-cell variation, FITC varies quite uniformly over a wide range, and APC correlates rather well with FITC. In some cases, I pasted day and replicate together as day_repl. My question is the following: Is there any gene (in my set of 10 genes) where the species makes a difference in the relation between FITC and APC ? If yes, in what gene does species have an effect ? And what is the effect of the species difference ? My attempts are the following: 1. Fit the data points of each experimental condition to a linear equation APC=Intercept+Slope*FITC and analyse the slopes : lm(Slope~species*gene*day_repl) This analysis shows clear differences between the genes, but no effect of species and no interaction gene:species. The linear fit to the cells is reasonably good, but of course does not represent the data set completely, so I wanted to incorporate the complete data set. 2a. lmer(APC~FITC*species*gene+(1|day)+(1|repl)) This gives extremely significant values for any interaction and variable because there are 200 000 df. Of course, it cannot be true, because the cells are not really independent. I have done many variations of the above, e.g. 2b. lmer(APC~FITC*species*gene+(1|day)+(1+FITC|day_repl)), but they all suffer from the excess of df. 3. lmer(APC~species*gene+(1|day/repl/FITC) gives several warning messages like this one: In repl:day : numerical expression has 275591 elements: only the first used 4. lmer(APC~gene*species+(1|day_repl) + (1+gene:species|FITC)) ran several days, but failed to converge... Can somebody give me
Re: [R] comma as decimal separator in xtable
Try this: sapply(x, format, decimal.mark = ',') On Wed, Sep 16, 2009 at 2:06 PM, Jakson A. Aquino jaksonaqu...@gmail.com wrote: On Wed, Sep 16, 2009 at 09:14:39AM -0300, Henrique Dallazuanna wrote: Try this also; format(coef(summary(lm.D9)), decimal.mark = ',') or using gsub: apply(coef(summary(lm.D9)), 2, gsub, pattern = \\., replacement = ,) using lm.D9 object from ?lm example. Thanks for your suggestion! The problem with the above approach is that all values get the same formatting. For example, x - c(1.2, 1.2e10) format(x, format = 'g', decimal.mark = ',') [1] 1,2e+00 1,2e+10 What I would like to get was: [1] 1,2 1,2e+10 I already solved my problem, but I'm using an external program to replace . with ,. Perhaps I should ask my question again, with a new subject, since what I need now is the perl regular expression equivalent to the sed command: s/\([0-9]\)a/a\1/ That's is, 1a becomes a1; 3b becomes b3, etc. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Turnpoints
On Sep 16, 2009, at 1:42 PM, ogbos okike wrote: Good evening to everybody. I have a data of four columns - three of the columns represent date while the fourth is counts. Setting 4000 as my minimum point/value, I wish to find out the number of counts that are less or equal to 4000 You already have two good worked solutions from Steve Lianoglou for the second part. I find table(logical expression) to be a useful construct. ct - read.table(textConnection(y m d count + 93 02 07 3974.6 + 93 02 08 3976.7 + 93 02 09 3955.2 + 93 02 10 3955.0 + 93 02 11 3971.8 + 93 02 12 3972.8 + 93 02 13 3961.0 + 93 02 14 3972.8 + 93 02 15 4008.0 + 93 02 16 4004.2 + 93 02 17 3981.2 + 93 02 18 3996.8 + 93 02 19 4028.2 + 93 02 20 4029.5 + 93 02 21 3953.4 + 93 02 22 3857.3 + 93 02 23 3848.3 + 93 02 24 3869.8 + 93 02 25 3898.1 + 93 02 26 3920.5 + 93 02 27 3936.7 + 93 02 28 3931.9), header=T) table(ct$count = 4000) FALSE TRUE 418 and the dates they occur. I have installed pastecs and have been trying to use turnpoints function to do this. I have not been much lucky. I am a still learning. I have added part of my data here where y stands for year, m month, d day and lastly the count. I would be grateful to anyone who could show me the way the do this. Best regards Ogbos -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] comma as decimal separator in xtable
On Wed, Sep 16, 2009 at 03:16:12PM -0300, Henrique Dallazuanna wrote: Try this: sapply(x, format, decimal.mark = ',') Yes, this works as I need, regarding the replacement of dots with commas. Thanks! On Wed, Sep 16, 2009 at 2:06 PM, Jakson A. Aquino jaksonaqu...@gmail.com wrote: On Wed, Sep 16, 2009 at 09:14:39AM -0300, Henrique Dallazuanna wrote: Try this also; format(coef(summary(lm.D9)), decimal.mark = ',') or using gsub: apply(coef(summary(lm.D9)), 2, gsub, pattern = \\., replacement = ,) using lm.D9 object from ?lm example. Thanks for your suggestion! The problem with the above approach is that all values get the same formatting. For example, x - c(1.2, 1.2e10) format(x, format = 'g', decimal.mark = ',') [1] 1,2e+00 1,2e+10 What I would like to get was: [1] 1,2 1,2e+10 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] T-test to check equality, unable to interpret the results.
Hi, I have the precision values of a system on two different data sets. The snippets of these results are as shown: sample1: (total 194 samples) 0.600238 0.800119 0.600238 0.200030 0.600238 ... ... sample2: (total 188 samples) 0.8001 0.2000 0.8001 0. 0.8001 0.4001 ... ... I want to check if these results are statistically significant? Intuitively, the similarity in the two results mean the results are statistically significant. I am using the t-test t.test(sample1,sample2)to check for similarity amongst the two results. I get the following output: --- Welch Two Sample t-test data: s1p5 and s2p5 t = 0.9778, df = 374.904, p-value = 0.3288 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.03170059 0.09441172 sample estimates: mean of x mean of y 0.5138298 0.4824742 I believe the t-test checks for difference amongst the two sets, and p-value 0.05 means both thesets are statistically different. Here while checking for dissimilarity the p-value is 0.3288, does it mean that higher the p-value (while t.test checks for dis-similarity) means more similar the results are (which is the case above as the means of the results are very close!) Please help me interpret the results.. thanks in advance! -- Rob Hall Masters Student ANU [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] T-test to check equality, unable to interpret the results.
Robert, We unfortunately do not have enough information to help you interpret the results, and this is not really an R question at all, but general statistical advice. You will probably have much better understanding and confidence in your results by consulting a local statistical consultant at your university. The values shown for sample 1 and sample 2 lead me to believe that these are not drawn from a homogeneous population. Whoever ends up helping you is going to need to know how these measurements were obtained in much greater detail than you've given here. Best Regards, Erik Iverson -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Robert Hall Sent: Wednesday, September 16, 2009 1:55 PM To: r-help Subject: [R] T-test to check equality, unable to interpret the results. Hi, I have the precision values of a system on two different data sets. The snippets of these results are as shown: sample1: (total 194 samples) 0.600238 0.800119 0.600238 0.200030 0.600238 ... ... sample2: (total 188 samples) 0.8001 0.2000 0.8001 0. 0.8001 0.4001 ... ... I want to check if these results are statistically significant? Intuitively, the similarity in the two results mean the results are statistically significant. I am using the t-test t.test(sample1,sample2)to check for similarity amongst the two results. I get the following output: --- Welch Two Sample t-test data: s1p5 and s2p5 t = 0.9778, df = 374.904, p-value = 0.3288 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.03170059 0.09441172 sample estimates: mean of x mean of y 0.5138298 0.4824742 I believe the t-test checks for difference amongst the two sets, and p-value 0.05 means both thesets are statistically different. Here while checking for dissimilarity the p-value is 0.3288, does it mean that higher the p-value (while t.test checks for dis-similarity) means more similar the results are (which is the case above as the means of the results are very close!) Please help me interpret the results.. thanks in advance! -- Rob Hall Masters Student ANU [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] T-test to check equality, unable to interpret the results.
Hi, I was just going to send this when I saw Erik's post. He's right -- we can't say anything about your data, but we can say something about using a t-test. I'm not a real statistician, so this answer isn't very rigorous, but might be helpful. On Sep 16, 2009, at 2:55 PM, Robert Hall wrote: I believe the t-test checks for difference amongst the two sets, and p-value 0.05 means both thesets are statistically different. A t-test is used to check if the difference in the mean of two samples is statistically significant. The null hypothesis is that the two means are not different. If you reject the null, it means you have reason to believe that the means of the two samples are different. See the uses section here: http://en.wikipedia.org/wiki/Student's_t-test Here while checking for dissimilarity the p-value is 0.3288, does it mean that higher the p-value (while t.test checks for dis-similarity) means more similar the results are (which is the case above as the means of the results are very close!) Please help me interpret the results.. Your intuition is essentially correct. In general, the higher the p- value (in any statistical test), the less confident you should be that rejecting the null hypothesis is a good idea. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] apply function across two variables by mult factors
Greetings, I am attempting to run a function, which produces a vector and requires two input variables, across two nested factor levels. I can do this using by(X, list(factor1, factor2), function), however I haven't found a simple way to extract the list output into an organized vector form. I can do this using nested loops but it isn't exactly an optimal approach. Thank you for any and all suggestions. Jon # example data frame testDF-data.frame( x=rnorm(12), y=rnorm(12), f1=gl(3,4), f2=gl(2,2,12)) # example function [trivial] testFun-function(x){ X-abs(x[,1]); Y-abs(x[,2]); as.numeric( paste(round(X), round(Y), sep='.')) } # apply by factor levels but hard to extract values by(testDF[,1:2], list(testDF$f1, testDF$f2), testFun) # Loop works, but not efficient for large datasets testDF$value-NA for(i in levels(testDF$f1)){ for(j in levels(testDF$f2)){ testDF[testDF$f1==i testDF$f2==j,]$value-testFun(testDF[testDF $f1==i testDF$f2==j,1:2]) } } testDF sessionInfo() #R version 2.9.1 Patched (2009-08-07 r49093) #i386-apple-darwin8.11.1 # #locale: #en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 # #attached base packages: #[1] stats graphics grDevices utils datasets methods base Jon Loehrke Graduate Research Assistant Department of Fisheries Oceanography School for Marine Science and Technology University of Massachusetts 200 Mill Road, Suite 325 Fairhaven, MA 02719 jloeh...@umassd.edu T 508-910-6393 F 508-910-6396 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] apply function across two variables by mult factors
Hello, -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jon Loehrke Sent: Wednesday, September 16, 2009 2:23 PM To: r-help@r-project.org Subject: [R] apply function across two variables by mult factors Greetings, I am attempting to run a function, which produces a vector and requires two input variables, across two nested factor levels. I can do this using by(X, list(factor1, factor2), function), however I haven't found a simple way to extract the list output into an organized vector form. I can do this using nested loops but it isn't exactly an optimal approach. Thank you for any and all suggestions. Jon # example data frame testDF-data.frame( x=rnorm(12), y=rnorm(12), f1=gl(3,4), f2=gl(2,2,12)) Try this using lapply, split, mapply? Maybe it is in a nicer output object for you? testFun2 - function(x, y) { X - abs(x); Y - abs(y); as.numeric(paste(round(X), round(Y), sep='.')) } lapply(split(testDF, list(testDF$f1, testDF$f2)), function(x) mapply(testFun2, x[1], x[2])) # example function [trivial] testFun-function(x){ X-abs(x[,1]); Y-abs(x[,2]); as.numeric( paste(round(X), round(Y), sep='.')) } # apply by factor levels but hard to extract values by(testDF[,1:2], list(testDF$f1, testDF$f2), testFun) # Loop works, but not efficient for large datasets testDF$value-NA for(i in levels(testDF$f1)){ for(j in levels(testDF$f2)){ testDF[testDF$f1==i testDF$f2==j,]$value- testFun(testDF[testDF $f1==i testDF$f2==j,1:2]) } } testDF sessionInfo() #R version 2.9.1 Patched (2009-08-07 r49093) #i386-apple-darwin8.11.1 # #locale: #en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 # #attached base packages: #[1] stats graphics grDevices utils datasets methods base Jon Loehrke Graduate Research Assistant Department of Fisheries Oceanography School for Marine Science and Technology University of Massachusetts 200 Mill Road, Suite 325 Fairhaven, MA 02719 jloeh...@umassd.edu T 508-910-6393 F 508-910-6396 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] apply function across two variables by mult factors
One correction below, ---snip--- # example data frame testDF-data.frame( x=rnorm(12), y=rnorm(12), f1=gl(3,4), f2=gl(2,2,12)) Try this using lapply, split, mapply? Maybe it is in a nicer output object for you? testFun2 - function(x, y) { X - abs(x); Y - abs(y); as.numeric(paste(round(X), round(Y), sep='.')) } lapply(split(testDF, list(testDF$f1, testDF$f2)), function(x) mapply(testFun2, x[1], x[2])) Or use list indexing in the mapply call to get a vector, in this case at least... lapply(split(testDF, list(testDF$f1, testDF$f2)), function(x) mapply(testFun2, x[[1]], x[[2]])) ---snip--- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] apply function across two variables by mult factors
Hi Jon, Here is a suggestion: foo - function(x) as.numeric( paste( abs( round( x ) ), collapse = . ) ) testDF$value - apply( testDF[,1:2], 1, foo ) testDF HTH, Jorge On Wed, Sep 16, 2009 at 3:22 PM, Jon Loehrke jloeh...@umassd.edu wrote: Greetings, I am attempting to run a function, which produces a vector and requires two input variables, across two nested factor levels. I can do this using by(X, list(factor1, factor2), function), however I haven't found a simple way to extract the list output into an organized vector form. I can do this using nested loops but it isn't exactly an optimal approach. Thank you for any and all suggestions. Jon # example data frame testDF-data.frame( x=rnorm(12), y=rnorm(12), f1=gl(3,4), f2=gl(2,2,12)) # example function [trivial] testFun-function(x){ X-abs(x[,1]); Y-abs(x[,2]); as.numeric( paste(round(X), round(Y), sep='.')) } # apply by factor levels but hard to extract values by(testDF[,1:2], list(testDF$f1, testDF$f2), testFun) # Loop works, but not efficient for large datasets testDF$value-NA for(i in levels(testDF$f1)){ for(j in levels(testDF$f2)){ testDF[testDF$f1==i testDF$f2==j,]$value-testFun(testDF[testDF $f1==i testDF$f2==j,1:2]) } } testDF sessionInfo() #R version 2.9.1 Patched (2009-08-07 r49093) #i386-apple-darwin8.11.1 # #locale: #en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 # #attached base packages: #[1] stats graphics grDevices utils datasets methods base Jon Loehrke Graduate Research Assistant Department of Fisheries Oceanography School for Marine Science and Technology University of Massachusetts 200 Mill Road, Suite 325 Fairhaven, MA 02719 jloeh...@umassd.edu T 508-910-6393 F 508-910-6396 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] apply function across two variables by mult factors
Try this: transform(testDF, value = as.numeric(paste(round(abs(x)), round(abs(y)), sep = .))) On Wed, Sep 16, 2009 at 3:22 PM, Jon Loehrke jloeh...@umassd.edu wrote: Greetings, I am attempting to run a function, which produces a vector and requires two input variables, across two nested factor levels. I can do this using by(X, list(factor1, factor2), function), however I haven't found a simple way to extract the list output into an organized vector form. I can do this using nested loops but it isn't exactly an optimal approach. Thank you for any and all suggestions. Jon # example data frame testDF-data.frame( x=rnorm(12), y=rnorm(12), f1=gl(3,4), f2=gl(2,2,12)) # example function [trivial] testFun-function(x){ X-abs(x[,1]); Y-abs(x[,2]); as.numeric( paste(round(X), round(Y), sep='.')) } # apply by factor levels but hard to extract values by(testDF[,1:2], list(testDF$f1, testDF$f2), testFun) # Loop works, but not efficient for large datasets testDF$value-NA for(i in levels(testDF$f1)){ for(j in levels(testDF$f2)){ testDF[testDF$f1==i testDF$f2==j,]$value-testFun(testDF[testDF $f1==i testDF$f2==j,1:2]) } } testDF sessionInfo() #R version 2.9.1 Patched (2009-08-07 r49093) #i386-apple-darwin8.11.1 # #locale: #en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 # #attached base packages: #[1] stats graphics grDevices utils datasets methods base Jon Loehrke Graduate Research Assistant Department of Fisheries Oceanography School for Marine Science and Technology University of Massachusetts 200 Mill Road, Suite 325 Fairhaven, MA 02719 jloeh...@umassd.edu T 508-910-6393 F 508-910-6396 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] T-test to check equality, unable to interpret the results.
I am loathe to expound basic statistics here ... but, at the considerable risk of pedantry, I must note that Steve's reply below contains fundamental errors, which I feel should not be left on this list unremarked: t-tests do **not** test for differences in **sample** means; they test for differences in **population** means. The sample means are different. Period. Furthermore, the null can be other than equality -- e.g. that the mean of the first population is less than the second. Finally, statistically different is a meaningless phrase. P .05 means that assuming the underlying assumptions at least approximately hold (and an operational definition of approximately hold means is a technical discussion unto itself), then were this calculation to be repeated over and over again with samples of data from populations for which the null is, in fact, true, the expected (long run) proportion of times the null will be rejected is .05 (the standard frequentist interpretation). For any **particular** pairs of samples, the probability of falsely rejecting when the null holds is either 1 or 0 -- either you rejected or not. I would not bother with this were it not for the fact that Steve's apparent confusion -- or at least imprecise statements -- is widespread among scientists, in my experience, and leads to frequent misapplications and misinterpretations of significance testing. The woes of Stat 101 training. ... But that's another diatribe... Bert Gunter Genentech Nonclinical Biostatistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Steve Lianoglou Sent: Wednesday, September 16, 2009 12:19 PM To: Robert Hall Cc: r-help Subject: Re: [R] T-test to check equality, unable to interpret the results. Hi, I was just going to send this when I saw Erik's post. He's right -- we can't say anything about your data, but we can say something about using a t-test. I'm not a real statistician, so this answer isn't very rigorous, but might be helpful. On Sep 16, 2009, at 2:55 PM, Robert Hall wrote: I believe the t-test checks for difference amongst the two sets, and p-value 0.05 means both thesets are statistically different. A t-test is used to check if the difference in the mean of two samples is statistically significant. The null hypothesis is that the two means are not different. If you reject the null, it means you have reason to believe that the means of the two samples are different. See the uses section here: http://en.wikipedia.org/wiki/Student's_t-test Here while checking for dissimilarity the p-value is 0.3288, does it mean that higher the p-value (while t.test checks for dis-similarity) means more similar the results are (which is the case above as the means of the results are very close!) Please help me interpret the results.. Your intuition is essentially correct. In general, the higher the p- value (in any statistical test), the less confident you should be that rejecting the null hypothesis is a good idea. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.