Re: [R] Conversion of Matlab code to an R code
On Mar 23, 2015, at 10:10 AM, Abhinaba Roy abhinabaro...@gmail.com wrote: Hi, Can a Matlab code be converted to R code? I am finding it difficult to do so. Could you please help me out with it. Your help will be highly appreciated. Here comes the Matlab code snip of code Hi, Not do the conversion automatically, certainly. I don't know that anyone will volunteer here to convert such a large volume of code, though I could be wrong of course. That being said, there are two R/Matlab references that you should leverage, if you have not already: http://www.math.umaine.edu/~hiebeler/comp/matlabR.pdf http://mathesaurus.sourceforge.net/octave-r.html That might make your job a bit easier. Regards, Marc Schwartz __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Superimposing 2 curves on the same graph with par(new=TRUE)
Hi, If he wants the two sets of data plotted on the same y axis scale, with the range of the y axis adjusted to the data, an alternative to the use of plot() and points() is: matplot(Date, cbind(MORTSFr, MORTSBu), type = l) See ?matplot Regards, Marc Schwartz On Mar 23, 2015, at 12:04 PM, Boris Steipe boris.ste...@utoronto.ca wrote: ... which is exactly what he shouldn't do because now it the plot falsely asserts that both curves are plotted to the same scale. B. On Mar 23, 2015, at 12:34 PM, Clint Bowman cl...@ecy.wa.gov wrote: Try: plot(Date,MORTSBu,lwd=2,lty=dashed,axes=F,xlab=,ylab=) Clint Bowman INTERNET: cl...@ecy.wa.gov Air Quality Modeler INTERNET: cl...@math.utah.edu Department of EcologyVOICE: (360) 407-6815 PO Box 47600 FAX:(360) 407-7534 Olympia, WA 98504-7600 USPS: PO Box 47600, Olympia, WA 98504-7600 Parcels:300 Desmond Drive, Lacey, WA 98503-1274 On Mon, 23 Mar 2015, varin sacha wrote: Dear R-Experts, I try to superimpose/present 2 curves/plots on the same graph. I would like the result/graph to be readable. For that, I use the par(new=TRUE) argument but on the Y-axis there is a superposition of writings and the Y-axis becomes unreadable. How can I solve this problem ? Here is a reproducible example : Date-c(1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010) MORTSFr-c(16445,17671,18113,17043,14738,14355,15028,14283,13229,13603,13672,13547,13527,13021,12737,11388,11947,10742,11497,11476,11215,10483,9900,9568,9019,8891,8541,8444,8918,8487,8079,8160,7655,6058,5593,5318,4709,4620,4275,4273,3992) MORTSBu-c(838,889,934,946,960,1030,1021,1040,1153,1149,1199,1219,1229,1123,1119,1113,1070,1153,1153,1280,1567,1114,1299,1307,1390,1264,1014,915,1003,1047,1012,1011,959,960,943,957,1043,1006,1061,901,776) plot(Date,MORTSFr,type=l) par(new=TRUE) plot(Date,MORTSBu,lwd=2,lty=dashed) Thanks for your time. Best, S __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Installing R on Linux Red Hat Server
On Mar 12, 2015, at 3:39 PM, Axel Urbiz axel.ur...@gmail.com wrote: Hello, My apologies if this is not the right place to post this question. I need to get R installed on a Linux Red Hat server. I have very limited exposure to R and would appreciate some basic guidance if you could point me to resources describing the process, requirements, etc. Thank you in advance for any help. Best, Axel. Hi, Pointers to some references: 1. The EPEL, which is how you would obtain pre-compiled binary RPMs of R. You will need to have root permissions on the server in order to do this. Once their yum repos are configured on your server, 'sudo yum install R' is essentially what you would need. https://fedoraproject.org/wiki/EPEL 2. The R Installation and Administration Manual, which will provide some guidance, in the Linux section and in the appendices (primarily A) for additional items that may be relevant: http://cran.r-project.org/manuals.html 3. The R-SIG-Fedora list, which is focused on the use of R on RH and derivative (eg. Fedora) Linux distributions. Follow up questions should be posted there, ideally after you subscribe, lest you be subject to on-going moderation (speaking as a co-moderator of that list). https://stat.ethz.ch/mailman/listinfo/r-sig-fedora Regards, Marc Schwartz __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SPSS command match files for merging one-to-many (hierarchical) equivalent in R?
On Mar 9, 2015, at 1:53 PM, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 09/03/2015 1:40 PM, Kristina Loderer wrote: Dear R community, to combine data sets of hierarchical, nested nature (i.e., data sets linked by, for example, the variable study ID and then also by outcome_variable_1 and outcome_variable_2) I can use the match files command in SPSS. What is the equivalent command / function in R? Is it the merge function, or the match function? The more I read, the more confused I become.. I don't know SPSS at all, so I can't help you. If nobody else does, you might try putting together a tiny example in R showing what you're starting with, and what you want to produce. From what you wrote, I'd guess merge(), not match(), but you might really be asking for something completely different. Duncan Murdoch Based upon the info here: http://www.ats.ucla.edu/stat/spss/modules/merge.htm I would go with ?merge, since the desired functionality appears to be a relational join operation. Regards, Marc Schwartz __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] numbering consecutive rows based on length criteria
On Mar 2, 2015, at 11:43 AM, Morway, Eric emor...@usgs.gov wrote: Using this dataset: dat - read.table(textConnection(daynoRes.QwRes.Q 1 237074.41 215409.41 2 2336240.20 164835.16 3 84855.42 357062.72 4 76993.48 386326.78 5 73489.47 307144.09 6 70246.96 75885.75 7 69630.09 74054.33 8 66714.78 70071.80 9 122296.90 66579.08 10 63502.71 65811.37 11 63401.84 64795.12 12 63387.84 64401.14 13 63186.10 64163.95 14 63160.74 63468.25 15 60471.15 60719.15 16 58235.63 57655.14 17 58089.73 58061.34 18 57846.39 57357.89 19 57839.42 56495.69 20 57740.06 56219.97 21 58068.57 55810.91 22 58358.34 56437.81 23 76284.90 73722.92 24 105138.31 100729.00 25 147203.03 178079.38 26 109996.02 13.95 27 91424.20 87391.56 28 89065.91 87196.69 29 86628.74 84809.07 30 79357.60 77555.62),header=T) I'm attempting to generate a column that continuously numbers consecutive rows where wRes.Q is greater than noRes.Q. To that end, I've come up with the following: dat$flg - dat$wRes.Qdat$noRes.Q dat$cnt - with(dat, ave(integer(length(flg)), flg, FUN=seq_along)) The problem with dat$cnt is that it doesn't start over with 1 when a 'new' group of either true or false is encountered. Thus, row 9's cnt value should start over at 1, as should dat$cnt[10], and dat$cnt[11]==2, etc. (the desired result is shown below) In the larger dataset I'm working with (6,000 rows), there are blocks of rows where the number of consecutive rows with dat$cnt==TRUE exceeds 100. My goal is to plot these blocks of rows as polygons in a time series plot. If, for the small example provided, the number of consecutive rows with dat$cnt==TRUE is greater than or equal to 5 (the 2 blocks of rows satisfying this criteria in this small example are rows 3-8 and 10-15), is there a way to add a column that uniquely numbers these blocks of rows? I'd like to end up with the following, which shows the correct cnt column and a column called plygn that is my ultimate goal: dat # daynoRes.QwRes.Q flg cnt plygn # 1 237074.41 215409.41 FALSE 1 NA # 2 2336240.20 164835.16 FALSE 2 NA # 3 84855.42 357062.72 TRUE 1 1 # 4 76993.48 386326.78 TRUE 2 1 # 5 73489.47 307144.09 TRUE 3 1 # 6 70246.96 75885.75 TRUE 4 1 # 7 69630.09 74054.33 TRUE 5 1 # 8 66714.78 70071.80 TRUE 6 1 # 9 122296.90 66579.08 FALSE 1 NA # 10 63502.71 65811.37 TRUE 1 2 # 11 63401.84 64795.12 TRUE 2 2 # 12 63387.84 64401.14 TRUE 3 2 # 13 63186.10 64163.95 TRUE 4 2 # 14 63160.74 63468.25 TRUE 5 2 # 15 60471.15 60719.15 TRUE 6 2 # 16 58235.63 57655.14 FALSE 1 NA # 17 58089.73 58061.34 FALSE 2 NA # 18 57846.39 57357.89 FALSE 3 NA # 19 57839.42 56495.69 FALSE 4 NA # 20 57740.06 56219.97 FALSE 5 NA # 21 58068.57 55810.91 FALSE 6 NA # 22 58358.34 56437.81 FALSE 7 NA # 23 76284.90 73722.92 FALSE 8 NA # 24 105138.31 100729.00 FALSE 9 NA # 25 147203.03 178079.38 TRUE 1 NA # 26 109996.02 13.95 TRUE 2 NA # 27 91424.20 87391.56 FALSE 1 NA # 28 89065.91 87196.69 FALSE 2 NA # 29 86628.74 84809.07 FALSE 3 NA # 30 79357.60 77555.62 FALSE 4 NA Thanks, Eric Hi, See ?rle unlist(sapply(rle(with(dat, wRes.Q noRes.Q))$lengths, seq)) [1] 1 2 1 2 3 4 5 6 1 1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 1 2 1 2 3 4 cbind() the result above to your data frame. Regards, Marc Schwartz __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Convert windows source package for Mac use
On Feb 24, 2015, at 7:19 AM, Warthog arjarvis.wart...@gmail.com wrote: Hi, I am on a Mac. Is there a way to convert a Windows source package so it can be installed on a Mac? I have a package in zip form from a friend who runs Windows. I THINK that it is in compiled format for Windows. The Description says: Built: R 3.1.2 x86_64-w64-mingw32windows I tried to convert it to a tgz then Install/Load on Mac R, but I get the error message: Error: package 'package' was built for x86_64-w64-mingw32 I can run Windows on Parallels Desktop, and the original zip format installs and loads OK. I'd prefer to run R on my Mac. Sorry if this is a stupid question: I read the R-exts and it doesn't say if you can or cannot do this. Thanks, Alan Hi, Just as an FYI, there is a Mac specific SIG list: https://stat.ethz.ch/mailman/listinfo/r-sig-mac Next, the Windows .zip file is a *binary*, not source package, specifically compiled for Windows, as you hint at above. If the package contains any C/C++/FORTRAN code, then that code is also compiled for Windows and is not portable. The source package would/should have a .tar.gz extension and you would want your friend to provide that version of his/her package, presuming that he/she created this package and that it is not otherwise available (eg. from CRAN or a third party location). If you can get that version of the package, then you may be able to install it on OS X, using: install.packages(PackageFileName, repos = NULL, type = source) That presumes that there is no C/C++/FORTRAN code that requires compilation. If so, you would also need to install required development related tools which are referenced in the R FAQ for OSX and the Installation and Admin manual. Regards, Marc Schwartz __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Change error bar length in barplot2
On Feb 17, 2015, at 10:46 AM, Joule Madinga jmadi...@yahoo.fr wrote: Hi,I'm new to R.I would like to make a barplot of parasite infection prevalence (with 95% confidence interval) by age group.I have 4 parasite species and 5 age-groups and the example by Marc Schwartz (barplot2) fits very well to my data.However, I would like to plot my own 95%CI (as calculated with my own data) instead of faked 95%CI provided in the example.How can I proceed? Thank you in advance. Joule Joule, Please see my reply to your offlist e-mail to me this morning. It looks like there was a delay in your post here coming through, perhaps as a result of moderation. Regards, Marc Schwartz __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Database connection query
On Feb 9, 2015, at 4:33 AM, Lalitha Kristipati lalitha.kristip...@techmahindra.com wrote: Hi, I would like to know when to use drivers and when to use packages to connect to databases in R Regards, Lalitha Kristipati Associate Software Engineer In general, you will need both. There is more information in the R Data Import/Export manual: http://cran.r-project.org/doc/manuals/r-release/R-data.html#Relational-databases and there is a SIG list for R and DB specific subject matter: https://stat.ethz.ch/mailman/listinfo/r-sig-db Regards, Marc Schwartz __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] odbcConnectAccess2007 errors with Access databases on new PC
On Feb 2, 2015, at 12:00 PM, utz.ryan utz.r...@gmail.com wrote: Hello, I've connected R to Microsoft Access databases for years now using odbcConnectAccess2007. I recently got a new computer and R is absolutely refusing to connect to any Access database with the following error message: Warning messages: 1: In odbcDriverConnect(con, ...) : [RODBC] ERROR: state IM002, code 0, message [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified 2: In odbcDriverConnect(con, ...) : ODBC connection failed It's definitely not a path name problem-I've checked a dozen times. A few things online have mentioned something about 32-bit and 64-bit systems causing problems. I've tried opening both the 64-bit and 32-bit versions of R with zero luck. My Office is running a 32-bit system. Is there anything else I can try? I really would hate to lose the ability to connect R to my Access databases due to some intractable problem. Thanks, Ryan Take a look at the RODBC vignette: vignette(RODBC) or http://cran.r-project.org/web/packages/RODBC/vignettes/RODBC.pdf and see the footnote (16) at the bottom of page 22 regarding the creation of 32 bit DSNs and the following from page 20: 32-bit Windows drivers for Access 2007 and Excel 2007 are bundled with Office 2007 but can be installed separately via the installer AccessDatabaseEngine.exe available from http://www.microsoft.com/en-us/download/details.aspx?id=23734.; The entire tool chain needs to be of the same architecture. So 32 bit Office, 32 bit ODBC drivers, 32 bit DSN and 32 bit R. BTW, as you may be aware, there is a DB SIG list specifically for these types of questions: https://stat.ethz.ch/mailman/listinfo/r-sig-db Regards, Marc Schwartz __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems installing jpeg package
On Jan 27, 2015, at 6:05 AM, Jorge Fernández García jorfeg...@hotmail.com wrote: I need help installing jpeg package. Simple command install.package(jpeg) produce the following result. My OS is Fedora 21. Thanks in advance for your help. install.packages(jpeg) Installing package into ‘/home/cgg/R/x86_64-redhat-linux-gnu-library/3.1’ (as ‘lib’ is unspecified) trying URL 'http://cran.rstudio.com/src/contrib/jpeg_0.1-8.tar.gz' Content type 'application/x-gzip' length 18046 bytes (17 Kb) opened URL == downloaded 17 Kb * installing *source* package ‘jpeg’ ... ** package ‘jpeg’ successfully unpacked and MD5 sums checked ** libs gcc -m64 -std=gnu99 -I/usr/include/R -DNDEBUG -I/usr/local/include-fpic -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -c read.c -o read.o In file included from read.c:1:0: rjcommon.h:11:21: fatal error: jpeglib.h: No such file or directory #include jpeglib.h ^ compilation terminated. /usr/lib64/R/etc/Makeconf:133: recipe for target 'read.o' failed make: *** [read.o] Error 1 ERROR: compilation failed for package ‘jpeg’ * removing ‘/home/cgg/R/x86_64-redhat-linux-gnu-library/3.1/jpeg’ Warning in install.packages : installation of package ‘jpeg’ had non-zero exit status The downloaded source packages are in ‘/tmp/RtmpYpmDcb/downloaded_packages’ Hi, You are missing the header file jpeglib.h, which is required for compiling the package from source. On Fedora, such files are typically contained in a *-devel RPM, where the '*' is the prefix for the Fedora RPM that provides the binary and related files. Specifically in this case, libjpeg is contained in the libjpeg-turbo RPM, thus you need, as root: yum install libjpeg-turbo-devel or sudo yum install libjpeg-turbo-devel from the CLI. The R Installation and Administration manual covers this in: http://cran.r-project.org/doc/manuals/r-release/R-admin.html#Essential-programs-and-libraries As an aside, there is a SIG list specifically for R on RH/Fedora based Linux distros: https://stat.ethz.ch/mailman/listinfo/r-sig-fedora Regards, Marc Schwartz __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nonmonotonic glm?
On Jan 11, 2015, at 4:00 PM, Ben Bolker bbol...@gmail.com wrote: Stanislav Aggerwal stan.aggerwal at gmail.com writes: I have the following problem. DV is binomial p IV is quantitative variable that goes from negative to positive values. The data look like this (need nonproportional font to view): [snip to make gmane happy] If these data were symmetrical about zero, I could use abs(IV) and do glm(p ~ absIV). I suppose I could fit two glms, one to positive and one to negative IV values. Seems a rather ugly approach. [snip] What's wrong with a GLM with quadratic terms in the predictor variable? This is perfectly respectable, well-defined, and easy to implement: glm(y~poly(x,2),family=binomial,data=...) or y~x+I(x^2) or y~poly(x,2,raw=TRUE) (To complicate things further, this is within-subjects design) glmer, glmmPQL, glmmML, etc. should all support this just fine. As an alternative to Ben's recommendation, consider using a piecewise cubic spline on the IV. This can be done using glm(): # splines is part of the Base R distribution # I am using 'df = 5' below, but this can be adjusted up or down as may be apropos require(splines) glm(DV ~ ns(IV, df = 5), family = binomial, data = YourDataFrame) and as Ben's notes, is more generally supported in mixed models. If this was not mixed model, another logistic regression implementation is in Frank's rms package on CRAN, using his lrm() instead of glm() and rcs() instead of ns(): # after installing rms from CRAN require(rms) lrm(DV ~ rcs(IV, 5), data = YourDataFrame) Regards, Marc Schwartz __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] i need help for var.test()
On Jan 8, 2015, at 5:12 AM, sait k sa...@hotmail.de wrote: Dear Sir or Madam, i want to use the var.test() (f.test()) for n samples. But in R the var.test() can only used for variances of two samples. In the intruductions stands: Performs an F test to compare the variances of two samples from normal populations. I need a variance test for n samples. It will be great, if you tell me the which test i can use in R for this problem. Thank you for the help. Yours sincerely, Sait Polat You can take a look at ?bartlett.test (which is listed in the See Also section of ?var.test) or perhaps ?fligner.test for a non-parametric method. Regards, Marc Schwartz __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to group by then count?
On Jan 6, 2015, at 3:29 PM, Monnand monn...@gmail.com wrote: Thank you, all! Your replies are very useful, especially Don's explanation! One complaint I have is: the function name (talbe) is really not very informative. Why not? You used the word 'table' in your original post, except as Don noted, you were overthinking the problem. The basic concept is a tabulation of discrete values in a vector, which is a basic analytic method. Using commands like: ??table ??frequency would have led you to the table() function, as well as others. Believe it or not, taking a few minutes to have read/searched An Introduction to R, which is the basic R manual, would have led you to the same solution: http://cran.r-project.org/doc/manuals/r-release/R-intro.html#Frequency-tables-from-factors Regards, Marc Schwartz On Sun Jan 04 2015 at 5:03:47 PM MacQueen, Don macque...@llnl.gov wrote: This seems to me to be a case where thinking in terms of computer programming concepts is getting in the way a bit. Approach it as a data analysis task; the S language (upon which R is based) is designed in part for data analysis so there is a function that does most of the job for you. (I changed your vector of strings to make the result more easily interpreted) x = c(1, 1, 2, 1, 5, 2,'3','5','5','2','2') tmp - table(x) ## counts the number of appearances of each element tmp[tmp==max(tmp)] ## finds which one occurs most often 2 4 Meaning that the element '2' appears 4 times. The table() function should be fast even with long vectors. Here's an example with a vector of length 1 million: foo - table( sample(letters, 1e6, replace=TRUE) ) One of the seminal books on the S language is John M Chambers' Programming with Data -- and I would emphasize the with Data part of that title. -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 1/4/15, 1:02 AM, Monnand monn...@gmail.com wrote: Hi all, I thought this was a very naive problem but I have not found any solution which is idiomatic to R. The problem is like this: Assuming we have vector of strings: x = c(1, 1, 2, 1, 5, 2) We want to count number of appearance of each string. i.e. in vector x, string 1 appears 3 times; 2 appears twice and 5 appears once. Then I want to know which string is the majority. In this case, it is 1. For imperative languages like C, C++ Java and python, I would use a hash table to count each strings where keys are the strings and values are the number of appearance. For functional languages like clojure, there're higher order functions like group-by. However, for R, I can hardly find a good solution to this simple problem. I found a hash package, which implements hash table. However, installing a package simple for a hash table is really annoying for me. I did find aggregate and other functions which operates on data frames. But in my case, it is a simple vector. Converting it to a data frame may be not desirable. (Or is it?) Could anyone suggest me an idiomatic way of doing such job in R? I would be appreciate for your help! -Monnand __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] move date-values from one line to several lines
On Dec 2, 2014, at 9:29 AM, Matthias Weber matthias.we...@fntsoftware.com wrote: Hello together, i have a data.frame with date-values. What I want is a data.frame with a several lines for each date. My current data.frame looks like this one: ID FROM TOREASON 1 2015-02-27 2015-02-28Holiday 1 2015-03-15 2015-03-20Illness 2 2015-05-20 2015-02-23Holiday 2 2015-06-01 2015-06-03Holiday 2 2015-07-01 2015-07-01Illness The result looks like this one: ID DATE REASON 12015-02-27Holiday 12015-02-28Holiday 12015-03-15Illness 12015-03-16Illness 12015-03-17Illness 12015-03-18Illness 12015-03-19Illness 12015-03-20Illness 22015-05-20 Holiday 22015-05-21 Holiday 22015-05-22 Holiday 22015-05-23 Holiday 22015-06-01 Holiday 22015-06-02 Holiday 22015-06-02 Holiday 22015-07-01 Illness Maybe anyone can help me, how I can do this. Thank you. Best regards. Mat A quick and dirty approach. First, note that in your source data frame, the TO value in the third row is incorrect. I changed it here: DF ID FROM TO REASON 1 1 2015-02-27 2015-02-28 Holiday 2 1 2015-03-15 2015-03-20 Illness 3 2 2015-05-20 2015-05-23 Holiday 4 2 2015-06-01 2015-06-03 Holiday 5 2 2015-07-01 2015-07-01 Illness With that in place, you can use R's recycling of values to create multiple data frame rows from the date sequences and the single ID and REASON entries: i - 1 data.frame(ID = DF$ID[i], DATE = seq(DF$FROM[i], DF$TO[i], by = day), REASON = DF$REASON[i]) ID DATE REASON 1 1 2015-02-27 Holiday 2 1 2015-02-28 Holiday So just put that into an lapply() based loop, which returns a list: DF.TMP - lapply(seq(nrow(DF)), function(i) data.frame(ID = DF$ID[i], DATE = seq(DF$FROM[i], DF$TO[i], by = day), REASON = DF$REASON[i])) DF.TMP [[1]] ID DATE REASON 1 1 2015-02-27 Holiday 2 1 2015-02-28 Holiday [[2]] ID DATE REASON 1 1 2015-03-15 Illness 2 1 2015-03-16 Illness 3 1 2015-03-17 Illness 4 1 2015-03-18 Illness 5 1 2015-03-19 Illness 6 1 2015-03-20 Illness [[3]] ID DATE REASON 1 2 2015-05-20 Holiday 2 2 2015-05-21 Holiday 3 2 2015-05-22 Holiday 4 2 2015-05-23 Holiday [[4]] ID DATE REASON 1 2 2015-06-01 Holiday 2 2 2015-06-02 Holiday 3 2 2015-06-03 Holiday [[5]] ID DATE REASON 1 2 2015-07-01 Illness Then use do.call() on the result: do.call(rbind, DF.TMP) ID DATE REASON 1 1 2015-02-27 Holiday 2 1 2015-02-28 Holiday 3 1 2015-03-15 Illness 4 1 2015-03-16 Illness 5 1 2015-03-17 Illness 6 1 2015-03-18 Illness 7 1 2015-03-19 Illness 8 1 2015-03-20 Illness 9 2 2015-05-20 Holiday 10 2 2015-05-21 Holiday 11 2 2015-05-22 Holiday 12 2 2015-05-23 Holiday 13 2 2015-06-01 Holiday 14 2 2015-06-02 Holiday 15 2 2015-06-03 Holiday 16 2 2015-07-01 Illness See ?seq.Date for the critical step. Regards, Marc Schwartz __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] install R without texlive dependencies
On Nov 20, 2014, at 8:00 AM, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 20/11/2014 3:31 AM, Muraki Kazutaka wrote: Hi all I'm trying install R from EPEL repo on Scientific Linux. It's going together texlive rpm dependencies also from repo, but I already have installed TexLive with tlmgr from CTAN mirror and I don't want texlive rpms from linux repo. So... Question is... How can I install R without these deps and at the same time work in without any problems in R and TexLive. You might get an answer to your question here, but this is more a question about your Linux distribution, and you will likely have better luck asking on a forum dedicated to it. R should be perfectly happy working with the TexLive you already have. Duncan Murdoch As Duncan noted, since this is Linux distro specific, you would be better off posting to R-SIG-Fedora, which is for R on RH based distros: https://stat.ethz.ch/mailman/listinfo/r-sig-fedora A number of the R related RH/Fedora package maintainers monitor that list and you can discuss the nuances of some of the dependencies for R there. That being said and while it has been a number of years for me on Linux, a Google search did not turn up any CLI options for 'yum' to be able to install without the hard coded RPM dependencies. However, there would be an option using 'rpm' at the command line, along the lines of: rpm -ivh --nodeps RPMName.rpm where you can either download the R RPM and install it locally, or include the full URL to the RPM on the EPEL server. Of course, the above incantation can leave you without other needed dependencies, so use with caution. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adapt sweave function to produce an automatic pdf.
On Nov 20, 2014, at 4:03 AM, Frederic Ntirenganya ntfr...@gmail.com wrote: Hi All, I want to make a climate method (sweave_function). The aim is to be able to adapt sweave code that produces an automatic pdf so that it can works for my climate object. i.e. instead of compile pdf, I call : data_obj$sweave_function() and get the pdf. for instance I have boxplot_method() and I want to output it in this way. Thanks for the help!!! Ex: This how I started but I don't understand how I can proceed. climate$methods(sweave_function = function(climate_data_objs_str, climate_data_objs_ind ) { #- # This function returns the pdf using sweave function in R # The required arguments are: # climate_data_objs_str : list of the names of climate data objects in the climate object # climate_data_objs_ind : list of the indices of climate data object in the climate object # note that one of the above arguments is enough. # # get_climate_data_objects returns a list of the climate_data objects specified # in the arguements. # If no objects specified then all climate_data objects will be taken by default climate_data_objs = get_climate_data_objects(climate_data_objs_str, climate_data_objs_ind) }) You need to either dynamically generate the entire final .tex file including the preamble and so forth via your function(s) or: alternatively use a .Rnw Sweave master file template that contains the static content and then in that file, insert the dynamic content using one or more directives along the lines of \input{InsertContentHere.tex}, where InsertContentHere.tex contains raw TeX content to be inserted at that point in the master file when processed by Sweave. The content of the *.tex files can be generated via various R functions including ?cat for raw text and others like the xtable() function in the CRAN package of the same name or the latex() function in Hmisc, which can be used to create tables, etc. Note that the content of InsertContentHere.tex will not itself be processed by Sweave, as if it was a child .Rnw file. If you want that type of functionality, you would need to use \SweaveInput{InsertContentHere.Rnw}. Once you have your final .tex file, you can then run pdflatex on that file via ?system. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sweave package for R version 3.02
On Nov 11, 2014, at 5:25 AM, Frederic Ntirenganya ntfr...@gmail.com wrote: Hi All, I would like to install the package sweave but got the following warning: install.packages(sweave) Installing package into ‘/home/fredo/R/x86_64-pc-linux-gnu-library/3.0’ (as ‘lib’ is unspecified) Warning in install.packages : package ‘sweave’ is not available (for R version 3.0.2) I tryied to download it's zip file but not get it. Anyone help me on how I can do it. Thanks Regrads, Frederic. Sweave is part of the 'utils' package, which is a part of base R and does not need to be installed, it already is. BTW, 3.1.2 is the current version of R. 3.0.2 is over a year old already. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inverse Student t-value
FWIW, I get: 4.117456652 in Excel 2011 on OS X and: 4.117457 in R 3.1.1 on OS X There is a KB article on the TINV function here, suggesting that the threshold for the iterative algorithm in Excel has been tightened in recent versions: http://support.microsoft.com/kb/828340 Regards, Marc Schwartz On Sep 30, 2014, at 1:49 PM, jlu...@ria.buffalo.edu wrote: My Excel (2013) returns exactly what R does. I used both T.INV and T.INV.T2There is no TINV. Has Excel been updated? Duncan Murdoch murdoch.dun...@gmail.com Sent by: r-help-boun...@r-project.org 09/30/2014 02:36 PM To Andre geomodel...@gmail.com, cc r-help@r-project.org Subject Re: [R] Inverse Student t-value On 30/09/2014 2:26 PM, Andre wrote: Hi Duncan, Actually, I am trying trace the formula for the Critical value of Z and manual formula is =(I7-1)/SQRT(I7)*SQRT((TINV(0.05/I7,I7-2))^2/(I7-2+TINV(0.05/I7,I7-2))) So, I got new problem for TINV formula. I just need a manual equation for TINV. Sorry, can't help. I'm not sure I understand what you want, but if it's a simple formula for quantiles of the t distribution, it doesn't exist. Duncan Murdoch Hope solve this problem. Cheers! On Wed, Oct 1, 2014 at 1:20 AM, Duncan Murdoch murdoch.dun...@gmail.com mailto:murdoch.dun...@gmail.com wrote: On 30/09/2014 2:11 PM, Andre wrote: Hi Duncan, No, that's correct. Actually, I have data set below; Then it seems Excel is worse than I would have expected. I confirmed R's value in two other pieces of software, OpenOffice and some software I wrote a long time ago based on an algorithm published in 1977 in Applied Statistics. (They are probably all using the same algorithm. I wonder what Excel is doing?) N= 1223 alpha= 0.05 Then probability= 0.05/1223=0.408831 degree of freedom= 1223-2= 1221 So, TINV(0.408831,1221) returns 4.0891672 Could you show me more detail a manual equation. I really appreciate it if you may give more detail. I already gave you the expression: abs(qt(0.408831/2, df=1221)). For more detail, I suppose you could look at the help page for the qt function, using help(qt). Duncan Murdoch Cheers! On Wed, Oct 1, 2014 at 1:01 AM, Duncan Murdoch murdoch.dun...@gmail.com mailto:murdoch.dun...@gmail.com mailto:murdoch.dun...@gmail.com mailto:murdoch.dun...@gmail.com wrote: On 30/09/2014 1:31 PM, Andre wrote: Dear Sir/Madam, I am trying to use calculation for two-tailed inverse of the student`s t-distribution function presented by Excel functions like =TINV(probability, deg_freedom). For instance: The Excel function =TINV(0.408831,1221) = returns 4.0891672. Would you like to show me a manual calculation for this? Appreciate your helps in advance. That number looks pretty far off the true value. Have you got a typo in your example? You can compute the answer to your question as abs(qt(0.408831/2, df=1221)), but you'll get 4.117. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error : '.path.package' is defunct.
On Sep 25, 2014, at 1:02 PM, Yuan, Rebecca rebecca.y...@bankofamerica.com wrote: Hello all, After this reinstallation of R 3.1.1 and Rstudio 0.98.1028, I have the following error messages whenever I tried to load a library to it: library('zoo') Error : '.path.package' is defunct. Use 'path.package' instead. See help(Defunct) Attaching package: 'zoo' The following objects are masked from 'package:base': as.Date, as.Date.numeric Could you please help on this? Thanks! Rebecca Check the version of 'zoo' that you have installed by using: library(help = zoo) More than likely, you have an old version of the package installed (current is 1.7-11) that still uses the now defunct function, hence the error message. You can run: update.packages(checkBuilt = TRUE) to update all of your installed packages and be sure that they are built for the current version of R that you now have running. There may be other nuances here, such as OS, having Admin access and where the CRAN packages are installed, but at least checking the version of zoo will be a good staring point. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to combine character month and year columns into one column
On Sep 23, 2014, at 10:41 AM, Kuma Raj pollar...@gmail.com wrote: Dear R users, I have a data with month and year columns which are both characters and wanted to create a new column like Jan-1999 with the following code. The result is all NA for the month part. What is wrong with the and what is the right way to combine the two? ddf$MonthDay - paste(month.abb[ddf$month], ddf$Year, sep=- ) Thanks dput(ddf) structure(list(month = c(01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12), Year = c(1999, 1999, 1999, 1999, 1999, 1999, 1999, 1999, 1999, 1999, 1999, 1999), views = c(42, 49, 44, 38, 37, 35, 38, 39, 38, 39, 38, 46), MonthDay = c(NA-1999, NA-1999, NA-1999, NA-1999, NA-1999, NA-1999, NA-1999, NA-1999, NA-1999, NA-1999, NA-1999, NA-1999)), .Names = c(month, Year, views, MonthDay), row.names = 109:120, class = data.frame) Since you are trying to use ddf$month as an index into month.abb, you will either need to coerce ddf$month to numeric in your code, or adjust how the data frame is created. In the case of the former approach: paste(month.abb[as.numeric(ddf$month)], ddf$Year, sep=- ) [1] Jan-1999 Feb-1999 Mar-1999 Apr-1999 May-1999 Jun-1999 [7] Jul-1999 Aug-1999 Sep-1999 Oct-1999 Nov-1999 Dec-1999 Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to combine character month and year columns into one column
Two things: 1. You need to convert the result of the paste() to a Date related class. 2. R's standard Date classes require a full date, so you would have to add in some default day of the month: See ?as.Date NewDate - as.Date(paste(month.abb[as.numeric(ddf$month)], 01, ddf$Year, sep=-), format = %b-%d-%Y) or without using month.abb, which is not really needed. Note the difference in the format argument: NewDate - as.Date(paste(as.numeric(ddf$month), 01, ddf$Year, sep=-), format = %m-%d-%Y) class(NewDate) [1] Date str(NewDate) Date[1:12], format: 1999-01-01 1999-02-01 1999-03-01 1999-04-01 ... You can then format the output of NewDate as you might require: format(NewDate, format = %b-%d-%Y) [1] Jan-01-1999 Feb-01-1999 Mar-01-1999 Apr-01-1999 [5] May-01-1999 Jun-01-1999 Jul-01-1999 Aug-01-1999 [9] Sep-01-1999 Oct-01-1999 Nov-01-1999 Dec-01-1999 Note that the output of the last step is a character vector: str(format(NewDate, format = %b-%d-%Y)) chr [1:12] Jan-01-1999 Feb-01-1999 Mar-01-1999 ... which is fine for formatting/printing, even though NewDate is a Date class object. Alternatively, I believe that Gabor's 'zoo' package on CRAN has a 'yearmon' class for this type of partial date. Regards, Marc On Sep 23, 2014, at 12:04 PM, Kuma Raj pollar...@gmail.com wrote: Many thanks for your quick answer which has created what I wished. May I ask followup question on the same issue. I failed to convert the new column into date format with this code. The class of MonthDay is still character df$MonthDay - format(df$MonthDay, format=c(%b %Y)) I would appreciate if you could suggest a working solution Thanks On 23 September 2014 18:03, Marc Schwartz marc_schwa...@me.com wrote: On Sep 23, 2014, at 10:41 AM, Kuma Raj pollar...@gmail.com wrote: Dear R users, I have a data with month and year columns which are both characters and wanted to create a new column like Jan-1999 with the following code. The result is all NA for the month part. What is wrong with the and what is the right way to combine the two? ddf$MonthDay - paste(month.abb[ddf$month], ddf$Year, sep=- ) Thanks dput(ddf) structure(list(month = c(01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12), Year = c(1999, 1999, 1999, 1999, 1999, 1999, 1999, 1999, 1999, 1999, 1999, 1999), views = c(42, 49, 44, 38, 37, 35, 38, 39, 38, 39, 38, 46), MonthDay = c(NA-1999, NA-1999, NA-1999, NA-1999, NA-1999, NA-1999, NA-1999, NA-1999, NA-1999, NA-1999, NA-1999, NA-1999)), .Names = c(month, Year, views, MonthDay), row.names = 109:120, class = data.frame) Since you are trying to use ddf$month as an index into month.abb, you will either need to coerce ddf$month to numeric in your code, or adjust how the data frame is created. In the case of the former approach: paste(month.abb[as.numeric(ddf$month)], ddf$Year, sep=- ) [1] Jan-1999 Feb-1999 Mar-1999 Apr-1999 May-1999 Jun-1999 [7] Jul-1999 Aug-1999 Sep-1999 Oct-1999 Nov-1999 Dec-1999 Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to combine character month and year columns into one column
Hi David, My initial reaction (not that the decision is mine to make), is that from a technical perspective, obviously indexing by name is common. There are two considerations, off the top of my head: 1. There would be a difference, of course, between: month.abb[1] NA NA and month.abb[01] 01 Jan Thus, is this approach overly fragile and potentially going to create more problems (bugs, head scratching, etc.) than it solves. 2. From a consistency standpoint, I don't see an indication that other built-in constants have similar name attributes, not that I did an exhaustive review. So I suspect that if there were reasonable justification for it here, it would also need to at least be considered for other constants, which increases the scope of work a good bit. If there is a desire for this, one could file an RFE at https://bugs.r-project.org to gauge the reactions from R Core, unless they comment here first. Regards, Marc On Sep 23, 2014, at 12:47 PM, David Winsemius dwinsem...@comcast.net wrote: Marc; Feature request: Would it make sense to construct month.abb as a named vector so that the operation that was attempted would have succeeded? Adding alphanumeric names c(01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12) would allow character extraction from substring or regex extracted month values which are always character-class. Example: names(month.abb) - c(01, 02, 03, 04, 05, 06, + 07, 08, 09, 10, 11, 12) month.abb 010203040506070809101112 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec month.abb[ substr(Sys.Date(), 6,7) ] 09 Sep -- David. On Sep 23, 2014, at 9:03 AM, Marc Schwartz wrote: On Sep 23, 2014, at 10:41 AM, Kuma Raj pollar...@gmail.com wrote: Dear R users, I have a data with month and year columns which are both characters and wanted to create a new column like Jan-1999 with the following code. The result is all NA for the month part. What is wrong with the and what is the right way to combine the two? ddf$MonthDay - paste(month.abb[ddf$month], ddf$Year, sep=- ) Thanks dput(ddf) structure(list(month = c(01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12), Year = c(1999, 1999, 1999, 1999, 1999, 1999, 1999, 1999, 1999, 1999, 1999, 1999), views = c(42, 49, 44, 38, 37, 35, 38, 39, 38, 39, 38, 46), MonthDay = c(NA-1999, NA-1999, NA-1999, NA-1999, NA-1999, NA-1999, NA-1999, NA-1999, NA-1999, NA-1999, NA-1999, NA-1999)), .Names = c(month, Year, views, MonthDay), row.names = 109:120, class = data.frame) Since you are trying to use ddf$month as an index into month.abb, you will either need to coerce ddf$month to numeric in your code, or adjust how the data frame is created. In the case of the former approach: paste(month.abb[as.numeric(ddf$month)], ddf$Year, sep=- ) [1] Jan-1999 Feb-1999 Mar-1999 Apr-1999 May-1999 Jun-1999 [7] Jul-1999 Aug-1999 Sep-1999 Oct-1999 Nov-1999 Dec-1999 Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] To Add a variable from Df1 to Df2 which have a same common variable
Cadre, 61-Homme-Non Cadre, 61-Homme-Non Cadre, 61-Homme-Non Cadre, 61-Homme-Non Cadre, 61-Homme-Non Cadre, 61-Homme-Non Cadre, 62-Femme-Non Cadre, 62-Femme-Non Cadre, 62-Femme-Non Cadre, 62-Femme-Non Cadre, 62-Femme-Non Cadre, 62-Femme-Non Cadre, 62-Homme-Non Cadre, 62-Homme-Non Cadre, 62-Homme-Non Cadre, 62-Homme-Non Cadre, 62-Homme-Non Cadre, 62-Homme-Non Cadre, 62-Homme-Non Cadre, 62-Homme-Non Cadre, 62-Homme-Non Cadre, 62-Homme-Non Cadre, 62-Homme-Non Cadre, 62-Homme-Non Cadre, 62-Homme-Non Cadre, 62-Homme-Non Cadre, 62-Homme-Non Cadre, 62-Homme-Non Cadre, 62-Homme-Non Cadre, 62-Homme-Non Cadre, 62-Homme-Non Cadre, 62-Homme-Non Cadre, 62-Homme-Non Cadre, 62-Homme-Non Cadre, 62-Homme-Non Cadre, 63-Femme-Non Cadre, 63-Femme-Non Cadre, 63-Femme-Non Cadre, 63-Femme-Non Cadre, 63-Femme-Non Cadre, 63-Femme-Non Cadre, 63-Homme-Non Cadre, 63-Homme-Non Cadre, 63-Homme-Non Cadre, 63-Homme-Non Cadre, 63-Homme-Non Cadre, 63-Homme-Non Cadre, 63-Homme-Non Cadre, 63-Homme-Non Cadre, 63-Homme-Non Cadre, 63-Homme-Non Cadre, 64-Femme-Non Cadre, 64-Femme-Non Cadre, 64-Femme-Non Cadre, 64-Femme-Non Cadre, 64-Femme-Non Cadre, 64-Femme-Non Cadre, 64-Femme-Non Cadre, 64-Femme-Non Cadre, 64-Femme-Non Cadre, 64-Homme-Non Cadre, 64-Homme-Non Cadre, 64-Homme-Non Cadre, 64-Homme-Non Cadre, 64-Homme-Non Cadre, 64-Homme-Non Cadre, 64-Homme-Non Cadre, 64-Homme-Non Cadre, 64-Homme-Non Cadre, 64-Homme-Non Cadre, 64-Homme-Non Cadre, 64-Homme-Non Cadre, 64-Homme-Non Cadre, 64-Homme-Non Cadre, 64-Homme-Non Cadre, 64-Homme-Non Cadre, 65-Homme-Non Cadre, 65-Homme-Non Cadre, 65-Homme-Non Cadre, 65-Homme-Non Cadre, 65-Homme-Non Cadre, 65-Homme-Non Cadre, 65-Homme-Non Cadre, 65-Homme-Non Cadre, 65-Homme-Non Cadre, 65-Homme-Non Cadre, 65-Homme-Non Cadre, 66-Femme-Non Cadre, 66-Femme-Non Cadre, 66-Femme-Non Cadre, 66-Homme-Non Cadre, 66-Homme-Non Cadre, 66-Homme-Non Cadre, 66-Homme-Non Cadre, 66-Homme-Non Cadre, 66-Homme-Non Cadre, 66-Homme-Non Cadre, 66-Homme-Non Cadre, 66-Homme-Non Cadre, 66-Homme-Non Cadre, 66-Homme-Non Cadre, 67-Homme-Non Cadre, 67-Homme-Non Cadre, 67-Homme-Non Cadre, 68-Homme-Non Cadre, 68-Homme-Non Cadre, 68-Homme-Non Cadre, 68-Homme-Non Cadre, 68-Homme-Non Cadre, 68-Homme-Non Cadre, 69-Homme-Non Cadre, 69-Homme-Non Cadre )), .Names = c(Matricule, AgeSexeCadNCad), class = data.frame, row.names = c(37, 58, 79, 104, 163, 220, 263, 276, 333, 422, 442, 587, 653, 684, 21, 25, 35, 42, 45, 47, 73, 76, 93, 100, 118, 133, 137, 138, 158, 174, 176, 179, 204, 208, 231, 249, 254, 312, 325, 439, 491, 500, 825, 928, 954, 1093, 1116, 1128, 1136, 1141, 1143, 1212, 1232, 1270, 1396, 14, 56, 66, 106, 148, 153, 226, 308, 717, 720, 1046, 1287, 36, 41, 54, 124, 144, 188, 197, 198, 201, 206, 242, 262, 377, 598, 611, 633, 683, 714, 742, 919, 980, 993, 1000, 1071, 1073, 1127, 1223, 32, 121, 456, 458, 462, 1013, 27, 43, 53, 59, 65, 67, 75, 77, 83, 97, 103, 107, 109, 110, 328, 412, 516, 698, 715, 740, 1122, 1267, 1824, 16, 452, 540, 557, 870, 1086, 5, 82, 94, 115, 123, 209, 339, 341, 862, 2211, 20, 61, 152, 358, 685, 760, 803, , 1134, 11, 22, 33, 49, 92, 193, 241, 394, 396, 463, 522, 595, 896, 1097, 1129, 1302, 7, 9, 18, 26, 81, 85, 185, 728, 884, 1029, 1155, 297, 479, 842, 3, 13, 15, 23, 51, 55, 63, 199, 574, 655, 1119, 48, 668, 1125, 1, 6, 10, 24, 40, 154, 2, 117)) Hi, Thanks for including data. See ?merge, which performs an SQL-like join. Since you have non-matching values between Df1 and Df2, you will need to decide if you want non-matching rows included in the resultant data frame or not (eg. a right/left outer or inner join). See the all, all.x and all.y arguments to merge(). The default (all = FALSE) is an inner join on matching rows only: Df3 - merge(Df1, Df2, by = AgeSexeCadNCad) If you include non-matching values in the resultant data frame (eg. all = TRUE), Pourcent will contains NA's in those rows. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot
On Sep 19, 2014, at 10:48 AM, IZHAK shabsogh ishaqb...@yahoo.com wrote: Hi, kindly give me some guide on how to plot the following data in a single line graph that is ( y1,y2,y3,y4 against x) including title and key y1-c(0.84,1.03,0.96) y2-c(1.30,1.46,1.48) y3-c(1.32,1.47,1.5) y4-c(0.07,0.07,0.07) x-c(500,1000,2000) Thanks Ishaq See ?matplot and ?legend matplot(x, cbind(y1, y2, y3, y4), type = l, main = Plot Title, ylab = Y Vals, xlab = X Vals) legend(right, lty = 1:4, col = 1:4, legend = c(y1, y2, y3, y4)) Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] X11/Intrinsic.h preventing build on rhel
On Sep 19, 2014, at 1:28 PM, Gaurav Chakravorty gc...@circulumvite.com wrote: I am trying to build R-3.1.0 on RHEL But configure returns with an error due to X11/Intrinsic.h missing Is there a workaround ? In most Linuxen, the header files are contained in *-dev[el] packages. For RHEL, this is likely to be libX11-devel, so you will need to install that RPM. Note that a pre-compiled binary RPM for R is available from the EPEL for RHEL: https://fedoraproject.org/wiki/EPEL Also note that there is the R-SIG-Fedora list, which covers support for RH based distros specifically: https://stat.ethz.ch/mailman/listinfo/r-sig-fedora Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] X11/Intrinsic.h preventing build on rhel
On Sep 19, 2014, at 1:28 PM, Gaurav Chakravorty gc...@circulumvite.com wrote: I am trying to build R-3.1.0 on RHEL But configure returns with an error due to X11/Intrinsic.h missing Is there a workaround ? In most Linuxen, the header files are contained in *-dev[el] packages. For RHEL, this is likely to be libX11-devel, so you will need to install that RPM. Note that a pre-compiled binary RPM for R is available from the EPEL for RHEL: https://fedoraproject.org/wiki/EPEL Also note that there is the R-SIG-Fedora list, which covers support for RH based distros specifically: https://stat.ethz.ch/mailman/listinfo/r-sig-fedora Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using R in our commercial business application
On Sep 18, 2014, at 4:36 AM, Pasu pasupat...@gmail.com wrote: Hi I would like to know how to use R in our commercial business application which we plan to host in cloud or deploy on customer's premise. 1. Using R and its package, does it enforce that my commercial business application should be distributed under GPL, as the statistical derivation (output) by using R will be presented to the end users as part of of our commercial business application 2. Whom to contact to get commercial license if required for using R? Rgds Pasupathy You will not get a definitive legal opinion here and my comments below do not represent any formal opinion on the part of any organization. There is nothing preventing you or your company from using R as an end user. There are many of us who use R in commercial settings and in general, the output of a GPL'd application (text or binary) is not considered to be also GPL'd. The subtleties get into the distribution of R (which you seem to plan to do), the nature of any additional functionality/code that you or your company may write/distribute, how that code interacts with R and/or modifies R source code copyrighted by the R Foundation and others. If you distribute R to clients, you will need to make R's source code available to them in some manner along with any modifications to that same code, while preserving appropriate copyrights. A proprietary (closed source) application cannot be licensed under the GPL, but your company's application/code may be forced to be GPL (the so called viral aspect of the GPL) depending upon how your application is implemented as I noted in the prior paragraph. Thus, you may be forced to make your source code available to your clients as well. If you plan to move forward, you should consult with an attorney well educated in software licensing and distribution issues, especially as they pertain to the GPL. The risks are not inconsequential of falling on the wrong side of the GPL. The official R distribution is not available via a commercial or developer license, but there are commercial vendors of R and a Google search will point you in their direction, if desired. However, since their products are founded upon the official R distribution and the GPL, they will have similar issues with respect to any enhancements that they have created and therefore, your concerns do not necessarily go away. They will have also consulted legal counsel on these issues because the viability of their business depends upon it. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using R in our commercial business application
On Sep 18, 2014, at 3:42 PM, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 18/09/2014 2:35 PM, Marc Schwartz wrote: On Sep 18, 2014, at 4:36 AM, Pasu pasupat...@gmail.com wrote: Hi I would like to know how to use R in our commercial business application which we plan to host in cloud or deploy on customer's premise. 1. Using R and its package, does it enforce that my commercial business application should be distributed under GPL, as the statistical derivation (output) by using R will be presented to the end users as part of of our commercial business application 2. Whom to contact to get commercial license if required for using R? Rgds Pasupathy You will not get a definitive legal opinion here and my comments below do not represent any formal opinion on the part of any organization. There is nothing preventing you or your company from using R as an end user. There are many of us who use R in commercial settings and in general, the output of a GPL'd application (text or binary) is not considered to be also GPL'd. The subtleties get into the distribution of R (which you seem to plan to do), the nature of any additional functionality/code that you or your company may write/distribute, how that code interacts with R and/or modifies R source code copyrighted by the R Foundation and others. If you distribute R to clients, you will need to make R's source code available to them in some manner along with any modifications to that same code, while preserving appropriate copyrights. A proprietary (closed source) application cannot be licensed under the GPL, but your company's application/code may be forced to be GPL (the so called viral aspect of the GPL) depending upon how your application is implemented as I noted in the prior paragraph. Thus, you may be forced to make your source code available to your clients as well. If you plan to move forward, you should consult with an attorney well educated in software licensing and distribution issues, especially as they pertain to the GPL. The risks are not inconsequential of falling on the wrong side of the GPL. The official R distribution is not available via a commercial or developer license, but there are commercial vendors of R and a Google search will point you in their direction, if desired. However, since their products are founded upon the official R distribution and the GPL, they will have similar issues with respect to any enhancements that they have created and therefore, your concerns do not necessarily go away. They will have also consulted legal counsel on these issues because the viability of their business depends upon it. I agree with all of that but for one thing: not all distributions are built on the GPL'd original. I believe Tibco is selling an independent implementation. Duncan Murdoch Thanks Duncan, I stand corrected. A quick Google search supports the point that the Tibco TERR system is an independent, closed-source, re-implementation of R, not based upon GPL R. Regards, Marc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] frequencies of a discrete numeric variable, including zeros
, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 9L, 9L, 10L, 11L, 12L, 12L, 16L, 19L) Micheal, Corece the vector to be tabulated to a factor, that contains all of the levels 0:19, then use barplot(): art.fac - factor(art, levels = 0:19) table(art.fac) art.fac 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 275 246 178 84 67 27 17 12 1 2 1 1 2 0 0 0 1 0 18 19 0 1 barplot(table(art.fac), cex.names = 0.5) Thanks for providing the data above. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] split a string a keep the last part
On Aug 28, 2014, at 12:41 PM, Jun Shen jun.shen...@gmail.com wrote: Hi everyone, I believe I am not the first one to have this problem but couldn't find a relevant thread on the list. Say I have a string (actually it is the whole column in a data frame) in a format like this: test- 'AF14-485-502-89-00235' I would like to split the test string and keep the last part. I think I can do the following sub('.*-.*-.*-.*-(.*)','\\1', test) to keep the fifth part of the string. But this won't work if other strings have more or fewer parts separated by '-'. Is there a general way to do it? Thanks. Jun Try this: test - 'AF14-485-502-89-00235' sub(^.*-(.*)$, \\1, test) [1] 00235 test - 'AF14-485-502-89-00235-1234' sub(^.*-(.*)$, \\1, test) [1] 1234 Another option: tail(unlist(strsplit(test, -)), 1) [1] 1234 Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Installing RODBC
On Aug 20, 2014, at 5:43 PM, William Deese williamde...@gmail.com wrote: I tried installing RODBC but got the following message: Checks were yes until the following checking sql.h usability... no checking sql.h presence... no checking for sql.h... no checking sqlext.h usability... no checking sqlext.h presence... no checking for sqlext.h... no configure: error: ODBC headers sql.h and sqlext.h not found ERROR: configuration failed for package ‘RODBC’ * removing ‘/home/bill/R/x86_64-pc-linux-gnu-library/3.1/RODBC’ Apparently RODBC was there when R was installed, but library() shows it is not there now, although the DBI package is. Best ideas for installing RODBC? Bill You are missing the indicated header files, which are required if you are building the package from source. As per the extensive vignette that Prof. Ripley has provided: http://cran.r-project.org/web/packages/RODBC/vignettes/RODBC.pdf in Appendix A, which describes Installation, you will find: For other systems the driver manager of choice is likely to be unixODBC, part of almost all Linux distributions and with sources downloadable from http://www.unixODBC.org. In Linux binary distributions it is likely that package unixODBC-devel or unixodbc-dev or some such will be needed. Thus, for whatever Linux distribution you are using, install the relevant RPMs or Debs or ... Also, for future reference, there is a specific mailing list for DB related queries: https://stat.ethz.ch/mailman/listinfo/r-sig-db and a search of the list archives, for example using rseek.org, would likely result in your finding queries and answers to this same issue over the years. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regex pattern assistance
On Aug 15, 2014, at 11:18 AM, Tom Wright t...@maladmin.com wrote: Hi, Can anyone please assist. given the string x-/mnt/AO/AO Data/S01-012/120824/ I would like to extract S01-012 require(stringr) str_match(x,\\/mnt\\/AO\\/AO Data\\/(.+)\\/+) str_match(x,\\/mnt\\/AO\\/AO Data\\/(\\w+)\\/+) both nearly work. I expected I would use something like: str_match(x,\\/mnt\\/AO\\/AO Data\\/([\\w -]+)\\/+) but I don't seem able to get the square bracket grouping to work correctly. Can someone please show me where I am going wrong? Thanks, Tom Is the desired substring always in the same relative position in the path? If so: strsplit(x, /) [[1]] [1] mnt AO AO Data S01-012 120824 unlist(strsplit(x, /))[5] [1] S01-012 Alternatively, again, presuming the same position: gsub(/mnt/AO/AO Data/([^/]+)/.+, \\1, x) [1] S01-012 You don't need all of the double backslashes in your regex above. The '/' character is not a special regex character, whereas '\' is and needs to be escaped. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regex pattern assistance
On Aug 15, 2014, at 11:56 AM, Tom Wright t...@maladmin.com wrote: WOW!!! What can I say 4 answers in less than 4 minutes. Thank you everyone. If I can't make it work now I don't deserve to. btw. the strsplit approach wouldn't work for me as: a) I wanted to play with regex and b) the location isn't consistent. Tom, If not in the same relative position, is the substring pattern always the same? That is 3 characters, a hyphen, then 3 characters? If so, would any other part of the path follow the same pattern or is it unique? If the pattern is the same and is unique in the path: gsub(.*([[:alnum:]]{3}-[[:alnum:]]{3}).*, \\1, x) [1] S01-012 is another possible alternative and more flexible: y - /mnt/AO/AO Data/Another Level/Yet Another Level/S01-012/120824/ gsub(.*([[:alnum:]]{3}-[[:alnum:]]{3}).*, \\1, y) [1] S01-012 z - /mnt/AO/AO Data/Another Level/Yet Another Level/S01-012/One More Level/120824/ gsub(.*([[:alnum:]]{3}-[[:alnum:]]{3}).*, \\1, z) [1] S01-012 Nice to see email support still works, not everything has moved to linkedin and stackoverflow. Stackoverflow? ;-) Regards, Marc Thanks again, Tom On Fri, 2014-08-15 at 12:18 -0400, Tom Wright wrote: Hi, Can anyone please assist. given the string x-/mnt/AO/AO Data/S01-012/120824/ I would like to extract S01-012 require(stringr) str_match(x,\\/mnt\\/AO\\/AO Data\\/(.+)\\/+) str_match(x,\\/mnt\\/AO\\/AO Data\\/(\\w+)\\/+) both nearly work. I expected I would use something like: str_match(x,\\/mnt\\/AO\\/AO Data\\/([\\w -]+)\\/+) but I don't seem able to get the square bracket grouping to work correctly. Can someone please show me where I am going wrong? Thanks, Tom __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] generating a sequence of seconds
On Aug 12, 2014, at 1:51 PM, Erin Hodgess erinm.hodg...@gmail.com wrote: Hello! If I would like to generate a sequence of seconds for a date, I would do the following: x - seq(from=as.POSIXct(2014-08-12 00:00:00),to=as.POSIXct(2014-08-12 23:59:59),by=secs) What if I just want the seconds vector without the date, please? Is there a convenient way to create such a vector, please? thanks, Erin Erin, Do you want just the numeric vector of seconds, with the first value being 0, incrementing by 1 to the final value? x - seq(from = as.POSIXct(2014-08-12 00:00:00), to = as.POSIXct(2014-08-12 23:59:59), by = secs) head(x) [1] 2014-08-12 00:00:00 CDT 2014-08-12 00:00:01 CDT [3] 2014-08-12 00:00:02 CDT 2014-08-12 00:00:03 CDT [5] 2014-08-12 00:00:04 CDT 2014-08-12 00:00:05 CDT tail(x) [1] 2014-08-12 23:59:54 CDT 2014-08-12 23:59:55 CDT [3] 2014-08-12 23:59:56 CDT 2014-08-12 23:59:57 CDT [5] 2014-08-12 23:59:58 CDT 2014-08-12 23:59:59 CDT head(as.numeric(x - x[1])) [1] 0 1 2 3 4 5 tail(as.numeric(x - x[1])) [1] 86394 86395 86396 86397 86398 86399 Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] generating a sequence of seconds
Erin, Is a sequential resolution of seconds required, as per your original post? If so, then using my approach and specifying the start and end dates and times will work, with the coercion of the resultant vector to numeric as I included. The method I used (subtracting the first value) will also give you the starting second as 0, or you can alter the math to adjust the origin of the vector as you desire. As Bill notes, there will be some days where the number of seconds in the day will be something other than 86,400. In Bill's example, it is due to his choosing the start and end dates of daylight savings time in a relevant time zone. Thus, his second date is short an hour, while the third has an extra hour. Regards, Marc On Aug 12, 2014, at 2:26 PM, Erin Hodgess erinm.hodg...@gmail.com wrote: What I would like to do is to look at several days and determine activities that happened at times on those days. I don't really care which days, I just care about what time. Thank you! On Tue, Aug 12, 2014 at 3:14 PM, William Dunlap wdun...@tibco.com wrote: What if I just want the seconds vector without the date, please? Is there a convenient way to create such a vector, please? Why do you want such a thing? E.g., do you want it to print the time of day without the date? Or are you trying to avoid numeric problems when you do regressions with the seconds-since-1970 numbers around 1414918800? Or is there another problem you want solved? Note that the number of seconds in a day depends on the day and the time zone. In US/Pacific time I get: length(seq(from=as.POSIXct(2014-08-12 00:00:00),to=as.POSIXct(2014-08-12 23:59:59), by=secs)) [1] 86400 length(seq(from=as.POSIXct(2014-03-09 00:00:00),to=as.POSIXct(2014-03-09 23:59:59), by=secs)) [1] 82800 length(seq(from=as.POSIXct(2014-11-02 00:00:00),to=as.POSIXct(2014-11-02 23:59:59), by=secs)) [1] 9 Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Aug 12, 2014 at 11:51 AM, Erin Hodgess erinm.hodg...@gmail.com wrote: Hello! If I would like to generate a sequence of seconds for a date, I would do the following: x - seq(from=as.POSIXct(2014-08-12 00:00:00),to=as.POSIXct(2014-08-12 23:59:59),by=secs) What if I just want the seconds vector without the date, please? Is there a convenient way to create such a vector, please? thanks, Erin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] generating a sequence of seconds
On Aug 12, 2014, at 2:49 PM, John McKown john.archie.mck...@gmail.com wrote: And some people wonder why I absolutely abhor daylight saving time. I'm not really fond of leap years and leap seconds either. Somebody needs to fix the Earth's rotation and orbit! I have been a longtime proponent of slowing the rotation of the Earth on its axis, so that we could have longer days to be more productive. Unfortunately, so far, my wish has gone unfulfilled...at least as it is relevant within human lifetimes. ;-) Regards, Marc On Tue, Aug 12, 2014 at 2:14 PM, William Dunlap wdun...@tibco.com wrote: What if I just want the seconds vector without the date, please? Is there a convenient way to create such a vector, please? Why do you want such a thing? E.g., do you want it to print the time of day without the date? Or are you trying to avoid numeric problems when you do regressions with the seconds-since-1970 numbers around 1414918800? Or is there another problem you want solved? Note that the number of seconds in a day depends on the day and the time zone. In US/Pacific time I get: length(seq(from=as.POSIXct(2014-08-12 00:00:00),to=as.POSIXct(2014-08-12 23:59:59), by=secs)) [1] 86400 length(seq(from=as.POSIXct(2014-03-09 00:00:00),to=as.POSIXct(2014-03-09 23:59:59), by=secs)) [1] 82800 length(seq(from=as.POSIXct(2014-11-02 00:00:00),to=as.POSIXct(2014-11-02 23:59:59), by=secs)) [1] 9 Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Aug 12, 2014 at 11:51 AM, Erin Hodgess erinm.hodg...@gmail.com wrote: Hello! If I would like to generate a sequence of seconds for a date, I would do the following: x - seq(from=as.POSIXct(2014-08-12 00:00:00),to=as.POSIXct(2014-08-12 23:59:59),by=secs) What if I just want the seconds vector without the date, please? Is there a convenient way to create such a vector, please? thanks, Erin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Better use with gsub
On Aug 1, 2014, at 9:46 AM, Doran, Harold hdo...@air.org wrote: I have done an embarrassingly bad job using a mixture of gsub and strsplit to solve a problem. Below is sample code showing what I have to start with (the vector xx) and I want to end up with two vectors x and y that contain only the digits found in xx. Any regex users with advice most welcome Harold xx - c(S24:57, S24:86, S24:119, S24:129, S24:138, S24:163) yy - gsub(S,\\1, xx) a1 - gsub(:, , yy) a2 - sapply(a1, function(x) strsplit(x, ' ')) x - as.numeric(sapply(a2, function(x) x[1])) y - as.numeric(sapply(a2, function(x) x[2])) If a matrix is a satisfactory result, rather than two separate vectors: sapply(strsplit(gsub(S, , xx), xx, split = :), as.numeric) [,1] [,2] [,3] [,4] [,5] [,6] [1,] 24 24 24 24 24 24 [2,] 57 86 119 129 138 163 Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to randomly extract a number of rows in a data frame
On Aug 1, 2014, at 1:58 PM, Stephen HK Wong hon...@stanford.edu wrote: Dear ALL, I have a dataframe contains 4 columns and several 10 millions of rows like below! I want to extract out randomly say 1 millions of rows, can you tell me how to do that in R using base packages? Many Thanks Col_1 Col_2 Col_3 Col_4 chr1 3000215 3000250 - chr1 3000909 3000944 + chr1 3001025 3001060 + chr1 3001547 3001582 + chr1 3002254 3002289 + chr1 3002324 3002359 - chr1 3002833 3002868 - chr1 3004565 3004600 - chr1 3004945 3004980 + chr1 3004974 3005009 - chr1 3005115 3005150 + chr1 3005124 3005159 + chr1 3005240 3005275 - chr1 3005558 3005593 - chr1 3005890 3005925 + chr1 3005929 3005964 + chr1 3005913 3005948 - chr1 3005913 3005948 - Stephen HK Wong If your data frame is called 'DF': DF.Rand - DF[sample(nrow(DF), 100), ] See ?sample which will generate a random sample from a uniform distribution. In the above, nrow(DF) returns the number of rows in DF and defines the sample space of 1:nrow(DF), from which 100 random integer values will be selected and used as indices to return the rows. Using the built in 'iris' dataset, select 20 random rows from the 150 total: iris[sample(nrow(iris), 20), ] Sepal.Length Sepal.Width Petal.Length Petal.WidthSpecies 122 5.6 2.8 4.9 2.0 virginica 79 6.0 2.9 4.5 1.5 versicolor 109 6.7 2.5 5.8 1.8 virginica 106 7.6 3.0 6.6 2.1 virginica 49 5.3 3.7 1.5 0.2 setosa 125 6.7 3.3 5.7 2.1 virginica 15.1 3.5 1.4 0.2 setosa 68 5.8 2.7 4.1 1.0 versicolor 84 6.0 2.7 5.1 1.6 versicolor 110 7.2 3.6 6.1 2.5 virginica 113 6.8 3.0 5.5 2.1 virginica 64 6.1 2.9 4.7 1.4 versicolor 102 5.8 2.7 5.1 1.9 virginica 71 5.9 3.2 4.8 1.8 versicolor 69 6.2 2.2 4.5 1.5 versicolor 65 5.6 2.9 3.6 1.3 versicolor 74 6.1 2.8 4.7 1.2 versicolor 99 5.1 2.5 3.0 1.1 versicolor 135 6.1 2.6 5.6 1.4 virginica 41 5.0 3.5 1.3 0.3 setosa Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] separate numbers from chars in a string
On Jul 31, 2014, at 3:17 AM, Uwe Ligges lig...@statistik.tu-dortmund.de wrote: On 31.07.2014 04:46, carol white wrote: There are some level of variation either chars followed by numbers or chars, numbers, chars Perhaps, I should use gsub as you suggested all and if the string is composed of chars followed by numbers, it will return the 3rd part empty? Please read about regularvexpressions and describe your problem accurately. If the last strings are onot always present, use * rather than + at the very end of the regular expression. Best, Uwe Ligges Carol, As Uwe notes, reviewing the documentation for ?regex and the examples in ?gsub can be helpful. There are also online regex resources such as: http://www.regular-expressions.info The question is how much variation might be present. If it will always be up to 3 possible components, then as Uwe indicated, using the '*' instead of '+' will allow for the possibility that one or more patterns will not be present. '*' means that 0 or more of the patterns must be present, whereas '+' requires that at least one or more matches are present. strsplit(gsub(([a-z]*)([0-9]*)([a-z]*), \\1 \\2 \\3, absdfds0213451ab), ) [[1]] [1] absdfds 0213451 ab strsplit(gsub(([a-z]*)([0-9]*)([a-z]*), \\1 \\2 \\3, absdfds0213451), ) [[1]] [1] absdfds 0213451 strsplit(gsub(([a-z]*)([0-9]*)([a-z]*), \\1 \\2 \\3, 0213451ab), ) [[1]] [1] 0213451 ab Using the 3 back references in the regex above will limit the parsing to up to 3 possible components. If you may have more than 3 you can increase the back reference sequence to some maximum number. However that can get tedious, so you may want to consider multiple passes using strsplit() to extract letters during one pass and then numbers during a second, or write a function to encapsulate that process. Here are examples using strsplit(): # Get the numbers, using letters as the split strsplit(absdfds0213451ab, split = [a-z]+) [[1]] [1] 0213451 strsplit(absdfds0213451ab4567, split = [a-z]+) [[1]] [1] 0213451 4567 # Get the letters, using numbers as the split strsplit(absdfds0213451ab, split = [0-9]+) [[1]] [1] absdfds ab strsplit(0213451ab, split = [0-9]+) [[1]] [1]ab strsplit(0213451ab123xyz789lmn, split = [0-9]+) [[1]] [1] ab xyz lmn Regards, Marc Regards, Carol On Wednesday, July 30, 2014 10:52 PM, Marc Schwartz marc_schwa...@me.com wrote: On Jul 30, 2014, at 3:13 PM, carol white wht_...@yahoo.com wrote: Hi, If I have a string of consecutive chars followed by consecutive numbers and then chars, like absdfds0213451ab, how to separate the consecutive chars from consecutive numbers? grep doesn't seem to be helpful grep([a-z],absdfds0213451ab, ignore.case=T) [1] 1 grep([0-9],absdfds0213451ab, ignore.case=T) [1] 1 Thanks Carol grep() will only tell you that a pattern is present. You want to use gsub() or similar with back references to return parts of the vector. Will they ALWAYS appear in that pattern (letters, numbers, letters) or is there some level of variation? If they will always appear as in your example, then one approach is: strsplit(gsub(([a-z]+)([0-9]+)([a-z]+), \\1 \\2 \\3, absdfds0213451ab), ) [[1]] [1] absdfds 0213451 ab The initial gsub() returns the 3 parts separated by a space, which is then used as the split argument to strsplit(). If there will be some variation, you can use multiple calls to gsub() or similar, each getting either the letters or the numbers. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] separate numbers from chars in a string
On Jul 30, 2014, at 3:13 PM, carol white wht_...@yahoo.com wrote: Hi, If I have a string of consecutive chars followed by consecutive numbers and then chars, like absdfds0213451ab, how to separate the consecutive chars from consecutive numbers? grep doesn't seem to be helpful grep([a-z],absdfds0213451ab, ignore.case=T) [1] 1 grep([0-9],absdfds0213451ab, ignore.case=T) [1] 1 Thanks Carol grep() will only tell you that a pattern is present. You want to use gsub() or similar with back references to return parts of the vector. Will they ALWAYS appear in that pattern (letters, numbers, letters) or is there some level of variation? If they will always appear as in your example, then one approach is: strsplit(gsub(([a-z]+)([0-9]+)([a-z]+), \\1 \\2 \\3, absdfds0213451ab), ) [[1]] [1] absdfds 0213451 ab The initial gsub() returns the 3 parts separated by a space, which is then used as the split argument to strsplit(). If there will be some variation, you can use multiple calls to gsub() or similar, each getting either the letters or the numbers. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SASxport function read.xport gives error object 'w' not found
On Jul 25, 2014, at 8:26 AM, Jocelyn Ireson-Paine p...@j-paine.org wrote: The subject line says it. I've just tried converting an SAS .xpt file with this call: read.xport( 'formats.xpt' ) I get the message Error in read.xport(formats.xpt) : object 'w' not found but there's no explanation about what 'w' is or how I should make it known to R. I certainly wasn't expecting to have to provide such a variable, unless I've overlooked something in the SASxport documentation. This is using R version 3.1.0 under Windows 7, and SASxport installed this morning: version 1.5.0 (2014-07-21). Any idea what this 'w' is that read.xport wants me to give it? The .xpt file is confidential, so I can't make it available, and I don't know exactly what's in it. I suspect, however, that it may contain quite a bit of text. However, I'm pretty sure that its author used SAS correctly when generating it. Other files containing purely numeric data from the same author converted OK, using analogous calls to read.xport . Googling the error didn't find anything. Thanks, Jocelyn Ireson-Paine 07768 534 091 http://www.jocelyns-cartoons.co.uk Have you tried reading the file with the read.xport() function in the foreign package, which is part of the default R installation? require(foreign) ?read.xport I would start a fresh R session, just to be sure that the SASxport version is not used unintentionally. There might be a bug in the SASxport version of the function, which apparently uses some of the foreign package version's code, or there might be something about your xpt file that is causing a problem. You may need to contact Greg, who is the package maintainer for SASxport, to get a sense from him as to what would trigger the error you are experiencing. Otherwise, you may have to trace through the code (see ?debug, for example) with your file and see if you can identify a trigger. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Norton Virus program indicates that R3.1.1 is not reliable
Jim, You can file a Type I error on the file here: https://submit.symantec.com/false_positive/ rather than waiting. I had seen similar reports, not on R, but elsewhere with this particular Symantec community based detection. I am not a user, but since it appears to be a community based system for this detection, it will take the Symantec user community to file reports and get it removed from detection. Regards, Marc Schwartz On Jul 13, 2014, at 10:30 AM, jim holtman jholt...@gmail.com wrote: Glad to see that I am not the only one seeing the error. I was getting it on my other (company) computer that has Symantec and it will not allow me to override and do the install. Guess I will check again in a couple of days and see it it clears up. Will also check to see if I can contact Norton and Symantec about the problem. Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Sun, Jul 13, 2014 at 11:17 AM, Jeff Newmiller jdnew...@dcn.davis.ca.us wrote: Have seen it. Had to override Norton and tell it to ignore the threat. Been awhile, don't off the top of my head remember how I did that. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On July 13, 2014 8:03:37 AM PDT, jim holtman jholt...@gmail.com wrote: I was downloading the latest version of R from the CMU mirror and got the following message (also tried the MTU mirror and got the same). Has anyone else seen this? Filename: r-3.1.1-win.exe Threat name: WS.Reputation.1 Full Path: c:\users\owner\downloads\r-3.1.1-win.exe Details Unknown Community Usage, Unknown Age, Risk Medium Origin Downloaded from http://lib.stat.cmu.edu/R/CRAN/bin/windows/base/R-3.1.1-win.exe Activity Actions performed: Actions performed: 1 On computers as of Not Available Last Used 7/13/14 at 10:56:56 Startup Item No Launched No Unknown It is unknown how many users in the Norton Community have used this file. Unknown This file release is currently not known. Medium This file risk is medium. Threat type: Insight Network Threat. There are many indications that this file is untrustworthy and therefore not safe http://lib.stat.cmu.edu/R/CRAN/bin/windows/base/R-3.1.1-win.exe Downloaded File r-3.1.1-win.exe Threat name: WS.Reputation.1 from cmu.edu Source: External Media File Actions File: c:\users\owner\downloads\ r-3.1.1-win.exe Removed File Thumbprint - SHA: ce6fb76612aefc482583fb92f4f5c3cb8e8e3bf1a8dda97df7ec5caf746e53fe File Thumbprint - MD5: Not available Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems with read.table and data structure
On Jul 11, 2014, at 9:15 AM, Tim Richter-Heitmann trich...@uni-bremen.de wrote: Hi there! I have huge datafile of 600 columns 360 samples: data - read.table(small.txt, header = TRUE, sep = \t, dec = ., row.names=1) The txt.file (compiled with excel) is showing me only numbers, however R gives me the structure of ANY column as factor. When i try stringsAsFactors=FALSE in the read command, the structure of the dataset becomes character. When i try as.numeric(data), i get Error: (list) object cannot be coerced to type 'double' even, if i try to subset columns with []. When i try as.numeric on single columns with $, i am successful, but the numbers dont make any sense at all, as the factors are not converted by their levels: Factor w/ 358 levels 0,123111694,..: 11 14 50 12 38 44 13 76 31 30 becomes num 11 14 50 12 38 44 13 76 31 30 whereas i would need the levels, though! I suspect excel to mess up the save as tab-delimited text, but the text file seems fine with me on surface (i dont know how the numbers are stored internally). I just see correct numbers, also the View command yields the correct content. Anyone knows help? Its pretty annoying. Thank you! Hi, See: http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-do-I-convert-factors-to-numeric_003f Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems with read.table and data structure
On Jul 11, 2014, at 2:36 PM, Marc Schwartz marc_schwa...@me.com wrote: On Jul 11, 2014, at 9:15 AM, Tim Richter-Heitmann trich...@uni-bremen.de wrote: Hi there! I have huge datafile of 600 columns 360 samples: data - read.table(small.txt, header = TRUE, sep = \t, dec = ., row.names=1) The txt.file (compiled with excel) is showing me only numbers, however R gives me the structure of ANY column as factor. When i try stringsAsFactors=FALSE in the read command, the structure of the dataset becomes character. When i try as.numeric(data), i get Error: (list) object cannot be coerced to type 'double' even, if i try to subset columns with []. When i try as.numeric on single columns with $, i am successful, but the numbers dont make any sense at all, as the factors are not converted by their levels: Factor w/ 358 levels 0,123111694,..: 11 14 50 12 38 44 13 76 31 30 becomes num 11 14 50 12 38 44 13 76 31 30 whereas i would need the levels, though! I suspect excel to mess up the save as tab-delimited text, but the text file seems fine with me on surface (i dont know how the numbers are stored internally). I just see correct numbers, also the View command yields the correct content. Anyone knows help? Its pretty annoying. Thank you! Hi, See: http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-do-I-convert-factors-to-numeric_003f Regards, Marc Schwartz Sorry, I just noted that you defined dec = . in your call to read.table(), whereas it appears that a comma (,) is being used as a decimal separator in your source data. Modify the dec = . to dec = , and that should obviate the need to convert the numeric values to factors during import. They should be converted to numerics right away. For example: str(read.table(textConnection(0,1234), dec = .)) 'data.frame': 1 obs. of 1 variable: $ V1: Factor w/ 1 level 0,1234: 1 str(read.table(textConnection(0,1234), dec = ,)) 'data.frame': 1 obs. of 1 variable: $ V1: num 0.123 Regards, Marc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] table over a matrix dimension...
On Jul 10, 2014, at 12:03 PM, Jonathan Greenberg j...@illinois.edu wrote: R-helpers: I'm trying to determine the frequency of characters for a matrix applied to a single dimension, and generate a matrix as an output. I've come up with a solution, but it appears inelegant -- I was wondering if there is an easier way to accomplish this task: # Create a matrix of factors (characters): random_characters=matrix(sample(letters[1:4],1000,replace=TRUE),100,10) # Applying with the table() function doesn't work properly, because not all rows # have ALL of the factors, so I get a list output: apply(random_characters,1,table) # Hacked solution: unique_values = letters[1:4] countsmatrix - t(apply(random_characters,1,function(x,unique_values) { counts=vector(length=length(unique_values)) for(i in seq(unique_values)) { counts[i] = sum(x==unique_values[i]) } return(counts) }, unique_values=unique_values )) # Gets me the output I want but requires two nested loops (apply and for() ), so # not efficient for very large datasets. ### Is there a more elegant solution to this? --j If I am correctly understanding your issue, you simply need to coerce the input to table() to a factor with a common set of levels, since the matrix will be 'character' by default: set.seed(1) random_characters - matrix(sample(factor(letters[1:4]), 1000, replace = TRUE), 100, 10) random_characters [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] b c b c c c d d d d [2,] b b a a a c d d a d [3,] c b c b d c a d d b [4,] d d b b d c c c c a [5,] a c a b d b d c b a [6,] d a c d c d d a c a [7,] d a c a b b b b b a [8,] c b a d d d b c d a [9,] c d b a a d d d b a [10,] a c c b d c a c a a [11,] a d d a d d d c b c [12,] a c a a b b b b b d [13,] c b d d c a c a b c [14,] b b d c d c c d d a [15,] d a d b c c c b b a [16,] b a b b b a b b c b [17,] c c c a b c a a d a [18,] d a d b b c b a d c ... RES - t(apply(random_characters, 1, function(x) table(factor(x, levels = letters[1:4] RES a b c d [1,] 0 2 4 4 [2,] 4 2 1 3 [3,] 1 3 3 3 [4,] 1 2 4 3 [5,] 3 3 2 2 [6,] 3 0 3 4 [7,] 3 5 1 1 [8,] 2 2 2 4 [9,] 3 2 1 4 [10,] 4 1 4 1 [11,] 2 1 2 5 [12,] 3 5 1 1 [13,] 2 2 4 2 [14,] 1 2 3 4 [15,] 2 3 3 2 [16,] 2 7 1 0 [17,] 4 1 4 1 [18,] 2 3 2 3 ... Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] symbols in a data frame
On Jul 9, 2014, at 12:19 PM, Sam Albers tonightstheni...@gmail.com wrote: Hello, I have recently received a dataset from a metal analysis company. The dataset is filled with less than symbols. What I am looking for is a efficient way to subset for any whole numbers from the dataset. The column is automatically formatted as a factor because of the symbols making it difficult to deal with the numbers is a useful way. So in sum any ideas on how I could subset the example below for only whole numbers? Thanks in advance! Sam #code metals - structure(list(Parameter = structure(c(1L, 2L, 3L, 4L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 1L), .Label = c(Antimony, Arsenic, Barium, Beryllium, Boron (Hot Water Soluble), Cadmium, Chromium, Cobalt, Copper, Lead, Mercury, Molybdenum, Nickel, pH 1:2, Selenium, Silver, Thallium, Tin, Vanadium, Zinc), class = factor), Cedar.Creek = structure(c(3L, 3L, 7L, 3L, 2L, 4L, 3L, 34L, 36L, 2L, 5L, 7L, 3L, 7L, 3L, 45L, 4L, 4L, 3L), .Label = c(1, 10, 100, 1000, 200, 5, 500, 0.1, 0.13, 0.5, 0.8, 1.07, 1.1, 1.4, 1.5, 137, 154, 163, 165, 169, 178, 2.3, 2.4, 22, 24, 244, 27.2, 274, 3, 3.1, 40.2, 43, 50, 516, 53.3, 550, 569, 65, 66.1, 68, 7.6, 72, 77, 89, 951), class = factor)), .Names = c(Parameter, Cedar.Creek), row.names = c(NA, 19L), class = data.frame) Sam, You can use ?gsub to remove the '' characters from the column and then use ?subset to select the records you wish. Note that gsub() returns a character vector, so you want to coerce to numeric. as.numeric(gsub(, , metals$Cedar.Creek)) [1] 100 100 500 100 10 1000 100 516 550 10 200 500 100 [14] 500 100 951 1000 1000 100 For example: subset(metals, as.numeric(gsub(, , Cedar.Creek)) == 100) Parameter Cedar.Creek 1 Antimony100 2Arsenic100 4 Beryllium100 7 Cobalt100 13 Selenium100 15 Thallium100 19 Antimony100 subset(metals, as.numeric(gsub(, , Cedar.Creek)) = 500) Parameter Cedar.Creek 1Antimony100 2 Arsenic100 3 Barium500 4 Beryllium100 5 Cadmium 10 7 Cobalt100 10Mercury 10 11 Molybdenum200 12 Nickel500 13 Selenium100 14 Silver500 15 Thallium100 19 Antimony100 You can also just create a new column that is numeric and go from there: metals$CC.Num - as.numeric(gsub(, , metals$Cedar.Creek)) str(metals) 'data.frame': 19 obs. of 3 variables: $ Parameter : Factor w/ 20 levels Antimony,Arsenic,..: 1 2 3 4 6 7 8 9 10 11 ... $ Cedar.Creek: Factor w/ 45 levels 1,10,100,..: 3 3 7 3 2 4 3 34 36 2 ... $ CC.Num : num 100 100 500 100 10 1000 100 516 550 10 ... metals Parameter Cedar.Creek CC.Num 1Antimony100100 2 Arsenic100100 3 Barium500500 4 Beryllium100100 5 Cadmium 10 10 6Chromium 1000 1000 7 Cobalt100100 8 Copper 516516 9Lead 550550 10Mercury 10 10 11 Molybdenum200200 12 Nickel500500 13 Selenium100100 14 Silver500500 15 Thallium100100 16Tin 951951 17 Vanadium 1000 1000 18 Zinc 1000 1000 19 Antimony100100 Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] odd behavior of seq()
On Jul 3, 2014, at 1:28 PM, Matthew Keller mckellerc...@gmail.com wrote: Hi all, A bit stumped here. z - seq(.05,.85,by=.1) z==.05 #good [1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE z==.15 #huh [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE More generally: sum(z==.25) [1] 1 sum(z==.35) [1] 0 sum(z==.45) [1] 1 sum(z==.55) [1] 1 sum(z==.65) [1] 0 sum(z==.75) [1] 0 sum(z==.85) [1] 1 Does anyone have any ideas what is going on here? See the MFAQ[1]: http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f Regards, Marc Schwartz [1] Most Frequently Asked Question __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] access an element of a list without looping
On Jul 3, 2014, at 2:35 PM, carol white wht_...@yahoo.com wrote: Hi, Is there any way to access an element of a list without looping over the list nor using unlist? Just to avoid parsing a very long list. For ex, how to find a vector of a length 2 in a list without using a loop? l = list (c(1), c(2,3), c(1,2,3)) for (i in 1:length(l)) if(length(l[[i]]==2){ print (i) break } Thanks Carol You can use one of the *apply() functions, albeit, it is still effectively looping through the list. It may or may not be faster in some cases than an explicit for() loop, but it can be easier to read, depending upon the complexity of the function being utilized within the call. For example: which(sapply(l, function(x) length(x) == 2)) [1] 2 This presumes that you only have a single level of list elements to scan. If you have sub-levels within the list, you might want to look at ?rapply, which is a recursive version. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Output levels of categorical data to Excel using with()
On Jun 18, 2014, at 11:31 PM, Daniel Schwartz das2...@gmail.com wrote: I have coded qualitative data with many (20+) different codes from a survey in an excel file. I am using the with() function to output the codes so we know what's there. Is it possible to direct the output from with() to an excel file? If not, what's another function that has the same, er, functionality?! Thanks, R World! It is not clear from your description, that the use of with() is really relevant here. with() is typically used as a convenience wrapper to be able to evaluate the names of data frame columns in the environment of the data frame, rather than having to repeat the 'DataFrameName$' prefix over and over. To export data from R to Excel files, there are various options which are listed both in the R wiki: http://rwiki.sciviews.org/doku.php?id=tips:data-io:ms_windows and in the R Data Import/Export manual: http://cran.r-project.org/doc/manuals/r-release/R-data.html#Reading-Excel-spreadsheets Worst case, you can use ?write.csv to dump the data to a CSV file, which can then be opened with Excel. The option you may prefer will depend upon your operating system, how comfortable you may or may not be relative to installing additional software, do you want to create a new Excel file with each export or be able to append to existing worksheets and how you may want to structure or format the worksheet(s) in Excel. Regards, Marc Schwartz P.S. I have a cousin Daniel, but different gmail e-mail address. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glm.fit: fitted probabilities numerically 0 or 1 occurred for a continuous variable?
On Jun 16, 2014, at 2:34 PM, Nwinters nicholas.wint...@mail.mcgill.ca wrote: I have gotten the this error before: glm.fit: fitted probabilities numerically 0 or 1 occurred and the problem was usually solved by combining one or more categories were there were no observations. I am now having this error show up for a variable that is continuous (not categorical). What could be the cause of this for a continuous variable?? Thanks, Nick Presuming that this is logistic regression (family = binomial), the error is suggestive of complete or near complete separation in the association between your continuous IV and your binary response. This can occur if there is a breakpoint within the range of your IV where the dichotomous event is present on one side of the break and is absent on the other side of the break. The resolution for the problem will depend upon first confirming the etiology of it and then, within the context of subject matter expertise, making some decisions on how to proceed. If you Google logistic regression separation, you will get some resources that can be helpful. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with arrays
On May 29, 2014, at 11:02 AM, Olivier Charansonney olivier.charanson...@orange.fr wrote: Hello, I would like to extract the value in row 1 corresponding to the maximum in row 2 Array W [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9][,10] [1,] 651.0 651.0 651.0 651.0 651.0 651.0 651.0 119.0 78.0 78.0 [2,] 13.24184 13.24184 13.24184 13.24184 13.24184 13.24184 13.24184 16.19418 15.47089 15.47089 valinit-max(W[2,]) valinit [1] 16.19418 How to obtain ‘119’ Thanks, Hi, Using ?dput can help make it easier for others to recreate your object to test code: dput(W) structure(c(651, 13.24184, 651, 13.24184, 651, 13.24184, 651, 13.24184, 651, 13.24184, 651, 13.24184, 651, 13.24184, 119, 16.19418, 78, 15.47089, 78, 15.47089), .Dim = c(2L, 10L)) W - structure(c(651, 13.24184, 651, 13.24184, 651, 13.24184, 651, 13.24184, 651, 13.24184, 651, 13.24184, 651, 13.24184, 119, 16.19418, 78, 15.47089, 78, 15.47089), .Dim = c(2L, 10L)) See ?which.max, which returns the index of the *first* maximum in the vector passed to it: W[1, which.max(W[2, ])] [1] 119 You should consider what happens if there is more than one of the maximum value in the first row and if it might correspond to non-unique values in the second row. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with arrays
On May 29, 2014, at 11:22 AM, Marc Schwartz marc_schwa...@me.com wrote: On May 29, 2014, at 11:02 AM, Olivier Charansonney olivier.charanson...@orange.fr wrote: Hello, I would like to extract the value in row 1 corresponding to the maximum in row 2 Array W [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9][,10] [1,] 651.0 651.0 651.0 651.0 651.0 651.0 651.0 119.0 78.0 78.0 [2,] 13.24184 13.24184 13.24184 13.24184 13.24184 13.24184 13.24184 16.19418 15.47089 15.47089 valinit-max(W[2,]) valinit [1] 16.19418 How to obtain ‘119’ Thanks, Hi, Using ?dput can help make it easier for others to recreate your object to test code: dput(W) structure(c(651, 13.24184, 651, 13.24184, 651, 13.24184, 651, 13.24184, 651, 13.24184, 651, 13.24184, 651, 13.24184, 119, 16.19418, 78, 15.47089, 78, 15.47089), .Dim = c(2L, 10L)) W - structure(c(651, 13.24184, 651, 13.24184, 651, 13.24184, 651, 13.24184, 651, 13.24184, 651, 13.24184, 651, 13.24184, 119, 16.19418, 78, 15.47089, 78, 15.47089), .Dim = c(2L, 10L)) See ?which.max, which returns the index of the *first* maximum in the vector passed to it: W[1, which.max(W[2, ])] [1] 119 You should consider what happens if there is more than one of the maximum value in the first row and if it might correspond to non-unique values in the second row. Correction in the above sentence, it should be: You should consider what happens if there is more than one of the maximum value in the second row and if it might correspond to non-unique values in the first row. Marc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] looking at C code from the stats package
On May 29, 2014, at 7:12 PM, Erin Hodgess erinm.hodg...@gmail.com wrote: Dear R People: How are you? I would like to look at the underlying C code from the program C_ARIMA_Like in the stats package. However, since that is a base package, I'm not entirely sure how to access this. When I used the .C(C_ARIMA_Like) it says that the C_ARIMA_Like cannot be found. This is on Windows 7, R version 3.0.2. Thank you for any help! Sincerely, Erin Hi Erin, If you are working from a binary install of R, you won't able to see the sources for C or FORTRAN based functions. If it is a base package in the '../library' tree like 'stats', in the source tarball from CRAN or in the R SVN repo, there will be a 'src' directory for the package where relevant C and/or FORTRAN code will be contained. As an example for arima.c, in the SVN repo for the 3.0 branch tree: https://svn.r-project.org/R/branches/R-3-0-branch/src/library/stats/src/arima.c For R-Devel, it will be in 'trunk': https://svn.r-project.org/R/trunk/src/library/stats/src/arima.c Scroll down or search in the arima.c source for the function name ARIMA_Like. If you know something about SVN repo trees, the path will make sense. Other common C and/or FORTRAN code that is not part of the base packages may be in the ../src/main directory: https://svn.r-project.org/R/branches/R-3-0-branch/src/main/ and there is a file 'names.c' that can be helpful in locating specific C functions and their associated declared C names. For Recommended packages, there is also a separate SVN repo at: https://svn.r-project.org/R-packages/ but it may be easier to download the tarball for each package from CRAN. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reference category in binomal glm
On May 27, 2014, at 3:51 AM, Xebar Saram zelt...@gmail.com wrote: Hi all i know this is probably a silly question but im wondering what is the 'reference' category when you run a binomal glm. that is my outcome/DV is 0,1 and i run a regression and get coefficients. do the coefficients refer to the probability to get 0 or 1? thanks so much in advance Z As per the Details section of ?glm: A typical predictor has the form response ~ terms where response is the (numeric) response vector and terms is a series of terms which specifies a linear predictor for response. For binomial and quasibinomial families the response can also be specified as a factor (when the first level denotes failure and all others success) or as a two-column matrix with the columns giving the numbers of successes and failures. Thus, if you have a numeric 0/1 response, you are predicting 1's and if you use a two level factor, you are predicting the second level of the factor. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Login
On May 27, 2014, at 4:46 AM, Andy Siddaway andysidda...@googlemail.com wrote: Dear R help, I cannot login to my account. I am keen to remove the posting I made to R help from google web searches - see http://r.789695.n4.nabble.com/R-software-installation-problem-td4659556.html Thanks, Andy You cannot. From the bottom of the R Posting Guide (http://www.r-project.org/posting-guide.html): Posters should be aware that the R lists are public discussion lists and anything you post will be archived and accessible via several websites for many years. There are a plethora of list archives on the web and there is no provision for removing specific posts from all sites that might possibly have a copy of your post. Google, by the way, is not the only search engine that will include and archive your post in searches. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Summary to data frame in R!!
On May 7, 2014, at 5:15 AM, Abhinaba Roy abhinabaro...@gmail.com wrote: Hi R-helpers, sumx - summary(mtcars[,c(mpg,disp)]) sumx mpg disp Min. :10.40 Min. : 71.1 1st Qu.:15.43 1st Qu.:120.8 Median :19.20 Median :196.3 Mean :20.09 Mean :230.7 3rd Qu.:22.80 3rd Qu.:326.0 Max. :33.90 Max. :472.0 I want a dataframe as mpgdisp Min. 10.40 71.1 1st Qu. 15.43 120.8 Median 19.20 196.3 Mean20.09 230.7 3rd Qu. 22.80 326.0 Max. 33.90 472.0 How can it be done in R? -- Regards Abhinaba Roy summary(), when applied to multiple columns, as you are doing, returns a character table object: str(sumx) 'table' chr [1:6, 1:2] Min. :10.40 1st Qu.:15.43 ... - attr(*, dimnames)=List of 2 ..$ : chr [1:6] ... ..$ : chr [1:2] mpg disp Note that the actual table elements contain both character and numeric values that have been formatted. If you use: sapply(mtcars[, c(mpg, disp)], summary) mpg disp Min.10.40 71.1 1st Qu. 15.42 120.8 Median 19.20 196.3 Mean20.09 230.7 3rd Qu. 22.80 326.0 Max.33.90 472.0 this applies the summary() function to each column separately, returning a numeric matrix: str(sapply(mtcars[, c(mpg, disp)], summary)) num [1:6, 1:2] 10.4 15.4 19.2 20.1 22.8 ... - attr(*, dimnames)=List of 2 ..$ : chr [1:6] Min. 1st Qu. Median Mean ... ..$ : chr [1:2] mpg disp If you actually want a data frame, you can coerce the result: as.data.frame(sapply(mtcars[, c(mpg, disp)], summary)) mpg disp Min.10.40 71.1 1st Qu. 15.42 120.8 Median 19.20 196.3 Mean20.09 230.7 3rd Qu. 22.80 326.0 Max.33.90 472.0 str(as.data.frame(sapply(mtcars[, c(mpg, disp)], summary))) 'data.frame': 6 obs. of 2 variables: $ mpg : num 10.4 15.4 19.2 20.1 22.8 ... $ disp: num 71.1 120.8 196.3 230.7 326 ... Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] get element of list with default?
On Apr 15, 2014, at 10:53 AM, Spencer Graves spencer.gra...@structuremonitoring.com wrote: Hello: Do you know of a simple function to return the value of a named element of a list if that exists, and return a default value otherwise? It's an easy function to write (e.g., below). I plan to add this to the Ecfun package unless I find it in another CRAN package. Thanks, Spencer getElement - function(element, default, list){ # get element of list; return elDefault if absent El - list[[element]] if(is.null(El)){ El - default } El } Hi Spencer, I don't know of a function elsewhere, but you can probably simplify the above with: getElement - function(element, default, list) { ifelse(is.null(list[[element]]), default, list[[element]]) } MyList - list(L1 = 1, L2 = 2) MyList $L1 [1] 1 $L2 [1] 2 getElement(L1, 5, MyList) [1] 1 getElement(L2, 5, MyList) [1] 2 getElement(L3, 5, MyList) [1] 5 You might want to think about the ordering of the function arguments, given typical use, for ease of calling it. For example: getElement - function(list, element, default = SomeValue) Another consideration is that the above function will only get the element if it is a 'first level' element in the list. If it is in a sub-list of the main list, you would need to think about a recursive approach of some type, along the lines of what ?rapply does. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] get element of list with default?
On Apr 15, 2014, at 11:22 AM, Marc Schwartz marc_schwa...@me.com wrote: On Apr 15, 2014, at 10:53 AM, Spencer Graves spencer.gra...@structuremonitoring.com wrote: Hello: Do you know of a simple function to return the value of a named element of a list if that exists, and return a default value otherwise? It's an easy function to write (e.g., below). I plan to add this to the Ecfun package unless I find it in another CRAN package. Thanks, Spencer getElement - function(element, default, list){ # get element of list; return elDefault if absent El - list[[element]] if(is.null(El)){ El - default } El } Hi Spencer, I don't know of a function elsewhere, but you can probably simplify the above with: getElement - function(element, default, list) { ifelse(is.null(list[[element]]), default, list[[element]]) } MyList - list(L1 = 1, L2 = 2) MyList $L1 [1] 1 $L2 [1] 2 getElement(L1, 5, MyList) [1] 1 getElement(L2, 5, MyList) [1] 2 getElement(L3, 5, MyList) [1] 5 You might want to think about the ordering of the function arguments, given typical use, for ease of calling it. For example: getElement - function(list, element, default = SomeValue) Another consideration is that the above function will only get the element if it is a 'first level' element in the list. If it is in a sub-list of the main list, you would need to think about a recursive approach of some type, along the lines of what ?rapply does. Regards, Marc Schwartz Spencer, A quick heads up here. I forgot, that there is already a function called getElement() in base R which appears to be designed to handle S4 objects and slots, lacking the default return value however, where it returns NULL if the 'name' element is not present: getElement function (object, name) { if (isS4(object)) slot(object, name) else object[[name, exact = TRUE]] } bytecode: 0x100905870 environment: namespace:base Thus, I would suggest calling your variant something else, or wrap the default function in your version, if you need/want to handle S4 objects and slots. Regards, Marc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error logistic analysis
On Apr 8, 2014, at 7:20 AM, lghansse lise.hanss...@ugent.be wrote: I'm trying to conduct a single level logistic analysis (as a beginning step for a more advanced Multi-level analysis). However, when I try to run it, I get following error: Warning messages: 1: In model.response(mf, numeric) : using type = numeric with a factor response will be ignored 2: In Ops.factor(y, z$residuals) : - not meaningful for factors I haven't got a clue why I'm getting this because I used the exact same syntax (same data preparation etc...) for a similar analysis (same datastructure, different country). Syntax: Single_model1 - lm(openhrs1 ~ genhealt1 + age + sexpat1 + hhincome1 + edupat1 + etniciteit1, data=Slovakije) My Missing data are coded as such, I already tried to run the analysis in a data frame without the missing cases, but that didn't work either. You are using the lm() function above, which is a regular least squares linear regression for a continuous response variable. If you want to run a logistic regression, you need to use glm() with 'family = binomial': Single_model1 - glm(openhrs1 ~ genhealt1 + age + sexpat1 + hhincome1 + edupat1 + etniciteit1, family = binomial, data = Slovakije) Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] moses extreme reaction test
On Apr 8, 2014, at 12:37 PM, José Trujillo Carmona truji...@unex.es wrote: Is there a package that contains moses extreme reaction test? Thank's A search using rseek.org indicates that the DescTools package on CRAN contains a function called MosesTest() that appears to implement it. http://cran.r-project.org/web/packages/DescTools/ Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange sprintf Behavior
On Apr 2, 2014, at 6:32 AM, Michael Smith my.r.h...@gmail.com wrote: All, I'm getting this: sprintf(%.17f, 0.8) [1] 0.80004 Where does the `4` at the end come from? Shouldn't it be zero at the end? Maybe I'm missing something. Hi, First, please start a new thread when posting, do not just reply to an existing thread and change the subject line. Your post gets lost in the archive and is improperly linked to other posts. Second, see the Most Frequently Asked Question: http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] CORDIF test
On Apr 2, 2014, at 8:09 AM, Elizabeth Caron-Gamache babeth_...@icloud.com wrote: Hi, I search on your website for a definition of the CORDIF test, but it wasn’t successful. I’m analyzing an article that use that test and it’s not really documented on the net. The article refers to your website, so I pretend that you will be able to give me a brief explanation of this test. Here is the cote that talk about this test in my article : ‘' To compare these regressions and to see which—either body height or LLL—is best related to performance (Pearson correlation coefficients comparison), a CORDIF test (R software [www.r-project.org], multilevel package, ver- sion 2.12.1) was performed. Does it use parametric or non-parametric values ? Is it a test to compare 2 groups only or it can be used for a comparison of more than two groups ? Why is it so hard to find information on that test on the net ? Thanks for your time Have a nice day Elizabeth Caron Physical therapist student, Laval University, Qc, Canada Thanks for including the citation, which indicates that the CORDIF test is part of the 'multilevel' package, which is on CRAN: http://cran.r-project.org/web/packages/multilevel/index.html The reason that it is likely difficult is that 'cordif' is an abbreviation for correlation difference, not the proper name for a test. If you review the provided documentation for the package: http://cran.r-project.org/web/packages/multilevel/multilevel.pdf you will see that there is a description of the cordif() function and a reference given: Cohen, J. Cohen, P. (1983). Applied multiple regression/correlation analysis for the behavioral sciences (2nd Ed.). Hillsdale, NJ: Lawrence Erlbaum Associates. I would review the package documentation and reference and if you have further questions, contact the authors of the paper. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] NA to NA
On Mar 31, 2014, at 1:29 PM, eliza botto eliza_bo...@hotmail.com wrote: Dear useRs, Sorry for such a ridiculous question but i really need to know that what is the difference between NA and NA and how to convert NA to NA. Thankyou very much in advance Eliza NA is the printed output that you would typically get when NA is an element in a factor: factor(NA) [1] NA is.na(factor(NA)) [1] TRUE NA [1] NA is.na(NA) [1] TRUE See ?factor for additional details. It is, other than the displayed output, the same as a 'normal' NA, which is to say that the value is missing and otherwise undefined. The behavior appears to evolve from the use of ?encodeString, which is called within print.factor: encodeString(NA) [1] NA encodeString(NA, na.encode = FALSE) [1] NA The default for the 'na.encode' argument is TRUE, so you get the formatting of the NA as you observe for factors. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] numeric to factor via lookup table
On Mar 28, 2014, at 3:38 PM, Jonathan Greenberg j...@illinois.edu wrote: R-helpers: Hopefully this is an easy one. Given a lookup table: mylevels - data.frame(ID=1:10,code=letters[1:10]) And a set of values (note these do not completely cover the mylevels range): values - c(1,2,5,5,10) How do I convert values to a factor object, using the mylevels to define the correct levels (ID matches the values), and code is the label? --j One approach would be to use ?merge and specify the 'by.*' arguments using column indices, where 'values' is column 1 and you want to match that with mylevels$ID, which is also column 1. Hence: merge(values, mylevels, by.x = 1, by.y = 1) x code 1 1a 2 2b 3 5e 4 5e 5 10j Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R mail list archive Google search function not work
On Mar 27, 2014, at 11:58 AM, Martin Maechler maech...@stat.math.ethz.ch wrote: Marc Schwartz marc_schwa...@me.com on Wed, 26 Mar 2014 16:25:08 -0500 writes: On Mar 26, 2014, at 4:14 PM, David Winsemius dwinsem...@comcast.net wrote: On Mar 25, 2014, at 5:31 PM, Rolf Turner wrote: On 26/03/14 12:51, David Winsemius wrote: On Mar 25, 2014, at 9:52 AM, Luo Weijun wrote: Dear Robert and R project team, I notice that the Google search function on the R mail list archives page has stopped working for quite a while, http://tolstoy.newcastle.edu.au/R/. Is there any solution on this or this has been move to another webpage? I know Google advance search can be used but the query is more complicated. This simple function could help R users greatly. Thank you! Weijun Why not use MarkMail: http://markmail.org/search/?q=list%3Aorg.r-project.r-help Or Gmane http://dir.gmane.org/gmane.comp.lang.r.general/search/list:org.r-project.r-help Why not? Well, for one thing the first link that the R web site points at is the tolstoy.newcastle.edu.au/R/ link. Which is *still there* but terminates at 31 March 2012. I'm a bit confused about what is meant by the R web site. Are you pointing out deficiencies in MarkMail or GMane? David, If you go to: http://www.r-project.org and look at the left hand navigation frame, there is a Search link there, which brings up, in the right hand frame, a list of search sites (http://www.r-project.org/search.html), which still includes Robert's site at http://tolstoy.newcastle.edu.au/R/. The archives there seem to stop at 2012 and the Google search box there seems to be non-functional from a quick check. I have not used his site in years and will use rseek.org these days. It seems to me that I recall discussion in the past about the status of Robert's site, but cannot seem to locate anything at the moment. I am not sure who maintains the R web site these days, but presumably that link should be removed if the search engine is no longer actively maintained. Regards, Marc Schwartz Thank you, Marc (and the other posters). R core has always been responsible for that, it is also an svn repos, mirrored daily to the web server. I have commented Robert King's mirror (and also added a bit about Nabble). You should be able to see the result within 24 hours. Martin Thanks Martin! Regards, Marc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Clinical significance - Equivalence test
On Mar 27, 2014, at 8:53 AM, Manuel Carona unku...@gmail.com wrote: Hi, I have implemented a therapeutic intervention on two groups (one is a control group) and tested them in two moments using some assessment tools (with normative data). Now I want to compare the experimental group with the control group using clinical equivalence testing. To do this I need to specify a range of closeness (One for each assessment tool according to the specificity of this same tool) and do two one-tailed tests to test if the two groups are considered clinically equivalent in the first moment and on the end I want to compare the experimental group with the normative data (Here I have to add the mean and standard deviation of the normative sample because I don't have the normative sample). I know that R has a package named equivalence but I don't know how to do this kind of calculations with it. Is it even possible with the actual packages? Thanks in advance I have not used it, but a quick review of the documentation the 'equivalence' would suggest that at least the tost() function might be what you need. The being said, you may need to seek the assistance of a local statistician familiar with the methods and underlying theory to guide you, beyond simply performing the analyses. It is not clear from your description above if the two time points are a baseline and post-treatment pair for each subject, or if they represent 2 time points beyond baseline (3 measures per subject), which would make this a repeated measures scenario and more complicated. In addition, with multiple assessment tools, what multiple testing adjustments may be required to control the likelihood of Type I errors? If this is a formal study, with a powered a priori hypothesis, all of this should have been pre-specified in the study protocol in the statistical analysis section (and possibly in a stand alone statistical analysis plan) by someone familiar with study designs of this type and the appropriate analytic methods. There are also regulatory guidance documents (eg. FDA) and books that cover the design and analysis of bioequivalence studies and those should have served as a reference for such a study design. Again, seeking local expertise would seem apropos here. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] completely different results for shapiro.test and ks.test
On Mar 27, 2014, at 8:14 AM, Hermann Norpois hnorp...@gmail.com wrote: Hello, My main question is wheter my data is distributed normally. As the shapiro.test doesnt work for large data sets I prefer the ks.test. But I have some problems to understand the completely different p-values: ks.test (test, pnorm, mean (test), sd (test)) One-sample Kolmogorov-Smirnov test data: test D = 0.0434, p-value = 0.1683 alternative hypothesis: two-sided Warnmeldung: In ks.test(test, pnorm, mean(test), sd(test)) : für den Komogorov-Smirnov-Test sollten keine Bindungen vorhanden sein shapiro.test (test) Shapiro-Wilk normality test data: test W = 0.9694, p-value = 1.778e-10 Generating some random data the difference is acceptable: nt - rnorm (200, mean=5, sd=1) ks.test (nt, pnorm, mean=5, sd=1) One-sample Kolmogorov-Smirnov test data: nt D = 0.0641, p-value = 0.3841 alternative hypothesis: two-sided shapiro.test (nt) Shapiro-Wilk normality test data: nt W = 0.9933, p-value = 0.5045 Thanks hermann snip The discussion here (and other similar ones) might be helpful: http://stats.stackexchange.com/questions/362/what-is-the-difference-between-the-shapiro-wilk-test-of-normality-and-the-kolmog You may also be served by searching the R-Help list archives for prior discussions on using normality tests and why they are essentially useless in practice. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R mail list archive Google search function not work
On Mar 26, 2014, at 4:14 PM, David Winsemius dwinsem...@comcast.net wrote: On Mar 25, 2014, at 5:31 PM, Rolf Turner wrote: On 26/03/14 12:51, David Winsemius wrote: On Mar 25, 2014, at 9:52 AM, Luo Weijun wrote: Dear Robert and R project team, I notice that the Google search function on the R mail list archives page has stopped working for quite a while, http://tolstoy.newcastle.edu.au/R/. Is there any solution on this or this has been move to another webpage? I know Google advance search can be used but the query is more complicated. This simple function could help R users greatly. Thank you! Weijun Why not use MarkMail: http://markmail.org/search/?q=list%3Aorg.r-project.r-help Or Gmane http://dir.gmane.org/gmane.comp.lang.r.general/search/list:org.r-project.r-help Why not? Well, for one thing the first link that the R web site points at is the tolstoy.newcastle.edu.au/R/ link. Which is *still there* but terminates at 31 March 2012. I'm a bit confused about what is meant by the R web site. Are you pointing out deficiencies in MarkMail or GMane? David, If you go to: http://www.r-project.org and look at the left hand navigation frame, there is a Search link there, which brings up, in the right hand frame, a list of search sites (http://www.r-project.org/search.html), which still includes Robert's site at http://tolstoy.newcastle.edu.au/R/. The archives there seem to stop at 2012 and the Google search box there seems to be non-functional from a quick check. I have not used his site in years and will use rseek.org these days. It seems to me that I recall discussion in the past about the status of Robert's site, but cannot seem to locate anything at the moment. I am not sure who maintains the R web site these days, but presumably that link should be removed if the search engine is no longer actively maintained. Regards, Marc Schwartz cheers, Rolf David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Duplicate of columns when merging two data frames
On Mar 13, 2014, at 10:19 AM, Stefano Sofia stefano.so...@regione.marche.it wrote: Dear list users, I have two data frames df1 and df2, where the columns of df1 are Sensor_RM Place_RM Station_RM Y_init_RM M_init_RM D_init_RM Y_fin_RM M_fin_RM D_fin_RM and the columns of df2 are Sensor_RM Station_RM Place_RM Province_RM Region_RM Net_init_RM GaussBoaga_EST_RM GaussBoaga_NORD_RM Gradi_Long_RM Primi_Long_RM Secondi_Long_RM Gradi_Lat_RM Primi_Lat_RM Secondi_Lat_RM Long_Cent_RM Lat_Cent_RM Height_RM When I merge the two data frames through df3 - merge(df1, df2, by=c(Sensor_RM, Station_RM)) I get a new data frame with columns Sensor_RM Station_RM Place_RM.x Y_init_RM M_init_RM D_init_RM Y_fin_RM M_fin_RM D_fin_RM Place_RM.y Province_RM Region_RM Net_init_RM GaussBoaga_EST_RM GaussBoaga_NORD_RM Gradi_Long_RM Primi_Long_RM Secondi_Long_RM Gradi_Lat_RM Primi_Lat_RM Secondi_Lat_RM Long_Cent_RM Lat_Cent_RM Height_RM I am sure that df1$Place_RM and df2$Place_RM are equal. I checked it from the shell using awk and diff. Why then I have a duplicate of Place_RM, i.e. Place_RM.x and Place_RM.y, and only of them? Thank you for your help Stefano From the Details section of ?merge: If the columns in the data frames not used in merging have any common names, these have suffixes (.x and .y by default) appended to try to make the names of the result unique. If this is not possible, an error is thrown. If you don't want both columns in the resultant data frame, use them in the 'by' argument or remove one of them prior to merge()ing. If you use them in the 'by' argument, be sure that they will be compared as exactly equal, which can be problematic if they are floating point values. If so, you would be better of subsetting one of the source data frames to remove the column first: df3 - merge(df1, subset(df2, select = -Place_RM), by=c(Sensor_RM, Station_RM)) Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Generalizing a regex for retrieving numbers with and without scientific notation
On Feb 19, 2014, at 12:26 PM, Morway, Eric emor...@usgs.gov wrote: I'm trying to extract all of the values from edm in the example below. However, the first attempt only retrieves the final number in the sequence since it is recorded using scientific notation. The second attempt retrieves all of the numbers, but omits the scientific notation component of the final number. How can I make the regular expression more general such that I get every value AND its corresponding E-value (i.e., ...E-06), where pertinent? I've spent time reading through ?regex, but my attempts to use the * character, where the preceding item will be matched zero or more times, have so far proven syntactically incorrect or generally unsuccessful. .Appreciate the help, Eric edm - c(,param_value,6.301343,6.366305,6.431268,6.496230,6.561192,6.626155,9.091117E-06) param_values - strapply(edm,\\d+\\.\\d+E[-+]?\\d+, as.numeric, simplify=cbind) param_values #[1,] 9.091117e-06 param_values - strapply(edm,\\d+\\.\\d+, as.numeric, simplify=cbind) param_values #[1,] 6.301343 6.366305 6.431268 6.49623 6.561192 6.626155 9.091117 If the individual elements of the vector are either numeric or non-numeric, why not just use: as.numeric(edm) [1] NA NA 6.301343e+00 6.366305e+00 6.431268e+00 [6] 6.496230e+00 6.561192e+00 6.626155e+00 9.091117e-06 Warning message: NAs introduced by coercion The non-numeric elements are returned as NA's, which you can remove by using ?na.omit. The only reason to use a regex would be if the individual elements themselves contained both numeric and non-numeric characters. If you then want to explicitly format numeric output (which would yield a character vector), you can use ?sprintf or ?format. Keep in mind the difference between how R *PRINTS* a numeric value and how R *STORES* a numeric value internally. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grep for multiple pattern?
On Feb 13, 2014, at 8:43 AM, Rainer M Krug rai...@krugs.de wrote: Hi I want to search for multiple pattern as grep is doing for a single pattern, but this obviously not work: grep(an, month.name) [1] 1 grep(em, month.name) [1] 9 11 12 grep(eb, month.name) [1] 2 grep(c(an, em, eb), month.name) [1] 1 Warning message: In grep(c(an, em, eb), month.name) : argument 'pattern' has length 1 and only the first element will be used Is there an equivalent which returns the positions as grep is doing, but not using the strict full-string matching of match()? I could obviously do: unlist( sapply(pat, grep, month.name ) ) an em1 em2 em3 eb 1 9 11 12 2 but is there a more compact command I am missing? Thanks, Rainer The vertical bar '|' acts as a logical 'or' operator in regex expressions: grep(an|em|eb, month.name) [1] 1 2 9 11 12 grep(an|em|eb, month.name, value = TRUE) [1] January February September November December Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sensitivity analysis - minimum effect size detectable by a binomial test
On Feb 5, 2014, at 10:05 AM, Simone misen...@hotmail.com wrote: Hi all, I have performed a binomial test to verify if the number of males in a study is significantly different from a null hypothesis (say, H0:p of being a male= 0.5). For instancee: binom.test(10, 30, p=0.5, alternative=two.sided, conf.level=0.95) Exact binomial test data: 10 and 30 number of successes = 10, number of trials = 30, p-value = 0.09874 alternative hypothesis: true probability of success is not equal to 0.5 95 percent confidence interval: 0.1728742 0.5281200 sample estimates: probability of success 0.333 This way I get the estimated proportion of males (in this case p of success) that is equal to 0.33 and an associated p-value (this is not significant at alpha=0.05 with respect to the H0:P=0.5). Now, I want to know, given a power of, say, 0.8, alpha=0.05 and the above sample size (30), what is the minimum proportion of males as low or as high (two sided) like to be significantly detected with respect to a H0 (not necessarily H0:P=0.5 - I am interested also in other null hypotheses). In other words, I would have been able to detect a significant deviation from the H0 for a given power, alpha and sample size if the proportion of males would have been more than Xhigh or less than Xlow. I have had a look at the pwr package but it seems to me it doesn't allow to calculate this. I would appreciate very much any suggestion. Take a look at ?power.prop.test, where you can specify that one of the proportions is NULL, yielding the value you seek: power.prop.test(n = 30, p1 = 0.5, p2 = NULL, power = 0.8, sig.level = 0.05) Two-sample comparison of proportions power calculation n = 30 p1 = 0.5 p2 = 0.834231 sig.level = 0.05 power = 0.8 alternative = two.sided NOTE: n is number in *each* group The value for 'p2' is your high value for the detectible difference from a proportion of 0.5, given the other parameters. 1 - p2 would be your low value. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating an equivalent of r-help on r.stackexchange.com ?
On Feb 3, 2014, at 8:54 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Mon, Feb 3, 2014 at 8:41 PM, Marc Schwartz marc_schwa...@me.com wrote: Hi All, As I have noted in a prior reply in this thread, which began last November, I don't post in SO, but I do keep track of the traffic there via RSS feeds. However, the RSS feeds are primarily for new posts and do not seem to update with follow ups to the initial post. I do wish that they would provide an e-mail interface, which would help to address some of the issues raised here today. They do provide notifications on comments to posts, as do many other online fora. However, there is no routine mailing of new posts with a given tag (eg. 'R'), at least as far as I can see, as I had searched there previously for that functionality. That would be a nice push based approach, as opposed to having to go to the web site. You can set up email subscriptions for specific tags. See the preferences section of your account. I get regular emails of the r_filter. Here are the first few lines of an email I juist received (I have pasted it into this text plain email but they are received as HTML and there are links to the specific questions). snip Thanks for the pointer Gabor. I did not have an account on SE/SO and had only searched the various help resources there attempting to find out what kind of e-mail push functionality was available. A number of posts had suggested a non real time e-mail ability, which indeed seems to be the case. I went ahead and created an account to get a sense of what was available. As you note, you can sign up for e-mail subscriptions based upon various tag criteria. However, it would seem that you need to specify time intervals for the frequency of the e-mails. These can be daily, every 3 hours or every 15 minutes. So there seems to be a polling/digest based process going on. I created an e-mail subscription last evening and selected every 15 minutes. What appears to be happening is that the frequency of the e-mails actually varies. Overnight and this morning, I have e-mails coming in every 20 to 30 minutes or more apart. It is not entirely clear what the trigger is, given the inconsistency in frequency. Perhaps the infrastructure is not robust enough to support a more consistent polling/digest e-mail capability yet. The e-mails contain snippets of new questions only and not responses (paralleling the RSS feed content). I need to actually go to the web site to see the full content of the question and to see if the question has been answered. In most cases, by the time that I get to the site, even right away after getting the e-mail, there are numerous replies already present. There is, of course, no way to respond via e-mail. I would say that if one is looking for an efficient e-mail based interface to SE/SO, it does not exist at present. It is really designed as a web site only interaction, where you are likely going to need to have a browser continuously open to the respective site or sites in order to be able to interact effectively, if it is your intent to monitor and to respond in a timely fashion to queries. Alternatively, perhaps a real-time or near real-time updating RSS feed reader might make more sense for the timeliness of knowing about new questions. It is not clear to me how those who respond quickly (eg. within minutes) are interacting otherwise. There appear to be some browser extensions to support notifications (eg. for Chrome), but again, you need to have your browser open. There also appear to be some desktop apps in alpha/beta stages that might be helpful. However, they seem to track new comments to questions that are specifically being followed (eg. questions that you have posted), rather than all new questions, thus paralleling the SE/SO Inbox content. That being said, obviously, a lot of people are moving in that direction given the traffic decline here and the commensurate increase there. Regards, Marc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating an equivalent of r-help on r.stackexchange.com ?
Hi All, As I have noted in a prior reply in this thread, which began last November, I don't post in SO, but I do keep track of the traffic there via RSS feeds. However, the RSS feeds are primarily for new posts and do not seem to update with follow ups to the initial post. I do wish that they would provide an e-mail interface, which would help to address some of the issues raised here today. They do provide notifications on comments to posts, as do many other online fora. However, there is no routine mailing of new posts with a given tag (eg. 'R'), at least as far as I can see, as I had searched there previously for that functionality. That would be a nice push based approach, as opposed to having to go to the web site. I appreciate Don's comments regarding too many web site logins and too many passwords. Slight digression. The reality of constant security breaches of web sites has led me to use 1Password, such that I have a unique, randomly generated, strong password for almost every site that I login to (where I can control the password and login). I don't have to remember user IDs and passwords. With the multiple browser plug-ins for the application on the desktop and mobile app support with cross platform syncing, this has become, operationally, a non-issue for me. I think that Barry makes a good distinction here. Notwithstanding the gamification of posting on SO, the formalisms on SO are pretty well ingrained. I do also think that the marketplace (aka R users) in many respects, is speaking with its fingers, in that traffic on R-Help continues to decline. I am attaching an updated PDF of the list traffic from 1997-2013, which at the time that I posted it last year, was not yet complete for 2013, albeit, my projection for the year was fairly close. You can see that since the peak in 2010 of 41,048 posts for the year, traffic in 2013 declined to 20,538, or roughly a 50% decline. Much of that decline was from 2012 to 2013, which I postulate, is a direct outcome of the snowballing use of SO primarily. Not in the plot for this year, January of 2014 had 1,129 posts, as compared to January of 2013 with 2,182 posts, or roughly a 50% decline. So the trend continues this year. If January's relative decline holds for the remainder of the year, or worse, perhaps accelerates, we could end the year at a level of activity (~10k posts) on R-Help not seen since circa 2002. I honestly don't know the answer to the question and don't know that SO is the singular solution, as Barry has noted. However, as a long time member of the community, do feel that discussion of the future of these lists is warranted. Perhaps Duncan's prophecy of R-Help just passively fading away will indeed happen. If the current rate of decline in posts here continues, it will become a self-fulfilling prophecy, or at minimum, R-Help will be supporting a declining minority of R users. Is it then worth the time, energy and costs to maintain and host, or are those resources better directed elsewhere to yield greater value to the community? Should this simply continue to be a passive process as the marketplace moves elsewhere, or should there be a proactive discussion and plan put in place to modify infrastructure and behavior to retain traffic here? I suspect that this year may very well be important temporally to the implications for whatever decisions are made. Regards, Marc Schwartz R-Help-Annual.pdf Description: Adobe PDF document On Feb 3, 2014, at 6:34 PM, Barry Rowlingson b.rowling...@lancaster.ac.uk wrote: As one of the original ranters of hey lets move to StackOverflow a few years back (see my UseR! lightning talk from Warwick) I should probably stick my oar in. I don't think the SO model is a good model for all the discussions that go on on R-help. I think SO is a good model for questions that have fairly precise answers that are demonstrably 'correct'. I think a mailing list is a bad model for questions that have answers. Reasons? Well, I see an email thread, start reading it, eight messages in, somewhere in a mix of top-posted and bottom-posted content, I discover the original poster has said Yes thanks Rolf that works!. Maybe I've learnt something in that process, but maybe I had the answer too and I've just wasted my time reading that thread. With StackOverflow questioners accept an answer and you needn't waste time reading it. I've given up reading R-help messages with interesting question titles if there's more than two contributors and six messages, since its either wandered off-topic or been answered. I suspect that heuristic is less efficient than SO's answer accepted flag. SO questions are tagged. I can look at only the ggplot-tagged questions, or the 'spatial'-tagged questions, or ignore anything with 'finance' in it. Mailing lists are a bit coarse-grained and rigid for that, and subject lines are often uninformative of the content
Re: [R] Handling large SAS file in R
Dennis, The key difference is that with R, you are, as always, dependent upon volunteers providing software at no charge to you, most of whom have full time (and then some) jobs. Those jobs (and in many cases, family) will be their priority, as I am sure is the case with Matt. Unless they are in a position where their employer specifically allows them to allocate a percentage of their work time to voluntary projects, like R, you are at the inevitable mercy of that volunteer's time and priorities. In the case of Stat/Transfer, they are a profit motivated business with revenue tied directly to the sales of the application. Thus, they have a very different perspective on serving their paying customers and can allocate dedicated resources to the functionality in their application. An alternative here would be for one of the for profit companies that sell and support R versions, to take on the task of providing some of these facilities and providing them back to the community as a service. But, that is up to them to consider in their overall business plan and the value that they perceive it brings to their products. Regards, Marc Schwartz On Jan 28, 2014, at 9:59 AM, Dennis Fisher fis...@plessthan.com wrote: Colleagues Frank Harrell wrote that “you need to purchase Stat/Transfer, which I did many years ago and continue to use. But I don’t understand why the sas7bdat package (or something equivalent) cannot reverse engineer the SAS procedures so that R users can read sas7bdat files as well as StatTransfer. I have been in contact with the maintainer, Matt Shotwell, regarding bugs in the present version (0.4) and he wrote: it tends to languish just one or two items from the top of my TODO... I hope to get back to it soon. I have also written to this bulletin board about the foreign package not being able to process certain SAS XPT files (which StatTransfer handled without any problem). I am a strong advocate of R and I have arranged work-arounds (using StatTransfer) in these cases. However, R users would benefit from the ability of R to read any SAS file without intermediate software. I would offer to participate in any efforts to accomplish this but I think that it is beyond my capabilities. Dennis Message: 23 Date: Mon, 27 Jan 2014 13:25:54 -0800 (PST) From: Frank Harrell f.harr...@vanderbilt.edu To: r-help@r-project.org Subject: Re: [R] Handlig large SAS file in R Message-ID: 1390857954542-4684250.p...@n4.nabble.com Content-Type: text/plain; charset=us-ascii For that you need to purchase Stat/Transfer. Frank hans012 wrote Hey Guys I have a .sas7bdat file of 1.79gb that i want to read. I am using the .sas7bdat package to read the file and after i typed the command read.sas7bdat('filename.sas7bdat') it has been 3 hours with no result so far. Is there a way that i can see the progress of the read? Or is there another way to read the file with less computing time? I do not have access to SAS, the file was sent to me. Let me know what you guys think KR Hans Dennis Fisher MD P (The P Less Than Company) Phone: 1-866-PLessThan (1-866-753-7784) Fax: 1-866-PLessThan (1-866-753-7784) www.PLessThan.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to read this data correctly
Hi, I don't know that it is a problem in R reading the file per se. It is more of an issue, as far as I can see, that read.xls() is not written to deal with some aspects of cell formatting of certain types. In this case, the cell is formatted using a financial format with Japanese Yen. I did not take the time to look through the Perl script included. The intermediate CSV file that is created by the Perl script that opens and reads the Excel file contains: -0.419547704894512 -[$¥-411]0.42 I captured this while running read.xls() under debug(), since the CSV is created as a temp file that is deleted upon function exit. It would seem that the financial cell data is not simply read as a numeric value. The CSV file will then be directly converted to a data frame in R as is using read.csv() to result in: read.xls(Book1.xlsx, 1, header = FALSE) V1 1 -0.419547704894512 2 -[$¥-411]0.42 You may need to use alternative Excel file importing functions, such as XLConnect or similar, that provide more robust functionality. Of course, R itself does not have financial data types, thus there may yet need to be some form of post import data clean up, even with the other options depending upon how they function. Regards, Marc Schwartz On Jan 24, 2014, at 2:49 PM, Christofer Bogaso bogaso.christo...@gmail.com wrote: Hi Rui, Thanks for your reply. However why you said, 'shouldn't read properly in R'? Basically I was looking for some way so that I would get -0.419547704894512 value in R against cell F4 F7. Because F7 is linked with F4. Ofcourse I can open Excel file then format that cell accordingly. However I am looking for some way in R so to avoid any manual process. Thanks and regards, On Sat, Jan 25, 2014 at 1:21 AM, Rui Barradas ruipbarra...@sapo.pt wrote: Hello, Cell F7 has a formula, =F4, and when I open the file in excel, I get -¥0.42, which shouldn't read properly in R. The problem seems to be in the file, not in read.xls. Hope this helps, Rui Barradas Em 24-01-2014 19:22, Christofer Bogaso escreveu: Hi again, I need to read below xlsx file correctly (available here: http://snk.to/f-ch3exae5), and used following code (say, file is saved in F: drive) library(gdata) read.xls(f:/Book1.xlsx, 1, header = F) V1 1 -0.419547704894512 2 -[$¥-411]0.42 However please notice that, in my original excel file the cells F4 and F7 have essentially the same values. Therefore I should get -0.419547704894512, for either cases above. Any idea on how to achieve that, without opening the xlsx file manually and then formatting the cell before reading it in R? Thanks for your help __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] summary() and the mode
On Jan 23, 2014, at 2:27 PM, Ruhil, Anirudh ru...@ohio.edu wrote: A student asked: Why does R's summary() command yield the Mean and the Median, quartiles, min, and max but was written to exclude the Mode? I said I had no clue, googled the question without much luck, and am now posting it to see if anybody knows why. Ani It has been discussed various times over the years. Presuming that there is interest in knowing it, the problem is how to estimate the mode, depending upon the nature of the data. That is, if the data are discrete (eg. a factor), a simple tabulation using table() can yield the one or perhaps more than one, most frequently occurring value. In this case: set.seed(1) x - sample(letters, 500, replace = TRUE) tab - table(x) # Get the first maximum value tab[which.max(tab)] If the data are continuous, then strictly speaking the mode is not well defined and you need to utilize something along the lines of a density estimation. In that case: set.seed(1) x - rnorm(500) # Get the density estimates dx - density(x) # Which value is at the peak dx$x[which.max(dx$y)] Visual inspection is also helpful in this case: plot(dx) abline(v = dx$x[which.max(dx$y)]) See ?table, ?density and ?which.max Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Doubt in simple merge
On Jan 16, 2014, at 11:14 PM, kingsly ecoking...@yahoo.co.in wrote: Thank you dear friends. You have cleared my first doubt. My second doubt: I have the same data sets Elder and Younger. Elder - data.frame( ID=c(ID1,ID2,ID3), age=c(38,35,31)) Younger - data.frame( ID=c(ID4,ID5,ID3), age=c(29,21,NA)) Row ID3 comes in both data set. It has a value (31) in Elder while NA in Younger. I need output like this. IDage ID1 38 ID2 35 ID3 31 ID4 29 ID5 21 Kindly help me. First, there is a problem with the way in which you created Younger, where you have the NA as NA, which is a character and coerces the entire column to a factor, rather than a numeric: str(Younger) 'data.frame': 3 obs. of 2 variables: $ ID : Factor w/ 3 levels ID3,ID4,ID5: 2 3 1 $ age: Factor w/ 3 levels 21,29,NA: 2 1 3 It then causes problems in the default merge(): DF - merge(Elder, Younger, by = c(ID, age), all = TRUE) str(DF) 'data.frame': 6 obs. of 2 variables: $ ID : Factor w/ 5 levels ID1,ID2,ID3,..: 1 2 3 3 4 5 $ age: chr 38 35 31 NA ... Note that 'age' becomes a character vector, again rather than numeric. Thus: Younger - data.frame(ID = c(ID4, ID5, ID3), age = c(29, 21, NA)) Now, when you merge as before, you get: str(merge(Elder, Younger, by = c(ID, age), all = TRUE)) 'data.frame': 6 obs. of 2 variables: $ ID : Factor w/ 5 levels ID1,ID2,ID3,..: 1 2 3 3 4 5 $ age: num 38 35 31 NA 29 21 merge(Elder, Younger, by = c(ID, age), all = TRUE) ID age 1 ID1 38 2 ID2 35 3 ID3 31 4 ID3 NA 5 ID4 29 6 ID5 21 Presuming that you want to consistently remove any NA values that may arise from either data frame: na.omit(merge(Elder, Younger, by = c(ID, age), all = TRUE)) ID age 1 ID1 38 2 ID2 35 3 ID3 31 5 ID4 29 6 ID5 21 See ?na.omit Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] barplot: segment-wise shading
On Jan 16, 2014, at 9:09 PM, Martin Weiser weis...@natur.cuni.cz wrote: Jim Lemon píše v Pá 17. 01. 2014 v 13:21 +1100: On 01/17/2014 10:59 AM, Marc Schwartz wrote: ... Arggh. No, this is my error for not actually looking at the plot and presuming that it would work. Turns out that it does work for a non-stacked barplot: barplot(VADeaths, angle = 1:20 * 10, density = 10, beside = TRUE) However, internally within barplot(), actually barplot.default(), the manner in which the matrix is passed to an internal function called xyrect() to draw the segments, is that entire columns are passed, rather than the individual segments (counts), when the bars are stacked. As a result, due to the vector based approach used, only the first 5 values of 'angle' are actually used, since there are 5 columns, rather than all 20. The same impact will be observed when using the default legend that is created. Thus, I don't believe that there will be an easy (non kludgy) way to do what you want, at least with the default barplot() function. You could fairly easily create/build your own function using ?rect, which is what barplot() uses to draw the segments. I am not sure if lattice based graphics can do this or perhaps using Hadley's ggplot based approach would offer a possibility. Apologies for the confusion. Regards, Marc Hi Marc and Martin, When I saw the original message I tried to look at the code for the barplot function to see if I could call the rectFill function from plotrix into it. Unfortunately barplot is one of those internal functions that are not at all easy to hack and I have never gotten around to adding stacked bars to the barp function. I thought that rectFill would allow you to use more easily discriminated fills than angles that only differed by 18 degrees. Jim Hi, after Marc pointed me out where to look for, I hacked barplot.default a bit, so now it does what I want (I added segmentwise argument). Unfortunately, it works well with segmentwise = TRUE, but not with segmentwise = FALSE (default) With segmentwise = FALSE, density argument works only in 1/n-th of the segments, where n is the number of columns (it seems like it refuses to auto-multiplicate, but I do not know why). Any ideas? Martin Here is my hack of barplot: code snipped Martin, This would be a good time to learn how to use the ?debug function and related tools to step through your code to see where it is failing. Roger Peng also has some good notes here: http://www.biostat.jhsph.edu/~rpeng/docs/R-debug-tools.pdf Note that when 'segmentwise = TRUE' and there are no 'angle' or 'density' arguments provided, it also does not work correctly. You may want to set some defaults in that case, or issue an error message. I suspect that something in the indexing/expansion code that you added is not working as desired, but you will need to step through the code to see where. One thing that you might want to consider, if the situation that you have is rather specialized, create your own function as you have done, but if 'segementwise = FALSE', then pass the arguments to barplot() so that the default function is used in that situation. Regards, Marc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Doubt in simple merge
Not quite: rbind(Elder, Younger) ID age 1 ID1 38 2 ID2 35 3 ID3 31 4 ID4 29 5 ID5 21 6 ID3 31 Note that ID3 is duplicated. Should be: merge(Elder, Younger, by = c(ID, age), all = TRUE) ID age 1 ID1 38 2 ID2 35 3 ID3 31 4 ID4 29 5 ID5 21 He wants to do a join on both ID and age to avoid duplications of rows when the same ID and age occur in both data frames. If the same column names (eg Var) appears in both data frames and are not part of the 'by' argument, you end up with Var.x and Var.y in the result. In the case of two occurrences of the same ID but two different ages, if that is possible, both rows would be added to the result using the above code. Regards, Marc Schwartz On Jan 16, 2014, at 9:04 AM, Frede Aakmann Tøgersen fr...@vestas.com wrote: Ups, sorry that should have been mer - rbind(Elder, Younger) /frede Oprindelig meddelelse Fra: Frede Aakmann Tøgersen Dato:16/01/2014 15.54 (GMT+01:00) Til: Adams, Jean ,kingsly Cc: R help Emne: Re: [R] Doubt in simple merge No I think the OP wants mer - merge(Elder, Younger) Br. Frede Oprindelig meddelelse Fra: Adams, Jean Dato:16/01/2014 15.45 (GMT+01:00) Til: kingsly Cc: R help Emne: Re: [R] Doubt in simple merge You are telling it to merge by ID only. But it sounds like you would like it to merge by both ID and age. merge(Elder, Younger, all=TRUE) Jean On Thu, Jan 16, 2014 at 6:25 AM, kingsly ecoking...@yahoo.co.in wrote: Dear R community I have a two data set called Elder and Younger. This is my code for simple merge. Elder - data.frame( ID=c(ID1,ID2,ID3), age=c(38,35,31)) Younger - data.frame( ID=c(ID4,ID5,ID3), age=c(29,21,31)) mer - merge(Elder,Younger,by=ID, all=T) Output I am expecting: IDage ID1 38 ID2 35 ID3 31 ID4 29 ID5 21 It looks very simple. But I need help. When I run the code it gives me age.x and age.y. thank you __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] barplot: segment-wise shading
On Jan 16, 2014, at 12:45 PM, Martin Weiser weis...@natur.cuni.cz wrote: Dear listers, I would like to make stacked barplot, and to be able to define shading (density or angle) segment-wise, i.e. NOT like here: # Bar shading example barplot(VADeaths, angle = 15+10*1:5, density = 20, col = black, legend = rownames(VADeaths)) The example has 5 different angles of shading, I would like to have as many possible angle values as there are segments (i.e. 20 in the VADeaths example). I was not successful using web search. Any advice? Thank you for your patience. With the best regards, Martin Weiser You could do something like this: # Get the dimensions of VADeaths dim(VADeaths) [1] 5 4 # How many segments? prod(dim(VADeaths)) [1] 20 Then use that value in the barplot() arguments as you desire, for example: barplot(VADeaths, angle = 15 + 10 * 1:prod(dim(VADeaths)), density = 20, col = black, legend = rownames(VADeaths)) or wrap the barplot() function in your own, which pre-calculates the values and then passes them to the barplot() call in the function. See ?dim and ?prod Be aware that a vector (eg. 1:5) will be 'dim-less', thus if you are going to use this approach for a vector based data object, you would want to use ?length Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] barplot: segment-wise shading
On Jan 16, 2014, at 5:03 PM, Martin Weiser weis...@natur.cuni.cz wrote: Marc Schwartz píše v Čt 16. 01. 2014 v 16:46 -0600: On Jan 16, 2014, at 12:45 PM, Martin Weiser weis...@natur.cuni.cz wrote: Dear listers, I would like to make stacked barplot, and to be able to define shading (density or angle) segment-wise, i.e. NOT like here: # Bar shading example barplot(VADeaths, angle = 15+10*1:5, density = 20, col = black, legend = rownames(VADeaths)) The example has 5 different angles of shading, I would like to have as many possible angle values as there are segments (i.e. 20 in the VADeaths example). I was not successful using web search. Any advice? Thank you for your patience. With the best regards, Martin Weiser You could do something like this: # Get the dimensions of VADeaths dim(VADeaths) [1] 5 4 # How many segments? prod(dim(VADeaths)) [1] 20 Then use that value in the barplot() arguments as you desire, for example: barplot(VADeaths, angle = 15 + 10 * 1:prod(dim(VADeaths)), density = 20, col = black, legend = rownames(VADeaths)) or wrap the barplot() function in your own, which pre-calculates the values and then passes them to the barplot() call in the function. See ?dim and ?prod Be aware that a vector (eg. 1:5) will be 'dim-less', thus if you are going to use this approach for a vector based data object, you would want to use ?length Regards, Marc Schwartz Hello, thank you for your attempt, but this does not work (for me). This produces 5 angles of shading, not 20. Maybe because of my R version (R version 2.15.1 (2012-06-22); Platform: i486-pc-linux-gnu (32-bit))? Thank you. Regards, Martin Weiser Arggh. No, this is my error for not actually looking at the plot and presuming that it would work. Turns out that it does work for a non-stacked barplot: barplot(VADeaths, angle = 1:20 * 10, density = 10, beside = TRUE) However, internally within barplot(), actually barplot.default(), the manner in which the matrix is passed to an internal function called xyrect() to draw the segments, is that entire columns are passed, rather than the individual segments (counts), when the bars are stacked. As a result, due to the vector based approach used, only the first 5 values of 'angle' are actually used, since there are 5 columns, rather than all 20. The same impact will be observed when using the default legend that is created. Thus, I don't believe that there will be an easy (non kludgy) way to do what you want, at least with the default barplot() function. You could fairly easily create/build your own function using ?rect, which is what barplot() uses to draw the segments. I am not sure if lattice based graphics can do this or perhaps using Hadley's ggplot based approach would offer a possibility. Apologies for the confusion. Regards, Marc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Different output for lm Mac vs PC
Devin, You should find out when and how that option was altered from the default, lest you find that virtually any modeling that you do on the Mac will be affected by that change, fundamentally altering the interpretation of the model results. Regards, Marc On Jan 15, 2014, at 7:17 AM, CASENHISER, DEVIN M de...@uthsc.edu wrote: Yes that's it! My mac has: options('contrasts') $contrasts [1] contr.sum contr.poly whereas the PC has $contrasts unordered ordered contr.treatment contr.poly I've changed the mac with options(contrasts=c('contr.treatment','contr.poly')) and that has solved the issue. Thanks Greg and Marc! Cheers! Devin On 1/14/14 5:35 PM, Marc Schwartz marc_schwa...@me.com wrote: Good catch Greg. The Mac output observed can result from either: options(contrasts = c(contr.helmert, contr.poly)) or options(contrasts = c(contr.sum, contr.poly)) being run first, before calling the model code. I checked the referenced tutorial and did not see any steps pertaining to altering the default contrasts. So either code along the lines of the above was manually entered on the Mac at some point or perhaps there is a change to the defaults on Devin's Mac system? The latter perhaps in ~/.Rprofile to mimic S-PLUS' behavior, in the case of Helmert contrasts? Devin, note that the model output lines for both the intercept and sex, beyond the way in which 'sex' is displayed (sex1 versus sexmale), are rather different and are consistent with the use of non-default contrasts on the Mac, as Greg noted. Regards, Marc On Jan 14, 2014, at 3:55 PM, Greg Snow 538...@gmail.com wrote: I would suggest running the code: options('contrasts') on both machines to see if there is a difference. Having the default contrasts set differently would be one explanation. On Tue, Jan 14, 2014 at 2:28 PM, Marc Schwartz marc_schwa...@me.com wrote: On Jan 14, 2014, at 2:23 PM, CASENHISER, DEVIN M de...@uthsc.edu wrote: I've noticed that I get different output when running a linear model on my Mac versus on my PC. Same effect, but the Mac assumes the predictor as a 0 level whereas the PC uses the first category (alphabetically). So for example (using Bodo Winter's example from his online linear models tutorial): pitch = c(233,204,242,130,112,142) sex=c(rep(female,3),rep(male,3)) summary(lm(pitch~sex)) My Mac, running R 3.0.2, outputs: Residuals: 1 2 3 4 5 6 6.667 -22.333 15.667 2.000 -16.000 14.000 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 177.167 7.201 24.601 1.62e-05 *** sex1 49.167 7.201 6.827 0.00241 ** --- Signif. codes: 0 Œ***¹ 0.001 Œ**¹ 0.01 Œ*¹ 0.05 Œ.¹ 0.1 Œ ¹ 1 Residual standard error: 17.64 on 4 degrees of freedom Multiple R-squared: 0.921, Adjusted R-squared: 0.9012 F-statistic: 46.61 on 1 and 4 DF, p-value: 0.002407 But my PC, running R 3.0.2, outputs: Residuals: 1 2 3 4 5 6 6.667 -22.333 15.667 2.000 -16.000 14.000 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 226.33 10.18 22.224 2.43e-05 *** sexmale -98.33 14.40 -6.827 0.00241 ** --- Signif. codes: 0 Œ***¹ 0.001 Œ**¹ 0.01 Œ*¹ 0.05 Œ.¹ 0.1 Œ ¹ 1 Residual standard error: 17.64 on 4 degrees of freedom Multiple R-squared: 0.921, Adjusted R-squared: 0.9012 F-statistic: 46.61 on 1 and 4 DF, p-value: 0.002407 I understand that these are the same (correct) answer, but it does make it a little more challenging to follow examples (when learning or teaching) given that the coefficient outputs are calculated differently. I don't suppose that there is way to easily change either output so that they correspond (some setting I've overlooked perhaps)? Thanks and Cheers! Devin On my Mac with R 3.0.2, I get the same output as you get on your Windows machine. Something on your Mac is amiss, resulting in the recoding of 'sex' into a factor with presumably 0/1 levels rather than the default textual factor levels. If you try something like: model.frame(pitch ~ sex) the output should give you an indication of the actual data that is being used for your model in each case. Either you have other code on your Mac that you did not include above, which is modifying the contents of 'sex', or you have some other behavior going on in the default workspace. I would check for other objects in your current workspace on the Mac, using ls() for example, that might be conflicting. If you are running some type of GUI on your Mac (eg. the default R.app or perhaps RStudio), try running R from a terminal session, using 'R --vanilla' from the command line, to be sure that you are not loading a default workspace containing objects that are resulting in the altered
Re: [R] Subsetting on multiple criteria (AND condition) in R
On Jan 14, 2014, at 1:38 PM, Jeff Johnson mrjeffto...@gmail.com wrote: I'm running the following to get what I would expect is a subset of countries that are not equal to US AND COUNTRY is not in one of my validcountries values. non_us - subset(mydf, (COUNTRY %in% validcountries) COUNTRY != US, select = COUNTRY, na.rm=TRUE) however, when I then do table(non_us) I get: table(non_us) non_us AE AN AR AT AU BB BD BE BH BM BN BO BR BS CA CH CM CN CO CR CY DE DK DO EC ES 0 3 0 2 1 31 4 1 1 1 45 1 1 4 5 86 3 1 8 1 2 1 8 2 1 2 4 FI FR GB GR GU HK ID IE IL IN IO IT JM JP KH KR KY LU LV MO MX MY NG NL NO NZ PA 2 4 35 3 3 14 3 5 2 5 1 2 1 15 1 11 2 2 1 1 23 7 1 6 1 3 1 PE PG PH PR PT RO RU SA SE SG TC TH TT TW TZ US ZA 2 1 1 8 1 1 1 1 1 18 1 1 2 11 1 0 3 Notice US appears as the second to last. I expected it to NOT appear. Do you know if I'm using incorrect syntax? Is the symbol equivalent to AND (notice I have 2 criteria for subsetting)? Also, is COUNTRY != US valid syntax? I don't get errors, but then again I don't get what I expect back. Thanks in advance! -- Jeff Review the Details section of ?subset, where you will find the following: Factors may have empty levels after subsetting; unused levels are not automatically removed. See droplevels for a way to drop all unused levels from a data frame. Your syntax is fine and the behavior is as expected. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Different output for lm Mac vs PC
On Jan 14, 2014, at 2:23 PM, CASENHISER, DEVIN M de...@uthsc.edu wrote: I've noticed that I get different output when running a linear model on my Mac versus on my PC. Same effect, but the Mac assumes the predictor as a 0 level whereas the PC uses the first category (alphabetically). So for example (using Bodo Winter's example from his online linear models tutorial): pitch = c(233,204,242,130,112,142) sex=c(rep(female,3),rep(male,3)) summary(lm(pitch~sex)) My Mac, running R 3.0.2, outputs: Residuals: 1 2 3 4 5 6 6.667 -22.333 15.667 2.000 -16.000 14.000 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 177.167 7.201 24.601 1.62e-05 *** sex1 49.167 7.201 6.827 0.00241 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 17.64 on 4 degrees of freedom Multiple R-squared: 0.921, Adjusted R-squared: 0.9012 F-statistic: 46.61 on 1 and 4 DF, p-value: 0.002407 But my PC, running R 3.0.2, outputs: Residuals: 1 2 3 4 5 6 6.667 -22.333 15.667 2.000 -16.000 14.000 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 226.33 10.18 22.224 2.43e-05 *** sexmale -98.33 14.40 -6.827 0.00241 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 17.64 on 4 degrees of freedom Multiple R-squared: 0.921, Adjusted R-squared: 0.9012 F-statistic: 46.61 on 1 and 4 DF, p-value: 0.002407 I understand that these are the same (correct) answer, but it does make it a little more challenging to follow examples (when learning or teaching) given that the coefficient outputs are calculated differently. I don't suppose that there is way to easily change either output so that they correspond (some setting I've overlooked perhaps)? Thanks and Cheers! Devin On my Mac with R 3.0.2, I get the same output as you get on your Windows machine. Something on your Mac is amiss, resulting in the recoding of 'sex' into a factor with presumably 0/1 levels rather than the default textual factor levels. If you try something like: model.frame(pitch ~ sex) the output should give you an indication of the actual data that is being used for your model in each case. Either you have other code on your Mac that you did not include above, which is modifying the contents of 'sex', or you have some other behavior going on in the default workspace. I would check for other objects in your current workspace on the Mac, using ls() for example, that might be conflicting. If you are running some type of GUI on your Mac (eg. the default R.app or perhaps RStudio), try running R from a terminal session, using 'R --vanilla' from the command line, to be sure that you are not loading a default workspace containing objects that are resulting in the altered behavior. Then re-try the example code. If that resolves the issue, you may want to delete, or at least rename/move the .RData file contained in your default working directory. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Different output for lm Mac vs PC
Good catch Greg. The Mac output observed can result from either: options(contrasts = c(contr.helmert, contr.poly)) or options(contrasts = c(contr.sum, contr.poly)) being run first, before calling the model code. I checked the referenced tutorial and did not see any steps pertaining to altering the default contrasts. So either code along the lines of the above was manually entered on the Mac at some point or perhaps there is a change to the defaults on Devin's Mac system? The latter perhaps in ~/.Rprofile to mimic S-PLUS' behavior, in the case of Helmert contrasts? Devin, note that the model output lines for both the intercept and sex, beyond the way in which 'sex' is displayed (sex1 versus sexmale), are rather different and are consistent with the use of non-default contrasts on the Mac, as Greg noted. Regards, Marc On Jan 14, 2014, at 3:55 PM, Greg Snow 538...@gmail.com wrote: I would suggest running the code: options('contrasts') on both machines to see if there is a difference. Having the default contrasts set differently would be one explanation. On Tue, Jan 14, 2014 at 2:28 PM, Marc Schwartz marc_schwa...@me.com wrote: On Jan 14, 2014, at 2:23 PM, CASENHISER, DEVIN M de...@uthsc.edu wrote: I've noticed that I get different output when running a linear model on my Mac versus on my PC. Same effect, but the Mac assumes the predictor as a 0 level whereas the PC uses the first category (alphabetically). So for example (using Bodo Winter's example from his online linear models tutorial): pitch = c(233,204,242,130,112,142) sex=c(rep(female,3),rep(male,3)) summary(lm(pitch~sex)) My Mac, running R 3.0.2, outputs: Residuals: 1 2 3 4 5 6 6.667 -22.333 15.667 2.000 -16.000 14.000 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 177.167 7.201 24.601 1.62e-05 *** sex1 49.167 7.201 6.827 0.00241 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 17.64 on 4 degrees of freedom Multiple R-squared: 0.921, Adjusted R-squared: 0.9012 F-statistic: 46.61 on 1 and 4 DF, p-value: 0.002407 But my PC, running R 3.0.2, outputs: Residuals: 1 2 3 4 5 6 6.667 -22.333 15.667 2.000 -16.000 14.000 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 226.33 10.18 22.224 2.43e-05 *** sexmale -98.33 14.40 -6.827 0.00241 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 17.64 on 4 degrees of freedom Multiple R-squared: 0.921, Adjusted R-squared: 0.9012 F-statistic: 46.61 on 1 and 4 DF, p-value: 0.002407 I understand that these are the same (correct) answer, but it does make it a little more challenging to follow examples (when learning or teaching) given that the coefficient outputs are calculated differently. I don't suppose that there is way to easily change either output so that they correspond (some setting I've overlooked perhaps)? Thanks and Cheers! Devin On my Mac with R 3.0.2, I get the same output as you get on your Windows machine. Something on your Mac is amiss, resulting in the recoding of 'sex' into a factor with presumably 0/1 levels rather than the default textual factor levels. If you try something like: model.frame(pitch ~ sex) the output should give you an indication of the actual data that is being used for your model in each case. Either you have other code on your Mac that you did not include above, which is modifying the contents of 'sex', or you have some other behavior going on in the default workspace. I would check for other objects in your current workspace on the Mac, using ls() for example, that might be conflicting. If you are running some type of GUI on your Mac (eg. the default R.app or perhaps RStudio), try running R from a terminal session, using 'R --vanilla' from the command line, to be sure that you are not loading a default workspace containing objects that are resulting in the altered behavior. Then re-try the example code. If that resolves the issue, you may want to delete, or at least rename/move the .RData file contained in your default working directory. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] replace to NA.
On Jan 6, 2014, at 5:57 AM, vikram ranga babuaw...@gmail.com wrote: Dear All, I am bit stuck to a problem of replacing to NA. I have big data set but here is the toy example:- test-data.frame( test1=c(,Hi,Hello), test2=c(Hi,,Bye), test3=c(Hello,,)) If the data as in above, I could change all to NA by this code:- for(i in 1:3){ for(j in 1:3){ if(test[j,i]==){ test[j,i]=NA } } } but the problem arises if data frame has NA at some places test-data.frame( test1=c(,Hi,Hello), test2=c(Hi,NA,Bye), test3=c(Hello,,)) the above loop script does not work on this data frame as NA is has logical class and does not return TRUE/FALSE. Can anyone provide some help? snip See ?is.na, which is used to test for NA values and is the canonical way to replace values with NA: test test1 test2 test3 1 Hi Hello 2Hi 3 Hello Bye # Where test == , replace with NA is.na(test) - test == test test1 test2 test3 1 NAHi Hello 2Hi NA NA 3 Hello Bye NA Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] why there is no quarters?
On Dec 15, 2013, at 6:11 AM, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 13-12-15 6:43 AM, 水静流深 wrote: seq(as.Date(2001/1/1),as.Date(2010/1/1),years) seq(as.Date(2001/1/1),as.Date(2010/1/1),weeks) seq(as.Date(2001/1/1),as.Date(2010/1/1),days) why there is no seq(as.Date(2001/1/1),as.Date(2010/1/1),quarters) ? There's no need for it. Just use months, and take every 3rd one: x - seq(as.Date(2001/1/1),as.Date(2010/1/1),months) x[seq_along(x) %% 3 == 1] Alternatively, ?cut.Date has quarter for the 'breaks' argument: x - seq(as.Date(2001/1/1), as.Date(2010/1/1), months) xq - cut(x, breaks = quarter) head(xq, 10) [1] 2001-01-01 2001-01-01 2001-01-01 2001-04-01 2001-04-01 2001-04-01 [7] 2001-07-01 2001-07-01 2001-07-01 2001-10-01 37 Levels: 2001-01-01 2001-04-01 2001-07-01 2001-10-01 ... 2010-01-01 If you want to change the values to use 2001-Q2 or variants, you can do something like: S - c(01-01, 04-01, 07-01, 10-01) xqq - paste(substr(xq, 1, 5), Q, match(substr(xq, 6, 10), S), sep = ) head(xqq, 10) [1] 2001-Q1 2001-Q1 2001-Q1 2001-Q2 2001-Q2 2001-Q2 [7] 2001-Q3 2001-Q3 2001-Q3 2001-Q4 See ?match, ?substr and ?paste Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] why there is no quarters?
That will only work if your starting date happens to be the first day of the year: x - seq(as.Date(2001/1/1), as.Date(2010/1/1), 3 months) head(x) [1] 2001-01-01 2001-04-01 2001-07-01 2001-10-01 2002-01-01 [6] 2002-04-01 Compare that to: x2 - seq(as.Date(2001/2/3), as.Date(2010/1/1), 3 months) head(x2, 10) [1] 2001-02-03 2001-05-03 2001-08-03 2001-11-03 2002-02-03 [6] 2002-05-03 2002-08-03 2002-11-03 2003-02-03 2003-05-03 The 3 months is literally 3 months from the defined start date, not 3 months from the first of the year. So you are not going to get calendar quarter starting dates in that case. On the other hand: cut(x2, breaks = quarter) [1] 2001-01-01 2001-04-01 2001-07-01 2001-10-01 2002-01-01 2002-04-01 [7] 2002-07-01 2002-10-01 2003-01-01 2003-04-01 2003-07-01 2003-10-01 [13] 2004-01-01 2004-04-01 2004-07-01 2004-10-01 2005-01-01 2005-04-01 [19] 2005-07-01 2005-10-01 2006-01-01 2006-04-01 2006-07-01 2006-10-01 [25] 2007-01-01 2007-04-01 2007-07-01 2007-10-01 2008-01-01 2008-04-01 [31] 2008-07-01 2008-10-01 2009-01-01 2009-04-01 2009-07-01 2009-10-01 36 Levels: 2001-01-01 2001-04-01 2001-07-01 2001-10-01 ... 2009-10-01 Regards, Marc Schwartz On Dec 16, 2013, at 6:35 AM, Dániel Kehl ke...@ktk.pte.hu wrote: Hi, try x - seq(as.Date(2001/1/1),as.Date(2010/1/1),3 months) best, daniel Feladó: r-help-boun...@r-project.org [r-help-boun...@r-project.org] ; meghatalmaz#243;: Pancho Mulongeni [p.mulong...@namibia.pharmaccess.org] Küldve: 2013. december 16. 13:05 To: 1248283...@qq.com Cc: r-help@r-project.org Tárgy: Re: [R] why there is no quarters? Hi, I also would like to use quarters. I think a work around would be to just label each record in the dataframe by its quarter. i.e. you add a factor called 'Quarter' with four levels (Q1 to Q4) for each row and you assign the level based on the month of the date. You can easily do this with as.Date and as.character. Pancho Mulongeni Research Assistant PharmAccess Foundation 1 Fouché Street Windhoek West Windhoek Namibia Tel: +264 61 419 000 Fax: +264 61 419 001/2 Mob: +264 81 4456 286 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Exporting R graphics into Word without losing graph quality
On Dec 16, 2013, at 8:39 AM, David Carlson dcarl...@tamu.edu wrote: This will create a simple plot using Windows enhanced metafile format: win.metafile(TestFigure.emf) plot(rnorm(25), rnorm(25)) dev.off() null device 1 Windows does not read pdf. This is correct for Office on Windows, not for Office on OSX. However, if you share the Office document created on OSX that has a PDF embedded with Windows Office users, they will see a bitmapped version of the graphic, rather than the PDF. It will offer to import an eps (encapsulated postscript) file, but it only imports the bitmap thumbnail image of the figure so it is completely useless. Regarding EPS imports, this is NOT correct. Word and the other Office apps will import the EPS file. It cannot render the postscript however, thus it will **display** a bitmapped preview image. If you print the Word document using a PS compatible printer driver, you will get the full high quality vector based graphic output. If you print to a non-PS compatible printer, the bitmapped preview is what will be printed. You may need to install EPS import filters for Office if they were not installed during the initial Office installation. That being said, while it has been years since I was on Windows, I used to use the WMF/EMF format to import or just copy/paste into Word, when I needed a document containing an R plot that could be shared with others. In most cases, the image quality was fine. Regards, Marc Schwartz You can edit a metafile in Word, but different versions seem to have different issues. Earlier versions would lose clipping if you tried to edit the file, but World 2013 works reasonably well. Text labels can jump if you edit the figure in Word (especially rotated text) although it is simple to drag them back to where you want them. I haven't tried 2010 or 2007 recently. - David L Carlson Department of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Duncan Murdoch Sent: Sunday, December 15, 2013 5:24 PM To: david hamer; r-help@r-project.org Subject: Re: [R] Exporting R graphics into Word without losing graph quality On 13-12-15 6:00 PM, david hamer wrote: Hello, My x-y scatterplot produces a very ragged best-fit line when imported into Word. Don't use a bitmap format (png). Don't produce your graph in one format (screen display), then convert to another (png). Open the device in the format you want for the final file. Use a vector format for output. I don't know what kinds Word supports, but EPS or PDF would likely be best; if it can't read those, then Windows metafile (via windows() to open the device) would be best. (Don't trust the preview to tell you the quality of the graph, try printing the document. Word isn't quite as bad as it appears.) Don't use Word. Duncan Murdoch * plot (data.file$x, data.file$y, type = p, las=1, pch=20, ylab = expression(Cover of Species y ~ (m^{2}~ha^{-1} )), xlab = expression(Cover of Species x ~ (m^{2}~ha^{-1})) ) lines ( data.file$x, fitted ( model.x ) )* A suggestion from the internet is to use .png at high (1200) resolution. * dev.print ( device = png, file = R.graph.png, width = 1200, height = 700)* This gives a high-quality graph, but the titles and tick-mark labels become very tiny when exported into Word. I therefore increased the size of the titles and tick-mark labels with cex. * plot (..cex =1.8, cex.lab = 1.8, cex.axis = 1.25,)* But this causes the x-axis title to lie on top of the tick-mark labels. (This problem does not occur with the y-axis, where the title lies well away from the y-axis tick-mark labels.) Changing margins * par ( mai = c ( 1.3, 1.35, 1, .75 ) )* does not seem to have any effect on this. A suggestion from the internet is to delete the titles from plot, and use mtext with line=4 to drop the title lower on the graph. * plot (... ylab = , xlab = .)mtext(side = 1, Cover of Species x (superscripts??), line = 4)* This works, but with mtext I have now lost the ability to have the superscripts in the axis title. And I am back full circle, having to lower the resolution of the graph to keep the x-axis title away from the axis, and thus reverting to a ragged, segmented line when exported to Word.. Final note: The R graphics window version of the graph becomes very distorted, even though the graph may be of high quality (other than the problem of the x-axis title overlaying the x-axis tick-mark labels) once in Word. I guess this is because of using tricks to try to get a desired end-product in Word Thanks for any suggestions, David. __ R-help@r
Re: [R] Should there be an R-beginners list?
On Nov 25, 2013, at 7:56 AM, PIKAL Petr petr.pi...@precheza.cz wrote: Hi I doubt if people start to search answers if they often do not search them in help pages and documentation provided. I must agree with Duncan that if Stackoverflow was far more better than this help list most people would seek advice there then here. Is there any evidence in decreasing traffic here? Anyway, similar discussion went in 2003 with outcome that was not in favour for separate beginner list http://tolstoy.newcastle.edu.au/R/help/03b/7944.html Petr BTW it is pitty that r help archive does not extend over year 2012. I found that *Last message date: Tue 31 Jan 2012 - 12:19:21 GMT Petr, I may be confusing your final statement above, but the **main** R-Help archive is current to today: https://stat.ethz.ch/pipermail/r-help/ That being said, as one who has been interacting on R-Help (and other R-* lists) for a dozen years or so, I would have to say that one would need to have their head in the sand to not be cognizant of the dramatic decline in the traffic on R-Help in recent years. Simply keeping subjective track of the declining daily traffic ought to be sufficient. Due to work related time constraints, my posting here in recent times has dropped notably. I do still read many of the R-Help posts and along with Martin, am co-moderator on R-Devel. So am still involved in that capacity. I do follow SO and SE via RSS feed, so am aware of the increasing traffic there, albeit, I have not posted there. In addition, there are a multitude of other online locations where R related posts have begun to accumulate. These include various LinkedIn groups, R related blogs, ResearchGate and others. I do believe, however, that SO is the dominant force in the shift of traffic. To answer Petr's question above, I updated and re-ran some code that I had used some years ago to estimate the traffic on various lists/fora: https://stat.ethz.ch/pipermail/r-help/2009-January/184196.html To that end, I am attaching a PDF file that contains a barplot of the annual R-Help traffic volume since 1997, through this month. The grey bars represent the actual annual traffic volumes of posts to R-Help. For 2013, I added a red segment to the bar, which shows the projected number of posts for the full year, albeit, it is simply based upon the mean number of posts per day, averaged over the YTD volume, projected over the remaining days in the year, without any seasonal adjustments. So it may be optimistic, as we are coming into the holiday season for many. Bottom line, while the trend was dramatically positive through 2010, peaking at a little over 41,000 total posts, the volume has just as dramatically declined in 2013 to a projected ~21,400. This means that the volume for 2013 has dropped back to the approximate volume of 2005. Only time will tell if the dramatic decline will continue, or reach some new reasonable asymptote that is simply reflective of the distribution of traffic on various other online resources. To the original query posted by Bert, I would say no, there is not a need for a beginner's list. Regards, Marc Schwartz R-Help.pdf Description: Adobe PDF document __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] convert one digit numbers to two digits one
On Nov 6, 2013, at 10:25 AM, Alaios ala...@yahoo.com wrote: Hi all, the following returns the hour and the minutes paste(DataSet$TimeStamps[selectedInterval$start,4], DataSet$TimeStamps[selectedInterval$start,5],sep=:) [1] 12:3 the problem is that from these two I want to create a time stamp so 12:03. The problem is that the number 3 is not converted to 03. Is there an easy way when I have one digit integer to add a zero in the front? Two digits integers are working fine so far, 12:19, or 12:45 would appear correctly I would like to thank you in advance for your help Regards Alex This is an example where using ?sprintf gives you more control: sprintf(%02d:%02d, 12, 3) [1] 12:03 sprintf(%02d:%02d, 9, 3) [1] 09:03 The syntax '%02d' tells sprintf to print the integer and pad with leading zeroes to two characters where needed. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Basic question: why does a scatter plot of a variable against itself works like this?
On Nov 6, 2013, at 10:40 AM, Tal Galili tal.gal...@gmail.com wrote: Hello all, I just noticed the following behavior of plot: x - c(1,2,9) plot(x ~ x) # this is just like doing: plot(x) # when maybe we would like it to give this: plot(x ~ c(x)) # the same as: plot(x ~ I(x)) I was wondering if there is some reason for this behavior. Thanks, Tal Hi Tal, In your example: plot(x ~ x) the formula method of plot() is called, which essentially does the following internally: model.frame(x ~ x) x 1 1 2 2 3 9 Note that there is only a single column in the result. Thus, the plot is based upon 'y' = c(1, 2, 9), while 'x' = 1:3, which is NOT the row names for the resultant data frame, but the indices of the vector elements in the 'x' column. This is just like: plot(c(1, 2, 9)) On the other hand: model.frame(x ~ c(x)) x c(x) 1 11 2 22 3 99 model.frame(x ~ I(x)) x I(x) 1 11 2 22 3 99 In both of the above cases, you get two columns of data back, thus the result is essentially: plot(c(1, 2, 9), c(1, 2, 9)) Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Questions about R
On Nov 6, 2013, at 11:09 AM, Silvia Espinoza siles...@gmail.com wrote: Good morning. I am interested in downloading R. I would appreciate if you can help me with the following questions, please. 1. Is R free, or I have to pay for support/maintenance, or it depends on the version? Is there a paid version? Yes, it is free, although there are commercial versions of R available, if you decide that you do need/want commercial support. Some additional info on commercial versions here: http://cran.r-project.org/doc/FAQ/R-FAQ.html#What-is-R_002dplus_003f None of this has any effect on your ability to use R in a commercial setting, though there are some CRAN packages that do have such limitations: http://cran.r-project.org/doc/FAQ/R-FAQ.html#Can-I-use-R-for-commercial-purposes_003f 2. How safe is it to work with data using R? Is there any risk that someone else can have access to the information? That is outside of the scope of R and is dependent upon the security of the computer system(s) and possibly networks, upon and over which R is running and where your data is stored and managed. Regards, Marc Schwartz Thanks in advance for your attention and for any help you can provide me. Silvia Espinoza __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fail to install packages in R3.0.2 running in Redhat linux
On Nov 5, 2013, at 4:38 AM, Mao Jianfeng jianfeng@gmail.com wrote: Dear R-helpers, Glad to write to you. I would like to have your helps to install packages through internet, in a linux computer. Could you please share any of your expertise with me on this problem? Thanks in advance. Best Jian-Feng, # check the problem here. install.packages(pkgs=ggplot2, repos='http://ftp.ctex.org/mirrors/CRAN/ ') Installing package into ‘/checkpoints/home/jfmao/bin/R_library’ (as ‘lib’ is unspecified) Error: Line starting 'html ...' is malformed! The error suggests that there is a problem with the CRAN mirror that you have specified. I would try a different CRAN mirror and see if that resolves the problem. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fail to install packages in R3.0.2 running in Redhat linux
Can you use those programs to get to the package tar file directly: http://ftp.ctex.org/mirrors/CRAN/src/contrib/ggplot2_0.9.3.1.tar.gz If so, you might want to download it and then install as a local package installation on the remote server from the CLI (eg. using R CMD INSTALL ...). You might also want to look at ?download.file for some additional hints on download methods which can be specified in the install.packages() call ('methods' argument) and possible options to deal with proxies if that is the issue. If you cannot use those programs to get to the tar file directly, it is possible that the remote linux server is blocked from accessing the CRAN mirror network. If so, check with a SysAdmin to see if there is something on the remote server that needs to be configured to allow you access to CRAN mirrors. Regards, Marc On Nov 5, 2013, at 6:59 AM, Mao Jianfeng jianfeng@gmail.com wrote: Hi Marc, Thanks a lot for your reply. In fact, I am running R in a remote linux server. I am wondering there are some special settings for Internet access in this server. I have ever tried to use different CRAN mirrors, and failed. I can use lftp, wget, curl to link to internet, in this server. So, do you have any ideas/tools/scripts on how to track the real problem, in my case? Best Jian-Feng, 2013/11/5 Marc Schwartz marc_schwa...@me.com On Nov 5, 2013, at 4:38 AM, Mao Jianfeng jianfeng@gmail.com wrote: Dear R-helpers, Glad to write to you. I would like to have your helps to install packages through internet, in a linux computer. Could you please share any of your expertise with me on this problem? Thanks in advance. Best Jian-Feng, # check the problem here. install.packages(pkgs=ggplot2, repos='http://ftp.ctex.org/mirrors/CRAN/ ') Installing package into /checkpoints/home/jfmao/bin/R_library (as lib is unspecified) Error: Line starting 'html ...' is malformed! The error suggests that there is a problem with the CRAN mirror that you have specified. I would try a different CRAN mirror and see if that resolves the problem. Regards, Marc Schwartz -- Jian-Feng, Mao Post doc Forest Sciences Center University of British Columbia Vancouver, Canada [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] alternative for shell() in Mac
On Nov 4, 2013, at 11:52 AM, Nicolas Gutierrez nicolas.gutier...@msc.org wrote: Hi All, I'm trying to run an ADMB function on R for Mac and need to find a substitute for the Windows command shell(). I tried system() but I get the following message: system(ADMBFile) /bin/sh: /Users/nicolas/Desktop/SPE/LBSPR_ADMB/L_AFun.exe: cannot execute binary file Any hints please? Cheers, N Why would you expect a Windows executable file (L_AFun.exe) to run on a non-Windows operating system? This is not related to the system call, but that you are trying to run the wrong executable. ADMB is presumably associated with AD Model Builder and you may be better off posting to the r-sig-mixed-models list: https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 'yum install R' failing with tcl/tk issue
On Oct 25, 2013, at 1:29 AM, Prof Brian Ripley rip...@stats.ox.ac.uk wrote: On 25/10/2013 02:33, Michael Stauffer wrote: Hi, I'm trying to install R on CentOS 6.4. This is not the right list. But - As the posting guide says, we only support current R here. R 2.10.0 is ancient, and other people seem to have found 3.0.1 RPMs for Centos 6.3. - It seems your RPM is linked against Tcl/Tk 8.4, also ancient. Tcl/Tk 8.6 is current. I suggest you install R 3.0.2 from the sources, in which case R-devel would be the right list. For binary installations on CentOS, R-sig-Fedora is. There are several inconsistencies in the output, as 3.0.1 is available as an RPM from the EPEL repos: http://dl.fedoraproject.org/pub/epel/6/x86_64/repoview/R.html In addition, the output below shows that the R rpm being installed is from 'el5', rather than 'el6'. If this was CentOS 5, rather than 6, R 2.15.2 is available: http://dl.fedoraproject.org/pub/epel/5/x86_64/repoview/R.html Something seems to be amiss with the configuration not getting the right yum repo paths. A Google search came up with this link: http://lancegatlin.org/tech/centos-6-clear-the-yum-cache which might be helpful, as it suggests a similar issue of yum picking up incorrect versions. You may need to reinstall the EPEL repo RPM after these steps. Regards, Marc Schwartz Following some instructions online, I've done this: rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm yum install R But yum fails, with this (full output below): Error: Package: R-core-2.10.0-2.el5.x86_64 (Rocks-6.1) Requires: libtcl8.4.so()(64bit) Error: Package: R-core-2.10.0-2.el5.x86_64 (Rocks-6.1) Requires: libtk8.4.so()(64bit) I have tcl/tk 8.5 already installed. Does anyone have any suggestion? Thanks! Full output: [root@picsl-cluster ~]# yum install R Repository base is listed more than once in the configuration Rocks-6.1 | 1.9 kB 00:00 base | 3.7 kB 00:00 Setting up Install Process Resolving Dependencies -- Running transaction check --- Package R.x86_64 0:2.10.0-2.el5 will be installed -- Processing Dependency: libRmath-devel = 2.10.0-2.el5 for package: R-2.10.0-2.el5.x86_64 -- Processing Dependency: R-devel = 2.10.0-2.el5 for package: R-2.10.0-2.el5.x86_64 -- Running transaction check --- Package R-devel.x86_64 0:2.10.0-2.el5 will be installed -- Processing Dependency: R-core = 2.10.0-2.el5 for package: R-devel-2.10.0-2.el5.x86_64 --- Package libRmath-devel.x86_64 0:2.10.0-2.el5 will be installed -- Processing Dependency: libRmath = 2.10.0-2.el5 for package: libRmath-devel-2.10.0-2.el5.x86_64 -- Running transaction check --- Package R-core.x86_64 0:2.10.0-2.el5 will be installed -- Processing Dependency: libtk8.4.so()(64bit) for package: R-core-2.10.0-2.el5.x86_64 -- Processing Dependency: libtcl8.4.so()(64bit) for package: R-core-2.10.0-2.el5.x86_64 -- Processing Dependency: libgfortran.so.1()(64bit) for package: R-core-2.10.0-2.el5.x86_64 --- Package libRmath.x86_64 0:2.10.0-2.el5 will be installed -- Running transaction check --- Package R-core.x86_64 0:2.10.0-2.el5 will be installed -- Processing Dependency: libtk8.4.so()(64bit) for package: R-core-2.10.0-2.el5.x86_64 -- Processing Dependency: libtcl8.4.so()(64bit) for package: R-core-2.10.0-2.el5.x86_64 --- Package compat-libgfortran-41.x86_64 0:4.1.2-39.el6 will be installed -- Finished Dependency Resolution Error: Package: R-core-2.10.0-2.el5.x86_64 (Rocks-6.1) Requires: libtcl8.4.so()(64bit) Error: Package: R-core-2.10.0-2.el5.x86_64 (Rocks-6.1) Requires: libtk8.4.so()(64bit) You could try using --skip-broken to work around the problem ** Found 57 pre-existing rpmdb problem(s), 'yum check' output follows: foundation-git-1.7.11.4-0.x86_64 has missing requires of perl(SVN::Client) foundation-git-1.7.11.4-0.x86_64 has missing requires of perl(SVN::Core) foundation-git-1.7.11.4-0.x86_64 has missing requires of perl(SVN::Delta) foundation-git-1.7.11.4-0.x86_64 has missing requires of perl(SVN::Ra) 1:guestfish-1.7.17-26.el6.x86_64 has missing requires of libguestfs = ('1', '1.7.17', '26.el6') opt-perl-AcePerl-1.92-0.el6.x86_64 has missing requires of perl(Ace::Browser::LocalSiteDefs) opt-perl-BioPerl-1.6.901-0.el6.noarch has missing requires of perl(Apache::DBI) opt-perl-BioPerl-1.6.901-0.el6.noarch has missing requires of perl(Bio::ASN1::EntrezGene) opt-perl-BioPerl-1.6.901-0.el6.noarch has missing requires of perl(Bio::Expression::Contact) opt-perl-BioPerl-1.6.901-0.el6.noarch has missing requires of perl(Bio::Expression::DataSet) opt-perl-BioPerl-1.6.901-0.el6.noarch has missing requires of perl(Bio::Expression::Platform) opt-perl-BioPerl-1.6.901-0.el6.noarch has missing requires of perl(Bio::Expression::Sample) opt-perl-BioPerl-1.6.901-0.el6.noarch has
Re: [R] Use R to plot a directory tree
One R package that might be of interest would be 'diagram': http://cran.r-project.org/web/packages/diagram/ I would also agree with Bert here and would point you in the direction of PSTricks, which can handle these sorts of complex figures. It would of course require learning LaTeX, but that is a good thing. :-) More info here: http://tug.org/PSTricks/main.cgi/ and lots of examples with code here: http://tug.org/PSTricks/main.cgi?file=examples I use PSTricks for creating things like subject disposition flow charts for clinical study reports. Regards, Marc Schwartz On Oct 24, 2013, at 8:47 AM, Bert Gunter gunter.ber...@gene.com wrote: A wild guess -- take a look at the CRAN phylohenetics task view, as that sounds like the sort of thing that might have tree generation and manipulation functions. ... but you may do better with some non-R tool out there. (Hopefully, you'll get a better response, though). Cheers, Bert On Thu, Oct 24, 2013 at 6:13 AM, Thaler,Thorn,LAUSANNE,Applied Mathematics thorn.tha...@rdls.nestle.com wrote: Dear all, I was wondering whether (or better: how) I can use R to read recursively a directory to get all the sub-folders and files located in the root folder and put it into a tree like structure where the leaves are files and intermediate nodes are the directories? The idea is that I'd like to plot the structure of a certain root folder to be able to restructure the file system. Any ideas on that? I was googling a lot but apparently I did not use the right terms (R tree folder or R tree directory takes me mainly to pages about the R-tree a structure for spatial access methods [at least I learnt something new ;)]) Any pointer to the right function is highly appreciated. Cheers, Thorn Thaler NRC Lausanne Applied Mathematics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] installing package from source
On Oct 24, 2013, at 11:38 AM, David Winsemius dwinsem...@comcast.net wrote: On Oct 23, 2013, at 7:53 PM, Long Vo wrote: Hi R users, Currently I want to fit a FIGARCH model to a dataset. The only package that allow for it that I could find is fGarch. However it seems that the FIGARCH model class fitting of this package has been moved to Oxmetrics. I tried to install the old versions of it using 'tar.gz' files from CRAN archive http://cran.r-project.org/src/contrib/Archive/fGarch/ http://cran.r-project.org/src/contrib/Archive/fGarch/ but not sure how it works. I tried install.packages(myfilepath\fGarch_260.71.tar.gz, repos = NULL, type=source) And received this error: Warning: invalid package './I:_R filesGarch_260.71.tar.gz' Error: ERROR: no packages specified Warning messages: 1: running command 'I:/01_RFI~1/INSTAL~1/R-30~1.1/bin/i386/R CMD INSTALL -l I:\01_R files\installment\R-3.0.1\library ./I:_R files Garch_260.71.tar.gz' had status 1 2: In install.packages(I:\001_R files\fGarch_260.71.tar.gz, repos = NULL, : installation of package ‘./I:_R filesGarch_260.71.tar.gz’ had non-zero exit status Any helps on this? I've aways specified the package names and their locations separately in my call to install.packages, but I don't know if that is always needed. It also appears that you have no / separator between your path and the file name. Long is trying to install a rather old version of the source R package that contains FORTRAN code on Windows. Besides the immediate error in the way the path was constructed in the install.packages() call, using a single backslash, which needs to be escaped: http://cran.r-project.org/bin/windows/base/rw-FAQ.html#R-can_0027t-find-my-file there are likely to be issues from trying to install an old version of the package on a newer version of R, perhaps the lack of the requisite development tools for compiling FORTRAN: http://cran.r-project.org/bin/windows/base/rw-FAQ.html#Can-I-install-packages-into-libraries-in-this-version_003f and other issues as well. Depending upon how far back you need to go in package versions, there may be pre-compiled Windows binaries (.zip files) available in directories here: http://cran.r-project.org/bin/windows/contrib/ Regards, Marc Schwartz Regards, Long __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.