Re: [R] Option "-shared" not being passed to gcc when installing packages
Thanks Ivan. I do have things in /etc/R. I have copied all files in /etc/R them out of a freshly pulled r-base:latest docker container. This seems to have worked. Thank you! How would I generally replace those or get them updated when installing a new version of version of R (which I do using R apt package repo)? Thanks again, Simon On Thu, 21 Sept 2023 at 17:09, Ivan Krylov wrote: > On Thu, 21 Sep 2023 10:09:56 +1000 > Simon Knapp wrote: > > > I am using R version 4.3.1 (2023-06-16) -- "Beagle Scouts" on ubuntu > > 20.04 > > > gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG-fopenmp -fpic > > -g -O2 -fdebug-prefix-map=/build/r-base-jbaK_j/r-base-3.6.3=. > > It seems to be picking up compiler flags from a much older version of > R. Do you have anything in ~/.R/Makevars*? What about /etc/R/Make*? > > https://cran.r-project.org/doc/manuals/R-admin.html#Customizing-package-compilation > > -- > Best regards, > Ivan > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Option "-shared" not being passed to gcc when installing packages
oth.c -o sparse-smooth.o gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG-fopenmp -fpic -g -O2 -fdebug-prefix-map=/build/r-base-jbaK_j/r-base-3.6.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c sparse.c -o sparse.o gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG-fopenmp -fpic -g -O2 -fdebug-prefix-map=/build/r-base-jbaK_j/r-base-3.6.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c tprs.c -o tprs.o gcc -std=gnu99 -L/usr/lib/R/lib -Wl,-Bsymbolic-functions -Wl,-z,relro -o mgcv.so coxph.o davies.o discrete.o gdi.o init.o magic.o mat.o matrix.o mgcv.o misc.o mvn.o ncv.o qp.o soap.o sparse-smooth.o sparse.o tprs.o -llapack -lblas -lgfortran -lm -lquadmath -fopenmp -L/usr/lib/R/lib -lR /usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/Scrt1.o: in function `_start': (.text+0x24): undefined reference to `main' collect2: error: ld returned 1 exit status make: *** [/usr/share/R/share/make/shlib.mk:10: mgcv.so] Error 1 ERROR: compilation failed for package ‘mgcv’ * removing ‘/home//R/x86_64-pc-linux-gnu-library/4.3/mgcv’ The downloaded source packages are in ‘/tmp/RtmpI7uMoa/downloaded_packages’ Warning message: In install.packages("mgcv") : installation of package ‘mgcv’ had non-zero exit status > Kind regards, Simon Knapp [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem reading from NetCDF.
Hi List, I'm not sure what I'm doing wrong here and would appreciate some help. I create a NetCDF dataset as follows: library(ncdf) combined.ncdf - with(new.env(), { nLat - 7 nLon - 8 nTime - 9 missing.val - -999 filename - file.path(tempdir(), 'combined_d6fg4s64s6g4l.nc') lat.dim - dim.def.ncdf('lat', 'degrees', 1:nLat) lon.dim - dim.def.ncdf('lon', 'degrees', 1:nLon) time.dim - dim.def.ncdf(time, days since 1970-01-01, as.integer(1:nTime), unlim=T) var.def - var.def.ncdf('Data', 'mm', list(lon.dim, lat.dim, time.dim), missing.val, prec='single') nc - create.ncdf(filename, var.def) for(i in 1:nLon) for(j in 1:nLat) for(k in 1:nTime) put.var.ncdf(nc, var.def, i + j*10 + k*100, c(i, j, k), rep(1, 3)) nc }) the following do not work: get.var.ncdf(combined.ncdf) get.var.ncdf(combined.ncdf, varid='Data') get.var.ncdf(combined.ncdf, varid=4) all of which I thought would be equivalent and would give me the same as: get.var.ncdf(combined.ncdf, forcevarid=4) Could someone please point out the error of my ways. Thanx in advance, Simon Knapp R version: 2.14.1, ncdf built with version: 2.14.2 os: windows 7 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question on callNextMethod
I don't understand why I get the following results. I define two classes 'Base' and 'Derived', the latter of which 'contains' the first. I then define a generic method 'test' and overload it for each of these classes. I call 'callNextMethod()' in the overload for Derived. From the output, it appears that the overload for Base gets called twice. Why is this? Test code follows: setClass('Base') setClass('Derived', contains='Base') setGeneric('test', function(x) standardGeneric('test')) setMethod('test', signature(x='Base'), function(x) print('base called')) setMethod('test', signature(x='Derived'), function(x) {print('derived called'); callNextMethod()}) d = new('Derived') test(d) Produces the output: [1] derived called [1] base called [1] base called and I was expecting: [1] derived called [1] base called Thanx in advance, Simon Knapp [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Logistic regression/Cut point? predict ??
What do you mean by at x equal zero? On Sun, Oct 21, 2012 at 8:37 AM, Adel Powell powella...@gmail.com wrote: I am new to R and I am trying to do a monte carlo simulation where I generate data and interject error then test various cut points; however, my output was garbage (at x equal zero, I did not get .50) I am basically testing the performance of classifiers. Here is the code: n - 1000; # Sample size fitglm - function(sigma,tau){ x - rnorm(n,0,sigma) intercept - 0 beta - 5 * ystar - intercept+beta*x* * z - rbinom(n,1,plogis(ystar))**# I believe plogis accepts the a +bx augments and return the e^x/(1+e^x) which is then used to generate 0 and 1 data* xerr - x + rnorm(n,0,tau)# error is added here model-glm(z ~ xerr, family=binomial(logit)) int-coef(model)[1] slope-coef(model)[2] pred-predict(model) #this gives me the a+bx data for new error? I know I can add type= response to get the probab. but only e^x not *e^x/(1+e^x) * pi1hat-length(z[which(z==1)]/length(z)) My cut point is calculated is the proportion of 0s to 1. pi0hat-length(z[which(z==0)]/length(z)) cutmid - log(pi0hat/pi1hat) pred-ifelse(predcutmid,1,0) * I am not sure if I need to compare these two. I think this is an error. * accuracy-length(which(pred==z))/length(z) accuracy rocpreds-prediction(pred,z) auc-performance(rocpreds,auc)@y.values output-c(int,slope,cutmid,accuracy,auc) names(output)-c(Intercept,Slope,CutPoint,Accuracy,AUC) return(output) } y-fitglm(.05,1) y nreps - 500; output-data.frame(matrix(rep(NA),nreps,6,ncol=6)) mysigma-.5 mytau-.1 i-1 for(j in 1:nreps) { output[j,1:5]-fitglm(mysigma,mytau) output[j,6]-j } names(output)-c(Intercept,Slope,CutPoint,Accuracy,AUC,Iteration) apply(output,2, mean) apply(output,2, var) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] time extraction and normalization
Whoops... tried to get all descriptive with the argument names before I posted but after I tested. Try this toDaySecond - function(date.string, format.string='%d%b%Y:%T', tz='') { d - as.POSIXct(date.string, format=format.string, tz=tz) sum(mapply(function(f, l) as.numeric(format(d, format=f)) * l, c('%H', '%M', '%S'), c(3600, 60, 1))) } On Tue, Oct 16, 2012 at 11:22 PM, arun smartpink...@yahoo.com wrote: Hi Simon, I tried your code, but I am getting some error message: toDaySecond('04MAY2011:08:19:00') #Error in as.POSIXct(d, format = format.string, tz = tz) : #object 'd' not found A.K. - Original Message - From: Simon Knapp sleepingw...@gmail.com To: Jeff Newmiller jdnew...@dcn.davis.ca.us Cc: r-help@r-project.org; york8866 yu_y...@hotmail.com Sent: Tuesday, October 16, 2012 1:32 AM Subject: Re: [R] time extraction and normalization toDaySecond - function(date.string, format.string='%d%b%Y:%T', tz='') { d - as.POSIXct(d, format=format.string, tz=tz) sum(mapply(function(f, l) as.numeric(format(d, format=f)) * l, c('%H', '%M', '%S'), c(3600, 60, 1))) } toDaySecond('04MAY2011:08:19:00') Will calculate the second of the day. divide by 3600 to convert to hours. On Tue, Oct 16, 2012 at 4:21 PM, Jeff Newmiller jdnew...@dcn.davis.ca.us wrote: There are multiple ways. For lack of your example, I would suggest multiplying by 24. Please post the previous context of the thread if you must post from Nabble. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. york8866 yu_y...@hotmail.com wrote: Thanks, The code gives the numbers in days, how can I adjust the code to directly get the numbers in hours? I tried units but it did not work. Thanks1 -- View this message in context: http://r.789695.n4.nabble.com/time-extraction-and-normalization-tp4646275p4646298.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] vectors comparison
Your question does not seem to make sense - there is no value of -500 in Y (did you mean -10?). Anyway, I think this might work: which(y==-10 (x==1 | c(0, x[-length(x)]) == 1 | c(x[-1], 0) == 1)) ... though one would think there is a more elegant way On Wed, Oct 17, 2012 at 10:07 AM, Francesco Nutini nutini.france...@gmail.com wrote: Dear R-Users, I'd like to have your help on this problem: I have two vectors:x- c(0,1,0,0,0,0,0,0,0,0,0,1,0)y- c(0,0,-10,0,0,-10,0,-10,0,0,0,0,0) And I want to know where the value -500 in y have a correspondence value 1 in x.Considering a buffer of one position before and after in x.i.e. in this example only the -10 in position y[3] satisfies the criteria, because x has in position x[2]the figure 1. Thank in advance for you help,Francesco [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rJava install - %1 is not a valid Win32 application.
My guess would be that your running the 32 bit version of R - and rJava is looking for the 64 bit dll. I'd suggest starting the 64 bit version of R explicitly (e.g. the 64 bit version of Rgui lives at R_HOME/bin/x64/Rgui.exe, whereas the 32 bit version lives at R_HOME/bin/i386/Rgui.exe). On Tue, Oct 16, 2012 at 2:37 AM, Steven Ranney steven.ran...@gmail.com wrote: All – I’m having a problem with the rJava package. I can download it to my machine (Win 7 64-bit) but when I try to load the package into R (2.15.1, 64-bit version), I get the following error: require(rJava) Loading required package: rJava Error : .onLoad failed in loadNamespace() for 'rJava', details: call: inDL(x, as.logical(local), as.logical(now), ...) error: unable to load shared object 'C:/Users/sranney/Documents/R/win-library/2.15/rJava/libs/x64/rJava.dll': LoadLibrary failure: %1 is not a valid Win32 application. I have verified that the file R is looking for is in the appropriate place, but I continue to get the error. I have tried to download rJava from another source, but still get the error. I have not been able to find another user with this same issue. Thanks for your help – Steven Ranney [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] warning message
The second and third arguments to ifelse are evaluated for all elements of x (producing your warnings), then the appropriate elements of each result are combined based on the (logical) values of the first argument to ifelse. On Tue, Oct 16, 2012 at 10:08 AM, Shi, Tao shida...@yahoo.com wrote: Hi list, Can somebody explain why there are these warning messages? I just don't get it. I'm using R 2.15.1 on WinXP. Thanks! Tao x [1] -2.143510 -1.157450 -1.315581 1.033562 -1.225440 -1.179909 ifelse(x0, log2(x), -log2(-x)) [1] -1.099975 -0.210950 -0.395700 0.047625 -0.293300 -0.238675 Warning messages: 1: In ifelse(x 0, log2(x), -log2(-x)) : NaNs produced 2: In ifelse(x 0, log2(x), -log2(-x)) : NaNs produced __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calling R from C (Windows)
Yes. Your looking at R_HOME (the installation directory), not the directory of the sources. On Tue, Oct 16, 2012 at 10:08 AM, BayesForever andrew.mor...@verizon.net wrote: I am trying to follow the instructions in section 8.2.2 of Writing R extensions, to learn how to call R from C. In this manual and at other sites I see continual references to examples in the tests/Embedding directory of the sources. There is no directory named Embedding anywhere in my 2.13.1 install of R. Am I missing components that I need to call R from C, or am I misunderstanding the meaning of the tests/Embedding directory of the sources? -- View this message in context: http://r.789695.n4.nabble.com/Calling-R-from-C-Windows-tp4646285.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Building a package with an existing dll
Anything you put in the folder 'inst' of a package gets copied, as is, to the installed package once 'everything else' is done (see section 1.1.3 of Writing R Extensions - which you read for caveats). Hence you can create inst/libs/arch/libs/package_name.dll where: - arch is either 'x64' or 'i386' (or both if you can compile for both architectures) for 64 or 32 bit R respectively, and - package_name is the name of the package. Then as per section 1.6.3 of Writing R Extensions You can then add the line useDynLib(package_name) to your NAMESPACE file and it should then work as before (noting that you should use the PACKAGE argument in your .C, .Call and .External calls to the routines therein). Optionally, you can list the symbol names in the call to useDynLib as per section 1.6.3 of Writing R Extensions. On Tue, Oct 16, 2012 at 1:24 PM, GlennManion glenn.man...@environment.nsw.gov.au wrote: I have setup a package directory structure with all the relevent files in the src, R, man and data directories. Also have the correct DESCRIPTION and NAMESPACE files in the root directory. I don't wisk to recompile the .dll file in the src directory as it currently works fine if I locate it in the same workdirectory as my R source code that calls it. This is a little tedious having to move source and dll into every directory that I wish to work in, thus the need to convert into a package. Is there a way to create a package that does not need the dll re-compiled? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] time extraction and normalization
toDaySecond - function(date.string, format.string='%d%b%Y:%T', tz='') { d - as.POSIXct(d, format=format.string, tz=tz) sum(mapply(function(f, l) as.numeric(format(d, format=f)) * l, c('%H', '%M', '%S'), c(3600, 60, 1))) } toDaySecond('04MAY2011:08:19:00') Will calculate the second of the day. divide by 3600 to convert to hours. On Tue, Oct 16, 2012 at 4:21 PM, Jeff Newmiller jdnew...@dcn.davis.ca.us wrote: There are multiple ways. For lack of your example, I would suggest multiplying by 24. Please post the previous context of the thread if you must post from Nabble. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. york8866 yu_y...@hotmail.com wrote: Thanks, The code gives the numbers in days, how can I adjust the code to directly get the numbers in hours? I tried units but it did not work. Thanks1 -- View this message in context: http://r.789695.n4.nabble.com/time-extraction-and-normalization-tp4646275p4646298.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Keep rows in a dataset if one value in a column is duplicated
#By using cbind in: PairIDs-cbind(PairID, PairIDDuplicates) #You create a numeric matrix (the logical #vector PairIDDuplicates gets converted #to numeric - note that your second column #contains 1s and 0s, not Trues and Falses). #Matricies are not subsetable using $, #they are basically a vector with #a dimension attribute - hence your error). #Two ways you could have avoided your error are: # 1) changing the cbind to data.frame PairIDs - data.frame(PairID, PairIDDuplicates) names(PairIDs) - c(Pairid,Pairiddups) Health2PairsOnly - PairIDs[PairIDs$Pairiddups,] # 2) using the dimensions name like: PairIDs-cbind(PairID, PairIDDuplicates) colnames(PairIDs) - c(Pairid,Pairiddups) Health2PairsOnly - PairIDs[PairIDs[,'Pairiddups']==1,] #In the latter you can save a line of code with PairIDs - data.frame(Pairid=PairID, Pairiddups=PairIDDuplicates) #Note that there is a fair bit of redundancy throughout #your code. A neater way of subsetting your original #data, for instance, would be: PairIDdup - unique(PairID[duplicated(PairID)]) Health2[PairID %in% PairIDdup,] Have Fun! Simon Knapp On Fri, Sep 28, 2012 at 5:46 AM, GradStudentDD dd...@virginia.edu wrote: Hi, I have a data set of observations by either one person or a pair of people. I want to only keep the pair observations, and was using the code below until it gave me the error $ operator is invalid for atomic vectors. I am just beginning to learn R, so I apologize if the code is really rough. Basically I want to keep all the rows in the data set for which the value of Pairiddups is TRUE. How do I do it? And how do I get past the error? Thank you so much, Diana PairID-c(Health2$pairid) duplicated(PairID, incomparables=TRUE, fromLast=TRUE) PairIDdup=duplicated(PairID) cbind(PairID, PairIDdup) PairID[which(PairIDdup)] PairIDDuplicates-PairID%in%PairID[which(PairIDdup)] PairIDs-cbind(PairID, PairIDDuplicates) colnames(PairIDs)-c(Pairid,Pairiddups) Health2PairsOnly-PairIDs[ which(PairIDs$Pairiddups=='TRUE'), ] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] combining numeric vector and column in a data frame
combined - data.frame(mnv=my.numeric.vec[df1$fileName], type=df1$type) sorted - combined[order(rownames(combined)),] On Wed, Jul 11, 2012 at 8:38 AM, Adrian Johnson oriolebaltim...@gmail.com wrote: Hi: I am trying to map column names of a matrix to another data frame and get values in a different column of data frame I have a matrix m. my.numeric vec - m[1,] my.numeric.vec a b cd ef 2 1 49 10 3 ## my data frame = df1 df1 fileNametype status b N alive a Tdead d N alive c T dead f Nalive e Tdead I want to combine as.numeric.vec and df1 and create j with correct filename and type and numeric value j my.numeric.vectype a 2 T b 1 N c 4 T d 9 N e 10 T f 3 N How can I combine my.numeric.vec and df1$type when I try: df1[df1$fileName %in% names(my.numeric.vec),2] I get wrong answer. thanks in advance. Adrian __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fractional Factorial - Wrong values using lm-function
They are coding the variables as factors and using orthogonal polynomial contrasts. This: data.catapult - data.frame(data.catapult$Distance, do.call(data.frame, lapply(data.catapult[-1], factor, ordered=T))) contrasts(data.catapult$h) - contrasts(data.catapult$s) - contrasts(data.catapult$l) - contrasts(data.catapult$e) - contr.poly(3, contrasts=F) contrasts(data.catapult$b) - contr.poly(2, contrasts=F) lm1 - lm(Distance ~ .^2, data=data.catapult) summary(lm1) gets you closer (same intercept at least), but I can't explain the remaining differences. I'm not even sure why the results to look like they do (interaction terms like a*b not a:b and one level for each interaction). Hope that helps, Simon Knapp __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fractional Factorial - Wrong values using lm-function
... but this is tantalisingly close: dat1 - with(data.catapult, data.frame( Distance, h=C(h, poly, 1), s=C(s, poly, 1), l=C(l, poly, 1), e=C(e, poly, 1), b=C(b, poly, 1) ) ) lm4 - lm(Distance ~ .^2, data = dat1) summary(lm4) ... wish I knew what it meant. On Tue, Jun 26, 2012 at 12:18 AM, Simon Knapp sleepingw...@gmail.com wrote: They are coding the variables as factors and using orthogonal polynomial contrasts. This: data.catapult - data.frame(data.catapult$Distance, do.call(data.frame, lapply(data.catapult[-1], factor, ordered=T))) contrasts(data.catapult$h) - contrasts(data.catapult$s) - contrasts(data.catapult$l) - contrasts(data.catapult$e) - contr.poly(3, contrasts=F) contrasts(data.catapult$b) - contr.poly(2, contrasts=F) lm1 - lm(Distance ~ .^2, data=data.catapult) summary(lm1) gets you closer (same intercept at least), but I can't explain the remaining differences. I'm not even sure why the results to look like they do (interaction terms like a*b not a:b and one level for each interaction). Hope that helps, Simon Knapp __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fractional Factorial - Wrong values using lm-function
and finally... thingy - function(x) { x - C(x, poly, 1) tmp - contrasts(x) contrasts(x, 1) - 2 * tmp / sum(abs(tmp)) x } dat2 - with(data.catapult, data.frame( Distance, h=thingy(h), s=thingy(s), l=thingy(l), e=thingy(e), b=thingy(b) ) ) lm5 - lm(Distance ~ .^2, data = dat2) summary(lm5) On Tue, Jun 26, 2012 at 12:35 AM, Simon Knapp sleepingw...@gmail.com wrote: ... but this is tantalisingly close: dat1 - with(data.catapult, data.frame( Distance, h=C(h, poly, 1), s=C(s, poly, 1), l=C(l, poly, 1), e=C(e, poly, 1), b=C(b, poly, 1) ) ) lm4 - lm(Distance ~ .^2, data = dat1) summary(lm4) ... wish I knew what it meant. On Tue, Jun 26, 2012 at 12:18 AM, Simon Knapp sleepingw...@gmail.com wrote: They are coding the variables as factors and using orthogonal polynomial contrasts. This: data.catapult - data.frame(data.catapult$Distance, do.call(data.frame, lapply(data.catapult[-1], factor, ordered=T))) contrasts(data.catapult$h) - contrasts(data.catapult$s) - contrasts(data.catapult$l) - contrasts(data.catapult$e) - contr.poly(3, contrasts=F) contrasts(data.catapult$b) - contr.poly(2, contrasts=F) lm1 - lm(Distance ~ .^2, data=data.catapult) summary(lm1) gets you closer (same intercept at least), but I can't explain the remaining differences. I'm not even sure why the results to look like they do (interaction terms like a*b not a:b and one level for each interaction). Hope that helps, Simon Knapp __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loop Help
First, what was going wrong: #this creates a character vector, not an object containing your data sets TOWERS - c(TOWER1,TOWER2,TOWER3,TOWER4,TOWER5,TOWER6,TOWER7) for(i in 1:7){ # this creates a variable named TOWER.i (not TOWER.1, TOWER.2, ..., TOWER.7) # which will get overwritten each time through the loop TOWER.i - TOWERS[i] # This is what causes your error, as TOWER.i is a string (TOWER1, TOWER2, ...), and hence nrow(TOWER.i) return NULL TOWER - rep(i,nrow(TOWER.i)) # if you got to here, you would have got more errors as TOWER.i is cannot be subsetted # (it is a string, not a 'dataset' with columns) TOWER.i - cbind(TOWER.i[1:2], TOWER, TOWER.i[2:length(TOWER.i)]) } # Note that to create variables like 'TOWER.1', ..., you would have to say something like assign(paste('TOWER', i, sep='.'), TOWERS[[i]]) # inside the loop. However, this is not necessary here and you would keep needing to # access the variable with similar ugliness. Here is the way I might approach this: TOWERS - list(TOWER1,TOWER2,TOWER3,TOWER4,TOWER5,TOWER6,TOWER7) # create a list of things that look like what you wanted TOWERS.i to look like # (still with column two repeated TOWERS.modified - mapply(function(x, i) cbind(x[,1:2], i, x[,2:ncol(x)]), TOWERS, 1:length(TOWERS), SIMPLIFY=F) # you can then 'merge' them (though that term would mean something else to most # R programmers - see ?merge) with: TOWERS.combined - do.call(rbind, TOWERS.modified) #Now what I would actually do is: TOWERS.combined - do.call(rbind, lapply(1:7, function(i) { x - get(paste('TOWER', i, sep='')) cbind(x[,1:2], i, x[,2:ncol(x)]) })) Hopefully that makes sense, if not I can provide more explanation, but please try ?do.call, ?get, ?lapply and ?mapply first Simon On Sat, May 19, 2012 at 9:14 AM, bdossman bdoss...@gmail.com wrote: Hi all, I am a beginner R user and need some help with a simple loop function. Currently, I have seven datasets (TOWER1,TOWER2...TOWER7) that are all in the same format (same # of col and headers). I am trying to add a new column (factor) to each dataset that simply identifies the dataset. Ultimately, I would like to merge all 7 datasets and need that column to identify what rows came from what dataset. Using the code below, I get the error message Error in rep(i, nrow(TOWER.i)) : invalid 'times' argument but it doesn't make sense to me since nrow should give an integer value. Any help will be really appreciated. TOWERS-c(TOWER1,TOWER2,TOWER3,TOWER4,TOWER5,TOWER6,TOWER7) for(i in 1:7){ TOWER.i-TOWERS[i] TOWER-rep(i,nrow(TOWER.i)) TOWER.i-cbind(TOWER.i[1:2],TOWER, TOWER.i[2:length(TOWER.i)]) } -- View this message in context: http://r.789695.n4.nabble.com/Loop-Help-tp4630555.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] converting csv to image file
provided you get the call to read.table (or perhaps read.csv) right and presuming that file contains only the image data, you should be able to say: r - raster(as.matrix(read.csv(file.csv))) extent(r) - extent(xmin, xmax, ymin, ymax) and not worry about the projection (if it is plain old decimal degrees) Regards, Simon On Sat, May 19, 2012 at 7:49 PM, Belay Gebregiorgis belay...@gmail.com wrote: Hello everyone, I want to get a 1km by lkm grid raster image using my csv data. If I call latitude=a, longitude=b and preciptation=c. a-(1,2,3,4,5) b-(6,7,8,9,10) c-(10,20, 30,40, 50) Then I found an example in r help which goes like pts = read.table(file.csv,..) library(sp) library(rgdal) proj4string(pts)=CRS(+init=epsg:4326) # set it to lat-long pts = spTransform(pts,CRS(insert your proj4 string here)) gridded(pts) = TRUE r = raster(pts) projection(r) = CRS(insert your proj4 string here) Because I am new to R, I have no idea what to put into the proj4 string and all that. Can anyone help me on this? If there is a different way of doing this, that is also fine. Kind Regards, Belay [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Q - scatterplot, plot function trellis linear regressions???
I presume you mean car::scatterplot A1) plot is a generic function for plotting 'things' and scatterplot is a specific tool provided by that library A2) from the documentation one infers that 1) the straight line is a regression line 2) the other lines are produced using loess a) the solid line is fitted to the data b) the dotted lines are created by: a smoother is applied to the root-mean-square positive and negative residuals from the loess line to display conditional spread and asymmetry A3) you forgot the data argument (at least). Cheers, Simon On Sat, May 19, 2012 at 11:34 AM, Jhope jeanwaij...@gmail.com wrote: Hi R-listers, Q1) What is the difference between the scatterplot and plot function? Q2) I am able to make a graph with the scatterplot function: scatterplot(DevelopIndex ~ Veg, + data = Turtle, + xlab = Vegetation border (m), + ylab = Embryonic development index) And have been successful. But I do not know if the lines are for: linear, non-linear, mean, upper and lower limits or percentile? This shows side boxplots and 4 lines. Yet this plot only has one regression line and no side boxplots?: EDIHTL - lm(DevelopIndex~ HTL, data=Turtle) plot(DevelopIndex~HTL, data=Turtle, pch=16, xlab=High tide line (m), ylab=Embryonic development index) abline(EDIHTL) Q-3 I am trying to make a trellis of linear regressions. I want 3 windows displaying 3 events of linear regression (two-way interactions). I have been trying this: require(lattice) library(lattice) trellis.par.set(col.whitebg()) scatterplot(DevelopIndex~Veg|Aeventexhumed, + xlab = Vegetation border, + ylab = Embryonic development index) Error in eval(expr, envir, enclos) : object 'DevelopIndex' not found Please advise, many thanks. Jean -- View this message in context: http://r.789695.n4.nabble.com/Q-scatterplot-plot-function-trellis-linear-regressions-tp4630562.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fwd: Help please
Hi All, This is not really an R question but a statistical one. If someone could either give me the brief explanation or point me to a reference that might help, I'd appreciate it. I want to estimate the mean of a log-normal distribution, given the (log scale normal) parameters mu and sigma squared (sigma2). I understood this should simply be: exp(mu + sigma2) ... but I the following code gives me something strange: R - 1000 mu - -400 sigma2 - 200 tmp - rlnorm(R, mu, sqrt(sigma2)) # a sample from the desired log-normal distribution muh - mean(log(tmp)) sigma2h - var(log(tmp)) #by my understanding, all of the the following vectors should then contain very similar numbers c(mu, muh) c(sigma2, sigma2h) c(exp(mu + sigma2/2), exp(muh + sigma2h/2), mean(tmp)) I get the following (for one sample): c(mu, muh) [1] -400. -400.0231 c(sigma2, sigma2h) [1] 200. 199.5895 c(exp(mu + sigma2/2), exp(muh + sigma2h/2), mean(tmp)) [1] 5.148200e-131 4.097249e-131 5.095888e-150 so they do all contain similar numbers, with the exception of the last vector, which is out by a factor of 10^19. Is this likely to be because one needs **very** large samples to get a reasonable estimate of the mean... or am I missing something? Regards, Simon [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RWinEdt problem
Hi R Helpers, I am a long time RWinEdt user and have just acquired a new laptop. I have installed RWinEdt and things are going smoothly except for one small glitch - file names are not appearing on the document tabs. When I use WinEdt (as opposed to RWinEdt), they are appearing. Can anyone offer any advice on this? Thanks in advance, Simon Knapp OS: windows7 Arch: 64 bit R version: 2.13.0 (2011-04-13) WinEdt version: 5.14 (build 20050701) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RWinEdt problem
... I just tried fiddling with the appearance settings, and when I uncheck Custom Colors under Document Tabs, the file names reappear, though I don't get the coloring I am used to (red for modified, green for unmodified). Thanks again, Simon Knapp On Mon, Jul 4, 2011 at 2:48 PM, Simon Knapp sleepingw...@gmail.com wrote: Hi R Helpers, I am a long time RWinEdt user and have just acquired a new laptop. I have installed RWinEdt and things are going smoothly except for one small glitch - file names are not appearing on the document tabs. When I use WinEdt (as opposed to RWinEdt), they are appearing. Can anyone offer any advice on this? Thanks in advance, Simon Knapp OS: windows7 Arch: 64 bit R version: 2.13.0 (2011-04-13) WinEdt version: 5.14 (build 20050701) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lapply and list indexing basics (after realizing I wasn't previously subscribed...sorry)
library(MASS) dat - data.frame( col1=as.factor(sample(1:4, 100, T)), col2=as.factor(sample(1:4, 100, T)), col3=as.factor(sample(1:4, 100, T)), isi=rnorm(100) ) dat - split(dat, as.factor(sample(1:3, 100, T))) lapply(dat, function(x, densfun) fitdistr(x$isi, densfun), 'normal') Not used fitdistr before, and without data, I can't help with the error you note below (which may well still occur) Hope this helps. On Mon, Mar 8, 2010 at 1:29 PM, Dgnn sharkbrain...@gmail.com wrote: I have split my original dataframe to generate a list of dataframes each of which has 3 columns of factors and a 4th column of numeric data. I would like to use lapply to apply the fitdistr() function to only the 4th column (x$isi) of the dataframes in the list. Is there a way to do this or am I misusing lapply? As a second solution I tried splitting only the numeric data column to yield a list of vectors and then using lapply(myList, fitdistr, densfun='gamma',start=list(scale=1, shape=2)) returns the error: Error in optim(x = c(305, 290, 283, 363, 331, 293, 304, 312, 286, 339, : non-finite finite-difference value [2] In addition: Warning message: In dgamma(x, shape, scale, log) : NaNs produced However, if I use fitdistr(myList[[i]]) on each element of the list, there are no errors. Thanks in advance for any comments. -jason -- View this message in context: http://n4.nabble.com/lapply-and-list-indexing-basics-after-realizing-I-wasn-t-previously-subscribed-sorry-tp1584053p1584053.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sub-matrixes that are linked to the Base matrix
The following code gives you functionality close to what you are looking for (I think)... but someone more familar with proto might be able to make it sexier. You will need to install the package 'proto' (though with a bit of work you might be able to do this without proto, using environments alone. library(proto) my.mat - function(mat) { structure(proto( expr = { mat - mat set.vals - function(., rs, cs, vals) { if(max(rs) nrow(.$mat) || min(rs) 1) stop(row out of range) if(max(cs) ncol(.$mat) || min(cs) 1) stop(column out of range) .$mat[rs, cs] - vals } get.mat.ref - function(., rows, cols) { if(max(rows) nrow(.$mat) || min(rows) 1) stop(row out of range) if(max(cols) ncol(.$mat) || min(cols) 1) stop(column out of range) tmp - proto(., expr = { set.vals - function(., rs, cs, vals) { if(max(rs) length(.$rows) || min(rs) 1) stop(row out of range) if(max(cs) length(.$cols) || min(cs) 1) stop(column out of range) .super$mat[.$rows[rs], .$cols[cs]] - vals } }) tmp$rows - rows tmp$cols - cols structure(tmp, class=c('my.sub.mat', 'proto', 'environment')) } }), class=c('my.mat', 'proto', 'environment')) } [.my.mat - function(x, ...) get('mat', x)[...] [-.my.sub.mat - [-.my.mat - function(x, i, j, value) {x$set.vals(i, j, value); as.proto(x)} print.my.sub.mat - function(x) print(get('mat',x)[get('rows', x), get('cols', x)]) print.my.mat - function(x) print(get('mat',x)) #You can use this code like: a - my.mat(matrix(1:(4*3),4,3)) b - a$get.mat.ref(2:3, 2:3) b[1,1] - 42 # equivalent to b$set.vals(1,1,42) b[1:2,2] - 43 a[1,1] - 44 a[rep(4, 2), 2:3] - 45 b # print the sub matrix a # print the matrix #test passing sub-matrix to a function test - function(x) x[2,1] - 46 test(b) a # print the matrix again (it is as we would like) # extract the original matrix dat - a$mat Hope this helps, Simon Knapp On Mon, Jan 18, 2010 at 12:36 PM, Friedrich goti...@gmail.com wrote: Hello, I'm am in the process of trying to port a RATS functions to R and have the problem, that RATS allows for the creation of submatrixes that are linked to their basematrix. Basically it should work like this: a = matrix(1:(4*3),4,3) a # [,1] [,2] [,3] # [1,]159 # [2,]26 10 # [3,]37 11 # [4,]48 12 # This of course doesnt work :) b = submatrix(a,fromx=1,tox=2,fromy=1,toy=2) b # [,1] [,2] # [1,]0 10 # [2,]7 11 b[1,1] = 42 a[1,1] # [1] 42 so changes in the sub-matrix propagate to the base matrix. Does such a feature exist? Thanks, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] change codes into loops
You can do this... # Some random data: b_1 - b_2 - b_3 - matrix(,2,3) for(i in 1:3) eval(substitute(A - matrix(rnorm(6), 2), list(A=paste('a', i, sep='' # the loop for (i in 1:2) { for (j in 1:3) { for(k in 1:3) { eval(substitute(A[i,j] - rank(c(a1[i,j],a2[i,j],a3[i,j]))[k], list(A=as.symbol(paste('b_', k, sep='') } } } # For my own interest, I would have thought that: thingy - expression(A[i,j] - rank(c(a1[i,j],a2[i,j],a3[i,j]))[k]) for (i in 1:2) { for (j in 1:3) { for(k in 1:3) { eval(do.call('substitute', list(expr=thingy, env=list(A=as.symbol(paste('b_', k, sep='')) } } } # would have also worked, but it does not. Can someone please explain why not? Simon Knapp On Tue, Jan 19, 2010 at 1:41 PM, David Winsemius dwinsem...@comcast.netwrote: On Jan 18, 2010, at 9:21 PM, rusers.sh wrote: If the number of datasets for a* is small (here is 3), it is ok for creating b_ijn[i, j, nn] and make assignments to it. But it will be a little bit impossible for a larger number of datasets for a*, say 999. We may need 999 lines to do this. Maybe there are other alternatives. Read more carefully. The b_ijn[ , , ] array can be pre-dimensioned to any size. You must know the size since you are specifying a loop range. 2010/1/18 David Winsemius dwinsem...@comcast.net On Jan 18, 2010, at 7:19 PM, rusers.sh wrote: Hi, See example. for (i in 1:2) { for (j in 1:3) { b_1[i,j]-rank(c(a1[i,j],a2[i,j],a3[i,j]))[1] b_2[i,j]-rank(c(a1[i,j],a2[i,j],a3[i,j]))[2] b_3[i,j]-rank(c(a1[i,j],a2[i,j],a3[i,j]))[3] } } The inner codes is really repeated, so i want to change the inner codes into loops. Take nn is from 1 to 3, something like, for (nn in 1:3) { b_nn[i,j]-rank(c(a1[i,j]:a3[i,j]))[nn] } Anybody can tell me the correct method to specify the above codes? There is no correct method. You cannot index on the object name b_nn that way. R has not been developing using a syntax with that much flexibility. If you want a 3D array of values, then you could create b_ijn[i, j, nn] and make assignments to it. But if you tried to do this with paste and assign, you will spending considerably more time degbugging it than it is worth and it would likely be more inefficient than what you have. -- David. Thanks. -- - Jane Chang Queen's [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT -- - Jane Chang Queen's [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help using Cast (Text) Version
Hi Steve, These might help. #generate some example data similar to your original email, with one extra line containing all NAs (first col included to provide similarity with your data) labs - c('4er66', '4gcyi', '5d3hh', '5d3wt', 'v3st5', 'a22g5', 'b5dd3', 'g44d2', 'z') dat - scan() 1 NA 1 0 NA 0 0 0 1 0 0 0 0 0 0 NA 0 0 0 0 0 0 0 0 NA NA 1 NA NA NA NA 0 NA NA NA NA NA 0 NA NA NA NA NA 0 NA NA NA NA NA NA NA NA NA NA data - data.frame(labs=labs, dat=matrix(dat, byrow=T, ncol=6)) # I think this line of code gives what you want (number of rows that contain at least 1 non-na value). sum(apply(data[,-1], 1, function(x) any(!is.na(x # to produce the line (all) in your original output (assuming that this line counts the number of non-na entries in each col): apply(data[,-1], 2, function(x) sum(!is.na(x))) Regards, Simon Knapp On Mon, Jan 18, 2010 at 8:37 AM, Steve Sidney sbsid...@mweb.co.za wrote: Well now I am totally baffled !! Using sum( !is.na(b[,3])) I get the total of all col 3 except those that are NA - Great solves the first problem What I can't seem to do is use the same logic to count all the 1's in that col, which are there before I use the cast with margins. So it seems to me that somehow sum((b[,3]) == 1) is wrong and is the part of my understanding that's missing. My guess is that that before using margins and sum in the cast statement the col is a character type and in order for == 1 to work I need to convert this to an integer. Hope this helps you to understand the problem. Regards Steve Your help is much appreciated - Original Message - From: David Winsemius dwinsem...@comcast.net To: Steve Sidney sbsid...@mweb.co.za Cc: r-help@r-project.org Sent: Sunday, January 17, 2010 7:36 PM Subject: Re: [R] Help using Cast (Text) Version On Jan 17, 2010, at 11:56 AM, Steve Sidney wrote: David Thanks, I'll try that..but no what I need is the total (1's) for each of the rows, labelled 1-6 at the top of each col in the table provided. Part of my confusion with your request (which remains unaddressed) is what you mean by valid. The melt-cast operation has turned a bunch of NA's into 0's which are now indistinguishable from the original 0's. So I don't see any way that operating on b could tell you the numbers you are asking for. If you were working on the original data, res, you might have gotten the column-wise valid counts of column 2 with something like: sum( !is.na(res[,2]) ) What I guess I am not sure of is how to identify the col after the melt and cast. The cast object represents columns as a list of vectors. The i-th column is b[[i]] which could be further referenced as a vector. So the j-th row entry for the i-th column would be b[[i]][j]. Steve - Original Message - From: David Winsemius dwinsem...@comcast.net To: Steve Sidney sbsid...@mweb.co.za Cc: r-help@r-project.org Sent: Sunday, January 17, 2010 4:39 PM Subject: Re: [R] Help using Cast (Text) Version On Jan 17, 2010, at 5:31 AM, Steve Sidney wrote: Sorry to repeat the meassage, not sure if the HTML version has been received - Apologies for duplication Dear list I am trying to count the no of occurances in a column of a data frame and there is missing data identifed by NA. I am able to melt and cast the data correctly as well as sum the occurances using margins and sum. Here are the melt and cast commands bw = melt(res, id=c(lab,r), pf_zbw) b = cast(bw, lab ~ r, sum, margins = T) Sample Data (before using sum and margins) lab 1 2 3 4 5 6 1 4er66 1 NA 1 0 NA 0 2 4gcyi 0 0 1 0 0 0 3 5d3hh 0 0 0 NA 0 0 4 5d3wt 0 0 0 0 0 0 . . lines deleted to save space . 69 v3st5 NA NA 1 NA NA NA 70 a22g5 NA 0 NA NA NA NA 71 b5dd3 NA 0 NA NA NA NA 72 g44d2 NA 0 NA NA NA NA Data after using sum and margins lab 1 2 3 4 5 6 (all) 1 4er66 1 0 1 0 0 0 2 2 4gcyi 0 0 1 0 0 0 1 3 5d3hh 0 0 0 0 0 0 0 4 5d3wt 0 0 0 0 0 0 0 5 6n44r 0 0 0 0 0 0 0 . .lines deleted to save space . 70 a22g5 0 0 0 0 0 0 0 71 b5dd3 0 0 0 0 0 0 0 72 g44d2 0 0 0 0 0 0 0 73 (all) 5 2 4 3 5 726 Uisng length just tells me how many total rows there are. What I need to do is count how many rows there is valid data, in this case either a one (1) or a zero (0) in b I'm guessing that you mean to apply that test to the column in b labeled (all) . If that's the case, then something like (obviously untested): sum( b$'(all)' == 1 | b$'(all)' == 0) I have a report to construct for tomorrow Mon so any help would be appreciated Regards Steve David Winsemius, MD Heritage Laboratories West Hartford, CT David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help
Re: [R] coplots with missing values
Never used coplot, but why not try removing the rows with missing variables? Perhaps with: dat - cbind(E2=resid(baea.ancova6, type = normalized), year, population) coplot(E2~year|population, data=dat[apply(dat, 1, function(x) !any(is.na (x))),]) Regards, Simon Knapp On Mon, Jan 18, 2010 at 6:54 AM, LisaB lisaba...@hotmail.com wrote: Hello - I'm trying to construct a coplot to check the residuals conditioned on a factor using the following formula, however I keep getting an error message. I've used the same formula for another data set and it worked fine - the only difference is that now I have missing values for each level (2 levels) of the factor. Is this the problem and how can I fix it. Thank you. E2 - resid(baea.ancova6, type = normalized) coplot(E2~year|population) Error in bad.lengths() : incompatible variable lengths -- View this message in context: http://n4.nabble.com/coplots-with-missing-values-tp1016157p1016157.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] some help regarding combining columns from different files
Hi Hari, You have not given examples of 'list1.bp.files.names' or 'list2.bp.files.names' and hence it is difficult to determine what the code is trying to achieve. I will assume you want to create all pairwise merges of the attached files. If this is indeed what you are doing, the problem is not a problem at all, it is just that the intersection of values of the variables geneDescription between any two files is the empty set, as is demonstrated by the following code snippet (for each merge, TRUE is printed if there is some overlap and FALSE is printed if there is not - a FALSE implies concatenation in the code you have provided) list1.bp.files.names - list2.bp.files.names - c('colNamesgenesplitList1_alcoholMetobolic_genes.txt', 'colNamesgenesplitList1_cellcycle_genes.txt', 'colNamesgenesplitList2_alcoholMetabolism_genes.txt', 'colNamesgenesplitList2_cellcycle_genes.txt') x.col.names - y.col.names - c(genesymbol,geneDescription,orgSymbol,orgName) for (i in 1:length(list1.bp.files.names)){ temp1 - read.table(list1.bp.files.names[i],sep=\t,header=T,stringsAsFactors=F,quote='') for (j in (i+1):length(list2.bp.files.names)){ if(i==length(list2.bp.files.names)) break temp2 - read.table(list2.bp.files.names[j],sep=\t,header=T,stringsAsFactors=F,quote='') cat(any(apply(temp1, 1, function(x, y) x['geneDescription'] %in% y, temp2['geneDescription'])), '\n') } } A tip - there are trailing and/or leading whitespace on the variables 'geneDescription' and 'orgSymbol'. (in the latter case RG is a different value to RG). This is probably going to give different results to what you were expecting. Hope this helps, Simon Knapp On Wed, Jan 13, 2010 at 8:48 AM, Harikrishnadhar hari.bom...@gmail.comwrote: Hi Jim, I am want to merge two files into one file : Here is my code . But the problem with this is that I am getting the 2nd file appended to the first when i write temp3 in my code to the text file. I am not sure what mistake I am doing . also find the test files to run the code . Please help me with this !!! temp1 - NULL temp2 - NULL x.col.names -c(genesymbol,geneDescription,orgSymbol,orgName) y.col.names - c(genesymbol,geneDescription,orgSymbol,orgName) for (i in 1:length(list1.bp.files.names)){ temp1 - read.table(list1.bp.files.names[i],sep=\t,header=T,stringsAsFactors=F,quote=\) for (j in 1:length(list2.bp.files.names)){ temp2 - read.table(list2.bp.files.names[j],sep=\t,header=T,stringsAsFactors=F,quote=\) temp3 - merge(temp1,temp2,by.x = x.col.names,by.y=y.col.names,all=T) myfile-gsub(( ), , paste(1_,merge.bp.files.names[i],.txt)) write.table(temp3,file=myfile,sep=\t,quote=FALSE,row.names=F) } } Thanks --Hari-- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculate the percentages of the numbers in every column.
tmp - scan() 0 2 1 0 1 0 2 1 2 3 0 0 0 0 1 0 0 2 3 1 dat - matrix(tmp, byrow=T, ncol=4) apply(dat, 2, function(x, min.val, max.val) { tmp - table(x)/length(x) res - rep(0, max.val - min.val + 1) res[as.numeric(names(tmp)) - min.val + 1] - tmp res }, 0, 3) Should do it (but I bet there is a more elegant way). Regards, Simon Knapp On Wed, Jan 13, 2010 at 5:25 AM, Kelvin 6kelv...@gmail.com wrote: Dear friends, I have a table like this, I have A B C D ... levels, the first column you see is just the index, and there are different numbers in the table. A B C D ... 10 2 1 0 21 0 2 1 32 3 0 0 40 0 1 0 50 2 3 1 ... I want to calculate the frequencies or the percentages of the numbers in every column. How do I get a table like this, the first column is the levels of numbers, and the numbers inside the table are the percentages. All the percentages should add up to 1 in every column. A B C D ... 0 0.2 0.3 0.1 0.1 1 0.1 0.1 0.2 0.1 2 0.1 0.2 0.2 0.2 3 0.2 0.1 0.1 0 ... Thanks your help! Kelvin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Recommended visualization for hierarchical data
Not sure if there are any 'recommended' visualisations, but the following is a start (you will need to tune the scaling of the x-axis label and state identifiers). In the plot produced, the widths of the 'columns' are proportional of purchases occurring in the state, and the height of the 'boxes' are proportional to the purchases occurring in the County - hence the area of each box represents the proportion of total (i.e. national) sales occurring in the County. #generate some random data n - 100 dat - data.frame( State=(tmp - sample(LETTERS, n, T)), County=paste(tmp, 1:n, sep=''), Purchases=floor(rnorm(n, 100, 20)) ) # draw the plot split.dat - split(dat, dat$State) widths - sapply(split.dat, function(x) sum(x$Purchases)) widths - widths / sum(widths) cum.widths - c(0.0, cumsum(widths)[-length(widths)]) plot(0:1, 0:1, type='n', xaxt='n', yaxt='n', xlab='', ylab='') par(usr=c(0,1,0,1), xpd=NA) mapply(function(x, cum.tot, width) { heights - x$Purchases / sum(x$Purchases) cum.heights - c(0.0, cumsum(heights)[-length(heights)]) mapply(function(y, height, x, width) rect(x, y, x+width, y+height), cum.heights, heights, MoreArgs=list(cum.tot, width)) text(cum.tot + width/2, heights/2 + cum.heights, x$County) }, split.dat, cum.widths, widths) axis(1, (widths / 2 + cum.widths), names(split.dat)) Hope this helps, Simon Knapp On Wed, Jan 13, 2010 at 2:46 PM, Rex C. Eastbourne rex.eastbou...@gmail.com wrote: On Tue, Jan 12, 2010 at 5:26 PM, Rex C. Eastbourne rex.eastbou...@gmail.com wrote: Let's say I have data in the following schema that describes the number of purchases a company has received from each County in the US: State | County | Purchases --- NJ | Mercer | 550 CA | Orange | 23 I would like to visualize what states contribute the most to the overall total, and furthermore within those states, what Counties contribute the most. What are some recommended R visualizations for this type of data? I created a treemap using map.market from the portfolio library, like the following: http://zoonek2.free.fr/UNIX/48_R/g126.png Although this is an attractive visual, I want something that makes it easier to compare the relative sizes of components at a glance (hard with a treemap because rectangles have different aspect ratios). Does anyone have a recommended alternate visualization? Thanks! Just to clarify: I made up the above example for simplicity's sake to illustrate what I meant by hierarchical data. My actual data is not related to maps or geography, so a map-based visualization wouldn't work. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] proto question
Dear R Users, I have a couple of proto objects like: wedge - proto(expr={ start.year - 2008 end.year - 2050 }) star.rating - wedge$proto( star = c(4, 5, 8, 10), gain = c(0, .3, .5, .7), cost = c(0, 2100, 4000, 7500), star.rating - function(., year) 6.0, setup = function(.){ .$cost.for.star - approxfun(.$star, .$cost) .$gain.for.star - approxfun(.$star, .$gain) }, test = function(., year) { gs - .$with(gain.for.star)(.$star.rating(year)) } ) And a function to create and modify 'instances': create.star.rating - function(switch.years, star.ratings) { res - star.rating$proto(switch.years = switch.years, star.ratings = star.ratings) res$setup() res } When I use them as follows: ee - create.star.rating(ratings, switch.years) ee$test(2009) The second line gives me the error message: Error in .$with(gain.for.star)(.$star.rating(year)) : object y not found I understand why this happens (when objects are inserted into a proto their environment is set to that of the proto)... but I can't figure out how to get around it! I have tried setting the environments of the functions created in setup as follows: setup = function(.){ t1 - approxfun(.$star, .$cost) t2 - approxfun(.$star, .$gain) .$cost.for.star - t1 .$gain.for.star - t2 environment(.$cost.for.star) - environment(t1) environment(.$gain.for.star) - environment(t2) } But then I get the error: Error in get(gain.for.star, env = `*tmp*`, inherits = TRUE) : object *tmp* not found Which I think occurs because the last two lines in 'setup' only create references to their respective environments which disappear on exit of 'setup'. Any ideas? Regards, Simon Knapp. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using jackknife in linear models
The definition of jackknife implies that the result of theta must be a numeric scalar. So, presuming you wanted to jackknife a coefficient, theta would need to be something like: theta - function(x, xdata, coefficient) coef(lm(model.lm, data=xdata[x,]))[coefficient] and would be called something like: results - jackknife(1:n,theta,xdata=DF,coefficient=(Intercept)) To do all coefficients, you could write a function like like: jackknife.apply - function(x, xdata, coefs) sapply(coefs, function(coefficient) jackknife(x, theta, xdata=xdata, coefficient=coefficient), simplify=F) and call it like: results - jackknife.apply(1:n, DF, c((Intercept), V1, V2, V3)) which would give you a list with the jackknife output for each coefficient. note that you don't need to attach DF for your code to work :-) Regards, Simon Knapp. On Fri, Dec 19, 2008 at 7:02 AM, Mark Heckmann mark.heckm...@gmx.de wrote: Hi R-experts, I want to use the jackknife function from the bootstrap package onto a linear model. I can't figure out how to do that. The manual says the following: ... So I tried: DF - as.data.frame(matrix(rnorm(250), ncol=5)) attach(DF) model.lm - formula(V1 ~ V2 + V3 + V4) n - 50 theta - function(x, xdata){ lm(model.lm, data=xdata) } results - jackknife(1:n,theta,xdata=DF) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I generate one vector for every row of a data frame?
Your code will always generate the same number of samples from each of the normals specified on every call, where the number of samples from each is (roughly) proportional to the weights column. If the weights column in your data frame represents probabilities of draws coming from each distribution, then this behaviour is not correct. Further, it does not guarantee that the sample size is actually n. This definition will work with arbitrary numbers of rows: gmm_data - function(n, data){ rows - sample(1:nrow(data), n, T, dat$weight) rnorm(n, data$mean[rows], data$sd[rows]) } and this one enforces a bit more sanity :-) gmm_data - function(n, data, tol=1e-8){ if(any(data$sd 0)) stop(all of data$sd must be 0) if(any(data$weight 0)) stop(all of data$weight must be 0) wgts - if(abs(sum(data$weight) - 1) tol) { warning(data$weight does not sum to 1 - rescaling) data$weight/sum(data$weight) } else data$weight rows - sample(1:nrow(data), n, T, wgts) rnorm(n, data$mean[rows], data$sd[rows]) } Regards, Simon Knapp. On Fri, Dec 19, 2008 at 4:14 PM, Bill McNeill (UW) bill...@u.washington.edu wrote: I am trying to generate a set of data points from a Gaussian mixture model. My mixture model is represented by a data frame that looks like this: gmm weight mean sd 10.30 1.0 20.2 -2 0.5 30.44 0.7 40.15 0.3 I have written the following function that generates the appropriate data: gmm_data - function(n, gmm) { c(rnorm(n*gmm[1,]$weight, gmm[1,]$mean, gmm[1,]$sd), rnorm(n*gmm[2,]$weight, gmm[2,]$mean, gmm[2,]$sd), rnorm(n*gmm[3,]$weight, gmm[3,]$mean, gmm[3,]$sd), rnorm(n*gmm[4,]$weight, gmm[4,]$mean, gmm[4,]$sd)) } However, the fact that my mixture has four components is hard-coded into this function. A better implementation of gmm_data() would generate data points for an arbitrary number of mixture components (i.e. an arbitrary number of rows in the data frame). How do I do this? I'm sure it's simple, but I can't figure it out. Thanks. -- Bill McNeill http://staff.washington.edu/billmcn/index.shtml __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I generate one vector for every row of a data frame?
... actually, the scaling of the weights was not required as it is done by sample anyway. On Fri, Dec 19, 2008 at 5:16 PM, Simon Knapp sleepingw...@gmail.com wrote: Your code will always generate the same number of samples from each of the normals specified on every call, where the number of samples from each is (roughly) proportional to the weights column. If the weights column in your data frame represents probabilities of draws coming from each distribution, then this behaviour is not correct. Further, it does not guarantee that the sample size is actually n. This definition will work with arbitrary numbers of rows: gmm_data - function(n, data){ rows - sample(1:nrow(data), n, T, dat$weight) rnorm(n, data$mean[rows], data$sd[rows]) } and this one enforces a bit more sanity :-) gmm_data - function(n, data, tol=1e-8){ if(any(data$sd 0)) stop(all of data$sd must be 0) if(any(data$weight 0)) stop(all of data$weight must be 0) wgts - if(abs(sum(data$weight) - 1) tol) { warning(data$weight does not sum to 1 - rescaling) data$weight/sum(data$weight) } else data$weight rows - sample(1:nrow(data), n, T, wgts) rnorm(n, data$mean[rows], data$sd[rows]) } Regards, Simon Knapp. On Fri, Dec 19, 2008 at 4:14 PM, Bill McNeill (UW) bill...@u.washington.edu wrote: I am trying to generate a set of data points from a Gaussian mixture model. My mixture model is represented by a data frame that looks like this: gmm weight mean sd 10.30 1.0 20.2 -2 0.5 30.44 0.7 40.15 0.3 I have written the following function that generates the appropriate data: gmm_data - function(n, gmm) { c(rnorm(n*gmm[1,]$weight, gmm[1,]$mean, gmm[1,]$sd), rnorm(n*gmm[2,]$weight, gmm[2,]$mean, gmm[2,]$sd), rnorm(n*gmm[3,]$weight, gmm[3,]$mean, gmm[3,]$sd), rnorm(n*gmm[4,]$weight, gmm[4,]$mean, gmm[4,]$sd)) } However, the fact that my mixture has four components is hard-coded into this function. A better implementation of gmm_data() would generate data points for an arbitrary number of mixture components (i.e. an arbitrary number of rows in the data frame). How do I do this? I'm sure it's simple, but I can't figure it out. Thanks. -- Bill McNeill http://staff.washington.edu/billmcn/index.shtml __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fwd: Categorial Response Questions
Not the same data your using but... # get something like a reasonable # age dist for dummy data: data('mw.ages', package='UsingR') mw.age - mw.ages[2:80,] # size of dummy dataset n.obs - 1000 #pr of survival pr.s - 0.80 # dummy data dat - data.frame( pclass=sample(c('1st', '2nd', '3rd'), n.obs, T), age=sample(1:79, n.obs, T, apply(mw.age, 1, sum)/sum(mw.age)), sex=sample(c('male', 'female'), n.obs, T, apply(mw.age, 2, sum)/sum(mw.age)), survived=sample(0:1, n.obs, T, c(1-pr.s, pr.s)) ) dat$age.group - cut(dat$age, seq(0, 80, 10), right=F) #- # get the logits attach(dat) logits - by(dat, list(age.group, sex, pclass), function(x){ p - sum(x$survived)/nrow(x) p/(1-p) }) detach(dat) #- #- # the class of the object logits is 'by' - # this function munges it into a data frame # (note that 'as.data.frame' is generic) # ARGS # x an object of class 'by' # out.names names you want to use # in the output data frame. as.data.frame.by - function(x, out.names=NULL) { nms - dimnames(x) no.levs - sapply(nms, length) res - as.numeric(x) for(i in 1:length(nms)){ res - data.frame( if(i==1) rep(nms[[i]], times=prod(no.levs[(i+1):length(nms)])) else if(i==length(nms)) rep(nms[[i]], each=prod(no.levs[1:(i-1)])) else rep(nms[[i]], each=prod(no.levs[1:(i-1)]), times=prod(no.levs[(i+1):length(nms)])) , res) } names(res) - out.names res } as.data.frame(logits, c('age.group', 'sex', 'pclass', 'logit')) #- 2008/10/18 andyer weng [EMAIL PROTECTED]: hi all, For my question in the first email below, I found I made a mistake on my coding in the previous email, the one I was trying to type should be grouped.titanic.df-data.frame(group.age.group=sort(unique(titanic.df$age.group)), + expand.grid(sex=sort(unique(titanic.df$sex)),pclass=sort(unique(titanic.df$pclass))), + r=as.vector(tapply(titanic.df$survived,titanic.df$age.group,sum)), + n=as.vector(tapply(titanic.df$survived,titanic.df$age.group,length))) Error in data.frame(group.age.group = sort(unique(titanic.df$age.group)), : arguments imply differing number of rows: 8, 6 please advise what I have done wrong? why the error message come up. Am I doing the right thing to fix the question i mentioned in the first email (the bottom email)? Cheers. Andyer -- Forwarded message -- From: andyer weng [EMAIL PROTECTED] Date: 2008/10/18 Subject: Fwd: Categorial Response Questions To: r-help@r-project.org hi all, me again. i try to type the following coding for my question below, but it comes up a error messgae. please advise whether the way i was trying to do will solve my question stated in the previous email. If so , please advise what is wrong with my coding. (p.s. all the data are stored in xxx.df) grouped.xxx.df-data.frame(group.age.group=sort(unique(xxx.df$age.group)), + expand.grid(sex=c(female,male),age.group=c(0-9,10-19,20-29,30-39,40-49,50-59,60-69,70-79)), + r=tapply(xxx.df$survived,titanic.df$age.group,sum), + n=tapply(xxx.df$survived,titanic.df$age.group,length)) Error in data.frame(group.age.group = sort(unique(xxx.df$age.group)), : arguments imply differing number of rows: 8, 16 In addition: Warning messages: 1: In Ops.factor(left) : + not meaningful for factors 2: In Ops.factor(left) : + not meaningful for factors thanks millions. Regards, Andyer -- Forwarded message -- From: andyer weng [EMAIL PROTECTED] Date: 2008/10/18 Subject: Fwd: Categorial Response Questions To: r-help@r-project.org Sorry Guys, i press the wrong button to send out the uncompleted message. let me do it again. my purpose for below questions is to assess the effect of class, age and sex on the survival. I have a data set containing : pclass: A factor giving the class of the passenger: one of 1st, 2nd, 3rd. age: The age of the passenger in years. sex: Passenger's gender: female or male age.group:Passengers age group, one of 0‐9 , 10‐19, 20‐29, 30‐39, 40‐49, 50‐59, 60‐69,70‐79 survived:Passenger's survival (1=survived, 0=did not survive) Ignoring the variable age, - I need to group the data into groups corresponding to each age‐group/sex/class combination, - I need to compute the logits for each combination. - Make a data frame containing the logits, and the categorical variables. I need to have one line in the data frame for each combination of the factor levels. Can someone please help with the R code for above???!!! Thanks millions!! Cheers Andyer. __ R-help@r-project.org mailing list
Re: [R] R script from Python
'Rterm --help' shows the usage as: Rterm [options] [ infile] [outfile] [EnvVars] just in case you didn't understand what the angle brackets meant: the term [ infile] means read input from 'infile', and the term [ outfile] means write output to 'outfile'. Though your code works at the command line, the spacing appeared to imply that you thought the angle brackets simply 'wrapped' the input file. This syntax doesn't work from python as the angle brackets are OS shell operators but won't be interpreted that way when used to construct a Popen object. the following will do what you want: import subprocess command = 'c:\\Program Files\\R\\R-2.7.2\\bin\\Rterm.exe --vanilla -q -f d:\\test\\run\\geneBank.r' subprocess.Popen(command, stdout=open('d:\\output.out', 'w')).wait() Regards, Simon On Sat, Oct 18, 2008 at 10:28 AM, Tomislav Puđa [EMAIL PROTECTED] wrote: Hi, I'm trying to execute R-script from Python. I'm using R 2.7.2, Python 2.5 and WinXP. I don't won't to use Python/R interface because of nature of project. Python code : import subprocess command = 'c:\\Program Files\\R\\R-2.7.2\\bin\\Rterm.exe --vanilla -q d:\\test\\run\\geneBank.r d:\\output.out' subprocess.Popen(command).wait() After that, I get error messages on Rterm.exe terminal : ARGUMENT 'd:\lloydMax.r' __ignored__ ARGUMENT 'd:\output.txt' __ignored__ Why ? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Please help me in Converting this from C# to R
# bit hard to provide a simple conversion without definitions of the class 'Node', the template 'DirectedGraph' and the function 'Writed'! # I've used the package 'igraph' as a drop in - hope it is still clear. # # by the way: # - your curly braces don't match, # - not all elements of P are initialised before they are used. # # original code (cleaned to make comparison easier). # #Random r = new Random(); #DirectedGraphSimpleNode graph = GetGraph(); #decimal B = 0.1m; #decimal D = 0.05m; #int nodes = graph.NodesCount; #decimal[] E = new decimal[nodes]; #decimal[] P = new decimal[nodes]; # #for (int i = 7; i = 10; ++i) P[i] = (decimal)r.NextDouble(); # #for (int t = 0; t 100; ++t){ #Writed(P, P); # #foreach (SimpleNode n in graph.Nodes) { #int id = graph.index[n]; # #decimal product = 1; #foreach (var item in graph.GetAdjacentNodes(n)){ #int j = graph.index[item]; #product *= (1 - B * P[j]); #} # #E[id] = product; #} # #foreach (SimpleNode n in graph.Nodes){ #int i = graph.index[n]; #P[i] = 1 - ((1 - P[i]) * E[i] + D * (1 - P[i]) * E[i] + 0.5m * D * P[i] * (1 - E[i])); #if (P[i] 0) P[i] = 0; #} #} # #} # # drop-in for your method getGraph (produces a 10 'random' node directed graph). I only assign to temporary so I can use the same 'grph' and 'P' in both implementations. # library(igraph) GetGraph - function() graph.adjacency(matrix(sample(0:1, size=100, replace=T), nrow=10)) grph.t - GetGraph() P.t - runif(nodes) # assume you meant to initialise all elements of P # # IMPLEMENTATON 1. # A 'mirror' implementation. Some of the code relies # on the specifics of package igraph, but I've tried to # be as similar as possible. Hope it still makes sense! # B - 0.1 D - 0.05 grph - grph.t nodes - vcount(grph) E - numeric(nodes) P - P.t for(t in 0:99){ cat('P:', P, '\n')# is this equivalent to 'Writed(P, P)' ??? graph.Nodes - get.adjlist(grph) # returns a list of vectors, where each vector is the nodes a node is connected to. id - 0 # we loop over the vectors and so must index separately for(n in graph.Nodes){ # n is a vector containing the verticies the vertex at index id+1 is connected to. id - id+1 product - 1; for(item in n){ product - product * (1 - B * P[item+1]); # verticies are indexed from 0. no operator*= in R. } E[id] - product; } at - 0 for(i in 1:nodes){ P[i] - 1 - ((1 - P[i]) * E[i] + D * (1 - P[i]) * E[i] + 0.5 * D * P[i] * (1 - E[i])); # we are accessing nodes in order so the indexes are also ordered. if (P[i] 0) P[i] - 0; } } P # print the result # # IMPLEMENTATION 2. # a more 'R-ish' implementation. # B - 0.1 D - 0.05 P - P.t grph - grph.t for(t in 0:99){ E - sapply(get.adjlist(grph), function(node) prod(1-B*P[node+1])) P - 1 - ((1 - P) * E + D * (1 - P) * E + 0.5 * D * P * (1 - E)) } P # print the result __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with misclass function on tree classification
Did you say: library(tree) at the top of your script? On Sun, Sep 14, 2008 at 5:47 PM, Meir Preiszler [EMAIL PROTECTED] wrote: I am working through Tom Minka's lectures on Data Mining and am now on Day 32. The following is the link: http://alumni.media.mit.edu/~tpminka/courses/36-350.2001/lectures/day32/ In order to use the functions cited I followed the instructions as follows: Installed tree package from CRAN mirror (Ca-1) Downloaded and sourced the file tree.r Downloaded the function clus1.r Having defined a tree tr, when I write misclass(tr,x$test) as shown in the link I get an error message that R does not find the function pred1.tree. Is this function included in the tree package? If so it was not in my download. Is this a bug? Do you know of a fix? Thanks for your help Meir clus1.r tree.r Meir Preiszler - Research Engineer I t a m a r M e d i c a l Ltd. Caesarea, Israel: Tel: +(972) 4 617 7000 ext 232 Fax: +(972) 4 627 5598 Cell: +(972) 54 699 9630 Email: [EMAIL PROTECTED] Web: www.Itamar-medical.com * 88---8--- This E-mail is confidential information of Itamar medical Ltd. It may also be legally privileged. If you are not the addressee you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return E-mail. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. The sender does not accept liability for any errors or omissions. Before printing this email , kindly think about the environment. Itamar Medical Ltd. MIS Yan Malgin. 88---8--- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Join data by minimum distance
I am wondering if there is a function which will do a join between 2 data.frames by minimum distance, as it is done in ArcGIS for example. For people who are not familiar with ArcGIS here it is an explanation: Suppose you have a data.frame with x, y, coordinates called track, and a second data frame with different x, y coordinates and some other attributes called classif. The track data.frame has a different number of rows than classif. I want to join the rows from classif to track in such a way that for each row in track I add only the row from classif that has coordinates closest to the coordinates in the track row (and hence minimum distance in between the 2 rows), and also add a new column which will record this minimum distance. Even if the coordinates in the 2 data.frames have same name, the values are not identical between the data.frames, so a merge by column is not what I am after. #--- # get the distance between two points on the globe. # # args: # lat1 - latitude of first point. # long1 - longitude of first point. # lat2 - latitude of first point. # long2 - longitude of first point. # radius - average radius of the earth in km # # see: http://en.wikipedia.org/wiki/Great_circle_distance #--- greatCircleDistance - function(lat1, long1, lat2, long2, radius=6372.795){ sf - pi/180 lat1 - lat1*sf lat2 - lat2*sf long1 - long1*sf long2 - long2*sf lod - abs(long1-long2) radius * atan2( sqrt((cos(lat1)*sin(lod))**2 + (cos(lat2)*sin(lat1)-sin(lat2)*cos(lat1)*cos(lod))**2), sin(lat2)*sin(lat1)+cos(lat2)*cos(lat1)*cos(lod) ) } #--- # Calculate the nearest point using latitude and longitude. # and attach the other args and nearest distance from the # other data.frame. # # args: # x as you describe 'track' # y as you describe 'classif' # xlongnme name of longitude variable in x # xlatnme name of latitude location variable in x # ylongnme name of longitude location variable on y # ylatnme name of latitude location variable on y #--- dist.merge - function(x, y, xlongnme, xlatnme, ylongnme, ylatnme){ tmp - t(apply(x[,c(xlongnme, xlatnme)], 1, function(x, y){ dists - apply(y, 1, function(x, y) greatCircleDistance(x[2], x[1], y[2], y[1]), x) cbind(1:nrow(y), dists)[dists == min(dists),,drop=F][1,] } , y[,c(ylongnme, ylatnme)])) tmp - cbind(x, min.dist=tmp[,2], y[tmp[,1],-match(c(ylongnme, ylatnme), names(y))]) row.names(tmp) - NULL tmp } # demo track - data.frame(xt=runif(10,0,360), yt=rnorm(10,-90, 90)) classif - data.frame(xc=runif(10,0,360), yc=rnorm(10,-90, 90), v1=letters[1:20], v2=1:20) dist.merge(track, classif, 'xt', 'yt', 'xc', 'yc') __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question on substitute.
I have a data frame of around 4500 rows which contain file name prefixes and other interesting variables. each prefix corresponds to multiple files and a I only have 4300 complete sets of files. If any one of the files is missing for a given prefix, then the function I use fails. The effect is emulated by the function f and data dummy.data: f - function(x) if(runif(1) 0.1) stop('oops') else rnorm(10) dummy.data - data.frame(cbind(paste('a', 1:100, sep=''), matrix(rnorm(9000),nrow=100, ncol=9))) I would like the results as a data.frame and if it weren't for the errors would do: res - apply(dummy.data, 1, function(x) try(f(x))) #... though I wouldn't really need the try would I. #dimnames(res)[[2]] - as.character(dummy.data[,1]) but the errors make res a list and the second line fails. I can do: res - NULL for(i in 1:nrow(dummy.data)) if(class(tmp - try(f(dummy.data[i,]))) != 'try-error') res - cbind(res, tmp) which gives me the correct result, but I don't know which column of res belongs to which row of dummy.data. I can get the desired result with with: #--- desired result res - NULL for(i in 1:nrow(dummy.data)) if(class(tmp - try(f(dummy.data[i,]))) != 'try-error') eval(parse(text=paste(res - cbind(res,, dummy.data[i,1], =tmp), sep=''))) #-- but it is sort of ugly and I was hoping I could do something like: res - NULL for(i in 1:nrow(dummy.data)) if(class(tmp - try(f(dummy.data[i,]))) != 'try-error') res - eval(substitute(cbind(res, a=b), list(a=dummy.data[i,1], b=tmp))) but the a is not being substituted (not that this is prettier, but I'm trying to come to grips with substitute/do.call/calls/expressions/...). I guess that argument names are a different beast to argument values in expressions. My question (finally) is: is there a way of achieving 'desired result', but using 'quote', 'substitute' or any other similar functions/idioms I've not run into yet? Alternatively, is there a completely different way? Any help appreciated, Simon Knapp. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.