Re: [R] glm with variance = mu+theta*mu^2?
Spencer Graves wrote: > How might you fit a generalized linear model (glm) with variance > = mu+theta*mu^2 (where mu = mean of the exponential family random > variable and theta is a parameter to be estimated)? > > This appears in Table 2.7 of Fahrmeir and Tutz (2001) > Multivariate Statisticial Modeling Based on Generalized Linear Models, > 2nd ed. (Springer, p. 60), where they compare "log-linear model fits > to cellular differentiation data based on quasi-likelihoods" between > variance = phi*mu (quasi-Poisson), variance = phi*mu^2 > (quasi-exponential), and variance = mu+theta*mu^2. The "quasi" > function accepted for the family argument in "glm" generates functions > "variance", "validmu", and "dev.resids". I can probably write > functions to mimic the "quasi" function. However, I have two > questions in regard to this: > > (1) I don't know what to use for "dev.resids". This may not > matter for fitting. I can try a couple of different things to see if > it matters. > > (2) Might someone else suggest something different, e.g., using > something like optim to solve an appropriate quasi-score function? > > Thanks, > spencer graves > Since nobody has answerd this I will try. The variance function mu+theta*mu^2 is the variance function of the negative binomial family. If this variance function is used to construct a quasi-likelihood, the resulting quasi- likelihood is identical to the negative binomial likelihood, so for fitting we can simly use glm.nb from MASS, which will give the correct estimated values. However, in a quasi-likelihood setting the (co)varince estimation from glm.nb is not appropriate, and from the book (fahrmeir ..) it seems that the estimation method used is a sandwich estimator, so we can try the sandwich package. This works but the numerical results are somewhat different from the book. Any comments on this? my code follows: > library(Fahrmeir) > library(help=Fahrmeir) > library(MASS) > cells.negbin <- glm(y~TNF+IFN+TNF:IFN, data=cells, family=negative.binomial(1/0.215)) > summary(cells.negbin) Call: glm(formula = y ~ TNF + IFN + TNF:IFN, family = negative.binomial(1/0.215), data = cells) Deviance Residuals: Min 1Q Median 3Q Max -1.6714 -0.8301 -0.2153 0.4802 1.4282 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.39874495 0.18791125 18.087 4.5e-10 *** TNF 0.01616136 0.00360569 4.482 0.00075 *** IFN 0.00935690 0.00359010 2.606 0.02296 * TNF:IFN -0.5910 0.7002 -0.844 0.41515 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for Negative Binomial(4.6512) family taken to be 1.012271) Null deviance: 46.156 on 15 degrees of freedom Residual deviance: 12.661 on 12 degrees of freedom AIC: 155.49 Number of Fisher Scoring iterations: 5 > confint(cells.negbin) Waiting for profiling to be done... 2.5 % 97.5 % (Intercept) 3.0383197319 3.7890206510 TNF 0.0091335087 0.0238915483 IFN 0.0023292566 0.0170195707 TNF:IFN -0.0001996824 0.960427 > library(sandwich) Loading required package: zoo > vcovHC( cells.negbin ) (Intercept) TNF IFN TNF:IFN (Intercept) 0.01176249372 -0.0001279740135 -0.0001488223001 0.0212541999 TNF -0.00012797401 0.039017282 0.021242875 -0.0019793137 IFN -0.00014882230 0.021242875 0.054314079 -0.0013277626 TNF:IFN 0.0212542 -0.001979314 -0.001327763 0.0002370104 > cov2cor(vcovHC( cells.negbin )) (Intercept)TNFIFNTNF:IFN (Intercept) 1.000 -0.5973702 -0.5887923 0.1272950 TNF -0.5973702 1.000 0.4614542 -0.6508822 IFN -0.5887923 0.4614542 1.000 -0.3700671 TNF:IFN 0.1272950 -0.6508822 -0.3700671 1.000 > cells.negbin2 <- glm.nb( y~TNF+IFN+TNF:IFN, data=cells) > summary(cells.negbin) Call: glm(formula = y ~ TNF + IFN + TNF:IFN, family = negative.binomial(1/0.215), data = cells) Deviance Residuals: Min 1Q Median 3Q Max -1.6714 -0.8301 -0.2153 0.4802 1.4282 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.39874495 0.18791125 18.087 4.5e-10 *** TNF 0.01616136 0.00360569 4.482 0.00075 *** IFN 0.00935690 0.00359010 2.606 0.02296 * TNF:IFN -0.5910 0.7002 -0.844 0.41515 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for Negative Binomial(4.6512) family taken to be 1.012271) Null deviance: 46.156 on 15 degrees of freedom Residual deviance: 12.661 on 12 degrees of freedom AIC: 155.49 Number of Fisher Scoring iterations: 5 > confint( cells.negbin2 ) Waiting for profiling to be done...
Re: [R] Performance difference between 32-bit build and 64-bit bu ild on Solaris 8
> From: Peter Dalgaard [snip] > While your setup is in place, you might want to play around with the > higher optimization levels. GCC on AMD64 sees a quite substantial > speedup from -O2 to -O3. On our SLES8 amd64 boxes, I had trouble with g77 -O3 (build failed). Have not tried with newer GCC. Andy > -- >O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B > c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K > (*) \(*) -- University of Copenhagen Denmark Ph: > (+45) 35327918 > ~~ - ([EMAIL PROTECTED]) FAX: > (+45) 35327907 > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Error with function lda in package MASS (dimnames not equal?)
I fully understand that this is a volunteer project, I'm a Debian user (not a developer... yet). I have read the posting guide, but I forgot the protocol. First offence, won't happen again. Thanks. On 6/11/05, Prof Brian Ripley <[EMAIL PROTECTED]> wrote: > The code will be changed to give a more informative error message in a > future release. However, do remember that this is volunteer code, and it > is not reasonable to expect volunteers to anticipate that a user will > apply it in extremely unlikely circumstances. > > If you have a suggestion about code in a contributed package, please send > it to the package maintainer (as the posting guide asks). > > On Sat, 11 Jun 2005, Joshua Gilbert wrote: > > > This is true, they are equal. I hadn't noticed that. Thank you. > > > > Now, if lda fails on this given input (equal means), shouldn't we > > catch it and give a slightly better error message? I've spent a good > > while going through the debugging process with lda.default. From that > > perspective it appears that there is a simple change to remove the > > problem. I am not saying that it is correct in any shape or form, but > > there is a point where a single transpose would silence the error. > > > > So, from a usability standpoint, could we add a check for equal means > > between classes and throw an error for that specific condition? Yes, > > the user should not do that. But users may become more interested in > > making the code run than checking on whether it's doing anything sane. > > > > If this isn't the place to do so, tell me. But, I'd like to petition > > to alter the code of lda.default. > > > > On 6/10/05, Prof Brian Ripley <[EMAIL PROTECTED]> wrote: > >> lda.default <- MASS:::lda.default and proceed. > >> > >> Look at the group means of your data: they are identical to machine > >> accuracy. > >> > >> The question has to be `why are you trying to use lda to separate > >> two groups with identical means'? Lda is not protected against that > >> and it is rather unlikely unless you failed to inspect your data in any > >> way. > >> > >> On Fri, 10 Jun 2005, Joshua Gilbert wrote: > >> > >>> This question appears to have been asked previously, but not answered. > >>> the last response I can find to this previous thread is here: > >>> http://tolstoy.newcastle.edu.au/R/help/04/07/0126.html. The asnwer was > >>> to provide debugging info, not an answer. > >>> > >>> So the problem is that I'm trying to use lda on my dataset. You can > >>> download my data here: > >>> http://northstar-www.dartmouth.edu/~jgilbert/nolda, I used R's save > >>> function to save objects data and classes (yes, I realize that I name > >>> stomped the data function in package utils). To replicate my results, > >>> simply enter the following: > library(MASS) > load('nolda') > lda(data,classes) > >>> Error in lda.default(x, grouping, ...) : length of 'dimnames' [2] not > >>> equal to array extent > >>> > >>> Now, I don't know what that means. > dimnames(data) > >>> NULL > dimnames(classes) > >>> NULL > >>> > >>> As for debugging, I don't know how. I cannot debug lda.default as I > >>> get the following: > debug(lda.default) > >>> Error: Object "lda.default" not found > >>> > >>> I think that that's pretty much it. Can anyone help me? > >>> > >>> __ > >>> R-help@stat.math.ethz.ch mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide! > >>> http://www.R-project.org/posting-guide.html > >>> > >> > >> -- > >> Brian D. Ripley, [EMAIL PROTECTED] > >> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > >> University of Oxford, Tel: +44 1865 272861 (self) > >> 1 South Parks Road, +44 1865 272866 (PA) > >> Oxford OX1 3TG, UKFax: +44 1865 272595 > >> > > > > > > -- > Brian D. Ripley, [EMAIL PROTECTED] > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UKFax: +44 1865 272595 > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Performance difference between 32-bit build and 64-bit bu ild on Solaris 8
On 6/11/05, Prof Brian Ripley <[EMAIL PROTECTED]> wrote: > For the record, I timed 1000x1000 not 3000x3000 (and said so). I was not > proposing to spend several hours running timings at ca 2200s each, not > least as I used a public machine with a ban on running long jobs (we have > other much faster machines for that purpose). My mistake. For the 1000x1000 problem on my system, I got the following: 32-bit [1] 23.37 0.03 23.41 0.00 0.00 64-bit [1] 80.99 0.02 81.03 0.00 0.00 So it looks like I've still got some work to do. Thanks for the suggestions. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Performance difference between 32-bit build and 64-bit bu ild on Solaris 8
On Sat, 11 Jun 2005, Peter Dalgaard wrote: > Scott Gilpin <[EMAIL PROTECTED]> writes: > >> Andy, Prof. Ripley - thanks for your replies. >> >>> CFLAGS was not specified, so should default to "-g -O2". It is definitely >>> worth checking, although almost all the time in these tests will be spent >>> in Fortran code. >> Yes - I verified that's the default. >> > neither build uses a BLAS. >>> >>> Well, they do, the one supplied with R (which is in Fortran). >> >> I guess I should have said that neither build is using an optimized >> BLAS. (which I am planning to install - I just haven't had the chance >> yet.) I also get a message in config.log "ld: fatal: library -lblas: >> not found". I need to investigate this more. > > Actually, you did say so... > > Don't worry about error messages like that in config.log; they only > mean that there is no system BLAS and the only way to find out is by > trial and failure. The configure script is full of that sort of code. > >> >>> and for gcc 3.4.3 and the internal BLAS (which I had to build for these >>> tests) I get >>> >>> 32-bit >>> [1] 9.96 0.09 10.12 0.00 0.00 >>> 64-bit >>> [1] 9.93 0.04 10.04 0.00 0.00 >>> >>> so I am not seeing anything like the same performance difference between >>> 32- and 64-bit builds (but it could well depend on the particular Sparc >>> chip). >> >> These timings are much, much less than what I reported (~700s and >> 2200s for 32 bit and 64 bit). I read the admin manual and didn't see >> anything specifically that needs to be set to use the internal BLAS. >> I guess I'll go back and do some more investigation. For the record, I timed 1000x1000 not 3000x3000 (and said so). I was not proposing to spend several hours running timings at ca 2200s each, not least as I used a public machine with a ban on running long jobs (we have other much faster machines for that purpose). > Yes. If you have hardcore linear algebra needs, those fast-BLAS > speedups can be impressive (Brian might also have a faster machine > than you, mind you). For code that is mainly interpreter-bound, it is > much less so. > > While your setup is in place, you might want to play around with the > higher optimization levels. GCC on AMD64 sees a quite substantial > speedup from -O2 to -O3. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] predict function for GLMM
I use "predict" for predictions from glm. I am wondering if there is a "predict" function for predictions from the results of GLMM model? Thanks ahead! Weihong Li Undergraduate Student in Statistics University of Alberta __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Error with function lda in package MASS (dimnames not equal?)
The code will be changed to give a more informative error message in a future release. However, do remember that this is volunteer code, and it is not reasonable to expect volunteers to anticipate that a user will apply it in extremely unlikely circumstances. If you have a suggestion about code in a contributed package, please send it to the package maintainer (as the posting guide asks). On Sat, 11 Jun 2005, Joshua Gilbert wrote: > This is true, they are equal. I hadn't noticed that. Thank you. > > Now, if lda fails on this given input (equal means), shouldn't we > catch it and give a slightly better error message? I've spent a good > while going through the debugging process with lda.default. From that > perspective it appears that there is a simple change to remove the > problem. I am not saying that it is correct in any shape or form, but > there is a point where a single transpose would silence the error. > > So, from a usability standpoint, could we add a check for equal means > between classes and throw an error for that specific condition? Yes, > the user should not do that. But users may become more interested in > making the code run than checking on whether it's doing anything sane. > > If this isn't the place to do so, tell me. But, I'd like to petition > to alter the code of lda.default. > > On 6/10/05, Prof Brian Ripley <[EMAIL PROTECTED]> wrote: >> lda.default <- MASS:::lda.default and proceed. >> >> Look at the group means of your data: they are identical to machine >> accuracy. >> >> The question has to be `why are you trying to use lda to separate >> two groups with identical means'? Lda is not protected against that >> and it is rather unlikely unless you failed to inspect your data in any >> way. >> >> On Fri, 10 Jun 2005, Joshua Gilbert wrote: >> >>> This question appears to have been asked previously, but not answered. >>> the last response I can find to this previous thread is here: >>> http://tolstoy.newcastle.edu.au/R/help/04/07/0126.html. The asnwer was >>> to provide debugging info, not an answer. >>> >>> So the problem is that I'm trying to use lda on my dataset. You can >>> download my data here: >>> http://northstar-www.dartmouth.edu/~jgilbert/nolda, I used R's save >>> function to save objects data and classes (yes, I realize that I name >>> stomped the data function in package utils). To replicate my results, >>> simply enter the following: library(MASS) load('nolda') lda(data,classes) >>> Error in lda.default(x, grouping, ...) : length of 'dimnames' [2] not >>> equal to array extent >>> >>> Now, I don't know what that means. dimnames(data) >>> NULL dimnames(classes) >>> NULL >>> >>> As for debugging, I don't know how. I cannot debug lda.default as I >>> get the following: debug(lda.default) >>> Error: Object "lda.default" not found >>> >>> I think that that's pretty much it. Can anyone help me? >>> >>> __ >>> R-help@stat.math.ethz.ch mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide! >>> http://www.R-project.org/posting-guide.html >>> >> >> -- >> Brian D. Ripley, [EMAIL PROTECTED] >> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ >> University of Oxford, Tel: +44 1865 272861 (self) >> 1 South Parks Road, +44 1865 272866 (PA) >> Oxford OX1 3TG, UKFax: +44 1865 272595 >> > > -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Problem with multinom ?
Dear Marc, > -Original Message- > From: Marc Girondot [mailto:[EMAIL PROTECTED] > Sent: Saturday, June 11, 2005 2:16 PM > To: John Fox > Cc: [EMAIL PROTECTED] > Subject: RE: [R] Problem with multinom ? > > >Dear Marc, > > > >I get the same results -- same coefficients, standard errors, and > >fitted probabilities -- from multinom() and glm(). It's true > that the > >deviances differ, but they, I believe, are defined only up > to an additive constant: > > > >> predict(dt.b, type="response") > > 1 2 3 > >0.4524946 0.5032227 0.5538845 > > > >> predict(dt.m, type="probs") > > 1 2 3 4 5 6 > >0.4524948 0.4524948 0.5032229 0.5032229 0.5538846 0.5538846 > > > >These fitted probabilities *are* correct: Since the members of each > >pair (1,2), (3,4), and (5,6) have identical values of factor > they are > >identical fitted probabilities. > > I expected rather to obtain (note 1- before some terms): > > 1 2 3 4 5 6 > 0.4524948 1-0.4524948 0.5032229 1-0.5032229 0.5538846 1-0.5538846 > > ... But the fitted probabilities are for each observation in the data set; in your data, these have identical values of factor for each pair and hence identical fitted probabilities. When the response is dichotomous, you get only one of the two fitted probabilities for each observation; for a polytomous response, you get a matrix of fitted probabilities which sum to 1 row-wise. Regards, John > > Marc > -- > > __ > Marc Girondot, Pr > Laboratoire Ecologie, Systématique et Evolution Equipe de > Conservation des Populations et des Communautés CNRS, ENGREF > et Université Paris-Sud 11 , UMR 8079 Bâtiment 362 > 91405 Orsay Cedex, France > > Tel: 33 1 (0)1.69.15.72.30 Fax: 33 1 (0)1 69 > 15 56 96 e-mail: [EMAIL PROTECTED] > Web: http://www.ese.u-psud.fr/epc/conservation/Marc.html > Skype: girondot > Fax in US: 1-425-732-6934 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Performance difference between 32-bit build and 64-bit bu ild on Solaris 8
Scott Gilpin <[EMAIL PROTECTED]> writes: > Andy, Prof. Ripley - thanks for your replies. > > > CFLAGS was not specified, so should default to "-g -O2". It is definitely > > worth checking, although almost all the time in these tests will be spent > > in Fortran code. > Yes - I verified that's the default. > > > >> neither build uses a BLAS. > > > > Well, they do, the one supplied with R (which is in Fortran). > > I guess I should have said that neither build is using an optimized > BLAS. (which I am planning to install - I just haven't had the chance > yet.) I also get a message in config.log "ld: fatal: library -lblas: > not found". I need to investigate this more. Actually, you did say so... Don't worry about error messages like that in config.log; they only mean that there is no system BLAS and the only way to find out is by trial and failure. The configure script is full of that sort of code. > > > and for gcc 3.4.3 and the internal BLAS (which I had to build for these > > tests) I get > > > > 32-bit > > [1] 9.96 0.09 10.12 0.00 0.00 > > 64-bit > > [1] 9.93 0.04 10.04 0.00 0.00 > > > > so I am not seeing anything like the same performance difference between > > 32- and 64-bit builds (but it could well depend on the particular Sparc > > chip). > > These timings are much, much less than what I reported (~700s and > 2200s for 32 bit and 64 bit). I read the admin manual and didn't see > anything specifically that needs to be set to use the internal BLAS. > I guess I'll go back and do some more investigation. Yes. If you have hardcore linear algebra needs, those fast-BLAS speedups can be impressive (Brian might also have a faster machine than you, mind you). For code that is mainly interpreter-bound, it is much less so. While your setup is in place, you might want to play around with the higher optimization levels. GCC on AMD64 sees a quite substantial speedup from -O2 to -O3. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] combination which limited
On 6/11/05, Marc Schwartz <[EMAIL PROTECTED]> wrote: > On Sat, 2005-06-11 at 20:44 +0200, Muhammad Subianto wrote: > > Dear R-helpers, > > I am learning about combination in R. > > I want to combination all of > > possible variable but it limited. > > I am sorry I could not explain exactly. > > For usefull I give an example > > interface <- c("usb","fireware","infra","bluetooth") > > screen<- c("lcd","cube") > > computer <- c("pc","server","laptop") > > available <- c("yes","no") > > > > What the result I need, something like this below, > > usb lcd pc yes > > fireware lcd pc yes > > infralcd pc yes > > bluetoothlcd pc yes > > usb cubepc yes > > usb lcd server yes > > usb lcd laptop yes > > usb lcd pc no > > > > How can I do that? > > I was wondering if someone can help me. > > Thanks you for your time and best regards, > > Muhammad Subianto > > Use: > > > expand.grid(interface, screen, computer, available) >Var1 Var2 Var3 Var4 > 1usb lcd pc yes > 2 fireware lcd pc yes > 3 infra lcd pc yes > 4 bluetooth lcd pc yes > 5usb cube pc yes > 6 fireware cube pc yes > 7 infra cube pc yes > 8 bluetooth cube pc yes > 9usb lcd server yes > 10 fireware lcd server yes > 11 infra lcd server yes > 12 bluetooth lcd server yes > 13 usb cube server yes > 14 fireware cube server yes > 15 infra cube server yes > 16 bluetooth cube server yes > 17 usb lcd laptop yes > 18 fireware lcd laptop yes > 19 infra lcd laptop yes > 20 bluetooth lcd laptop yes > 21 usb cube laptop yes > 22 fireware cube laptop yes > 23 infra cube laptop yes > 24 bluetooth cube laptop yes > 25 usb lcd pc no > 26 fireware lcd pc no > 27 infra lcd pc no > 28 bluetooth lcd pc no > 29 usb cube pc no > 30 fireware cube pc no > 31 infra cube pc no > 32 bluetooth cube pc no > 33 usb lcd server no > 34 fireware lcd server no > 35 infra lcd server no > 36 bluetooth lcd server no > 37 usb cube server no > 38 fireware cube server no > 39 infra cube server no > 40 bluetooth cube server no > 41 usb lcd laptop no > 42 fireware lcd laptop no > 43 infra lcd laptop no > 44 bluetooth lcd laptop no > 45 usb cube laptop no > 46 fireware cube laptop no > 47 infra cube laptop no > 48 bluetooth cube laptop no > > > See ?expand.grid for more information. > After you do the above you will still want to cut it down to just the rows you need. As expained, use expand.grid. Let's assume you used this statement: dd <- expand.grid(interface = interface, screen = screen, computer = computer, available = available) There are several possibilities now: 1. you could list out dd on the console and note the number of the rows you want to keep: idx <- c(1,5,7) dd2 <- dd[,idx] or if you want most of them it may be easier to record which ones you do not want: ndix <- c(2,4,7) dd2 <- dd[,-ndix] 2. Another possibility is to export it to a spreadsheet and visually delete the rows you don't want. 3. A third possibility is to install JGR (which is a free Java GUI front end to R). First download and install JGR from:http://stats.math.uni-augsburg.de/JGR/ In JGR (I am using Windows and its possible that the instructions vary slightly on other platforms): 1. create dd as explained 2. bring up the object browser using the menu Tools | Object Browser or just ctrl-B 3. Select dd from the object browser 4. This will put you into a spreadsheet in which you can select the rows you want to delete (hold down ctrl for the 2nd and subsequent selection to have a non-contiguous multi-row selection). 5. Select Tools | Remove Rows 6. Click on Apply in the lower right of the spreadsheet. 7. Click on X on the upper right of the spreadsheet. the menu entry Tools | Remove Rows. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Problem with multinom ?
>Dear Marc, > >I get the same results -- same coefficients, standard errors, and fitted >probabilities -- from multinom() and glm(). It's true that the deviances >differ, but they, I believe, are defined only up to an additive constant: > >> predict(dt.b, type="response") > 1 2 3 >0.4524946 0.5032227 0.5538845 > >> predict(dt.m, type="probs") > 1 2 3 4 5 6 >0.4524948 0.4524948 0.5032229 0.5032229 0.5538846 0.5538846 > >These fitted probabilities *are* correct: Since the members of each pair >(1,2), (3,4), and (5,6) have identical values of factor they are identical >fitted probabilities. I expected rather to obtain (note 1- before some terms): > 1 2 3 4 5 6 0.4524948 1-0.4524948 0.5032229 1-0.5032229 0.5538846 1-0.5538846 ... Marc -- __ Marc Girondot, Pr Laboratoire Ecologie, Systématique et Evolution Equipe de Conservation des Populations et des Communautés CNRS, ENGREF et Université Paris-Sud 11 , UMR 8079 Bâtiment 362 91405 Orsay Cedex, France Tel: 33 1 (0)1.69.15.72.30 Fax: 33 1 (0)1 69 15 56 96 e-mail: [EMAIL PROTECTED] Web: http://www.ese.u-psud.fr/epc/conservation/Marc.html Skype: girondot Fax in US: 1-425-732-6934 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Problem with multinom ?
>On Sat, 11 Jun 2005, John Fox wrote: > >>Dear Marc, >> >>I get the same results -- same coefficients, standard errors, and fitted >>probabilities -- from multinom() and glm(). It's true that the deviances >>differ, but they, I believe, are defined only up to an additive constant: > >Yes. There are many variations on the definition >of (residual) deviance, but it compares -2 log >likelihood with a `saturated' model. For >grouped data you have a choice: a separate term >for each group or for each observation. A >binomial GLM uses the first but the second is >more >normal in logistic regression (since it has a >direct interpretation via log-probability >scoring). > >multinom() is support software for a book (which >the R posting guide does ask you to consult): >this is discussed with a worked example on pp >203-4. Dear Prof. Ripley, I have your book... but I don't find the answer to my questions... You propose that the difference in residual deviance between two versions of the same model (0.001841823 for glm and 106.2304 for multinom()) is due to a difference in the specification of the satured model. However, as RD=-2 Ln L model+2 Ln L saturated and that -2 Ln L model=11.1146... it seems impossible to me that RD > -2 Ln L model ... Marc Girondot Sorry to be so close-minded ! -- __ Marc Girondot, Pr Laboratoire Ecologie, Systématique et Evolution Equipe de Conservation des Populations et des Communautés CNRS, ENGREF et Université Paris-Sud 11 , UMR 8079 Bâtiment 362 91405 Orsay Cedex, France Tel: 33 1 (0)1.69.15.72.30 Fax: 33 1 (0)1 69 15 56 96 e-mail: [EMAIL PROTECTED] Web: http://www.ese.u-psud.fr/epc/conservation/Marc.html Skype: girondot Fax in US: 1-425-732-6934 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] combination which limited
On Sat, 2005-06-11 at 20:44 +0200, Muhammad Subianto wrote: > Dear R-helpers, > I am learning about combination in R. > I want to combination all of > possible variable but it limited. > I am sorry I could not explain exactly. > For usefull I give an example > interface <- c("usb","fireware","infra","bluetooth") > screen<- c("lcd","cube") > computer <- c("pc","server","laptop") > available <- c("yes","no") > > What the result I need, something like this below, > usb lcd pc yes > fireware lcd pc yes > infralcd pc yes > bluetoothlcd pc yes > usb cubepc yes > usb lcd server yes > usb lcd laptop yes > usb lcd pc no > > How can I do that? > I was wondering if someone can help me. > Thanks you for your time and best regards, > Muhammad Subianto Use: > expand.grid(interface, screen, computer, available) Var1 Var2 Var3 Var4 1usb lcd pc yes 2 fireware lcd pc yes 3 infra lcd pc yes 4 bluetooth lcd pc yes 5usb cube pc yes 6 fireware cube pc yes 7 infra cube pc yes 8 bluetooth cube pc yes 9usb lcd server yes 10 fireware lcd server yes 11 infra lcd server yes 12 bluetooth lcd server yes 13 usb cube server yes 14 fireware cube server yes 15 infra cube server yes 16 bluetooth cube server yes 17 usb lcd laptop yes 18 fireware lcd laptop yes 19 infra lcd laptop yes 20 bluetooth lcd laptop yes 21 usb cube laptop yes 22 fireware cube laptop yes 23 infra cube laptop yes 24 bluetooth cube laptop yes 25 usb lcd pc no 26 fireware lcd pc no 27 infra lcd pc no 28 bluetooth lcd pc no 29 usb cube pc no 30 fireware cube pc no 31 infra cube pc no 32 bluetooth cube pc no 33 usb lcd server no 34 fireware lcd server no 35 infra lcd server no 36 bluetooth lcd server no 37 usb cube server no 38 fireware cube server no 39 infra cube server no 40 bluetooth cube server no 41 usb lcd laptop no 42 fireware lcd laptop no 43 infra lcd laptop no 44 bluetooth lcd laptop no 45 usb cube laptop no 46 fireware cube laptop no 47 infra cube laptop no 48 bluetooth cube laptop no See ?expand.grid for more information. Using: > help.search("combinations") would guide you to that function based upon a keyword search. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Error with function lda in package MASS (dimnames not equal?)
--- Joshua Gilbert <[EMAIL PROTECTED]> wrote: > If this isn't the place to do so, tell me. But, I'd like to petition > to alter the code of lda.default. It seems to me that if you want to alter the code of lda.default, you have everything you need to do so. The code is there, it is GPL'd and you can change it to your heart's content. But if you are suggesting that the lda function be changed for everyone, I'd be personally opposed to doing so. It behaves the way it should behave and if you're looking for a way to save yourself from the way it works, then make the change on a local copy of lda and you'll have solved the problem for you. For me, I vote for the status quo. Dr. Marc R Feldesman Professor & Chair Emeritus Department of Anthropology Portland State University Portland, OR 97207 Please respond to all emails at: [EMAIL PROTECTED] "I've proven who am so many times, the magnetic strip is wearing thin" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] combination which limited
Dear R-helpers, I am learning about combination in R. I want to combination all of possible variable but it limited. I am sorry I could not explain exactly. For usefull I give an example interface <- c("usb","fireware","infra","bluetooth") screen<- c("lcd","cube") computer <- c("pc","server","laptop") available <- c("yes","no") What the result I need, something like this below, usb lcd pc yes fireware lcd pc yes infralcd pc yes bluetoothlcd pc yes usb cubepc yes usb lcd server yes usb lcd laptop yes usb lcd pc no How can I do that? I was wondering if someone can help me. Thanks you for your time and best regards, Muhammad Subianto __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Error with function lda in package MASS (dimnames not equal?)
This is true, they are equal. I hadn't noticed that. Thank you. Now, if lda fails on this given input (equal means), shouldn't we catch it and give a slightly better error message? I've spent a good while going through the debugging process with lda.default. From that perspective it appears that there is a simple change to remove the problem. I am not saying that it is correct in any shape or form, but there is a point where a single transpose would silence the error. So, from a usability standpoint, could we add a check for equal means between classes and throw an error for that specific condition? Yes, the user should not do that. But users may become more interested in making the code run than checking on whether it's doing anything sane. If this isn't the place to do so, tell me. But, I'd like to petition to alter the code of lda.default. On 6/10/05, Prof Brian Ripley <[EMAIL PROTECTED]> wrote: > lda.default <- MASS:::lda.default and proceed. > > Look at the group means of your data: they are identical to machine > accuracy. > > The question has to be `why are you trying to use lda to separate > two groups with identical means'? Lda is not protected against that > and it is rather unlikely unless you failed to inspect your data in any > way. > > On Fri, 10 Jun 2005, Joshua Gilbert wrote: > > > This question appears to have been asked previously, but not answered. > > the last response I can find to this previous thread is here: > > http://tolstoy.newcastle.edu.au/R/help/04/07/0126.html. The asnwer was > > to provide debugging info, not an answer. > > > > So the problem is that I'm trying to use lda on my dataset. You can > > download my data here: > > http://northstar-www.dartmouth.edu/~jgilbert/nolda, I used R's save > > function to save objects data and classes (yes, I realize that I name > > stomped the data function in package utils). To replicate my results, > > simply enter the following: > >> library(MASS) > >> load('nolda') > >> lda(data,classes) > > Error in lda.default(x, grouping, ...) : length of 'dimnames' [2] not > > equal to array extent > > > > Now, I don't know what that means. > >> dimnames(data) > > NULL > >> dimnames(classes) > > NULL > > > > As for debugging, I don't know how. I cannot debug lda.default as I > > get the following: > >> debug(lda.default) > > Error: Object "lda.default" not found > > > > I think that that's pretty much it. Can anyone help me? > > > > __ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html > > > > -- > Brian D. Ripley, [EMAIL PROTECTED] > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UKFax: +44 1865 272595 > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Performance difference between 32-bit build and 64-bit bu ild on Solaris 8
Andy, Prof. Ripley - thanks for your replies. > CFLAGS was not specified, so should default to "-g -O2". It is definitely > worth checking, although almost all the time in these tests will be spent > in Fortran code. Yes - I verified that's the default. > >> neither build uses a BLAS. > > Well, they do, the one supplied with R (which is in Fortran). I guess I should have said that neither build is using an optimized BLAS. (which I am planning to install - I just haven't had the chance yet.) I also get a message in config.log "ld: fatal: library -lblas: not found". I need to investigate this more. > and for gcc 3.4.3 and the internal BLAS (which I had to build for these > tests) I get > > 32-bit > [1] 9.96 0.09 10.12 0.00 0.00 > 64-bit > [1] 9.93 0.04 10.04 0.00 0.00 > > so I am not seeing anything like the same performance difference between > 32- and 64-bit builds (but it could well depend on the particular Sparc > chip). These timings are much, much less than what I reported (~700s and 2200s for 32 bit and 64 bit). I read the admin manual and didn't see anything specifically that needs to be set to use the internal BLAS. I guess I'll go back and do some more investigation. Thanks for your help. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Calculating moments of a distribution
Mohammad Ehsanul Karim wrote: > Dear All, > > Is there any existing package or direct function in R > / S-plus that calculates moments (raw or central); the > usual measures of skewness and kurtosis, and/or > pearsonian k criterion? See package e1071. Uwe Ligges > Thank you for your time. > > > -- > > Mohammad Ehsanul Karim > > Web: http://snipurl.com/ehsan > ISRT, University of Dhaka, BD > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Calculating moments of a distribution
Have you tried RSiteSearch("skewness") and RSiteSearch("kurtosis")? The "fBasics" package has functions for that, as do other packages. spencer graves Mohammad Ehsanul Karim wrote: > Dear All, > > Is there any existing package or direct function in R > / S-plus that calculates moments (raw or central); the > usual measures of skewness and kurtosis, and/or > pearsonian k criterion? > > Thank you for your time. > > > -- > > Mohammad Ehsanul Karim > > Web: http://snipurl.com/ehsan > ISRT, University of Dhaka, BD > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R-help Digest, Vol 28, Issue 11
dwfu wrote: > Dear all, > I'm new using R and in (geo)statistics. I have a problem with solving > my homework questions. We are working with variograms and trying to > write down basic equations for different models (spherical, > exponential, Gaussian). I tried to use the 'gstat' and 'geoR' packages > to solve the questions but as I said before I'm new in R and always > encountered with some syntax errors (I can send some specific examples > later). If one of you used this packages and could help me, I will be > very glad. Please read the posting guide which tells you: - "Use an informative subject line." - "Basic statistics and classroom homework: R-help is not intended for these." - Provide small reproducible examples. Uwe Ligges > Best Wishes, > Emre Duran > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Calculating moments of a distribution
Dear All, Is there any existing package or direct function in R / S-plus that calculates moments (raw or central); the usual measures of skewness and kurtosis, and/or pearsonian k criterion? Thank you for your time. -- Mohammad Ehsanul Karim Web: http://snipurl.com/ehsan ISRT, University of Dhaka, BD __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R-help Digest, Vol 28, Issue 11
> -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] Behalf Of dwfu > Sent: 11 June 2005 13:59 > To: r-help@stat.math.ethz.ch > Subject: Re: [R] R-help Digest, Vol 28, Issue 11 > > > Dear all, > I'm new using R and in (geo)statistics. I have a problem with solving > my homework questions. We are working with variograms and trying to > write down basic equations for different models (spherical, > exponential, Gaussian). I tried to use the 'gstat' and 'geoR' packages > to solve the questions but as I said before I'm new in R and always > encountered with some syntax errors (I can send some specific examples > later). If one of you used this packages and could help me, I will be > very glad. > Best Wishes, > Emre Duran For geoR, go here http://www.est.ufpr.br/geoR/ and study the illustrative session. Ruben __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R-help Digest, Vol 28, Issue 11
Dear all, I'm new using R and in (geo)statistics. I have a problem with solving my homework questions. We are working with variograms and trying to write down basic equations for different models (spherical, exponential, Gaussian). I tried to use the 'gstat' and 'geoR' packages to solve the questions but as I said before I'm new in R and always encountered with some syntax errors (I can send some specific examples later). If one of you used this packages and could help me, I will be very glad. Best Wishes, Emre Duran __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Calling R from C/C
This is discussed in detail in the R-admin manual for R 2.1.0 (the current version). There is no point in repeating the whole discussion (several pages) here: that is the definitive account and you do need the whole picture. On Sat, 11 Jun 2005, SUMIT MALHOTRA wrote: > hi ppl, > this somethin very urgent. plz anybody tell me for my problem:- > > how to make a module/lib that will allow to call easily R code/functions > from C++ (C++ Builder 6). Is it possible without using any intermediate > things. > > plz help > > srry no time for RTFM. Any idea or hint is appreciated. > > n I can help u a lot in projects sort of things. > > plz do reply. > > > byee -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Problem with multinom ?
On Sat, 11 Jun 2005, John Fox wrote: Dear Marc, I get the same results -- same coefficients, standard errors, and fitted probabilities -- from multinom() and glm(). It's true that the deviances differ, but they, I believe, are defined only up to an additive constant: Yes. There are many variations on the definition of (residual) deviance, but it compares -2 log likelihood with a `saturated' model. For grouped data you have a choice: a separate term for each group or for each observation. A binomial GLM uses the first but the second is more normal in logistic regression (since it has a direct interpretation via log-probability scoring). multinom() is support software for a book (which the R posting guide does ask you to consult): this is discussed with a worked example on pp 203-4. dt output factor n 1 m1.2 10 2 f1.2 12 3 m1.3 14 4 f1.3 14 5 m1.4 15 6 f1.4 12 dt.m <- multinom(output ~ factor, data=dt, weights=n) # weights: 3 (2 variable) initial value 53.372333 iter 10 value 53.115208 iter 10 value 53.115208 iter 10 value 53.115208 final value 53.115208 converged dt2 m f factor 1 10 121.2 2 14 141.3 3 15 121.4 dt.b <- glm(cbind(m,f) ~ factor, data=dt2, family=binomial) summary(dt.m) Call: multinom(formula = output ~ factor, data = dt, weights = n) Coefficients: Values Std. Err. (Intercept) -2.632443 3.771265 factor 2.034873 2.881479 Residual Deviance: 106.2304 AIC: 110.2304 Correlation of Coefficients: [1] -0.9981598 summary(dt.b) Call: glm(formula = cbind(m, f) ~ factor, family = binomial, data = dt2) Deviance Residuals: 1 2 3 0.01932 -0.03411 0.01747 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -2.632 3.771 -0.6980.485 factor 2.035 2.881 0.7060.480 (Dispersion parameter for binomial family taken to be 1) Null deviance: 0.5031047 on 2 degrees of freedom Residual deviance: 0.0018418 on 1 degrees of freedom AIC: 15.115 Number of Fisher Scoring iterations: 2 predict(dt.b, type="response") 1 2 3 0.4524946 0.5032227 0.5538845 predict(dt.m, type="probs") 1 2 3 4 5 6 0.4524948 0.4524948 0.5032229 0.5032229 0.5538846 0.5538846 These fitted probabilities *are* correct: Since the members of each pair (1,2), (3,4), and (5,6) have identical values of factor they are identical fitted probabilities. (Note, by the way, the large negative correlation between the coefficients, produced by the configuration of factor values.) I hope this helps, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Marc Girondot Sent: Saturday, June 11, 2005 3:06 AM To: John Fox Cc: r-help@stat.math.ethz.ch Subject: [R] Problem with multinom ? Thanks for your response. OK, multinom() is a more logical in this context. But similar problem occurs: Let these data to be analyzed using classical glm with binomial error: m f factor m theo f theo -Ln L model-Ln L full interecept f 10 12 1.2 0.452494473 0.547505527 1.778835688 1.7786489632.632455675 -2.034882223 14 14 1.3 0.503222759 0.496777241 1.901401922 1.900820284 15 12 1.4 0.553884782 0.446115218 1.877062369 1.876909821 Sum -Ln L 5.557299979 5.556379068 Residual deviance Deviance 11.11459996 11.11275814 0.001841823 If I try to use multinom() function to analyze these data, the fitted parameters are correct but the residual deviance not. dt<-read.table('/try.txt'. header=T) dt output factor n 1 m1.2 10 2 f1.2 12 3 m1.3 14 4 f1.3 14 5 m1.4 15 6 f1.4 12 dt.plr <- multinom(output ~ factor. data=dt. weights=n. maxit=1000) # weights: 3 (2 variable) initial value 53.372333 iter 10 value 53.115208 iter 10 value 53.115208 iter 10 value 53.115208 final value 53.115208 converged dt.plr Call: multinom(formula = output ~ factor. data = dt. weights = n. maxit = 1000) Coefficients: (Intercept) factor -2.6324432.034873 Residual Deviance: 106.2304 AIC: 110.2304 dt.pr1<-predict(dt.plr. . type="probs") dt.pr1 1 2 3 4 5 6 0.4524948 0.4524948 0.5032229 0.5032229 0.5538846 0.5538846 Probability for 2. 4 and 6 are not correct and its explain the non-correct residual deviance obtained in R. Probably the problem I have is due to an incorrect data format... could someone help me... Thanks (I know there is a simple way to analyze binomial data. but in fine I want to use multinom() for
[R] Calling R from C/C
hi ppl, this somethin very urgent. plz anybody tell me for my problem:- how to make a module/lib that will allow to call easily R code/functions from C++ (C++ Builder 6). Is it possible without using any intermediate things. plz help srry no time for RTFM. Any idea or hint is appreciated. n I can help u a lot in projects sort of things. plz do reply. byee - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] TaskView for ecological and environmental data
Dear list, Following off-list discussion with Graham Smith (who initiated the initial request for a task view on this topic) and Philippe Grosjean I have produced a Task View for ecological and environmental data analysis. Graham and Philippe commented on a earlier draft and it is now in a shape suitable for public scrutiny. I have placed a mock-up of the html version of a task view page at the URI below and would welcome your comments and suggestions before I submit it to Achim Zeileis for inclusion in the ctv package. http://ecrc3.geog.ucl.ac.uk/download/taskview/ I will now be out of the office for a week and am not sure what email availability I will have so it may take a little while for me to reply to and act on any suggestions you might make. I would like to thank Graham and Philippe for encouraging me to maintain this view and for their helpful suggestions. All the best, Gavin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Problem with multinom ?
Dear Marc, I get the same results -- same coefficients, standard errors, and fitted probabilities -- from multinom() and glm(). It's true that the deviances differ, but they, I believe, are defined only up to an additive constant: > dt output factor n 1 m1.2 10 2 f1.2 12 3 m1.3 14 4 f1.3 14 5 m1.4 15 6 f1.4 12 > dt.m <- multinom(output ~ factor, data=dt, weights=n) # weights: 3 (2 variable) initial value 53.372333 iter 10 value 53.115208 iter 10 value 53.115208 iter 10 value 53.115208 final value 53.115208 converged > dt2 m f factor 1 10 121.2 2 14 141.3 3 15 121.4 > dt.b <- glm(cbind(m,f) ~ factor, data=dt2, family=binomial) > summary(dt.m) Call: multinom(formula = output ~ factor, data = dt, weights = n) Coefficients: Values Std. Err. (Intercept) -2.632443 3.771265 factor 2.034873 2.881479 Residual Deviance: 106.2304 AIC: 110.2304 Correlation of Coefficients: [1] -0.9981598 > summary(dt.b) Call: glm(formula = cbind(m, f) ~ factor, family = binomial, data = dt2) Deviance Residuals: 1 2 3 0.01932 -0.03411 0.01747 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -2.632 3.771 -0.6980.485 factor 2.035 2.881 0.7060.480 (Dispersion parameter for binomial family taken to be 1) Null deviance: 0.5031047 on 2 degrees of freedom Residual deviance: 0.0018418 on 1 degrees of freedom AIC: 15.115 Number of Fisher Scoring iterations: 2 > predict(dt.b, type="response") 1 2 3 0.4524946 0.5032227 0.5538845 > predict(dt.m, type="probs") 1 2 3 4 5 6 0.4524948 0.4524948 0.5032229 0.5032229 0.5538846 0.5538846 These fitted probabilities *are* correct: Since the members of each pair (1,2), (3,4), and (5,6) have identical values of factor they are identical fitted probabilities. (Note, by the way, the large negative correlation between the coefficients, produced by the configuration of factor values.) I hope this helps, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Marc Girondot > Sent: Saturday, June 11, 2005 3:06 AM > To: John Fox > Cc: r-help@stat.math.ethz.ch > Subject: [R] Problem with multinom ? > > Thanks for your response. > OK, multinom() is a more logical in this context. > > But similar problem occurs: > > Let these data to be analyzed using classical glm with binomial error: > > m f factor m theo f theo > -Ln L model-Ln L full interecept > f > 10 12 1.2 0.452494473 0.547505527 > 1.778835688 1.7786489632.632455675 > -2.034882223 > 14 14 1.3 0.503222759 0.496777241 1.901401922 1.900820284 > 15 12 1.4 0.553884782 0.446115218 1.877062369 1.876909821 > > Sum -Ln L > 5.557299979 5.556379068 Residual deviance > Deviance > 11.11459996 11.11275814 0.001841823 > > If I try to use multinom() function to analyze these data, > the fitted parameters are correct but the residual deviance not. > > > dt<-read.table('/try.txt'. header=T) > > dt >output factor n > 1 m1.2 10 > 2 f1.2 12 > 3 m1.3 14 > 4 f1.3 14 > 5 m1.4 15 > 6 f1.4 12 > > dt.plr <- multinom(output ~ factor. data=dt. weights=n. maxit=1000) > # weights: 3 (2 variable) > initial value 53.372333 > iter 10 value 53.115208 > iter 10 value 53.115208 > iter 10 value 53.115208 > final value 53.115208 > converged > > dt.plr > Call: > multinom(formula = output ~ factor. data = dt. weights = n. > maxit = 1000) > > Coefficients: > (Intercept) factor >-2.6324432.034873 > > Residual Deviance: 106.2304 > AIC: 110.2304 > > > dt.pr1<-predict(dt.plr. . type="probs") > > dt.pr1 > 1 2 3 4 5 6 > 0.4524948 0.4524948 0.5032229 0.5032229 0.5538846 0.5538846 > > Probability for 2. 4 and 6 are not correct and its explain > the non-correct residual deviance obtained in R. > Probably the problem I have is due to an incorrect data > format... could someone help me... > Thanks > > (I know there is a simple way to analyze binomial data. but > in fine I want to use multinom() for 5 categories of outputs. > > > Thanks a lot > > Marc > -- > > __ > Marc Girondot, Pr > Laboratoire Ecologie, Systématique et Evolution Equipe de > Conservation des Populations et des Communautés CNRS, ENGREF > et Université Paris-Sud 11 , UMR 8079 Bâtiment 362 > 91405 Orsay Cedex, France > > Tel: 33 1 (0)1.69.15.72.30
[R] Customer Service enquiry
Dear Valued Customer Thank you for contacting Domainz. Your email enquiry has been received and will be attended by a member of our Customer Service Team. Please note that this is an system generated email response to confirm the receipt of your request, if you need to contact us urgently please do not hesitate to call us on 0800 3662469, or try the 'Frequently asked questions' on our website http://www.domainz.net.nz/info.asp?menu=help&content=faq&page=FAQs Thank you for choosing Domainz. Domainz Customer Service Team Domainz Limited - A Melbourne IT Company [EMAIL PROTECTED] p: +64 4 473 4567 f: +64 4 473 4569 Level 4, Greenock House, 102-112 Lambton Quay, Wellington, NZ www.domainz.net.nz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Sum up the values of a function
Hi On 10 Jun 2005 at 13:58, Stefan Wagner wrote: > I am really sorry, that the code appears in a bad way but it is really > not my fault (at least I think so). I think my mail-programm is > responsible for it because I sent it in a "normal" way without havin g > trouble with my space bar. Here are the missing values (but please > remember, this is only a short example, normally there are more than > 400 values) x1 <- c(34.67,19.91,47.48,22.48,17.34,24.42) m1 <- > c(1,0,0,0,0,0) n1 <- c(1,1,1,1,1,1) dec1 <- c(0,0,0,0,1,0) x2 <- > c(50.98,30.63,31.37,21.17,21.49,39.41,46.85,38.08) m2 <- > c(1,0,0,0,1,0,2,1) n2 <- c(3,3,3,3,3,3,3,3) dec2 <- c(0,0,0,0,0,0,0,0) > > I hope this is all you need, Well, actually I am not sure why you do all computations in so many functions. It will hide all things and one should go through all of them step by step without much knowledge why they are constructed this way. Maybe some other list member can give you any reasonable advice. Sorry I could not give you some positive answer. Petr > > Stefan > > Zitat von Petr Pikal <[EMAIL PROTECTED]>: > > > Hi > > > > Your example is not reproducible as we do not know x1. > > Do you by chance seek for something like aggregate? If yes see > > > > ?aggregate > > or > > ?by > > > > BTW, do you have some trouble with your space bar? > > > > HTH > > > > Petr > > > > On 10 Jun 2005 at 13:08, Stefan Wagner wrote: > > > >> Dear R-Users, > >> > >> I have to do a maximum-likelihood estimation and have now a problem > >> concerning how to sum up my function values in a smart way. I don't > >> know how to explain it easyly, so I give you the code and show you > >> where my problem is. I have shorten the code a little bit, so that > >> you only get the necessary facts: > >> > >> ws12 <- function (z, i) (1/(1+exp(z[1]*(z[3]-x1[i]- > >> z[4]*(m1[i]/n1[i]-0.5) ws37 <- function (z, i) > >> (1/(1+exp(z[2]*(z[3]-x2[i]- z[5]*(m2[i]/n2[i]-0.5) wsAttack12 > >> <- function (z,i) (ws12(z,i)*dec1[i]+(1-ws12(z,i))*(1-dec1[i])) > >> wsAttack37 <- function (z,i) > >> (ws37(z,i)*dec2[i]+(1-ws37(z,i))*(1-dec2[i])) logwsAttack12 <- > >> function (z,i) (log(wsAttack12(z,i))) logwsAttack37 <- function > >> (z,i) (log(wsAttack37(z,i))) ws12sum <- function (z) > >> (logwsAttack12(z,i=1)+logwsAttack12(z,i=2)+logwsAttack12(z,i=3)+log > >> wsA ttack12(z,i=4)+logwsAttack12(z,i=5)+logwsAttack12(z,i=6)) > >> ws37sum <- function (z) > >> (logwsAttack37(z,i=1)+logwsAttack37(z,i=2)+logwsAttack37(z,i=3)+log > >> wsA > >> ttack37(z,i=4)+logwsAttack37(z,i=5)+logwsAttack37(z,i=6)+logwsAttac > >> k37 (z,i=7)+logwsAttack37(z,i=8)) wsLOG <- function (z) (ws12sum(z) > >> + ws37sum(z)) LogSum <- function (z) (-sum(wsLOG(z))) SP <- c(0.16, > >> 0.10, 44, 0.80, 46) out <- nlm (LogSum, p=SP) out > >> > >> For explanation: x1[i], x2[i], m1[i], m2[i], n1[i], n2[i] are given > >> data and z[1:5] are my estimates. My problem is that I have more > >> than one session with diffent number of datas so that I am > >> searching for a general way of summing up my logwsAttack12 and > >> logwsAttack37. The program should recognize how many data are in my > >> table concerning ws12 and how many concerning ws37 and should, in > >> dependency of z, sum them up. I hope you understand my problem and > >> hopefully someone is able to solve it. Many thanks for the moment. > >> > >> Best regards, > >> > >> Stefan Wagner > >> > >> __ > >> R-help@stat.math.ethz.ch mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide! > >> http://www.R-project.org/posting-guide.html > > > > Petr Pikal > > [EMAIL PROTECTED] > > > > > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html Petr Pikal [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] CRAN task view for genetics
Hi Gregor, The compiled list of genetics packages is a great idea, I'll certainly find it useful as it took me quite some time to track them all down through CRAN. I've a couple of things that could be added... qvalue - A procedure for false discovery rate control. General, but developed for the general area of genetics (see Sotrey & Tibshirani 2003 PNAS 100:9440-9445) SimHap - still in development (Beta testing), this is a Java GUI front end for performing a range of statistical genetics analyses, very useful for occasional users who can not get their heads round CLI's. Registration with the author is currently required. Regards Neil -- Neil Shephard Genetics Statistician, ARC Epidemiology Unit [EMAIL PROTECTED] [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Replacing for loop with tapply!?
Hi On 10 Jun 2005 at 20:05, Sander Oom wrote: > Dear all, > > Dimitris and Andy, thanks for your great help. I have progressed to > the following code which runs very fast and effective: > > mat <- matrix(sample(-15:50, 15 * 10, TRUE), 15, 10) > mat[mat>45] <- NA > mat<-NA By this you redefine mat as > str(mat) logi NA > and your code gives an error that it has to have some dimensions + apply(mat, 1, max, na.rm=TRUE)) Error in rowSums(mat > temp, na.rm = TRUE) : 'x' must be an array of at least two dimensions > If your matrix has one row full of NA's it only complains but computes a value. > mat[3,]<-NA > temps <- c(35, 37, 39) > ind <- rbind( + t(sapply(temps, function(temp) +rowSums(mat > temp, na.rm=TRUE) )), + rowSums(!is.na(mat), na.rm=FALSE), + apply(mat, 1, max, na.rm=TRUE)) Warning message: no finite arguments to max; returning -Inf > ind <- t(ind) > ind > ind [,1] [,2] [,3] [,4] [,5] [1,]5539 48 [2,]1119 42 [3,]0000 -Inf > mat > temps <- c(35, 37, 39) > ind <- rbind( > t(sapply(temps, function(temp) >rowSums(mat > temp, na.rm=TRUE) )), > rowSums(!is.na(mat), na.rm=FALSE), > apply(mat, 1, max, na.rm=TRUE)) > ind <- t(ind) > ind > > However, some weather stations have missing values for the whole year. > Unfortunately, the code breaks down (when uncommenting mat<-NA). > > I have tried 'ifelse' statements in the functions, but it becomes even > more of a mess. I could subset the matrix before hand, but this would > mean merging with a complete matrix afterwards to make it compatible > with other years. That would slow things down. > > How can I make the code robust for rows containing all missing values? which(rowSums(!is.na(mat))==0) This gives you indices which lines of your matrix has all values NA and you can use it for fine tuning of your code. What you need to do depends on what results do you want, how ind matrix should look like after processing mat with one or more rows full of NA's. HTH Petr > > Thanks for your help, > > Sander. > > Dimitris Rizopoulos wrote: > > for the maximum you could use something like: > > > > ind[, 1] <- apply(mat, 2, max) > > > > I hope it helps. > > > > Best, > > Dimitris > > > > > > Dimitris Rizopoulos > > Ph.D. Student > > Biostatistical Centre > > School of Public Health > > Catholic University of Leuven > > > > Address: Kapucijnenvoer 35, Leuven, Belgium > > Tel: +32/16/336899 > > Fax: +32/16/337015 > > Web: http://www.med.kuleuven.ac.be/biostat/ > > http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm > > > > > > > > - Original Message - > > From: "Sander Oom" <[EMAIL PROTECTED]> > > To: "Dimitris Rizopoulos" <[EMAIL PROTECTED]> Cc: > > Sent: Friday, June 10, 2005 12:10 PM > > Subject: Re: [R] Replacing for loop with tapply!? > > > > > >>Thanks Dimitris, > >> > >>Very impressive! Much faster than before. > >> > >>Thanks to new found R.basic, I can simply rotate the result with > >>rotate270{R.basic}: > >> > >>>mat <- matrix(sample(-15:50, 365 * 15000, TRUE), 365, 15000) > >>>temps <- c(37, 39, 41) > >>># > >>>#ind <- matrix(0, length(temps), ncol(mat)) > >>>ind <- matrix(0, 4, ncol(mat)) > >>>(startDate <- date()) > >>[1] "Fri Jun 10 12:08:01 2005" > >>>for(i in seq(along = temps)) ind[i, ] <- colSums(mat > temps[i]) > >>>ind[4, ] <- colMeans(max(mat)) > >>Error in colMeans(max(mat)) : 'x' must be an array of at least two > >>dimensions > >>>(endDate <- date()) > >>[1] "Fri Jun 10 12:08:02 2005" > >>>ind <- rotate270(ind) > >>>ind[1:10,] > >> V4 V3 V2 V1 > >>1 0 56 75 80 > >>2 0 46 53 60 > >>3 0 50 58 67 > >>4 0 60 72 80 > >>5 0 59 68 76 > >>6 0 55 67 74 > >>7 0 62 77 93 > >>8 0 45 57 67 > >>9 0 57 68 75 > >>10 0 61 66 76 > >> > >>However, I have not managed to get the row maximum using your > >>method? It > >>should be 50 for most rows, but my first guess code gives an error! > >> > >>Any suggestions? > >> > >>Sander > >> > >> > >> > >>Dimitris Rizopoulos wrote: > >>>maybe you are looking for something along these lines: > >>> > >>>mat <- matrix(sample(-15:50, 365 * 15000, TRUE), 365, 15000) > >>>temps <- c(37, 39, 41) > >>># > >>>ind <- matrix(0, length(temps), ncol(mat)) > >>>for(i in seq(along = temps)) ind[i, ] <- colSums(mat > temps[i]) > >>>ind > >>> > >>> > >>>I hope it helps. > >>> > >>>Best, > >>>Dimitris > >>> > >>> > >>>Dimitris Rizopoulos > >>>Ph.D. Student > >>>Biostatistical Centre > >>>School of Public Health > >>>Catholic University of Leuven > >>> > >>>Address: Kapucijnenvoer 35, Leuven, Belgium > >>>Tel: +32/16/336899 > >>>Fax: +32/16/337015 > >>>Web: http://www.med.kuleuven.ac.be/biostat/ > >>> http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm > >>> > >>> > >>>- Original Message - > >>>From: "Sander Oom" <[EMAIL PROTECTED]> > >>>To: > >>>Sent: Friday, June 10, 2005 10:50 AM > >
Re: [R] us zipcode data map
On Fri, 10 Jun 2005, Mike R wrote: > i've search the email archives, searched the documention > of various map packages and done an R-site search, but > have been unable to find direct resources for creating maps > of the US that are colored or annotated or ... by zipcode > data. > > For example, create a map of the US and color each zipcode > region by its population using two vectors z,p containing the > zipcode and population, respectively. I'm not looking for data > to fill z, and p. I'm looking for some highend functions to > display on a map the data that I already have. I'm replying to this original message, because your followups have been dropping the original query (please don't assume that everybody uses the e-mail system you have chosen yourself). With the points you have found, you can delineate surrounding polygons and clip to the coastline using functions in the tripack, gpclib, and maps packages (being very careful to find out how to match the output polygon IDs to your data). But unless you are aiming for A0 postscript output, for the continental US viewed as a whole, almost all raster image processors will merge the interesting metropolitan zip polygons, leaving the viewer looking at a mess with only the areas with lowest population density legible. Even zip code "maps" of small states are very difficult to use for look-up (reading values from the polygon fill), one of the main functions of thematic cartography. So like most tasks, it can be done (this _is_ R, after all), but do you really need to do it? Roger > > Cheers, > Mike > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > -- Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Problem with multinom ?
Thanks for your response. OK, multinom() is a more logical in this context. But similar problem occurs: Let these data to be analyzed using classical glm with binomial error: m f factor m theo f theo -Ln L model-Ln L full interecept f 10 12 1.2 0.452494473 0.547505527 1.778835688 1.7786489632.632455675 -2.034882223 14 14 1.3 0.503222759 0.496777241 1.901401922 1.900820284 15 12 1.4 0.553884782 0.446115218 1.877062369 1.876909821 Sum -Ln L 5.557299979 5.556379068 Residual deviance Deviance 11.11459996 11.11275814 0.001841823 If I try to use multinom() function to analyze these data, the fitted parameters are correct but the residual deviance not. > dt<-read.table('/try.txt'. header=T) > dt output factor n 1 m1.2 10 2 f1.2 12 3 m1.3 14 4 f1.3 14 5 m1.4 15 6 f1.4 12 > dt.plr <- multinom(output ~ factor. data=dt. weights=n. maxit=1000) # weights: 3 (2 variable) initial value 53.372333 iter 10 value 53.115208 iter 10 value 53.115208 iter 10 value 53.115208 final value 53.115208 converged > dt.plr Call: multinom(formula = output ~ factor. data = dt. weights = n. maxit = 1000) Coefficients: (Intercept) factor -2.6324432.034873 Residual Deviance: 106.2304 AIC: 110.2304 > dt.pr1<-predict(dt.plr. . type="probs") > dt.pr1 1 2 3 4 5 6 0.4524948 0.4524948 0.5032229 0.5032229 0.5538846 0.5538846 Probability for 2. 4 and 6 are not correct and its explain the non-correct residual deviance obtained in R. Probably the problem I have is due to an incorrect data format... could someone help me... Thanks (I know there is a simple way to analyze binomial data. but in fine I want to use multinom() for 5 categories of outputs. Thanks a lot Marc -- __ Marc Girondot, Pr Laboratoire Ecologie, Systématique et Evolution Equipe de Conservation des Populations et des Communautés CNRS, ENGREF et Université Paris-Sud 11 , UMR 8079 Bâtiment 362 91405 Orsay Cedex, France Tel: 33 1 (0)1.69.15.72.30 Fax: 33 1 (0)1 69 15 56 96 e-mail: [EMAIL PROTECTED] Web: http://www.ese.u-psud.fr/epc/conservation/Marc.html Skype: girondot Fax in US: 1-425-732-6934 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Performance difference between 32-bit build and 64-bit bu ild on Solaris 8
On Fri, 10 Jun 2005, Liaw, Andy wrote: > I'm not familiar with Solaris, so take this with appropriate dose of NaCl... > > For the 64-bit build, why not have the -O2 for gcc, since you have it for > g77 and g++? If you just run vanilla configure for the 32-bit build, I > believe it uses -O2 for all three compilers by default. If that's the > difference, perhaps it's sufficient to explain the performance difference. CFLAGS was not specified, so should default to "-g -O2". It is definitely worth checking, although almost all the time in these tests will be spent in Fortran code. > The other thing is that solve(), and to some extent lm(), can benefit from > an optimized BLAS. You might want to check if the two builds are using > equivalent BLAS. He said: >> neither build uses a BLAS. Well, they do, the one supplied with R (which is in Fortran). To supplement my timings earlier, with an optimized BLAS I gave 32-bit [1] 4.99 0.03 5.02 0.00 0.00 64-bit [1] 5.25 0.03 5.29 0.00 0.00 and for gcc 3.4.3 and the internal BLAS (which I had to build for these tests) I get 32-bit [1] 9.96 0.09 10.12 0.00 0.00 64-bit [1] 9.93 0.04 10.04 0.00 0.00 so I am not seeing anything like the same performance difference between 32- and 64-bit builds (but it could well depend on the particular Sparc chip). There are some problems in which the 64-bit builds _are_ much slower: complex matrix arithmetic is one (presumably because less effort has been spent on optimizing that). > > Andy > >> From: Scott Gilpin >> >> Hi everyone - >> >> I'm seeing a 32-bit build perform significantly faster (up to 3x) than >> a 64 bit build on Solaris 8. I'm running R version 2.1.0. Here are >> some of my system details, and some resulting timings: >> >>> uname -a >> SunOS lonetree 5.8 Generic_117350-16 sun4u sparc SUNW,Sun-Fire-V440 >> >> lonetree /home/sgilpin >gcc -v >> Reading specs from /usr/local/lib/gcc/sparc-sun-solaris2.8/3.4.2/specs >> Configured with: ../configure --with-as=/usr/ccs/bin/as >> --with-ld=/usr/ccs/bin/ld --disable-nls >> Thread model: posix >> gcc version 3.4.2 >> >> I built the 32 bit version of R with no changes to config.site. I >> built the 64 bit version with the following in config.site: >> >> CC="gcc -m64" >> FFLAGS="-m64 -g -02" >> LDFLAGS="-L/usr/local/lib/sparcv9 -L/usr/local/lib" >> CXXFLAGS="-m64 -g -02" >> >> neither build uses a BLAS. Both builds are installed on the same >> machine, and the same disk. The machine has virtually no load; R is >> one of the only processes running during these timings: -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Performance difference between 32-bit build and 64-bit build on Solaris 8
Your tests are of problems where you really should be using an optimized BLAS. But because those pointers are twice the size, the L1 cache will hold half as many and so I am not surprised at a factor of three on a naive implementation. For linear algebra on large matrices the key to good performance is to keep L1 cache misses to a minimum. Using SunPerf and a 1000x1000 problem I got 32-bit [1] 4.99 0.03 5.02 0.00 0.00 64-bit [1] 5.25 0.03 5.29 0.00 0.00 and for your regression problem 32-bit [1] 24.97 0.96 26.15 0.00 0.00 64-bit [1] 26.25 1.06 27.52 0.00 0.00 So the moral appears to be to take the advice in the R-admin manual and tune your linear algebra system. On Fri, 10 Jun 2005, Scott Gilpin wrote: > Hi everyone - > > I'm seeing a 32-bit build perform significantly faster (up to 3x) than > a 64 bit build on Solaris 8. I'm running R version 2.1.0. Here are > some of my system details, and some resulting timings: > >> uname -a > SunOS lonetree 5.8 Generic_117350-16 sun4u sparc SUNW,Sun-Fire-V440 > > lonetree /home/sgilpin >gcc -v > Reading specs from /usr/local/lib/gcc/sparc-sun-solaris2.8/3.4.2/specs > Configured with: ../configure --with-as=/usr/ccs/bin/as > --with-ld=/usr/ccs/bin/ld --disable-nls > Thread model: posix > gcc version 3.4.2 > > I built the 32 bit version of R with no changes to config.site. I > built the 64 bit version with the following in config.site: > > CC="gcc -m64" > FFLAGS="-m64 -g -02" > LDFLAGS="-L/usr/local/lib/sparcv9 -L/usr/local/lib" > CXXFLAGS="-m64 -g -02" > > neither build uses a BLAS. Both builds are installed on the same > machine, and the same disk. The machine has virtually no load; R is > one of the only processes running during these timings: > > First comparison: solve on a large matrix > >> echo 'set.seed(1);M<-matrix(rnorm(9e6),3e3);system.time(solve(M))' | > /disk/loneres01/R-2.1.0-32bit/bin/R -q --vanilla >> set.seed(1);M<-matrix(rnorm(9e6),3e3);system.time(solve(M)) > [1] 713.45 0.38 713.93 0.00 0.00 >> > >> echo 'set.seed(1);M<-matrix(rnorm(9e6),3e3);system.time(solve(M))' | > /disk/loneres01/R-2.1.0-64bit/bin/R -q --vanilla >> set.seed(1);M<-matrix(rnorm(9e6),3e3);system.time(solve(M)) > [1] 2277.050.31 2278.380.000.00 >> > > Second comparison: linear regression > > lonetree /home/sgilpin/R >echo 'set.seed(1); > y<-matrix(rnorm(1*500),500); > x<-matrix(runif(500*100),500); > system.time(fit<-lm(y~x))' | /disk/loneres01/R-2.1.0-32bit/bin/R -q --vanilla >> set.seed(1);y<-matrix(rnorm(1*500),500);x<-matrix(runif(500*100),500);system.time(fit<-lm(y~x)) > [1] 23.34 0.80 24.17 0.00 0.00 >> > > lonetree /home/sgilpin/R >echo 'set.seed(1); > y<-matrix(rnorm(1*500),500); > x<-matrix(runif(500*100),500); > system.time(fit<-lm(y~x))' | /disk/loneres01/R-2.1.0-64bit/bin/R -q --vanilla >> set.seed(1);y<-matrix(rnorm(1*500),500);x<-matrix(runif(500*100),500);system.time(fit<-lm(y~x)) > [1] 55.34 0.70 56.21 0.00 0.00 >> > > Final comparison: stats-Ex.R (from R-devel) > lonetree /home/sgilpin/R >time /disk/loneres01/R-2.1.0-32bit/bin/R -q > --vanilla CMD BATCH stats-Ex.R > > real1m4.042s > user0m47.400s > sys 0m10.390s > lonetree /home/sgilpin/R >time /disk/loneres01/R-2.1.0-64bit/bin/R -q > --vanilla CMD BATCH stats-Ex.R > > real1m20.017s > user1m3.590s > sys 0m10.130s > > I've seen Prof. Ripley and others comment that a 64 bit build will be > a little slower because the pointers are larger, and gc() will take > longer, but these timings seem out of this range. > > Any thoughts? > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html