Re: [R] glm with family binomial and link identity
> Am Wed, 27 Apr 2022 10:27:33 +0200 schrieb Ralf Goertz > > > Hi, > > > > I just noticed that (with my version 4.2.0) it is no longer possible > > to use glm with family=binomial(link=identity). Why is that? It was > > possible with 4.0.x as a colleague of mine just confirmed. After all > > it is useful to compute risk differences with factors. > > Sorry about the noise. It still works with "identity" but it has to be > quoted (whereas "log" and "logit" also work unquoted). On the other hand why do I get this following error at all? Error in binomial(link = identity) : link "identity" not available for binomial family; available links are ‘logit’, ‘probit’, ‘cloglog’, ‘cauchit’, ‘log’ This is a bit misleading since identity is a legitimate link function and works perfectly when quoted. The results of that call are in agreement with what you get with the risk difference approach (for convenience computed using the meta package) > d=data.frame(y=c(0, 1, 0, 1), trt=c(0, 0, 1, 1), w=c(97, 3, 90, 10)) > d y trt w 1 0 0 97 2 1 0 3 3 0 1 90 4 1 1 10 > res=glm(y~trt,data=d,weights=w,family=binomial(link="identity")) > summary(res) Call: glm(formula = y ~ trt, family = binomial(link = "identity"), data = dd, weights = w) Deviance Residuals: 1 2 3 4 -2.431 4.587 -4.355 6.786 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.030000.01706 1.759 0.0786 . trt 0.070000.03451 2.028 0.0425 * … > confint.default(res)[2,] 2.5 % 97.5 % 0.002359942 0.137640058 This is exactly the same as > meta::metabin(10,100,3,100,sm="RD") Number of observations: o = 200 Number of events: e = 13 RD 95%-CIz p-value 0.0700 [0.0024; 0.1376] 2.03 0.0425 So why the error? __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glm with family binomial and link identity
Am Wed, 27 Apr 2022 10:27:33 +0200 schrieb Ralf Goertz : > Hi, > > I just noticed that (with my version 4.2.0) it is no longer possible > to use glm with family=binomial(link=identity). Why is that? It was > possible with 4.0.x as a colleague of mine just confirmed. After all > it is useful to compute risk differences with factors. Sorry about the noise. It still works with "identity" but it has to be quoted (whereas "log" and "logit" also work unquoted). __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] glm with family binomial and link identity
Hi, I just noticed that (with my version 4.2.0) it is no longer possible to use glm with family=binomial(link=identity). Why is that? It was possible with 4.0.x as a colleague of mine just confirmed. After all it is useful to compute risk differences with factors. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RMariaDB returns a query without microseconds
Am Mon, 26 Jul 2021 16:07:24 + (UTC) schrieb Baki UNAL via R-help : > Hi > > I can query a table from a mysql database with RMariaDB. One of the > table's column indicates "trade_time" and contains values such as > "09:55:02.113000". When I query this table I can not get fractional > seconds. I get a value such as "09:55:02". Also I get a variable > class such as "hms" and "difftime" for this column. Not character or > POSIX* format. I tried both "datetime" and "varchar(25)" as column > type of "trade_time" in mysql. How can I solve this problem? Did you tell mariadb to include microsecond? I you just do > create table dt (d TIME); > insert into dt values("09:55:02.113000"); > select * from dt; +--+ | d| +--+ | 09:55:02 | +--+ 1 row in set (0.000 sec) the fractional part is gone. But if you instead say > create table dt (d DATETIME(6)); you get > select * from dt; ++ | d | ++ | 2020-07-23 09:55:02.113000 | ++ 1 row in set (0.001 sec) And I also see this in R: > dbGetQuery(con,"select * from dt") d 1 2020-07-23 09:55:02.113000 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How can certain variables be automatically excluded from being saved?
Am Thu, 1 Apr 2021 08:39:45 -0400 schrieb Duncan Murdoch : > On 01/04/2021 4:05 a.m., Ralf Goertz wrote: > > Hi, > > > > after having read here about the "seed problem" I wonder if there > > is a way to automatically exclude certain variables from being > > saved when the workspace image is saved at the end of an > > interactive session. I have been using .First() and .Last() for > > ages but apparently they are of no help as .First() gets called > > before loading the workspace and .Last() after it has been saved. > > At least the line > > > > if (".Random.seed" %in% ls(all.names=T)) rm(.Random.seed, pos=1) > > > > in either of those functions doesn't have the desired effect. > > Jim suggested a way to do that, but I don't think it's really a good > idea: it just fixes one aspect of the problem, it doesn't solve the > whole thing. > > The real problem is saving the workspace occasionally, but always > loading it. The "always loading" part is automatic, so I think the > real solution should address the "occasionally saving" part. Yes, that is exactly what I quite often do. I have to work in different projects and I usually start R in the project directory. When I do serious stuff I save afterwards. But quite often I want to check something quickly and then I don't want to clutter up my workspace. > If you always save the workspace, things are fine. You'll save the > seed at the end of one session, and load it at the beginning of the > next. > > If you never save the workspace, things are also fine. You'll always > generate a new seed in each session that needs one. > > Personally, I believe in the "never save it" workflow. I think it > has lots of benefits besides the random seed issue: you won't get a > more-and-more cluttered workspace over time, you end up with more > reproducible results, etc. However, I can understand that some > people use a different workflow, so "always save it" is sometimes a > reasonable choice. > > So the real problem is the "sometimes save it" workflow, which is > **encouraged** by the default q(save = "default") option, which asks > when interactive. Changing the default to act like q(save = "no") > would be my preference (and that's how I configure things), but > changing it to act like q(save = "yes") would be an improvement over > the current choice. I would prefer to be able to only save the history since that is where the work is done. Usually, my data is easily restored using commands from the history. I could probably accomplish that by linking .RData to /dev/null or making it an empty readonly file. However, I would have to do that in every directory I happen to use R in. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How can certain variables be automatically excluded from being saved?
Hi, after having read here about the "seed problem" I wonder if there is a way to automatically exclude certain variables from being saved when the workspace image is saved at the end of an interactive session. I have been using .First() and .Last() for ages but apparently they are of no help as .First() gets called before loading the workspace and .Last() after it has been saved. At least the line if (".Random.seed" %in% ls(all.names=T)) rm(.Random.seed, pos=1) in either of those functions doesn't have the desired effect. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] random numbers with constraints
Am Wed, 27 Jan 2021 09:03:15 +0100 schrieb Denis Francisci : > Hi, > I would like to generate random numbers in R with some constraints: > - my vector of numbers must contain 410 values; > - min value must be 9.6 and max value must be 11.6; > - sum of vector's values must be 4200. > Is there a way to do this in R? > And is it possible to generate this series in such a way that it > follows a specific distribution form (for example exponential)? > Thank you in advance, In principle it should be possible. But I guess you are asking too much with three given values considering that you only have one paramter for the exponential distribution. For instance, if you only had given min and max, and wanted a normal distribution then you could have just taken 410 random values from a standard normal: x=rnorm(410) then centered it: x=x-mean(x) then scaled it so its span equals the one for your given max (M) and min (m) values: x=x*(M-m)/(max(x)-min(x)) and finally shift it such that the mininum becomes m: x=x-min(x)+m. Note however, that the things you are allowed to do with your vector of random numbers depend on the distribution if you want the result to still follow that type of distribution. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] loading edited functions already in saved workspace automatically
Am Tue, 09 May 2017 10:00:17 -0700 schrieb Jeff Newmiller : > This boils down to the fact that some "my ways" are more effective in > the long run than others.. but I really want to address the complaint > > "... sometimes tedious to rebuild my environment by reexecuting > commands in the history" > > by asserting that letting R re-run a script that loads my functions > and packages (though perhaps not the data analysis steps) is always > very fast and convenient to do explicitly. I almost never use the > history file feature, because I type nearly every R instruction I use > into a script file and execute/edit it until it does what I want. I > keep functions in a separate file or package, and steps dealing with > a particular data set in their own file that uses source() to load > the functions) even when I am executing the lines interactively. My > goal is to regularly re-execute the whole script so that > tomorrow/next year/whenever someone notices something was wrong then > I can re-execute the sequence without following the dead ends I went > down the first time (as using a history file does) and I don't have a > separate clean-up-the-history-file step to go through to create it. > When I have confirmed that the script still works as it did before > then I can find where the analysis/data problem went wrong and fix it. My usual work with R is probably a bit different from yours. As I said before I work on many projects (often simultaneously) but I do routine work. For that I have my super function, the one I want to reload every time R starts, at the moment about 250 lines of code. This is always work in progress. In almost every project there is something that makes me edit this function. But in order to apply my function I need to prepare the data, e.g. getting them from a database or csv files, renaming the columns of data.frames etc. This is all tedious and not worth putting in scripts because these steps are very specific to the project and are rarely needed more than once. Sometimes one or two data records in project on which I worked a few days before turn out to be wrong and need to be changed. That's why I want to keep the data because changing the data.frame directly is much easier then starting from scratch. Meanwhile my function has evolved. But in the .RData file is still the old version, which is bad. However, I found a solution! .Last() gets executed before saving here, too. I simply had forgotten that I need to use rm() with pos=1, i.e. rm(myfun,pos=1) because otherwise rm wants to delete myfun from within the context of the function .Last() where it doesn't live. I changed my .Rprofile to: .First=function(){ assign("myfun",eval(parse(file=("~/R/myfun.R"))),pos=1) } .Last=function(){ rm(.First,pos=1) rm(myfun,pos=1) rm(.Last,pos=1) } and everything works as I want it. So no design flaw but still way too complicated in my opinion. Thanks to everybody who came up with suggestions. Ralf __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] loading edited functions already in saved workspace automatically
Am Sat, 6 May 2017 11:17:42 -0400 schrieb Michael Friendly : > On 5/5/2017 10:23 AM, Ralf Goertz wrote: > > Am Fri, 05 May 2017 07:14:36 -0700 > > schrieb Jeff Newmiller : > > > >> R normally prompts you to save .RData, but it just automatically > >> saves .Rhistory... the two are unrelated. > > > > Not here. If I say "n" to the prompted question "Save workspace > > image? [y/n/c]: " my history doesn't get saved. > > > > Version: > > > > R version 3.3.1 (2016-06-21) -- "Bug in Your Hair" > > Copyright (C) 2016 The R Foundation for Statistical Computing > > Platform: x86_64-suse-linux-gnu (64-bit) > > > > On Windoze, here's what I use in my .Rprofile, which runs every time > I start an RGUI coonsole. The key is .First & .Last to load/save > history automagically. Hi Michael, thanks. This helps with saving the history without saving the data. But actually I'd really like to save both and still be able to load functions automatically from .Rprofile. Not saving the data as Jeff suggested is not a good option because it is sometimes tedious to rebuild my environment by reexecuting commands in the history. And I explained in my OP why I can't use .First() to achieve my goal. But let me try again to explain the problem because I think not everybody understood what I was trying to say. For simplicity I use the plain variable "a" instead of a function. Start a fresh session and remove all variables, define one variable and quit with saving: > rm(list=ls()) > a=17 > quit(save="yes") Now, before opening a new session edit .Rprofile such that it contains just the two lines: print("Hello from .Rprofile") a=42 Start a new session where your saved environment will be loaded. Observe that you see the line [1] "Hello from .Rprofile" proving that the commands in .Rprofile have been executed. Now look at "a": > a [1] 17 You would expect to see this because *after* your "Hello" line you find [Previously saved workspace restored] So you have set "a" to 42 in .Rprofile but it gets overwritten from the previously saved and now restored workspace. On the other hand, .First() gets executed after the restoring of the workspace. Therefore, I could edit .Rprofile to read .First=function(){ assign("a",42,pos=1) } Now, after starting I see that "a" is indeed 42. But then it turns out that from now on I need "a" to be 11. After editing .Rprofile accordingly, I am quite hopeful but after starting a new session I see that "a" is still 42. Why is that? Because .First() was saved and when I started a new session it got a new function body (setting "a" to 11) but before it could be executed it was again overwritten by the old value (setting "a" to 42) and I am chasing my own tail. Sigh. .Last() doesn't help. Apparently (at least on my linux system) it is executed *after* saving the environment so too late to remove anything you don't want saved. In that regard linux doesn't seem to be typical, since in "?.Last" the reverse order is described as typical: Exactly what happens at termination of an R session depends on the platform and GUI interface in use. A typical sequence is to run ‘.Last()’ and ‘.Last.sys()’ (unless ‘runLast’ is false), to save the workspace if requested (and in most cases also to save the session history: see ‘savehistory’), then run any finalizers (see ‘reg.finalizer’) that have been set to be run on exit, close all open graphics devices, remove the session temporary directory and print any remaining warnings (e.g., from ‘.Last()’ and device closure). IMHO this is a design flaw. Ralf __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] loading edited functions already in saved workspace automatically
Am Fri, 05 May 2017 07:14:36 -0700 schrieb Jeff Newmiller : > R normally prompts you to save .RData, but it just automatically > saves .Rhistory... the two are unrelated. Not here. If I say "n" to the prompted question "Save workspace image? [y/n/c]: " my history doesn't get saved. Version: R version 3.3.1 (2016-06-21) -- "Bug in Your Hair" Copyright (C) 2016 The R Foundation for Statistical Computing Platform: x86_64-suse-linux-gnu (64-bit) __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] loading edited functions already in saved workspace automatically
Am Fri, 05 May 2017 06:30:01 -0700 schrieb Jeff Newmiller : > The answer most people seem to use is to avoid depending on functions > in RData files, and in particular avoiding ever saving the > "automatic" ".RData" files at all. (Some people avoid using any RData > files, but the automatic loading of functions by ".RData" files is a > particularly pernicious source of evil as you have already > discovered.) > > That is, always work toward building scripts that you run to restore > your workspace rather than depending on save files. Don't depend on > save files to keep track of what you do interactively. This also > usually means that there should be little if anything in > your .Rprofile because that tends to build non-reproducibility into > your scripts. Hi Jeff, thanks for your answer. Actually, I don't use the workspace saving feature primarily for the data but for the command line history. Is there a way to just save .Rhistory? Ralf __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] loading edited functions already in saved workspace automatically
Hi, In short: Is it possible to have the previously saved workspace restored and nevertheless load a function already existing in this workspace via .Rprofile anyway? In detail: I use different directories for different projects. In all those projects I use a function which I therefore try to get into the session by `myfunc=eval(parse(file=("~/R/myfunc.R")))' in ~/.Rprofile. Once I leave the session thereby saving the workspace this function gets saved in ./.RData as well. In a subsequent session in that directory it gets loaded back. However, in the meantime I might have edited ~/R/myfunc.R. I don't seem to be able to automatically load the new function into the session. The workspace gets loaded *after* the execution of ~/.Rprofile. So the new definition of myfunc() gets overwritten by the old one. I can't use .First() – which is executed after loading the workspace – because this would load myfunc() into the environment of .First() instead of the global environment. I could use .Last() to remove the function before saving the workspace. But then .Last() gets saved to the workspace which is also not convenient since when I add another function the same way and edit the definition of .Last() in ~/.Rprofile to also remove that function this does not work because I don't get the new .Last() into the session automatically. And no, removing .Last() from within .Last() doesn't work. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] readline issue with 3.3.1
Am Mon, 25 Jul 2016 08:57:17 +0200 schrieb Martin Maechler : >> Ralf Goertz on Fri, 22 Jul 2016 10:15:36 +0200 >> writes: > >> It would be great if – while fixing this – you also took care of the >> SIGWINCH problem described in bug report >> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16604 > Well, that *has* been fixed in 'R-devel' and 'R 3.3.1 patched' ... but > again only for readline >= 6.3 [snip] > One reason we have not closed the bug is that the fix is only for > readline >= 6.3, ... and also that I don't think we got much of user > confirmation of the form " yes, the bug goes away once I compile > R-devel or R-patched " Thanks for the explanation. I will provide confirmation as soon as R-patched hits my repositories. ;-) Ralf __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] readline issue with 3.3.1
Am Thu, 21 Jul 2016 18:07:43 +0200 schrieb Martin Maechler : > Ralf Goertz on Wed, 20 Jul 2016 16:37:53 +0200 > writes: >> I installed readline version 6.3 and the issue is gone. So probably >> some of the recent changes in R's readline code are incompatible with >> version readline version 6.2. > > Yes, it seems so, unfortunately. > > Thank you for reporting ! It would be great if – while fixing this – you also took care of the SIGWINCH problem described in bug report https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16604 Thanks, Ralf __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] readline issue with 3.3.1
Am Wed, 20 Jul 2016 11:35:31 +0200 schrieb Ralf Goertz : > Hi, > > after a recent update to version 3.3.1 on Opensuse Leap I have > problems with command lines longer than the terminal width. E.g. when > I do this I installed readline version 6.3 and the issue is gone. So probably some of the recent changes in R's readline code are incompatible with version readline version 6.2. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] readline issue with 3.3.1
Hi, after a recent update to version 3.3.1 on Opensuse Leap I have problems with command lines longer than the terminal width. E.g. when I do this > print("This is a very long line which is in fact so long that it gets wrapped while writing it") and then hit enter I end up with: > print("This is a very long line which is in fact [1] "This is a very long line which is in fact so l ong that it gets wrapped while writing it" So the output overwrites the second line of input. This does not happen when I start R without readline support using "R --no-readline". That's why I thought it could be a readline problem. But my current readline6 version (6.2) was installed way before the update of R and I had no problems with the previous R version. Furthermore no other program using readline seems to have that problem. E.g. in bash: me@host:~/some/dir> echo This is a very long line which is in fact so long that it gets wrapped while writing it This is a very long line which is in fact so long that it gets wrapped whi le writing it __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] cygwin utf-8 problem with "help"
Hi, In R under cygwin I have trouble to correctly display help in text format. One problem arises when there are single qoutes in the text like in "?help" which looks like this helppackage:utils R Documentation Documentation Description: Usage: help(topic, package = NULL, lib.loc = NULL, verbose = getOption("verbose"), try.all.packages = getOption("help.try.all.packages"), help_type = getOption("help_type")) Arguments: Details: The following types of help are available: Plain text help ... The whole paragraph containing typographic single quotes is not displayed. I can work around that a bit by issuing tools::Rd2txt_options(code_quote=FALSE) but there are still those single quotes like in "See `Details´ for what happens if this is omitted." Furthermore, lines with bullets don't get indented and no bullets are displayed (the "Plain text help" line above). I am using a UTF-8 locale. When setting LANG to "C" or a latin1 locale, help is displayed correctly, but then there are all sorts of problems with non ASCII characters. What can I do? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] using R's svd from outside R
Hi, I have to compute the singular value decomposition of rather large matrices. My test matrix is 10558 by 4255 and it takes about three minutes in R to decompose on a 64bit quadruple core linux machine. (R is running svd in parallel, all four cores are at their maximum load while doing this.) I tried several blas and lapack libraries as well as the gnu scientific library in my C++ programm. Apart from being unable to have them do svd in parallel mode (although I thought I did everything to make them do it in parallel) execution time always exceeds 25 minutes which is still way more than the expected 12 minutes for the non-parallel R code. I am now going to call R from within my program, but this not very elegant. So my questions are: Does R use a special svd-routine and is it possible to use it directly by linking in the relevant libraries? (Sorry but I couldn't figure that out by looking at the source code.) If that is possible, can I have the code run in parallel mode? Thanks, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] deleting variables
How can I automatically exclude one variable from being saved in the workspace when I quit an R session? The problem is I don't know how to erase a variable once it has been created. Background: I open a connection called "con" to a database server in my ~/.Rprofile. Obviously, the connection expires when quitting the R session. Unfortunately, the workspace is loaded after ~/.Rprofile is run. So "con" get overwritten by the old workspace. I thought of using .First() or .Last() but as these are functions I don't know how to modify global variables. Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] completion doesn't work anymore
Peter Dalgaard, Montag, 28. April 2008: > Peter Dalgaard wrote: > The 10.2 version that I just installed does seem to work though: > viggo:~/>R > > R version 2.7.0 (2008-04-22) > Copyright (C) 2008 The R Foundation for Statistical Computing > ISBN 3-900051-07-0 > ... > > air > airmilesairquality > > airquality$ > airquality$Day airquality$Ozoneairquality$Temp > airquality$Monthairquality$Solar.R airquality$Wind Confirmed. It works on an openSUSE 10.2 x64 machine but not on 10.3 i386. It failed on to computers with that system/version combination, one of them never had R installed before. I can't say whether it is version related or system related. What machine did you use for your test, x64 or i386? Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] completion doesn't work anymore
Hi, after updating to 2.7.0, command line completion doesn't work anymore. I understand that the package rcompgen is now part of utils. I hadn't used rcompgen before but completion worked without it (I double checked it using an 2.6.2 version on another machine). Now, it doesn't work, even after switching on all option via "rc.settings". I updated R using the opensuse 10.3 repository. For instance I have a data frame d. When I enter d$ I see a list of files in the current directory instead of the names in the data frame. What can I do to turn it back on? [EMAIL PROTECTED]:~> R --version R version 2.7.0 (2008-04-22) Copyright (C) 2008 The R Foundation for Statistical Computing ISBN 3-900051-07-0 R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under the terms of the GNU General Public License version 2. For more information about these matters see http://www.gnu.org/licenses/. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Ransac implementation
Hi, is there an R-implementation of the RANSAC-algorithm? Thanks, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-squared value for linear regression passing through origin using lm()
Berwin A Turlach, Freitag, 19. Oktober 2007: > G'day Ralf, Hi Berwin, > On Fri, 19 Oct 2007 09:51:37 +0200 Ralf Goertz <[EMAIL PROTECTED]> > wrote: > > Why should either of those formula yield the output of > summary(lm(y~x+0)) ? The R-squared output of that command is > documented in help(summary.lm): > > r.squared: R^2, the 'fraction of variance explained by the model', > > R^2 = 1 - Sum(R[i]^2) / Sum((y[i]- y*)^2), yes I know. But you know why I chose those formulas, right? > where y* is the mean of y[i] if there is an intercept and > zero otherwise. > > And, indeed: > > > 1-sum(residuals(lm(y~x+0))^2)/sum((y-0)^2) > [1] 0.9796238 > > confirms this. > > Note: if you do not have an intercept in your model, the residuals do > not have to add to zero; and, typically, they will not. Hence, > var(residuals(lm(y~x+0)) does not give you the residual sum of squares. Yes I am right, you know why. > > In order to save the role of R^2 as a goodness-of-fit indicator > > R^2 is no goodness-of-fit indicator, neither in models with intercept > nor in models without intercept. So I do not see how you can save its > role as a goodness-of-fit indicator. :) Okay, I surrender. > Since you are posting from a .de domain, I assume you will understand > the following quote from Tutz (2000), "Die Analyse kategorialer Daten", > page 18: > > R^2 misst *nicht* die Anpassungsguete des linearen Modelles, es sagt > nichts darueber aus, ob der lineare Ansatz wahr oder falsch ist, sondern > nur ob durch den linearen Ansatz individuelle Beobachtungen > vorhersagbar sind. R^2 wird wesentlich vom Design, d.h. den Werten, > die x annimmt bestimmt (vgl. Kockelkorn (1998)). Danke schön. > > But I assume that this has probably been discussed at length > > somewhere more appropriate than r-help. > > I am sure about that, but it was also discussed here on r-help (long > ago). The problem is that this compares two models that are not nested > in each other which is a quite controversial thing to do; some might > even go so far as saying that it makes no sense at all. The other > problem with this approaches is illustrated by my example: > > > set.seed(20070807) > > x <- runif(100)*2+10 > > y <- 4+rnorm(x, sd=1) > > 1-var(residuals(lm(y~x+0)))/var(y) > [1] -0.04848273 > > How do you explain that a quantity that is called R-squared, implying > that it is the square of something, hence always non-negative, can > become negative? because the correlation coefficient is either 0.2201879424i or -0.2201879424i ;) Thanks for your time, and yours as well, Steve. You've been very helpful. Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-squared value for linear regression passing through origin using lm()
Berwin A Turlach, Donnerstag, 18. Oktober 2007: > G'day all, > > I must admit that I have not read the previous e-mails in this thread, > but why should that stop me to comment? ;-) Your comments are very welcome. > On Thu, 18 Oct 2007 16:17:38 +0200 > Ralf Goertz <[EMAIL PROTECTED]> wrote: > > > But in that case the numerator is very large, too, isn't it? > > Not necessarily. > > > I don't want to argue, though. > > Good, you might lose the argument. :) Yes, I admit I lost. :-( > > But so far, I have not managed to create a dataset where R^2 is > > larger for the model with forced zero intercept (although I have not > > tried very hard). It would be very convincing to see one (Etienne?) > > Indeed, you haven't tried hard. It is not difficult. Here are my > canonical commands to convince people why regression through the > intercept is evil; the pictures should illustrate what is going on: > [example snipped] Thanks to Thomas Lumley there is another convincing example. But still I've got a problem with it: > x<-c(2,3,4);y<-c(2,3,3) > 1-2*var(residuals(lm(y~x+1)))/sum((y-mean(y))^2) [1] 0.75 That's okay, but neither > 1-3*var(residuals(lm(y~x+0)))/sum((y-0)^2) [1] 0.97076 nor > 1-2*var(residuals(lm(y~x+0)))/sum((y-0)^2) [1] 0.9805066 give the result of summary(lm(y~x+0)), which is 0.9796. > > IIRC, I have not been told so. Perhaps my teachers were not as good > > they should have been. So what is R^2 good if not to indicate the > > goodness of fit?. > > I am wondering about that too sometimes. :) I was always wondering > that R^2 was described to me by my lecturers as the square of the > correlation between the x and the y variate. But on the other hand, > they pretended that x was fixed and selected by the experimenter (or > should be regarded as such). If x is fixed and y is random, then it > does not make sense to me to speak about a correlation between x and y > (at least not on the population level). I see the point. But I was raised with that description, too, and it's hard to drop that idea. > My best guess at the moment is that R^2 was adopted by users of > statistics before it was properly understood; and by the time it was > properly understood, it was too much entrenched to abandon it. Try not > to teach it these days and see what your "client faculties" will tell > you. In order to save the role of R^2 as a goodness-of-fit indicator in zero intercept models one could use the same formula like in models with a constant. I mean, if R^2 is the proportion of variance explained by the model we should use the a priori variance of y[i]. > 1-var(residuals(lm(y~x+0)))/var(y) [1] 0.3567182 But I assume that this has probably been discussed at length somewhere more appropriate than r-help. Thanks, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-squared value for linear regression passing through origin using lm()
S Ellison, Donnerstag, 18. Oktober 2007: > >I think there is reason to be surprised, I am, too. ... > >What am I missing? > > Read the formula and ?summary.lm more closely. The denominator, > > Sum((y[i]- y*)^2) > > is very large if the mean value of y is substantially nonzero and y* > set to 0 as the calculation implies for a forced zero intercept. But in that case the numerator is very large, too, isn't it? I don't want to argue, though. You might very well be right. But so far, I have not managed to create a dataset where R^2 is larger for the model with forced zero intercept (although I have not tried very hard). It would be very convincing to see one (Etienne?) > In effect, the calculation provides the fraction of sum of squared > deviations from the mean for the case with intercept, but the fraction > of sum of squared y ('about' zero) for the non-intercept case. I understand the mathematics behind it. But as I said, I thought the growth of the denominator is more than fully balanced by the growth of the numerator. > This is surprising if you automatically assume that better R^2 means > better fit. I guess that explains why statisticians tell you not to use > R^2 as a goodness-of-fit indicator. IIRC, I have not been told so. Perhaps my teachers were not as good they should have been. So what is R^2 good if not to indicate the goodness of fit?. Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-squared value for linear regression passing through origin using lm()
Achim Zeileis, Donnerstag, 18. Oktober 2007: > On Thu, 18 Oct 2007, Toffin Etienne wrote: > > > Hi, > > A have small technical question about the calculation of R-squared > > using lm(). > > In a study case with experimental values, it seems more logical to > > force the regression line to pass through origin with lm(y ~ x +0). > > However, R-squared values are higher in this case than when I > > compute the linear regression with lm(y ~ x). > > It seems to be surprising to me: is this result normal ? Is there any > > problem in the R-squared value calculated in this case ? > > Have you considered reading the documentation? ?summary.lm has > > r.squared: R^2, the 'fraction of variance explained by the model', > > R^2 = 1 - Sum(R[i]^2) / Sum((y[i]- y*)^2), > > where y* is the mean of y[i] if there is an intercept and > zero otherwise. I think there is reason to be surprised, I am, too. The fraction of variance explained should never be smaller when there are two values to fit the data to. Of course, if mean(y)=0 anyway there should be no difference in R^2 (except that the error df of the two models differ). What am I missing? Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] library(car): Anova and repeated measures wit hout between subjects factors
John Fox, Dienstag, 16. Oktober 2007: > Dear Ralf, > > Unfortunately, Anova.mlm(), and indeed Anova() more generally, won't > handle a model with only a constant. As you point out, this isn't > reasonable for repeated-measures ANOVA, where it should be possible to > have only within-subjects factors. When I have a chance, I'll see what > I can do to fix the problem -- my guess is that it shouldn't be too > hard. > > Thanks for pointing out this limitation in Anova.mlm() Dear John, I am looking forward to your having a chance. There is one thing that I would like to request, though. Greenhouse-Geisser and Huyn-Feldt eps corrections have already been implemented but how about Mauchly's sphericity test? I know this can be done with mauchly.test() but it would be nice to have it in the summary of Anova(). However, there is one more thing. Look at the following data > c1<-c(-6.0,-10.3,-2.9,-8.3,-10.0,5.3,-7.7,-0.8,9.1,-6.2) > mat<-matrix(c(c1,c1),10,2) > mat [,1] [,2] [1,] -6.0 -6.0 [2,] -10.3 -10.3 [3,] -2.9 -2.9 [4,] -8.3 -8.3 [5,] -10.0 -10.0 [6,] 5.3 5.3 [7,] -7.7 -7.7 [8,] -0.8 -0.8 [9,] 9.1 9.1 [10,] -6.2 -6.2 > bf<-ordered(rep(1:2,5)) > bf [1] 1 2 1 2 1 2 1 2 1 2 Levels: 1 < 2 Since the two columns of mat are equal: > t.test(mat[,1],mat[,2],paired=T) Paired t-test data: mat[, 1] and mat[, 2] t = NaN, df = 9, p-value = NA alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: NaN NaN sample estimates: mean of the differences 0 I would assume to either get a warning or a F-value of 0 for the repeated factor zeit but actually: > Anova(lm(mat~bf),idata=data.frame(zeit=ordered(1:2)),idesign=~zeit) Type II Repeated Measures MANOVA Tests: Pillai test statistic Df test stat approx F num Df den Df Pr(>F) bf 10.0020 0.0163 1 8 0.9016 zeit 10.2924 3.3059 1 8 0.1065 bf:zeit 10.0028 0.0221 1 8 0.8854 whereas > anova.mlm(lm(mat~bf),X=~1,idata=data.frame(zeit=ordered(1:2))) Error in anova.mlm(...) : residuals have rank 1 < 2 This is quite dangerous. In a real data situation I accidentally used the same column twice and I got a significant effect for the factor zeit! I hope it wouldn't be too hard to fix this. too. Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] library(car): Anova and repeated measures without between subjects factors
Hi, sorry if this is explained somewhere but I didn't find anything. How can I use "Anova" from the car package to test a modell without between subject's factors? Suppose I have the following data mat.1 mat.2 mat.3 di ex 1 858588 1 1 2 909293 1 1 3 979794 1 1 4 808283 1 1 5 919291 1 1 6 838384 2 1 7 878890 2 1 8 929495 2 1 9 979996 2 1 10 10097 100 2 1 11868684 1 2 1293 103 104 1 2 13909293 1 2 149596 100 1 2 15899695 1 2 16848689 2 2 17 103 10990 2 2 189296 101 2 2 199798 100 2 2 20 102 104 103 2 2 219398 110 1 3 2298 104 112 1 3 2398 10599 1 3 2487 132 120 1 3 2594 110 116 1 3 2695 126 143 2 3 27 100 126 140 2 3 28 103 124 140 2 3 2994 135 130 2 3 3099 111 150 2 3 Using > Anova(lm(mat~di*ex,data=data),idata=data.frame(zeit=ordered(1:3)),idesign=~zeit) Type II Repeated Measures MANOVA Tests: Pillai test statistic Df test stat approx F num Df den DfPr(>F) di 1 0.377 14.524 1 24 0.0008483 *** ex 2 0.800 47.915 2 24 4.166e-09 *** di:ex 2 0.2814.695 2 24 0.0190230 * zeit1 0.782 41.209 2 23 2.491e-08 *** di:zeit 1 0.2523.865 2 23 0.0357258 * ex:zeit 2 0.8368.611 4 48 2.538e-05 *** di:ex:zeit 2 0.5184.189 4 48 0.0054586 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 works as expected. But every once in a while I have a model without between subject's factors. So I thought of > Anova(lm(mat~1,data=data),idata=data.frame(zeit=factor(1:3)),idesign=~zeit) Fehler in L %*% B : nicht passende Argumente (Error in L %*% B : non matching arguments) On the other hand using anova.mlm I get > anova.mlm(lm(mat~1,data),idata=data.frame(zeit=factor(1:3)),X=~1,test="Spherical") Analysis of Variance Table Contrasts orthogonal to ~1 Greenhouse-Geisser epsilon: 0.7464 Huynh-Feldt epsilon:0. Df F num Df den Df Pr(>F) G-G Pr H-F Pr (Intercept) 1 11.767 2 58 5.1375e-05 3.1183e-04 2.4939e-04 Residuals 29 How can achieve this with Anova? Thanks in advance, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.