Re: [R] glm with family binomial and link identity

2022-04-28 Thread Ralf Goertz
> Am Wed, 27 Apr 2022 10:27:33 +0200 schrieb Ralf Goertz
> 
> > Hi,
> >
> > I just noticed that (with my version 4.2.0) it is no longer possible
> > to use glm with family=binomial(link=identity). Why is that? It was
> > possible with 4.0.x as a colleague of mine just confirmed. After all
> > it is useful to compute risk differences with factors.  
> 
> Sorry about the noise. It still works with "identity" but it has to be
> quoted (whereas "log" and "logit" also work unquoted).

On the other hand why do I get this following error at all?

Error in binomial(link = identity) : 
  link "identity" not available for binomial family; available links are 
‘logit’, ‘probit’, ‘cloglog’, ‘cauchit’, ‘log’

This is a bit misleading since identity is a legitimate link function
and works perfectly when quoted. The results of that call are in
agreement with what you get with the risk difference approach (for
convenience computed using the meta package)

> d=data.frame(y=c(0, 1, 0, 1), trt=c(0, 0, 1, 1), w=c(97, 3, 90, 10))
> d
  y trt  w
1 0   0 97
2 1   0  3
3 0   1 90
4 1   1 10

> res=glm(y~trt,data=d,weights=w,family=binomial(link="identity"))
> summary(res)

Call: glm(formula = y ~ trt, family = binomial(link = "identity"), data
= dd, weights = w)

Deviance Residuals: 
 1   2   3   4  
-2.431   4.587  -4.355   6.786  

Coefficients:
Estimate Std. Error z value Pr(>|z|)  
(Intercept)  0.030000.01706   1.759   0.0786 .
trt  0.070000.03451   2.028   0.0425 *
…

> confint.default(res)[2,]
  2.5 %  97.5 % 
0.002359942 0.137640058 

This is exactly the same as 

> meta::metabin(10,100,3,100,sm="RD")
Number of observations: o = 200
Number of events: e = 13

 RD   95%-CIz p-value
 0.0700 [0.0024; 0.1376] 2.03  0.0425

So why the error?

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm with family binomial and link identity

2022-04-27 Thread Ralf Goertz
Am Wed, 27 Apr 2022 10:27:33 +0200
schrieb Ralf Goertz :

> Hi,
>
> I just noticed that (with my version 4.2.0) it is no longer possible
> to use glm with family=binomial(link=identity). Why is that? It was
> possible with 4.0.x as a colleague of mine just confirmed. After all
> it is useful to compute risk differences with factors.

Sorry about the noise. It still works with "identity" but it has to be
quoted (whereas "log" and "logit" also work unquoted).

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] glm with family binomial and link identity

2022-04-27 Thread Ralf Goertz
Hi,

I just noticed that (with my version 4.2.0) it is no longer possible to
use glm with family=binomial(link=identity). Why is that? It was
possible with 4.0.x as a colleague of mine just confirmed. After all it
is useful to compute risk differences with factors.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RMariaDB returns a query without microseconds

2021-07-27 Thread Ralf Goertz
Am Mon, 26 Jul 2021 16:07:24 + (UTC)
schrieb Baki UNAL via R-help :

> Hi
> 
> I can query a table from a mysql database with RMariaDB. One of the
> table's column indicates "trade_time" and contains values such as
> "09:55:02.113000". When I query this table I can not get fractional
> seconds. I get a value such as "09:55:02". Also I get a variable
> class such as "hms" and "difftime" for this column. Not character or
> POSIX* format. I tried both "datetime" and "varchar(25)" as column
> type of "trade_time" in mysql. How can I solve this problem?

Did you tell mariadb to include microsecond? I you just do

> create table dt (d TIME);
> insert into dt values("09:55:02.113000");
> select * from dt;
+--+
| d|
+--+
| 09:55:02 |
+--+
1 row in set (0.000 sec)

the fractional part is gone. But if you instead say

> create table dt (d DATETIME(6));

you get

> select * from dt;
++
| d  |
++
| 2020-07-23 09:55:02.113000 |
++
1 row in set (0.001 sec)

And I also see this in R:
> dbGetQuery(con,"select * from dt")
   d
1 2020-07-23 09:55:02.113000

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How can certain variables be automatically excluded from being saved?

2021-04-01 Thread Ralf Goertz
Am Thu, 1 Apr 2021 08:39:45 -0400
schrieb Duncan Murdoch :

> On 01/04/2021 4:05 a.m., Ralf Goertz wrote:
> > Hi,
> >
> > after having read here about the "seed problem" I wonder if there
> > is a way to automatically exclude certain variables from being
> > saved when the workspace image is saved at the end of an
> > interactive session. I have been using .First() and .Last() for
> > ages but apparently they are of no help as .First() gets called
> > before loading the workspace and .Last() after it has been saved.
> > At least the line
> >
> > if (".Random.seed" %in% ls(all.names=T)) rm(.Random.seed, pos=1)
> >
> > in either of those functions doesn't have the desired effect.
>
> Jim suggested a way to do that, but I don't think it's really a good
> idea:  it just fixes one aspect of the problem, it doesn't solve the
> whole thing.
>
> The real problem is saving the workspace occasionally, but always
> loading it.  The "always loading" part is automatic, so I think the
> real solution should address the "occasionally saving" part.

Yes, that is exactly what I quite often do. I have to work in different
projects and I usually start R in the project directory. When I do
serious stuff I save afterwards. But quite often I want to check
something quickly and then I don't want to clutter up my workspace.

> If you always save the workspace, things are fine.  You'll save the
> seed at the end of one session, and load it at the beginning of the
> next.
>
> If you never save the workspace, things are also fine.  You'll always
> generate a new seed in each session that needs one.
>
> Personally, I believe in the "never save it" workflow.  I think it
> has lots of benefits besides the random seed issue:  you won't get a
> more-and-more cluttered workspace over time, you end up with more
> reproducible results, etc.  However, I can understand that some
> people use a different workflow, so "always save it" is sometimes a
> reasonable choice.
>
> So the real problem is the "sometimes save it" workflow, which is
> **encouraged** by the default q(save = "default") option, which asks
> when interactive.  Changing the default to act like q(save = "no")
> would be my preference (and that's how I configure things), but
> changing it to act like q(save = "yes") would be an improvement over
> the current choice.

I would prefer to be able to only save the history since that is where
the work is done. Usually, my data is easily restored using commands
from the history. I could probably accomplish that by linking .RData to
/dev/null or making it an empty readonly file. However, I would have to
do that in every directory I happen to use R in.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How can certain variables be automatically excluded from being saved?

2021-04-01 Thread Ralf Goertz
Hi,

after having read here about the "seed problem" I wonder if there is a
way to automatically exclude certain variables from being saved when the
workspace image is saved at the end of an interactive session. I have
been using .First() and .Last() for ages but apparently they are of no
help as .First() gets called before loading the workspace and .Last()
after it has been saved. At least the line

if (".Random.seed" %in% ls(all.names=T)) rm(.Random.seed, pos=1)

in either of those functions doesn't have the desired effect.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] random numbers with constraints

2021-01-27 Thread Ralf Goertz
Am Wed, 27 Jan 2021 09:03:15 +0100
schrieb Denis Francisci :

> Hi,
> I would like to generate random numbers in R with some constraints:
> - my vector of numbers must contain 410 values;
> - min value must be 9.6 and max value must be 11.6;
> - sum of vector's values must be 4200.
> Is there a way to do this in R?
> And is it possible to generate this series in such a way that it
> follows a specific distribution form (for example exponential)?
> Thank you in advance,

In principle it should be possible. But I guess you are asking too much
with three given values considering that you only have one paramter for
the exponential distribution. For instance, if you only had given min
and max, and wanted a normal distribution then you could have just taken
410 random values from a standard normal: x=rnorm(410) then centered it:
x=x-mean(x) then scaled it so its span equals the one for your given max
(M) and min (m) values: x=x*(M-m)/(max(x)-min(x)) and finally shift it
such that the mininum becomes m: x=x-min(x)+m. Note however, that the
things you are allowed to do with your vector of random numbers depend
on the distribution if you want the result to still follow that type of
distribution.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] loading edited functions already in saved workspace automatically

2017-05-09 Thread Ralf Goertz
Am Tue, 09 May 2017 10:00:17 -0700
schrieb Jeff Newmiller :

> This boils down to the fact that some "my ways" are more effective in
> the long run than others.. but I really want to address the complaint
> 
> "... sometimes tedious to rebuild my environment by reexecuting
> commands in the history"
> 
> by asserting that letting R re-run a script that loads my functions
> and packages (though perhaps not the data analysis steps) is always
> very fast and convenient to do explicitly. I almost never use the
> history file feature, because I type nearly every R instruction I use
> into a script file and execute/edit it until it does what I want. I
> keep functions in a separate file or package, and steps dealing with
> a particular data set in their own file that uses source() to load
> the functions) even when I am executing the lines interactively. My
> goal is to regularly re-execute the whole script so that
> tomorrow/next year/whenever someone notices something was wrong then
> I can re-execute the sequence without following the dead ends I went
> down the first time (as using a history file does) and I don't have a
> separate clean-up-the-history-file step to go through to create it.
> When I have confirmed that the script still works as it did before
> then I can find where the analysis/data problem went wrong and fix it. 

My usual work with R is probably a bit different from yours. As I said
before I work on many projects (often simultaneously) but I do routine
work. For that I have my super function, the one I want to reload every
time R starts, at the moment about 250 lines of code. This is always
work in progress. In almost every project there is something that makes
me edit this function. But in order to apply my function I need to
prepare the data, e.g. getting them from a database or csv files,
renaming the columns of data.frames etc. This is all tedious and not
worth putting in scripts because these steps are very specific to the
project and are rarely needed more than once. Sometimes one or two data
records in project on which I worked a few days before turn out to be
wrong and need to be changed. That's why I want to keep the data because
changing the data.frame directly is much easier then starting from
scratch. Meanwhile my function has evolved. But in the .RData file is
still the old version, which is bad.

However, I found a solution! .Last() gets executed before saving here,
too. I simply had forgotten that I need to use rm() with pos=1, i.e.
rm(myfun,pos=1) because otherwise rm wants to delete myfun from within
the context of the function .Last() where it doesn't live. I changed my
.Rprofile to:

.First=function(){
assign("myfun",eval(parse(file=("~/R/myfun.R"))),pos=1)
}
.Last=function(){
rm(.First,pos=1)
rm(myfun,pos=1)
rm(.Last,pos=1)
}

and everything works as I want it. So no design flaw but still way too
complicated in my opinion. Thanks to everybody who came up with
suggestions.

Ralf

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] loading edited functions already in saved workspace automatically

2017-05-09 Thread Ralf Goertz
Am Sat, 6 May 2017 11:17:42 -0400
schrieb Michael Friendly :

> On 5/5/2017 10:23 AM, Ralf Goertz wrote:
> > Am Fri, 05 May 2017 07:14:36 -0700
> > schrieb Jeff Newmiller :
> >  
> >> R normally prompts you to save .RData, but it just automatically
> >> saves .Rhistory... the two are unrelated.  
> >
> > Not here. If I say "n" to the prompted question "Save workspace
> > image? [y/n/c]: " my history doesn't get saved.
> >
> > Version:
> >
> > R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
> > Copyright (C) 2016 The R Foundation for Statistical Computing
> > Platform: x86_64-suse-linux-gnu (64-bit)
> >  
> 
> On Windoze, here's what I use in my .Rprofile, which runs every time
> I start an RGUI coonsole.  The key is .First & .Last to load/save
> history automagically.

Hi Michael,

thanks. This helps with saving the history without saving the data. But
actually I'd really like to save both and still be able to load
functions automatically from .Rprofile. Not saving the data as Jeff
suggested is not a good option because it is sometimes tedious to
rebuild my environment by reexecuting commands in the history. And I
explained in my OP why I can't use .First() to achieve my goal.

But let me try again to explain the problem because I think not
everybody understood what I was trying to say. For simplicity I use the
plain variable "a" instead of a function. Start a fresh session and
remove all variables, define one variable and quit with saving:

> rm(list=ls())
> a=17
> quit(save="yes")

Now, before opening a new session edit .Rprofile such that it contains
just the two lines:

print("Hello from .Rprofile")
a=42

Start a new session where your saved environment will be loaded.
Observe that you see the line 

[1] "Hello from .Rprofile"

proving that the commands in .Rprofile have been executed. Now look at
"a":

> a
[1] 17


You would expect to see this because *after* your "Hello" line you find

[Previously saved workspace restored]

So you have set "a" to 42 in .Rprofile but it gets overwritten from the
previously saved and now restored workspace. On the other hand, .First()
gets executed after the restoring of the workspace. Therefore, I could
edit .Rprofile to read

.First=function(){ assign("a",42,pos=1) }

Now, after starting I see that "a" is indeed 42. But then it turns out
that from now on I need "a" to be 11. After editing .Rprofile
accordingly, I am quite hopeful but after starting a new session I see
that "a" is still 42. Why is that? Because .First() was saved and when I
started a new session it got a new function body (setting "a" to 11) but
before it could be executed it was again overwritten by the old value
(setting "a" to 42) and I am chasing my own tail. Sigh.

.Last() doesn't help. Apparently (at least on my linux system) it is
executed *after* saving the environment so too late to remove anything
you don't want saved. In that regard linux doesn't seem to be typical,
since in "?.Last" the reverse order is described as typical:

 Exactly what happens at termination of an R session depends on the
 platform and GUI interface in use.  A typical sequence is to run
 ‘.Last()’ and ‘.Last.sys()’ (unless ‘runLast’ is false), to save
 the workspace if requested (and in most cases also to save the
 session history: see ‘savehistory’), then run any finalizers (see
 ‘reg.finalizer’) that have been set to be run on exit, close all
 open graphics devices, remove the session temporary directory and
 print any remaining warnings (e.g., from ‘.Last()’ and device
 closure).


IMHO this is a design flaw.

Ralf

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] loading edited functions already in saved workspace automatically

2017-05-05 Thread Ralf Goertz
Am Fri, 05 May 2017 07:14:36 -0700
schrieb Jeff Newmiller :

> R normally prompts you to save .RData, but it just automatically
> saves .Rhistory... the two are unrelated. 

Not here. If I say "n" to the prompted question "Save workspace image?
[y/n/c]: " my history doesn't get saved.

Version:

R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-suse-linux-gnu (64-bit)

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] loading edited functions already in saved workspace automatically

2017-05-05 Thread Ralf Goertz
Am Fri, 05 May 2017 06:30:01 -0700
schrieb Jeff Newmiller :

> The answer most people seem to use is to avoid depending on functions
> in RData files, and in particular avoiding ever saving the
> "automatic" ".RData" files at all. (Some people avoid using any RData
> files, but the automatic loading of functions by ".RData" files is a
> particularly pernicious source of evil as you have already
> discovered.)
> 
> That is,  always work toward building scripts that you run to restore
> your workspace rather than depending on save files. Don't depend on
> save files to keep track of what you do interactively. This also
> usually means that there should be little if anything in
> your .Rprofile because that tends to build non-reproducibility into
> your scripts.

Hi Jeff,

thanks for your answer. Actually, I don't use the workspace saving
feature primarily for the data but for the command line history. Is
there a way to just save .Rhistory?

Ralf

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] loading edited functions already in saved workspace automatically

2017-05-05 Thread Ralf Goertz
Hi,

In short: Is it possible to have the previously saved workspace restored
and nevertheless load a function already existing in this workspace via
.Rprofile anyway?

In detail: I use different directories for different projects. In all
those projects I use a function which I therefore try to get into the
session by `myfunc=eval(parse(file=("~/R/myfunc.R")))' in ~/.Rprofile.
Once I leave the session thereby saving the workspace this function gets
saved in ./.RData as well. In a subsequent session in that directory it
gets loaded back. However, in the meantime I might have edited
~/R/myfunc.R. I don't seem to be able to automatically load the new
function into the session. The workspace gets loaded *after* the
execution of ~/.Rprofile. So the new definition of myfunc() gets
overwritten by the old one. I can't use .First() – which is executed
after loading the workspace – because this would load myfunc() into the
environment of .First() instead of the global environment. I could use
.Last() to remove the function before saving the workspace. But then
.Last() gets saved to the workspace which is also not convenient since
when I add another function the same way and edit the definition of
.Last() in ~/.Rprofile to also remove that function this does not work
because I don't get the new .Last() into the session automatically. And
no, removing .Last() from within .Last() doesn't work.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] readline issue with 3.3.1

2016-07-25 Thread Ralf Goertz
Am Mon, 25 Jul 2016 08:57:17 +0200
schrieb Martin Maechler :

>> Ralf Goertz  on Fri, 22 Jul 2016 10:15:36 +0200
>> writes:  
> 
>> It would be great if – while fixing this – you also took care of the
>> SIGWINCH problem described in bug report
>> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16604  

> Well, that *has* been fixed in 'R-devel' and 'R 3.3.1 patched' ... but
> again only for readline >= 6.3

[snip]

> One reason we have not closed the bug is that the fix is only for
> readline >= 6.3, ... and also that I don't think we got much of user
> confirmation of the form " yes, the bug goes away once I compile
> R-devel or R-patched "

Thanks for the explanation. I will provide confirmation as soon as
R-patched hits my repositories. ;-)

Ralf

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] readline issue with 3.3.1

2016-07-22 Thread Ralf Goertz
Am Thu, 21 Jul 2016 18:07:43 +0200
schrieb Martin Maechler :

> Ralf Goertz  on Wed, 20 Jul 2016 16:37:53 +0200
> writes:
 
>> I installed readline version 6.3 and the issue is gone. So probably
>> some of the recent changes in R's readline code are incompatible with
>> version readline version 6.2.
> 
> Yes, it seems so, unfortunately.
> 
> Thank you for reporting !

It would be great if – while fixing this – you also took care of the
SIGWINCH problem described in bug report
https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16604 

Thanks, Ralf

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] readline issue with 3.3.1

2016-07-20 Thread Ralf Goertz
Am Wed, 20 Jul 2016 11:35:31 +0200
schrieb Ralf Goertz :

> Hi,
> 
> after a recent update to version 3.3.1 on Opensuse Leap I have
> problems with command lines longer than the terminal width. E.g. when
> I do this

I installed readline version 6.3 and the issue is gone. So probably some
of the recent changes in R's readline code are incompatible with version
readline version 6.2.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] readline issue with 3.3.1

2016-07-20 Thread Ralf Goertz
Hi,

after a recent update to version 3.3.1 on Opensuse Leap I have problems
with command lines longer than the terminal width. E.g. when I do this

> print("This is a very long line which is in fact 
so long that it gets wrapped while writing it")

and then hit enter I end up with:

> print("This is a very long line which is in fact 
[1] "This is a very long line which is in fact so l
ong that it gets wrapped while writing it"


So the output overwrites the second line of input. This does not happen
when I start R without readline support using "R --no-readline". That's
why I thought it could be a readline problem. But my current readline6
version (6.2) was installed way before the update of R and I had no
problems with the previous R version. Furthermore no other program using
readline seems to have that problem. E.g. in bash:

me@host:~/some/dir> echo This is a very long line which is in fact 
so long that it gets wrapped while writing it
This is a very long line which is in fact so long that it gets wrapped whi
le writing it

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] cygwin utf-8 problem with "help"

2012-08-24 Thread Ralf Goertz
Hi,

In R under cygwin I have trouble to correctly display help in text
format. One problem arises when there are single qoutes in the text like
in "?help" which looks like this

helppackage:utils R Documentation

Documentation

Description:


Usage:

 help(topic, package = NULL, lib.loc = NULL,
  verbose = getOption("verbose"),
  try.all.packages = getOption("help.try.all.packages"),
  help_type = getOption("help_type"))
 
Arguments:











Details:

 The following types of help are available:

Plain text help


...

The whole paragraph containing typographic single quotes is not displayed. I
can work around that a bit by issuing

tools::Rd2txt_options(code_quote=FALSE)

but there are still those single quotes like in 

"See `Details´ for what happens if this is omitted."


Furthermore, lines with bullets don't get indented and no bullets are
displayed (the "Plain text help" line above). I am using a UTF-8 locale.
When setting LANG to "C" or a latin1 locale, help is displayed
correctly, but then there are all sorts of problems with non ASCII
characters.

What can I do?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] using R's svd from outside R

2010-09-02 Thread Ralf Goertz
Hi,

I have to compute the singular value decomposition of rather large
matrices. My test matrix is 10558 by 4255 and it takes about three
minutes in R to decompose on a 64bit quadruple core linux machine. (R is
running svd in parallel, all four cores are at their maximum load while
doing this.) I tried several blas and lapack libraries as well as the
gnu scientific library in my C++ programm. Apart from being unable to
have them do svd in parallel mode (although I thought I did everything
to make them do it in parallel) execution time always exceeds 25 minutes
which is still way more than the expected 12 minutes for the
non-parallel R code.

I am now going to call R from within my program, but this not very
elegant. So my questions are: Does R use a special svd-routine and is it
possible to use it directly by linking in the relevant libraries? (Sorry
but I couldn't figure that out by looking at the source code.) If that
is possible, can I have the code run in parallel mode?

Thanks,

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] deleting variables

2008-04-29 Thread Ralf Goertz
How can I automatically exclude one variable from being saved in the
workspace when I quit an R session? The problem is I don't know how to
erase a variable once it has been created.

Background: I open a connection called "con" to a database server in my
~/.Rprofile. Obviously, the connection expires when quitting the R
session. Unfortunately, the workspace is loaded after ~/.Rprofile is
run. So "con" get overwritten by the old workspace. I thought of using
.First() or .Last() but as these are functions I don't know how to
modify global variables.

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] completion doesn't work anymore

2008-04-28 Thread Ralf Goertz
Peter Dalgaard, Montag, 28. April 2008:
> Peter Dalgaard wrote:

> The 10.2 version that I just installed does seem to work though:
> viggo:~/>R
> 
> R version 2.7.0 (2008-04-22)
> Copyright (C) 2008 The R Foundation for Statistical Computing
> ISBN 3-900051-07-0
> ...
> > air
> airmilesairquality
> > airquality$
> airquality$Day  airquality$Ozoneairquality$Temp
> airquality$Monthairquality$Solar.R  airquality$Wind

Confirmed. It works on an openSUSE 10.2 x64 machine but not on 10.3
i386. It failed on to computers with that system/version combination,
one of them never had R installed before. I can't say whether it is
version related or system related. What machine did you use for your
test, x64 or i386?

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] completion doesn't work anymore

2008-04-28 Thread Ralf Goertz
Hi,

after updating to 2.7.0, command line completion doesn't work anymore. I
understand that the package rcompgen is now part of utils. I hadn't used
rcompgen before but completion worked without it (I double checked it
using an 2.6.2 version on another machine). Now, it doesn't work, even
after switching on all option via "rc.settings". I updated R using the
opensuse 10.3 repository. 

For instance I have a data frame d. When I enter

d$

I see a list of files in the current directory instead of the names in
the data frame.

What can I do to turn it back on?

[EMAIL PROTECTED]:~> R --version
R version 2.7.0 (2008-04-22)
Copyright (C) 2008 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under the terms of the
GNU General Public License version 2.
For more information about these matters see
http://www.gnu.org/licenses/.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Ransac implementation

2008-01-09 Thread Ralf Goertz
Hi,

is there an R-implementation of the RANSAC-algorithm?

Thanks,

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R-squared value for linear regression passing through origin using lm()

2007-10-19 Thread Ralf Goertz
Berwin A Turlach, Freitag, 19. Oktober 2007:
> G'day Ralf,

Hi Berwin,

 
> On Fri, 19 Oct 2007 09:51:37 +0200 Ralf Goertz <[EMAIL PROTECTED]>
> wrote:
> 
> Why should either of those formula yield the output of
> summary(lm(y~x+0)) ?  The R-squared output of that command is
> documented in help(summary.lm):
> 
> r.squared: R^2, the 'fraction of variance explained by the model',
> 
>   R^2 = 1 - Sum(R[i]^2) / Sum((y[i]- y*)^2),

yes I know. But you know why I chose those formulas, right?

>   where y* is the mean of y[i] if there is an intercept and
>   zero otherwise.
> 
> And, indeed:
> 
> > 1-sum(residuals(lm(y~x+0))^2)/sum((y-0)^2)
> [1] 0.9796238
> 
> confirms this.
> 
> Note: if you do not have an intercept in your model, the residuals do
> not have to add to zero; and, typically, they will not.  Hence,
> var(residuals(lm(y~x+0)) does not give you the residual sum of squares.

Yes I am right, you know why.
 
> > In order to save the role of R^2 as a goodness-of-fit indicator 
> 
> R^2 is no goodness-of-fit indicator, neither in models with intercept
> nor in models without intercept.  So I do not see how you can save its
> role as a goodness-of-fit indicator. :)

Okay, I surrender.

> Since you are posting from a .de domain, I assume you will understand
> the following quote from Tutz (2000), "Die Analyse kategorialer Daten",
> page 18:
> 
> R^2 misst *nicht* die Anpassungsguete des linearen Modelles, es sagt
> nichts darueber aus, ob der lineare Ansatz wahr oder falsch ist, sondern
> nur ob durch den linearen Ansatz individuelle Beobachtungen
> vorhersagbar sind.  R^2 wird wesentlich vom Design, d.h. den Werten,
> die x annimmt bestimmt (vgl. Kockelkorn (1998)).  
 
Danke schön.

> > But I assume that this has probably been discussed at length
> > somewhere more appropriate than r-help.
> 
> I am sure about that, but it was also discussed here on r-help (long
> ago).  The problem is that this compares two models that are not nested
> in each other which is a quite controversial thing to do; some might
> even go so far as saying that it makes no sense at all.  The other
> problem with this approaches is illustrated by my example:
> 
> > set.seed(20070807)
> > x <- runif(100)*2+10
> > y <- 4+rnorm(x, sd=1)
> > 1-var(residuals(lm(y~x+0)))/var(y)
> [1] -0.04848273
> 
> How do you explain that a quantity that is called R-squared, implying
> that it is the square of something, hence always non-negative, can
> become negative?
 
because the correlation coefficient is either 0.2201879424i or
-0.2201879424i ;)

Thanks for your time, and yours as well, Steve. You've been very
helpful.

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R-squared value for linear regression passing through origin using lm()

2007-10-19 Thread Ralf Goertz
Berwin A Turlach, Donnerstag, 18. Oktober 2007:
> G'day all,
> 
> I must admit that I have not read the previous e-mails in this thread,
> but why should that stop me to comment? ;-)

Your comments are very welcome.
 
> On Thu, 18 Oct 2007 16:17:38 +0200
> Ralf Goertz <[EMAIL PROTECTED]> wrote:
> 
> > But in that case the numerator is very large, too, isn't it? 
> 
> Not necessarily.
> 
> > I don't want to argue, though.
> 
> Good, you might lose the argument. :)

Yes, I admit I lost. :-(
 
> > But so far, I have not managed to create a dataset where R^2 is
> > larger for the model with forced zero intercept (although I have not
> > tried very hard). It would be very convincing to see one (Etienne?)
> 
> Indeed, you haven't tried hard.  It is not difficult.  Here are my
> canonical commands to convince people why regression through the
> intercept is evil; the pictures should illustrate what is going on:

> [example snipped] 

Thanks to Thomas Lumley there is another convincing example. But still
I've got a problem with it:

> x<-c(2,3,4);y<-c(2,3,3)

> 1-2*var(residuals(lm(y~x+1)))/sum((y-mean(y))^2)

[1] 0.75

That's okay, but neither

> 1-3*var(residuals(lm(y~x+0)))/sum((y-0)^2)
[1] 0.97076

nor

> 1-2*var(residuals(lm(y~x+0)))/sum((y-0)^2)
[1] 0.9805066

give the result of summary(lm(y~x+0)), which is 0.9796. 

> > IIRC, I have not been told so. Perhaps my teachers were not as good
> > they should have been. So what is R^2 good if not to indicate the
> > goodness of fit?.
> 
> I am wondering about that too sometimes. :)   I was always wondering
> that R^2 was described to me by my lecturers as the square of the
> correlation between the x and the y variate.  But on the other hand,
> they pretended that x was fixed and selected by the experimenter (or
> should be regarded as such). If x is fixed and y is random, then it
> does not make sense to me to speak about a correlation between x and y
> (at least not on the population level). 

I see the point. But I was raised with that description, too, and it's
hard to drop that idea. 

> My best guess at the moment is that R^2 was adopted by users of
> statistics before it was properly understood; and by the time it was
> properly understood, it was too much entrenched to abandon it.  Try not
> to teach it these days and see what your "client faculties" will tell
> you.

In order to save the role of R^2 as a goodness-of-fit indicator in zero
intercept models one could use the same formula like in models with a
constant. I mean, if R^2 is the proportion of variance explained by the
model we should use the a priori variance of y[i].

> 1-var(residuals(lm(y~x+0)))/var(y)
[1] 0.3567182

But I assume that this has probably been discussed at length somewhere
more appropriate than r-help.
 
Thanks,

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R-squared value for linear regression passing through origin using lm()

2007-10-18 Thread Ralf Goertz
S Ellison, Donnerstag, 18. Oktober 2007:
> >I think there is reason to be surprised, I am, too. ...
> >What am I missing?
> 
> Read the formula and ?summary.lm more closely. The denominator,
> 
> Sum((y[i]- y*)^2) 
> 
> is very large if the mean value of y is substantially nonzero and y*
> set to 0 as the calculation implies for a forced zero intercept.

But in that case the numerator is very large, too, isn't it? I don't
want to argue, though. You might very well be right. But so far, I have
not managed to create a dataset where R^2 is larger for the model with
forced zero intercept (although I have not tried very hard). It would be
very convincing to see one (Etienne?)

> In effect, the calculation provides the fraction of sum of squared
> deviations from the mean for the case with intercept, but the fraction
> of sum of squared y ('about' zero) for the non-intercept case. 

I understand the mathematics behind it. But as I said, I thought the
growth of the denominator is more than fully balanced by the growth of
the numerator.

> This is surprising if you automatically assume that better R^2 means
> better fit. I guess that explains why statisticians tell you not to use
> R^2 as a goodness-of-fit indicator.

IIRC, I have not been told so. Perhaps my teachers were not as good they
should have been. So what is R^2 good if not to indicate the goodness of
fit?.

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R-squared value for linear regression passing through origin using lm()

2007-10-18 Thread Ralf Goertz
Achim Zeileis, Donnerstag, 18. Oktober 2007:
> On Thu, 18 Oct 2007, Toffin Etienne wrote:
> 
> > Hi,
> > A have small technical question about the calculation of R-squared
> > using lm().
> > In a study case with experimental values, it seems more logical to
> > force the regression line to pass through origin with lm(y ~ x +0).
> > However, R-squared  values are higher in this case than when I
> > compute the linear regression with lm(y ~ x).
> > It seems to be surprising to me: is this result normal ? Is there any
> > problem in the R-squared value calculated in this case ?
> 
> Have you considered reading the documentation? ?summary.lm has
> 
>   r.squared: R^2, the 'fraction of variance explained by the model',
> 
> R^2 = 1 - Sum(R[i]^2) / Sum((y[i]- y*)^2),
> 
> where y* is the mean of y[i] if there is an intercept and
> zero otherwise.

I think there is reason to be surprised, I am, too. The fraction of
variance explained should never be smaller when there are two values to
fit the data to. Of course, if mean(y)=0 anyway there should be no
difference in R^2 (except that the error df of the two models differ). 

What am I missing?

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] library(car): Anova and repeated measures wit hout between subjects factors

2007-10-17 Thread Ralf Goertz
John Fox, Dienstag, 16. Oktober 2007:
> Dear Ralf,
> 
> Unfortunately, Anova.mlm(), and indeed Anova() more generally, won't
> handle a model with only a constant. As you point out, this isn't
> reasonable for repeated-measures ANOVA, where it should be possible to
> have only within-subjects factors. When I have a chance, I'll see what
> I can do to fix the problem -- my guess is that it shouldn't be too
> hard.
>
> Thanks for pointing out this limitation in Anova.mlm()

Dear John,

I am looking forward to your having a chance. There is one thing that I
would like to request, though. Greenhouse-Geisser and Huyn-Feldt eps
corrections have already been implemented but how about Mauchly's
sphericity test? I know this can be done with mauchly.test() but it
would be nice to have it in the summary of Anova().

However, there is one more thing. Look at the following data

> c1<-c(-6.0,-10.3,-2.9,-8.3,-10.0,5.3,-7.7,-0.8,9.1,-6.2)
> mat<-matrix(c(c1,c1),10,2)
> mat
   [,1]  [,2]
 [1,]  -6.0  -6.0
 [2,] -10.3 -10.3
 [3,]  -2.9  -2.9
 [4,]  -8.3  -8.3
 [5,] -10.0 -10.0
 [6,]   5.3   5.3
 [7,]  -7.7  -7.7
 [8,]  -0.8  -0.8
 [9,]   9.1   9.1
[10,]  -6.2  -6.2

> bf<-ordered(rep(1:2,5))
> bf
 [1] 1 2 1 2 1 2 1 2 1 2
Levels: 1 < 2

Since the two columns of mat are equal:

> t.test(mat[,1],mat[,2],paired=T)

Paired t-test

data:  mat[, 1] and mat[, 2] 
t = NaN, df = 9, p-value = NA
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 NaN NaN 
sample estimates:
mean of the differences 
  0 

I would assume to either get a warning or a F-value of 0 for the
repeated factor zeit but actually:

> Anova(lm(mat~bf),idata=data.frame(zeit=ordered(1:2)),idesign=~zeit)

Type II Repeated Measures MANOVA Tests: Pillai test statistic
Df test stat approx F num Df den Df Pr(>F)
bf   10.0020   0.0163  1  8 0.9016
zeit 10.2924   3.3059  1  8 0.1065
bf:zeit  10.0028   0.0221  1  8 0.8854

whereas

> anova.mlm(lm(mat~bf),X=~1,idata=data.frame(zeit=ordered(1:2)))

Error in anova.mlm(...) :
 residuals have rank 1 < 2

This is quite dangerous. In a real data situation I accidentally used
the same column twice and I got a significant effect for the factor
zeit! I hope it wouldn't be too hard to fix this. too.

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] library(car): Anova and repeated measures without between subjects factors

2007-10-16 Thread Ralf Goertz
Hi,

sorry if this is explained somewhere but I didn't find anything.

How can I use "Anova" from the car package to test a modell without
between subject's factors? Suppose I have the following data

   mat.1 mat.2 mat.3 di ex
1 858588  1  1
2 909293  1  1
3 979794  1  1
4 808283  1  1
5 919291  1  1
6 838384  2  1
7 878890  2  1
8 929495  2  1
9 979996  2  1
10   10097   100  2  1
11868684  1  2
1293   103   104  1  2
13909293  1  2
149596   100  1  2
15899695  1  2
16848689  2  2
17   103   10990  2  2
189296   101  2  2
199798   100  2  2
20   102   104   103  2  2
219398   110  1  3
2298   104   112  1  3
2398   10599  1  3
2487   132   120  1  3
2594   110   116  1  3
2695   126   143  2  3
27   100   126   140  2  3
28   103   124   140  2  3 
2994   135   130  2  3
3099   111   150  2  3

Using

> Anova(lm(mat~di*ex,data=data),idata=data.frame(zeit=ordered(1:3)),idesign=~zeit)

Type II Repeated Measures MANOVA Tests: Pillai test statistic
   Df test stat approx F num Df den DfPr(>F)
di  1 0.377   14.524  1 24 0.0008483 ***
ex  2 0.800   47.915  2 24 4.166e-09 ***
di:ex   2 0.2814.695  2 24 0.0190230 *  
zeit1 0.782   41.209  2 23 2.491e-08 ***
di:zeit 1 0.2523.865  2 23 0.0357258 *  
ex:zeit 2 0.8368.611  4 48 2.538e-05 ***
di:ex:zeit  2 0.5184.189  4 48 0.0054586 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

works as expected. But every once in a while I have a model without
between subject's factors. So I thought of

> Anova(lm(mat~1,data=data),idata=data.frame(zeit=factor(1:3)),idesign=~zeit)
Fehler in L %*% B : nicht passende Argumente

(Error in L %*% B : non matching arguments)

On the other hand using anova.mlm I get

> anova.mlm(lm(mat~1,data),idata=data.frame(zeit=factor(1:3)),X=~1,test="Spherical")
Analysis of Variance Table


Contrasts orthogonal to
~1

Greenhouse-Geisser epsilon: 0.7464
Huynh-Feldt epsilon:0.

Df  F num Df den Df Pr(>F) G-G Pr H-F Pr
(Intercept)  1 11.767  2 58 5.1375e-05 3.1183e-04 2.4939e-04
Residuals   29  


How can achieve this with Anova?


Thanks in advance,

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.