from:"John Fox"

Re: [R] Manually calculating values from aov() result

2024-08-07 Thread John Fox


Dear Brian,

As Duncan mentioned, the terms type-I, II, and III sums of squares 
originated in SAS. The type-II and III SSs computed by the Anova() 
function in the car package take a different computational approach than 
in SAS, but in almost all cases produce the same results. (I slightly 
regret using the "type-*" terminology for car::Anova() because of the 
lack of exact correspondence to SAS.) The standard R anova() function 
computes type-I (sequential) SSs.


The focus, however, shouldn't be on the SSs, or how they're computed, 
but on the hypotheses that are tested. Briefly, the hypotheses for 
type-I tests assume that all terms later in the sequence are 0 in the 
population; type-II tests assume that interactions to which main effects 
are marginal (and higher-order interactions to which lower-order 
interactions are marginal) are 0. Type-III tests don't, e.g., assume 
that interactions to which a main effect are marginal are 0 in testing 
the main effect, which represents an average over levels of the 
factor(s) with which the factor in the main effect interact. The 
description of the hypotheses for type-III tests is even more complex if 
there are covariates. In my opinion, researchers are usually interested 
in the hypotheses for type-II tests.


These matters are described in detail, for example, in my applied 
regression text <https://www.john-fox.ca/AppliedRegression/index.html>.


I hope this helps,
 John

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://www.john-fox.ca/
--
On 2024-08-07 8:27 a.m., Brian Smith wrote:

[You don't often get email from briansmith199...@gmail.com. Learn why this is 
important at https://aka.ms/LearnAboutSenderIdentification ]

Caution: External email.


Hi,

Thanks for this information. Is there any way to force R to use Type-1
SS? I think most textbooks use this only.

Thanks and regards,

On Wed, 7 Aug 2024 at 17:00, Duncan Murdoch  wrote:


On 2024-08-07 6:06 a.m., Brian Smith wrote:

Hi,

I have performed ANOVA as below

dat = data.frame(
'A' = c(-0.3960025, -0.3492880, -1.5893792, -1.4579074, -4.9214873,
-0.8575018, -2.5551363, -0.9366557, -1.4307489, -0.3943704),
'B' = c(2,1,2,2,1,2,2,2,2,2),
'C' = c(0,1,1,1,1,1,1,0,1,1))

summary(aov(A ~ B * C, dat))

However now I also tried to calculate SSE for factor C

Mean = sapply(split(dat, dat$C), function(x) mean(x$A))
N = sapply(split(dat, dat$C), function(x) dim(x)[1])

N[1] * (Mean[1] - mean(dat$A))^2 + N[2] * (Mean[2] - mean(dat$A))^2
#1.691

But in ANOVA table the sum-square for C is reported as 0.77.

Could you please help how exactly this C = 0.77 is obtained from aov()


Your design isn't balanced, so there are several ways to calculate the
SS for C.  What you have calculated looks like the "Type I SS" in SAS
notation, if I remember correctly, assuming that C enters the model
before B.  That's not what R uses; I think it is Type II SS.

For some details about this, see
https://mcfromnz.wordpress.com/2011/03/02/anova-type-ii-ss-explained/



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regression performance when using summary() twice

2024-06-21 Thread John Fox


Dear Christian,

You're apparently using the glm.nb() function in the MASS package.

Your function is peculiar in several respects. For example, you specify 
the model formula as a character string and then convert it into a 
formula, but you could just pass the formula to the function -- the 
conversion seems unnecessary. Similarly, you compute the summary for the 
model twice rather than just saving it in a local variable in your 
function. And the form of the function output is a bit strange, but I 
suppose you have reasons for that.


The primary reason that your function is slow, however, is that the 
confidence intervals computed by confint() profile the likelihood, which 
requires refitting the model a number of times. If you're willing to use 
possibly less accurate Wald-based rather than likelihood-based 
confidence intervals, computed, e.g., by the Confint() function in the 
car package, then you could speed up the computation considerably,


Using a model fit by example(glm.nb),

library(MASS)
example(glm.nb)
microbenchmark::microbenchmark(
  Wald = car::Confint(quine.nb1, vcov.=vcov(quine.nb1),
   estimate=FALSE),
  LR = confint(quine.nb1)
)

which produces

Unit: microseconds
 expr   min   lq   meanmedian   uqmax
 Wald   136.366   161.13   222.0872   184.541   283.72386.466
   LR 87223.031 88757.09 95162.8733 95761.568 97672.23 182734.048
 neval
   100
   100


I hope this helps,
 Johm
--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://www.john-fox.ca/
--
On 2024-06-21 10:38 a.m., c.bu...@posteo.jp wrote:
[You don't often get email from c.bu...@posteo.jp. Learn why this is 
important at https://aka.ms/LearnAboutSenderIdentification ]


Caution: External email.


Hello,

I am not a regular R user but coming from Python. But I use R for
several special task.

Doing a regression analysis does cost some compute time. But I wonder
when this big time consuming algorithm is executed and if it is done
twice in my sepcial case.

It seems that calling "glm()" or similar does not execute the time
consuming part of the regression code.
It seems it is done when calling "summary(model)".
Am I right so far?

If this is correct I would say that in my case the regression is down
twice with the identical formula and data. Which of course is
inefficient. See this code:

my_function <- function(formula_string, data) {
     formula <- as.formula(formula_string)
     model <- glm.nb(formula, data = data)

     result = cbind(summary(model)$coefficients, confint(model))
     result = as.data.frame(result)

     string_result = capture.output(summary(model))

     return(list(result, string_result))
     }

I do call summary() once to get the "$coefficents" and a second time
when capturing its output as a string.

If this really result in computing the regression twice I ask myself if
there is a R-way to make this more efficent?

Best regards,
Christian Buhtz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Column names of model.matrix's output with contrast.arg

2024-06-17 Thread John Fox


Dear Christophe and Ben,

Also see the car package for replacements for contr.treatment(), 
contr.sum(), and contr.helmert() -- e.g., help("contr.Sum", package="car").


These functions have been in the car package for more than two decades, 
and AFAIK, no one uses them (including myself). I didn't write a 
replacement for contr.poly() because the current coefficient labeling 
seemed reasonably transparent.


Best,
 John

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://www.john-fox.ca/

--
On 2024-06-17 4:29 p.m., Ben Bolker wrote:

Caution: External email.


   It's sorta-kinda-obliquely-partially documented in the examples:

zapsmall(cP <- contr.poly(3)) # Linear and Quadratic

output:

     .L .Q
[1,] -0.7071068  0.4082483
[2,]  0.000 -0.8164966
[3,]  0.7071068  0.4082483

FWIW the faux package provides better-named alternatives.


On 2024-06-17 4:25 p.m., Christophe Dutang wrote:

Thanks for your reply.

It might good to document the naming convention in ?contrasts. It is 
hard to understand .L for linear, .Q for quadratic, .C for cubic and 
^n for other degrees.


For contr.sum, we could have used .Sum, .Sum…

Maybe the examples ?model.matrix should use names in dd objects so 
that we observe when names are dropped.


Kind regards, Christophe



Le 14 juin 2024 à 11:45, peter dalgaard  a écrit :

You're at the mercy of the various contr.XXX functions. They may or 
may not set the colnames on the matrices that they generate.


The rationales for (not) setting them is not perfectly transparent, 
but you obviously cannot use level names on contr.poly, so it uses 
.L, .Q, etc.


In MASS, contr.sdif is careful about labeling the columns with the 
levels that are being diff'ed.


For contr.treatment, there is a straightforward connection to 0/1 
dummy variables, so level names there are natural.


One could use levels in contr.sum and contr.helmert, but it might 
confuse users that comparisons are with the average of all levels or 
preceding levels. (It can be quite confusing when coding is +1 for 
male and -1 for female, so that the gender difference is twice the 
coefficient.)


-pd


On 14 Jun 2024, at 08:12 , Christophe Dutang  wrote:

Dear list,

Changing the default contrasts used in glm() makes me aware how 
model.matrix() set column names.


With default contrasts, model.matrix() use the level values to name 
the columns. However with other contrasts, model.matrix() use the 
level indexes. In the documentation, I don’t see anything in the 
documentation related to this ? It does not seem natural to have 
such a behavior?


Any comment is welcome.

An example is below.

Kind regards, Christophe


#example from ?glm
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- paste0("O", gl(3,1,9))
treatment <- paste0("T", gl(3,3))

X3 <- model.matrix(counts ~ outcome + treatment)
X4 <- model.matrix(counts ~ outcome + treatment, contrasts = 
list("outcome"="contr.sum"))
X5 <- model.matrix(counts ~ outcome + treatment, contrasts = 
list("outcome"="contr.helmert"))


#check with original factor
cbind.data.frame(X3, outcome)
cbind.data.frame(X4, outcome)
cbind.data.frame(X5, outcome)

#same issue with glm
glm.D93 <- glm(counts ~ outcome + treatment, family = poisson())
glm.D94 <- glm(counts ~ outcome + treatment, family = poisson(), 
contrasts = list("outcome"="contr.sum"))
glm.D95 <- glm(counts ~ outcome + treatment, family = poisson(), 
contrasts = list("outcome"="contr.helmert"))


coef(glm.D93)
coef(glm.D94)
coef(glm.D95)

#check linear predictor
cbind(X3 %*% coef(glm.D93), predict(glm.D93))
cbind(X4 %*% coef(glm.D94), predict(glm.D94))

-
Christophe DUTANG
LJK, Ensimag, Grenoble INP, UGA, France
ILB research fellow
Web: http://dutangc.free.fr

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


--
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
(Acting) Graduate chair, Mathe

Re: [R] Listing folders on One Drive

2024-05-20 Thread John Fox


Dear Nick,

See list.dirs(), which is documented in the same help file as list.files().

I hope this helps,
 John

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://www.john-fox.ca/
--
On 2024-05-20 9:36 a.m., Nick Wray wrote:

[You don't often get email from nickmw...@gmail.com. Learn why this is 
important at https://aka.ms/LearnAboutSenderIdentification ]

Caution: External email.


Hello I have lots of folders of individual Scottish river catchments on my
uni One Drive.  Each folder is labelled with the river name eg "Tay" and
they are all in a folder named "Scotland"
I want to list the folders on One Drive so that I can cross check that I
have them all against a list of folders on my laptop.
Can I somehow use list.files() - I've tried various things but none seem to
work...
Any help appreciated
Thanks Nick Wray

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] x[0]: Can '0' be made an allowed index in R?

2024-04-23 Thread John Fox

Hello Peter,

Unless I too misunderstand your point, negative indices for removal do 
work with the Oarray package (though -0 doesn't work to remove the 0th 
element, since -0 == 0 -- perhaps what you meant):

> library(Oarray)

> v <- Oarray(1:10, offset=0)

> v
[0,] [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] [9,]
   123456789   10

> dim(v)
[1] 10

> v[-1]
[1]  1  3  4  5  6  7  8  9 10

> v[-0]
[1] 1

Best,
 John

On 2024-04-23 9:03 a.m., Peter Dalgaard via R-help wrote:

Caution: External email.

Doesn't sound like you got the point. x[-1] normally removes the first element. 
With 0-based indices, this cannot work.

- pd

On 22 Apr 2024, at 17:31 , Ebert,Timothy Aaron  wrote:

You could have negative indices. There are two ways to do this.
1) provide a large offset.
Offset <- 30
for (i in -29 to 120) { print(df[i+Offset])}

2) use absolute values if all indices are negative.
for (i in -200 to -1) {print(df[abs(i)])}

Tim

-Original Message-
From: R-help  On Behalf Of Peter Dalgaard via 
R-help
Sent: Monday, April 22, 2024 10:36 AM
To: Rolf Turner 
Cc: R help project ; Hans W 
Subject: Re: [R] x[0]: Can '0' be made an allowed index in R?

[External Email]

Heh. Did anyone bring up negative indices yet?

-pd

On 22 Apr 2024, at 10:46 , Rolf Turner  wrote:

See fortunes::fortune(36).

cheers,

Rolf Turner

--
Honorary Research Fellow
Department of Statistics
University of Auckland
Stats. Dep't. (secretaries) phone:
+64-9-373-7599 ext. 89622
Home phone: +64-9-480-4619

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat/
.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C02%7Ctebert%40ufl.edu
%7C79ca6aadcaee4aa3241308dc62d986f6%7C0d4da0f84a314d76ace60a62331e1b84
%7C0%7C0%7C638493933686698527%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=
wmv9OYcMES0nElT9OAKTdjBk%2BB55bQ7BjxOuaVVkPg4%3D&reserved=0
PLEASE do read the posting guide
http://www.r/
-project.org%2Fposting-guide.html&data=05%7C02%7Ctebert%40ufl.edu%7C79
ca6aadcaee4aa3241308dc62d986f6%7C0d4da0f84a314d76ace60a62331e1b84%7C0%
7C0%7C638493933686711061%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL
CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=AP78X
nfKrX6B0YVM0N76ty9v%2Fw%2BchHIytw33X7M9umE%3D&reserved=0
and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 
Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.r-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Use of geometric mean .. in good data analysis

2024-01-22 Thread John Fox

Dear Martin,

Helpful general advice, although it's perhaps worth mentioning that the 
geometric mean, defined e.g. naively as prod(x)^(1/length(x)), is 
necessarily 0 if there are any 0 values in x. That is, the geometric 
mean "works" in this case but isn't really informative.

Best,
 John
--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://www.john-fox.ca/

On 2024-01-22 12:18 p.m., Martin Maechler wrote:

Caution: External email.

Rich Shepard
 on Mon, 22 Jan 2024 07:45:31 -0800 (PST) writes:

 > A statistical question, not specific to R.  I'm asking for
 > a pointer for a source of definitive descriptions of what
 > types of data are best summarized by the arithmetic,
 > geometric, and harmonic means.

In spite of  off-topic:

I think it is a good question, not really only about
geo-chemistry, but about statistics in applied sciences (and
engineering for that matter).

Something I sure good applied statisticians in the 1980's and
1990's would all know the answer of :

To use the geometric mean instead of the arithmetic mean
is basically  *equivalent* to  first log-transform the data
and then work with that transformed data:
Not just for computing average, but for more relevant modelling,
inference, etc.

John W Tukey (and several other of the grands of the time)
had the log transform among the  "First aid transformations":

If the data for a continuous variable must all be positive it is
also typically the case that the distribution is considerably
skewed to the right.
In such a case behave as a good human who sees another human in
health distress: apply First Aid -- do the things you learned to
do quickly without too much thought, because things must happen
fast ---to hopefully save the other's life.

Here: Do log transform all such variables with further ado,
and only afterwards start your (exploratory and more) data analysis.

Now,  mean(log(y)) = log(geometricmean(y)),
where mean() is the arithmetic mean as in R
{mathematically; on the computer you need all.equal(), not '==' !!}

I.e., according to Tukey and all the other experienced applied
statisticians of the past, the geometric mean is the "best thing"
to do for such positive right-skewed data   in the same sense
that the log-transform is the best "a priori" transformation for
such data -- with the one advantage even that you need to fiddle
with zeroes when log-transforming, whereas the geometric mean
works already for zeroes.

Martin

 > As an aquatic ecologist I see regulators apply the
 > geometric mean to geochemical concentrations rather than
 > using the arithmetic mean. I want to know whether the
 > geometric mean of a set of chemical concentrations (e.g.,
 > in mg/L) is an appropriate representation of the expected
 > value. If not, I want to explain this to non-technical
 > decision-makers; if so, I want to understand why my
 > assumption is wrong.

 > TIA,

 > Rich

 > __
 > R-help@r-project.org mailing list -- To UNSUBSCRIBE and
 > more, see https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide
 > http://www.R-project.org/posting-guide.html and provide
 > commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Is there any design based two proportions z test?

2024-01-18 Thread John Fox

Dear Md Kamruzzaman,

I've copied this response to the r-help list, where you originally asked 
your question. That way, other people can follow the conversation, if 
they're interested and there will be a record of the solution. Please 
keep r-help in the loop

See below:

On 2024-01-17 9:47 p.m., Md. Kamruzzaman wrote:

Caution: External email.

Dear John
Thank you so much for your reply.

I have calculated the 95%CI of the separate two proportions by using the 
survey package. The code is given below.

svyby(~Diabetes_Cate, ~Year, nhc, svymean, na=TRUE)

Here: nhc is the weighted survey data.

I understand your point that it is possible to calculate the 95%CI of 
the proportional difference manually.  It is time consuming, that's why 
I was looking for a function with a design effect to calculate this 
easily.  I couldn't find this kind of function.

However, it will be okay for me to calculate this manually, if there are 
no functions like this.

If you intend to do this computation once, it's not terribly time 
consuming. If you intend to do it repeatedly, you can write a simple 
function to do the calculation, probably in less time than it takes to 
search for one.

For manual calculation, could you please share the formula? to calculate 
the 95%CI of proportional difference.

Here's a simple function to compute the confidence interval, assuming 
that the normal distribution is used. The formula is based on the 
elementary result that the variance of the difference of two independent 
random variables is the sum of their variances, plus the observation 
that the width of the confidence interval is 2*z*SE, where z is the 
normal quantile corresponding to the confidence level (e.g., 1.96 for a 
95% CI).

ciDiff <- function(ci1, ci2, level=0.95){
  p1 <- mean(ci1)
  p2 <- mean(ci2)
  z <- qnorm((1 - level)/2, lower.tail=FALSE)
  se1 <- (ci1[2] - ci1[1])/(2*z)
  se2 <- (ci2[2] - ci2[1])/(2*z)
  seDiff <- sqrt(se1^2 + se2^2)
  (p1 - p2) + c(-z, z)*seDiff
}

Example: Prevalence of Diabetes:
                                                      2011: 11.0 (95%CI 
10.1-11.9)
                                                      2017: 10.1 (95%CI 
9.4-10.9)

                                                      Diff: 0.9% (95%CI: ??)

These are percentages, not proportions, but you can use either:

> ciDiff(c(10.1, 11.9), c(9.4, 10.9))
[1] -0.3215375  2.0215375

> ciDiff(c(.101, .119), c(.094, .109))
[1] -0.003215375  0.020215375

You'll want more significant digits in the inputs to get sufficiently 
precise results.

Since I did this quickly, if I were you I'd check the results manually.

Best,
 John

With Kind Regards

---------

*/Md Kamruzzaman/*

On Thu, Jan 18, 2024 at 12:44 AM John Fox <mailto:j...@mcmaster.ca>> wrote:

Dear Md Kamruzzaman,

To answer your second question first, you could just use the svychisq()
function. The difference-of-proportion test is equivalent to a
chisquare
test for the 2-by-2 table.

You don't say how you computed the confidence intervals for the two
separate proportions, but if you have their standard errors (and if
not,
you should be able to infer them from the confidence intervals) you can
compute the variance of the difference as the sum of the variances
(squared standard errors), because the two proportions are independent,
and from that the confidence interval for their difference.

I hope this helps,
John
-- 
John Fox, Professor Emeritus

McMaster University
Hamilton, Ontario, Canada
web: https://www.john-fox.ca/ <https://www.john-fox.ca/>

On 2024-01-16 10:21 p.m., Md. Kamruzzaman wrote:
 > [You don't often get email from mkzama...@gmail.com
<mailto:mkzama...@gmail.com>. Learn why this is important at
https://aka.ms/LearnAboutSenderIdentification
<https://aka.ms/LearnAboutSenderIdentification> ]
 >
 > Caution: External email.
 >
 >
 > Hello Everyone,
 > I was analysing big survey data using survey packages on RStudio.
Survey
 > package allows survey data analysis with the design effect.The survey
 > package included functions for all other statistical analysis except
 > two-proportion z tests.
 >
 > I was trying to calculate the difference in prevalence of
Diabetes and
 > Prediabetes between the year 2011 and 2017 (with 95%CI). I was
able to
 > calculate the weighted prevalence of diabetes and prediabetes in
the Year
 > 2011 and 2017 and just subtracted the prevalence of 2011 from the
 > prevalence of 2017 to get the difference in prevalence. But I
could not
 > calculate the 95%CI of the difference in prevalence considering
the weight
 > of the survey data.
 >
 >

Re: [R] Is there any design based two proportions z test?

2024-01-17 Thread John Fox


Dear Md Kamruzzaman,

To answer your second question first, you could just use the svychisq() 
function. The difference-of-proportion test is equivalent to a chisquare 
test for the 2-by-2 table.


You don't say how you computed the confidence intervals for the two 
separate proportions, but if you have their standard errors (and if not, 
you should be able to infer them from the confidence intervals) you can 
compute the variance of the difference as the sum of the variances 
(squared standard errors), because the two proportions are independent, 
and from that the confidence interval for their difference.


I hope this helps,
John
--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://www.john-fox.ca/

On 2024-01-16 10:21 p.m., Md. Kamruzzaman wrote:

[You don't often get email from mkzama...@gmail.com. Learn why this is 
important at https://aka.ms/LearnAboutSenderIdentification ]

Caution: External email.


Hello Everyone,
I was analysing big survey data using survey packages on RStudio. Survey
package allows survey data analysis with the design effect.The survey
package included functions for all other statistical analysis except
two-proportion z tests.

I was trying to calculate the difference in prevalence of Diabetes and
Prediabetes between the year 2011 and 2017 (with 95%CI). I was able to
calculate the weighted prevalence of diabetes and prediabetes in the Year
2011 and 2017 and just subtracted the prevalence of 2011 from the
prevalence of 2017 to get the difference in prevalence. But I could not
calculate the 95%CI of the difference in prevalence considering the weight
of the survey data.

I was also trying to see if this difference in prevalence is statistically
significant. I could do it using the simple two-proportion z test without
considering the weight of the sample. But I want to do it considering the
weight of the sample.


Example: Prevalence of Diabetes:
  2011: 11.0 (95%CI
10.1-11.9)
  2017: 10.1 (95%CI
9.4-10.9)
  Diff: 0.9% (95%CI: ??)
  Proportion Z test P
Value: ??
Your cooperation will be highly appreciated.

Thanks in advance.

With Regards

**

*Md Kamruzzaman*

*PhD **Research Fellow (**Medicine**)*
Discipline of Medicine and Centre of Research Excellence in Translating
Nutritional Science to Good Health
Adelaide Medical School | Faculty of Health and Medical Sciences
The University of Adelaide
Adelaide SA 5005

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] car::deltaMethod() fails when a particular combination of categorical variables is not present

2023-09-26 Thread John Fox


Dear Michael,

My previous response was inaccurate: First, linearHypothesis() *is* able 
to accommodate aliased coefficients by setting the argument singular.ok 
= TRUE:


> linearHypothesis(minimal_model, "bt2 + csent + bt2:csent = 0",
+  singular.ok=TRUE)

Linear hypothesis test:
bt2  + csent  + bt2:csent = 0

Model 1: restricted model
Model 2: a ~ b * c

  Res.DfRSS Df Sum of Sq  F Pr(>F)
1 16 9392.1
2 15 9266.4  1125.67 0.2034 0.6584

Moreover, when there is an empty cell, this F-test is (for a reason that 
I haven't worked out, but is almost surely due to how the rank-deficient 
model is parametrized) *not* equivalent to the t-test for the 
corresponding coefficient in the raveled version of the two factors:


> df$bc <- factor(with(df, paste(b, c, sep=":")))
> m <- lm(a ~ bc, data=df)
> summary(m)

Call:
lm(formula = a ~ bc, data = df)

Residuals:
Min  1Q  Median  3Q Max
-57.455 -11.750   0.439  14.011  37.545

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)20.50  17.57   1.166   0.2617
bct1:unsent37.50  24.85   1.509   0.1521
bct2:other 32.00  24.85   1.287   0.2174
bct2:sent  17.17  22.69   0.757   0.4610  <<< cf. F = 0.2034, p 
= 0.6584

bct2:unsent38.95  19.11   2.039   0.0595

Residual standard error: 24.85 on 15 degrees of freedom
Multiple R-squared:  0.2613,Adjusted R-squared:  0.06437
F-statistic: 1.327 on 4 and 15 DF,  p-value: 0.3052

In the full-rank case, however, what I said is correct -- that is, the 
F-test for the 1 df hypothesis on the three coefficients is equivalent 
to the t-test for the corresponding coefficient when the two factors are 
raveled:


> linearHypothesis(minimal_model_fixed, "bt2 + csent + bt2:csent = 0")

Linear hypothesis test:
bt2  + csent  + bt2:csent = 0

Model 1: restricted model
Model 2: a ~ b * c

  Res.DfRSS Df Sum of Sq  F Pr(>F)
1 15 9714.5
2 14 9194.4  1520.08 0.7919 0.3886

> df_fixed$bc <- factor(with(df_fixed, paste(b, c, sep=":")))
> m <- lm(a ~ bc, data=df_fixed)
> summary(m)

Call:
lm(formula = a ~ bc, data = df_fixed)

Residuals:
Min  1Q  Median  3Q Max
-57.455 -11.750   0.167  14.011  37.545

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   64.000 25.627   2.497   0.0256
bct1:sent-43.500 31.387  -1.386   0.1874
bct1:unsent  -12.000 36.242  -0.331   0.7455
bct2:other   -11.500 31.387  -0.366   0.7195
bct2:sent-26.333 29.591  -0.890   0.3886 << cf.
bct2:unsent   -4.545 26.767  -0.170   0.8676

Residual standard error: 25.63 on 14 degrees of freedom
Multiple R-squared:  0.2671,Adjusted R-squared:  0.005328
F-statistic:  1.02 on 5 and 14 DF,  p-value: 0.4425

So, to summarize:

(1) You can use linearHypothesis() with singular.ok=TRUE to test the 
hypothesis that you specified, though I suspect that this hypothesis 
probably isn't testing what you think in the rank-deficient case. I 
suspect that the hypothesis that you want to test is obtained by 
raveling the two factors.


(2) There is no reason to use deltaMethod() for a linear hypothesis, but 
there is also no intrinsic reason that deltaMethod() shouldn't be able 
to handle a rank-deficient model. We'll probably fix that.


My apologies for the confusion,
 John

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://www.john-fox.ca/

On 2023-09-26 9:49 a.m., John Fox wrote:

Caution: External email.


Dear Michael,

You're testing a linear hypothesis, so there's no need to use the delta
method, but the linearHypothesis() function in the car package also
fails in your case:

 > linearHypothesis(minimal_model, "bt2 + csent + bt2:csent = 0")
Error in linearHypothesis.lm(minimal_model, "bt2 + csent + bt2:csent = 
0") :

there are aliased coefficients in the model.

One work-around is to ravel the two factors into a single factor with 5
levels:

 > df$bc <- factor(with(df, paste(b, c, sep=":")))
 > df$bc
  [1] t2:unsent t2:unsent t2:unsent t2:unsent t2:sent   t2:unsent
  [7] t2:unsent t1:sent   t2:unsent t2:unsent t2:other  t2:unsent
[13] t1:unsent t1:sent   t2:unsent t2:other  t1:unsent t2:sent
[19] t2:sent   t2:unsent
Levels: t1:sent t1:unsent t2:other t2:sent t2:unsent

 > m <- lm(a ~ bc, data=df)
 > summary(m)

Call:
lm(formula = a ~ bc, data = df)

Residuals:
     Min  1Q  Median  3Q Max
-57.455 -11.750   0.439  14.011  37.545

Coefficients:
     Estimate Std. Error t value Pr(>|t|)
(Intercept)    20.50  17.57   1.166   0.2617
bct1:unsent    37.50  24.85   1.509   0.1521
bct2:other 32.00  24.85   1.287   0.2174
bct2:sent  17.17  22.69   0.757   0.4610
bct2:unsent    38.95  19.11   2.039   0.0595

Residual sta

Re: [R] car::deltaMethod() fails when a particular combination of categorical variables is not present

2023-09-26 Thread John Fox

Dear Michael,

You're testing a linear hypothesis, so there's no need to use the delta 
method, but the linearHypothesis() function in the car package also 
fails in your case:

> linearHypothesis(minimal_model, "bt2 + csent + bt2:csent = 0")
Error in linearHypothesis.lm(minimal_model, "bt2 + csent + bt2:csent = 0") :
there are aliased coefficients in the model.

One work-around is to ravel the two factors into a single factor with 5 
levels:

> df$bc <- factor(with(df, paste(b, c, sep=":")))
> df$bc
 [1] t2:unsent t2:unsent t2:unsent t2:unsent t2:sent   t2:unsent
 [7] t2:unsent t1:sent   t2:unsent t2:unsent t2:other  t2:unsent
[13] t1:unsent t1:sent   t2:unsent t2:other  t1:unsent t2:sent
[19] t2:sent   t2:unsent
Levels: t1:sent t1:unsent t2:other t2:sent t2:unsent

> m <- lm(a ~ bc, data=df)
> summary(m)

Call:
lm(formula = a ~ bc, data = df)

Residuals:
Min  1Q  Median  3Q Max
-57.455 -11.750   0.439  14.011  37.545

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)20.50  17.57   1.166   0.2617
bct1:unsent37.50  24.85   1.509   0.1521
bct2:other 32.00  24.85   1.287   0.2174
bct2:sent  17.17  22.69   0.757   0.4610
bct2:unsent38.95  19.11   2.039   0.0595

Residual standard error: 24.85 on 15 degrees of freedom
Multiple R-squared:  0.2613,Adjusted R-squared:  0.06437
F-statistic: 1.327 on 4 and 15 DF,  p-value: 0.3052

Then the hypothesis is tested directly by the t-value for the 
coefficient bct2:sent.

I hope that this helps,
 John

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://www.john-fox.ca/

On 2023-09-26 1:12 a.m., Michael Cohn wrote:

Caution: External email.

I'm running a linear regression with two categorical predictors and their
interaction. One combination of levels does not occur in the data, and as
expected, no parameter is estimated for it. I now want to significance test
a particular combination of levels that does occur in the data (ie, I want
to get a confidence interval for the total prediction at given levels of
each variable).

In the past I've done this using car::deltaMethod() but in this dataset
that does not work, as shown in the example below: The regression model
gives the expected output, but deltaMethod() gives this error:

error in t(gd) %*% vcov. : non-conformable arguments

I believe this is because there is no parameter estimate for when the
predictors have the values 't1' and 'other'. In the df_fixed dataframe,
putting one person into that combination of categories causes deltaMethod()
to work as expected.

I don't know of any theoretical reason that missing one interaction
parameter estimate should prevent getting a confidence interval for a
different combination of predictors. Is there a way to use deltaMethod() or
some other function to do this without changing my data?

Thank you,

- Michael Cohn
Vote Rev (http://voterev.org)

Demonstration:
--

library(car)
# create dataset with outcome and two categorical predictors
outcomes <- c(91,2,60,53,38,78,48,33,97,41,64,84,64,8,66,41,52,18,57,34)
persontype <-
c("t2","t2","t2","t2","t2","t2","t2","t1","t2","t2","t2","t2","t1","t1","t2","t2","t1","t2","t2","t2")
arm_letter <-
c("unsent","unsent","unsent","unsent","sent","unsent","unsent","sent","unsent","unsent","other","unsent","unsent","sent","unsent","other","unsent","sent","sent","unsent")
df <- data.frame(a = outcomes, b=persontype, c=arm_letter)

# note: there are no records with the combination 't1' + 'other'
table(df$b,df$c)

#regression works as expected
minimal_formula <- formula("a ~ b*c")
minimal_model <- lm(minimal_formula, data=df)
summary(minimal_model)

#use deltaMethod() to get a prediction for individuals with the combination
'b2' and 'sent'
# deltaMethod() fails with "error in t(gd) %*% vcov. : non-conformable
arguments."
deltaMethod(minimal_model, "bt2 + csent + `bt2:csent`", rhs=0)

# duplicate the dataset and change one record to be in the previously empty
cell
df_fixed <- df
df_fixed[c(13),"c"] <- 'other'
table(df_fixed$b,df_fixed$c)

#deltaMethod() now works
minimal_model_fixed <- lm(minimal_formula, data=df_fixed)
deltaMethod(minimal_model_fixed, "bt2 + csent + `bt2:csent`", rhs=0)

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Print hypothesis warning- Car package

2023-09-18 Thread John Fox


Hi Peter,

On 2023-09-18 10:08 a.m., peter dalgaard wrote:

Caution: External email.


Also, I would guess that the code precedes the use of backticks in non-syntactic names. 


Indeed, by more than a decade (though modified in the interim).


Could they be deployed here?


I don't think so, at least not without changing how the function works.

The problem doesn't occur when the hypothesis is specified symbolically 
as a character vector, including in equation form, only when the 
hypothesis matrix is given directly, in which case linearHypothesis() 
tries to construct the equation-form representation, again as character 
vectors. Its inability to do so when the coefficient names include 
arithmetic operators doesn't, I think, require a warning or even a 
message: the symbolic representation of the hypothesis can simply be 
omitted. The numeric results reported are entirely unaffected.


I've made this change and will commit it to the next version of the car 
package.


Thank you for the suggestion,
 John




- Peter


On 17 Sep 2023, at 16:43 , John Fox  wrote:

Dear Robert,

Anova() calls linearHypothesis(), also in the car package, to compute sums of 
squares and df, supplying appropriate hypothesis matrices. linearHypothesis() 
usually tries to express the hypothesis matrix in symbolic equation form for 
printing, but won't do this if coefficient names include arithmetic operators, 
in your case - and +, which can confuse it.

The symbolic form of the hypothesis isn't really relevant for Anova(), which 
doesn't use the printed representation of each hypothesis, and so, despite the 
warnings, you get the correct ANOVA table. In your case, where the data are 
balanced, with 4 cases per cell, Anova(mod) and summary(mod) are equivalent, 
which makes me wonder why you would use Anova() in the first place.

To elaborate a bit, linearHypothesis() does tolerate arithmetic operators in 
coefficient names if you specify the hypothesis symbolically rather than as a 
hypothesis matrix. For example, to test, the interaction:

--- snip 


linearHypothesis(mod,

+  c("TreatmentDabrafenib:ExpressionCD271+ = 0",
+"TreatmentTrametinib:ExpressionCD271+ = 0",
+"TreatmentCombination:ExpressionCD271+ = 0"))
Linear hypothesis test

Hypothesis:
TreatmentDabrafenib:ExpressionCD271+ = 0
TreatmentTrametinib:ExpressionCD271+ = 0
TreatmentCombination:ExpressionCD271+ = 0

Model 1: restricted model
Model 2: Viability ~ Treatment * Expression

  Res.Df   RSS Df Sum of Sq F Pr(>F)
1 27 18966
2 24 16739  32226.3 1.064 0.3828

--- snip 

Alternatively:

--- snip 


H <- matrix(0, 3, 8)
H[1, 6] <- H[2, 7] <- H[3, 8] <- 1
H

 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]00000100
[2,]00000010
[3,]00000001


linearHypothesis(mod, H)

Linear hypothesis test

Hypothesis:


Model 1: restricted model
Model 2: Viability ~ Treatment * Expression

  Res.Df   RSS Df Sum of Sq F Pr(>F)
1 27 18966
2 24 16739  32226.3 1.064 0.3828
Warning message:
In printHypothesis(L, rhs, names(b)) :
  one or more coefficients in the hypothesis include
 arithmetic operators in their names;
  the printed representation of the hypothesis will be omitted

--- snip 

There's no good reason that linearHypothesis() should try to express each 
hypothesis symbolically for Anova(), since Anova() doesn't use that 
information. When I have some time, I'll arrange to avoid the warning.

Best,
John

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://www.john-fox.ca/
On 2023-09-16 4:39 p.m., Robert Baer wrote:

Caution: External email.
When doing Anova using the car package,  I get a print warning that is
unexpected.  It seemingly involves have my flow cytometry factor levels
named CD271+ and CD171-.  But I am not sure this warning should be
intended behavior.  Any explanation about whether I'm doing something
wrong? Why can't I have CD271+ and CD271- as factor levels?  Its legal
text isn't it?
library(car) mod = aov(Viability ~ Treatment*Expression, data = dat1)
Anova(mod, type =2) Anova Table (Type II tests) Response: Viability Sum
Sq Df F value Pr(>F) Treatment 19447.3 3 9.2942 0.0002927 *** Expression
2669.8 1 3.8279 0.0621394 . Treatment:Expression 2226.3 3 1.0640
0.3828336 Residuals 16739.3 24 --- Signif. codes: 0 ‘***’ 0.001 ‘**’
0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Warning messages: 1: In printHypothesis(L,
rhs, names(b)) : one or more coefficients in the hypothesis include
arithmetic operators in their names; the printed representation of the
hypothesis will be omitted 2: In printHypothesis(L, rhs, names(b)) : one
or more coefficients in the hypothesis include arithmetic operators in
the

Re: [R] Print hypothesis warning- Car package

2023-09-17 Thread John Fox

Dear Robert,

Anova() calls linearHypothesis(), also in the car package, to compute 
sums of squares and df, supplying appropriate hypothesis matrices. 
linearHypothesis() usually tries to express the hypothesis matrix in 
symbolic equation form for printing, but won't do this if coefficient 
names include arithmetic operators, in your case - and +, which can 
confuse it.

The symbolic form of the hypothesis isn't really relevant for Anova(), 
which doesn't use the printed representation of each hypothesis, and so, 
despite the warnings, you get the correct ANOVA table. In your case, 
where the data are balanced, with 4 cases per cell, Anova(mod) and 
summary(mod) are equivalent, which makes me wonder why you would use 
Anova() in the first place.

To elaborate a bit, linearHypothesis() does tolerate arithmetic 
operators in coefficient names if you specify the hypothesis 
symbolically rather than as a hypothesis matrix. For example, to test, 
the interaction:

--- snip 

> linearHypothesis(mod,
+  c("TreatmentDabrafenib:ExpressionCD271+ = 0",
+"TreatmentTrametinib:ExpressionCD271+ = 0",
+"TreatmentCombination:ExpressionCD271+ = 0"))
Linear hypothesis test

Hypothesis:
TreatmentDabrafenib:ExpressionCD271+ = 0
TreatmentTrametinib:ExpressionCD271+ = 0
TreatmentCombination:ExpressionCD271+ = 0

Model 1: restricted model
Model 2: Viability ~ Treatment * Expression

  Res.Df   RSS Df Sum of Sq F Pr(>F)
1 27 18966
2 24 16739  32226.3 1.064 0.3828

--- snip 

Alternatively:

--- snip 

> H <- matrix(0, 3, 8)
> H[1, 6] <- H[2, 7] <- H[3, 8] <- 1
> H
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]00000100
[2,]00000010
[3,]00000001

> linearHypothesis(mod, H)
Linear hypothesis test

Hypothesis:

Model 1: restricted model
Model 2: Viability ~ Treatment * Expression

  Res.Df   RSS Df Sum of Sq F Pr(>F)
1 27 18966
2 24 16739  32226.3 1.064 0.3828
Warning message:
In printHypothesis(L, rhs, names(b)) :
  one or more coefficients in the hypothesis include
 arithmetic operators in their names;
  the printed representation of the hypothesis will be omitted

--- snip 

There's no good reason that linearHypothesis() should try to express 
each hypothesis symbolically for Anova(), since Anova() doesn't use that 
information. When I have some time, I'll arrange to avoid the warning.

Best,
 John

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://www.john-fox.ca/
On 2023-09-16 4:39 p.m., Robert Baer wrote:

Caution: External email.

When doing Anova using the car package,  I get a print warning that is
unexpected.  It seemingly involves have my flow cytometry factor levels
named CD271+ and CD171-.  But I am not sure this warning should be
intended behavior.  Any explanation about whether I'm doing something
wrong? Why can't I have CD271+ and CD271- as factor levels?  Its legal
text isn't it?

library(car) mod = aov(Viability ~ Treatment*Expression, data = dat1)
Anova(mod, type =2) Anova Table (Type II tests) Response: Viability Sum
Sq Df F value Pr(>F) Treatment 19447.3 3 9.2942 0.0002927 *** Expression
2669.8 1 3.8279 0.0621394 . Treatment:Expression 2226.3 3 1.0640
0.3828336 Residuals 16739.3 24 --- Signif. codes: 0 ‘***’ 0.001 ‘**’
0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Warning messages: 1: In printHypothesis(L,
rhs, names(b)) : one or more coefficients in the hypothesis include
arithmetic operators in their names; the printed representation of the
hypothesis will be omitted 2: In printHypothesis(L, rhs, names(b)) : one
or more coefficients in the hypothesis include arithmetic operators in
their names; the printed representation of the hypothesis will be
omitted 3: In printHypothesis(L, rhs, names(b)) : one or more
coefficients in the hypothesis include arithmetic operators in their
names; the printed representation of the hypothesis will be omitted

The code to reproduce:

```

dat1 <-structure(list(Treatment = structure(c(1L, 1L, 1L, 1L, 3L, 1L,
1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L), levels = c("Control",
"Dabrafenib", "Trametinib", "Combination"), class = "factor"),
Expression = structure(c(2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L,
 1L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L,
 1L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 1L), levels = c("CD271-",
"CD271+"), class = "factor"),

Re: [R] Determining Starting Values for Model Parameters in Nonlinear Regression

2023-08-19 Thread John Fox


Dear John, John, and Paul,

In this case, one can start values by just fitting

> lm(1/y ~ x1 + x2 + x3 - 1, data=mydata)

Call:
lm(formula = 1/y ~ x1 + x2 + x3 - 1, data = mydata)

Coefficients:
 x1   x2   x3
0.00629  0.00868  0.00803

Of course, the errors enter this model differently, so this isn't the 
same as the nonlinear model, but the regression coefficients are very 
close to the estimates for the nonlinear model.


Best,
 John

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://www.john-fox.ca/

On 2023-08-19 6:39 p.m., Sorkin, John wrote:

Caution: External email.


Colleagues,

At the risk of starting a forest fire, or perhaps a brush fire, while it is 
good to see that nlxb can find a solution from arbitrary starting values, I 
think Paul’s question has merit despite Professor Nash’s excellent and helpful 
observation.

Although non-linear algorithms can converge, they can converge to a false 
solution if starting values are sub-optimally specified. When possible, I try 
to specify thought-out starting values. Would it make sense to plot y as a 
function of (x1, x2) at different values of x3 to get a sense of possible 
starting values? Or, perhaps using median values of x1, x2, and x3 as starting 
values. Comparing results from different starting values can give some 
confidence that the solution obtained using arbitrary starting values are 
likely “correct”.

I freely admit that my experience (and thus expertise) using non-linear 
solutions is limited. Please do not flame me, I am simply urging caution.

John

John David Sorkin M.D., Ph.D.
Professor of Medicine
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology and Geriatric 
Medicine
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to 
faxing)

On Aug 19, 2023, at 4:35 PM, J C Nash 
mailto:profjcn...@gmail.com>> wrote:

Why bother. nlsr can find a solution from very crude start.

Mixture <- c(17, 14, 5, 1, 11, 2, 16, 7, 19, 23, 20, 6, 13, 21, 3, 18, 15, 26, 
8, 22)
x1 <- c(69.98, 72.5, 77.6, 79.98, 74.98, 80.06, 69.98, 77.34, 69.99, 67.49, 
67.51, 77.63,
72.5, 67.5, 80.1, 69.99, 72.49, 64.99, 75.02, 67.48)
x2 <- c(29, 25.48, 21.38, 19.85, 22, 18.91, 29.99, 19.65, 26.99, 29.49, 32.47,
20.35, 26.48, 31.47, 16.87, 27.99, 24.49, 31.99, 24.96, 30.5)
x3 <- c(1, 2, 1, 0, 3, 1, 0, 2.99, 3, 3, 0, 2, 1, 1, 3, 2,
3, 3, 0, 2)
y <- c(1.4287, 1.4426, 1.4677, 1.4774, 1.4565,
   1.4807, 1.4279, 1.4684, 1.4301, 1.4188, 1.4157, 1.4686, 1.4414,
   1.4172, 1.4829, 1.4291, 1.4438, 1.4068, 1.4524, 1.4183)
mydata<-data.frame(Mixture, x1, x2, x3, y)
mydata
mymod <- y ~ 1/(Beta1*x1 + Beta2*x2 + Beta3*x3)
library(nlsr)
strt<-c(Beta1=1, Beta2=2, Beta3=3)
trysol<-nlxb(formula=mymod, data=mydata, start=strt, trace=TRUE)
trysol
# or pshort(trysol)


Output is

residual sumsquares =  1.5412e-05  on  20 observations
after  29Jacobian and  43 function evaluations
  namecoeff  SE   tstat  pval  gradient
JSingval
Beta1 0.00629212 5.997e-06   1049  2.425e-42   4.049e-08   
721.8
Beta2 0.00867741 1.608e-05  539.7  1.963e-37  -2.715e-08   
56.05
Beta3 0.00801948 8.809e-05  91.03  2.664e-24   1.497e-08   
10.81

J Nash


On 2023-08-19 16:19, Paul Bernal wrote:
Dear friends,
Hope you are all doing well and having a great weekend.  I have data that
was collected on specific gravity and spectrophotometer analysis for 26
mixtures of NG (nitroglycerine), TA (triacetin), and 2 NDPA (2 -
nitrodiphenylamine).
In the dataset, x1 = %NG,  x2 = %TA, and x3 = %2 NDPA.
The response variable is the specific gravity, and the rest of the
variables are the predictors.
This is the dataset:
dput(mod14data_random)
structure(list(Mixture = c(17, 14, 5, 1, 11, 2, 16, 7, 19, 23,
20, 6, 13, 21, 3, 18, 15, 26, 8, 22), x1 = c(69.98, 72.5, 77.6,
79.98, 74.98, 80.06, 69.98, 77.34, 69.99, 67.49, 67.51, 77.63,
72.5, 67.5, 80.1, 69.99, 72.49, 64.99, 75.02, 67.48), x2 = c(29,
25.48, 21.38, 19.85, 22, 18.91, 29.99, 19.65, 26.99, 29.49, 32.47,
20.35, 26.48, 31.47, 16.87, 27.99, 24.49, 31.99, 24.96, 30.5),
 x3 = c(1, 2, 1, 0, 3, 1, 0, 2.99, 3, 3, 0, 2, 1, 1, 3, 2,
 3, 3, 0, 2), y = c(1.4287, 1.4426, 1.4677, 1.4774, 1.4565,
 1.4807, 1.4279, 1.4684, 1.4301, 1.4188, 1.4157, 1.4686, 1.4414,
 1.4172, 1.4829, 1.4291, 1.4438, 1.4068, 1.4524, 1.4183)), row.names =
c(NA,
-20L), class = "data.frame")
The model is the following:
y = 1/(Beta1x1 + Beta2x2 + Beta3x3)
I need to determine starting (initial) values for the model parameters for
this nonlinear regression model, any ideas on how to accomplish this using
R?
Cheers,
Paul
[[alternative HTML version deleted]]

Re: [R] Getting an error calling MASS::boxcox in a function

2023-07-08 Thread John Fox

Hi Bert,

On 2023-07-08 3:42 p.m., Bert Gunter wrote:

Caution: This email may have originated from outside the organization. Please 
exercise additional caution with any links and attachments.

Thanks John.

?boxcox says:

*
Arguments

object

a formula or fitted model object. Currently only lm and aov objects are handled.
*
I read that as saying that

boxcox(lm(z+1 ~ 1),...)

should run without error. But it didn't. And perhaps here's why:
BoxCoxLambda <- function(z){
b <- MASS:::boxcox.lm(lm(z+1 ~ 1), lambda = seq(-5, 5, length.out =
61), plotit = FALSE)
b$x[which.max(b$y)]# best lambda
}

lambdas <- apply(dd,2 , BoxCoxLambda)

Error in NextMethod() : 'NextMethod' called from an anonymous function

and, indeed, ?UseMethod says:
"NextMethod should not be called except in methods called by UseMethod
or from internal generics (see InternalGenerics). In particular it
will not work inside anonymous calling functions (e.g.,
get("print.ts")(AirPassengers))."

BUT 
BoxCoxLambda <- function(z){
   b <- MASS:::boxcox(z+1 ~ 1, lambda = seq(-5, 5, length.out = 61),
plotit = FALSE)
   b$x[which.max(b$y)]# best lambda
}

lambdas <- apply(dd,2 , BoxCoxLambda)
lambdas

[1] 0.167 0.167

As it turns out, it's the update() step in boxcox.lm() that fails, and 
the update takes place because $y is missing from the lm object, so the 
following works:

BoxCoxLambda <- function(z){
b <- boxcox(lm(z + 1 ~ 1, y=TRUE),
lambda = seq(-5, 5, length.out = 101),
plotit = FALSE)
b$x[which.max(b$y)]
}

The identical lambdas do not seem right to me; 

I think that's just an accident of the example (using the BoxCoxLambda() 
above):

> apply(dd, 2, BoxCoxLambda, simplify = TRUE)
[1] 0.2 0.2

> dd[, 2]  <- dd[, 2]^3
> apply(dd, 2, BoxCoxLambda, simplify = TRUE)
[1] 0.2 0.1

Best,
 John

nor do I understand why
boxcox.lm apparently throws the error while boxcox.formula does not
(it also calls NextMethod()) So I would welcome clarification to clear
my clogged (cerebral) sinuses. :-)

Best,
Bert

On Sat, Jul 8, 2023 at 11:25 AM John Fox  wrote:

Dear Ron and Bert,

First (and without considering why one would want to do this, e.g.,
adding a start of 1 to the data), the following works for me:

-- snip --

  > library(MASS)

  > BoxCoxLambda <- function(z){
+   b <- boxcox(z + 1 ~ 1,
+   lambda = seq(-5, 5, length.out = 101),
+   plotit = FALSE)
+   b$x[which.max(b$y)]
+ }

  > mrow <- 500
  > mcol <- 2
  > set.seed(12345)
  > dd <- matrix(rgamma(mrow*mcol, shape = 2, scale = 5), nrow = mrow, ncol =
+mcol)

  > dd1 <- dd[, 1] # 1st column of dd
  > res <- boxcox(lm(dd1 + 1 ~ 1), lambda = seq(-5, 5, length.out = 101),
plotit
+  = FALSE)
  > res$x[which.max(res$y)]
[1] 0.2

  > apply(dd, 2, BoxCoxLambda, simplify = TRUE)
[1] 0.2 0.2

-- snip --

One could also use the powerTransform() function in the car package,
which in this context transforms towards *multi*normality:

-- snip --

  > library(car)
Loading required package: carData

  > powerTransform(dd + 1)
Estimated transformation parameters
 Y1Y2
0.1740200 0.2089925

I hope this helps,
   John

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://www.john-fox.ca/

On 2023-07-08 12:47 p.m., Bert Gunter wrote:

Caution: This email may have originated from outside the organization. Please 
exercise additional caution with any links and attachments.

No, I'm afraid I'm wrong. Something went wrong with my R session and gave
me incorrect answers. After restarting, I continued to get the same error
as you did with my supposed "fix." So just ignore what I said and sorry for
the noise.

-- Bert

On Sat, Jul 8, 2023 at 8:28 AM Bert Gunter  wrote:

Try this for your function:

BoxCoxLambda <- function(z){
 y <- z
 b <- boxcox(y + 1 ~ 1,lambda = seq(-5, 5, length.out = 61), plotit =
FALSE)
 b$x[which.max(b$y)]# best lambda
}

***I think*** (corrections and clarification strongly welcomed!) that `~`
(the formula function) is looking for 'z' in the GlobalEnv, the caller of
apply(), and not finding it. It finds 'y' here explicitly in the
BoxCoxLambda environment.

Cheers,
Bert

On Sat, Jul 8, 2023 at 4:28 AM Ron Crump via R-help 
wrote:

Hi,

Firstly, apologies as I have posted this on community.rstudio.com too.

I want to optimise a Box-Cox transformation on columns of a matrix (ie, a
unique lambda for each column). So I wrote a function that includes the
call to MASS::boxcox in order that it can be applied to each column easily.
Except that I'm getting an error when calling the function. If I just
extract a column of the matrix

Re: [R] Getting an error calling MASS::boxcox in a function

2023-07-08 Thread John Fox

Dear Ron and Bert,

First (and without considering why one would want to do this, e.g., 
adding a start of 1 to the data), the following works for me:

-- snip --

> library(MASS)

> BoxCoxLambda <- function(z){
+   b <- boxcox(z + 1 ~ 1,
+   lambda = seq(-5, 5, length.out = 101),
+   plotit = FALSE)
+   b$x[which.max(b$y)]
+ }

> mrow <- 500
> mcol <- 2
> set.seed(12345)
> dd <- matrix(rgamma(mrow*mcol, shape = 2, scale = 5), nrow = mrow, ncol =
+mcol)

> dd1 <- dd[, 1] # 1st column of dd
> res <- boxcox(lm(dd1 + 1 ~ 1), lambda = seq(-5, 5, length.out = 101), 
plotit

+  = FALSE)
> res$x[which.max(res$y)]
[1] 0.2

> apply(dd, 2, BoxCoxLambda, simplify = TRUE)
[1] 0.2 0.2

-- snip --

One could also use the powerTransform() function in the car package, 
which in this context transforms towards *multi*normality:

-- snip --

> library(car)
Loading required package: carData

> powerTransform(dd + 1)
Estimated transformation parameters
   Y1Y2
0.1740200 0.2089925

I hope this helps,
 John

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://www.john-fox.ca/

On 2023-07-08 12:47 p.m., Bert Gunter wrote:

Caution: This email may have originated from outside the organization. Please 
exercise additional caution with any links and attachments.

No, I'm afraid I'm wrong. Something went wrong with my R session and gave
me incorrect answers. After restarting, I continued to get the same error
as you did with my supposed "fix." So just ignore what I said and sorry for
the noise.

-- Bert

On Sat, Jul 8, 2023 at 8:28 AM Bert Gunter  wrote:

Try this for your function:

BoxCoxLambda <- function(z){
y <- z
b <- boxcox(y + 1 ~ 1,lambda = seq(-5, 5, length.out = 61), plotit =
FALSE)
b$x[which.max(b$y)]# best lambda
}

***I think*** (corrections and clarification strongly welcomed!) that `~`
(the formula function) is looking for 'z' in the GlobalEnv, the caller of
apply(), and not finding it. It finds 'y' here explicitly in the
BoxCoxLambda environment.

Cheers,
Bert

On Sat, Jul 8, 2023 at 4:28 AM Ron Crump via R-help 
wrote:

Hi,

Firstly, apologies as I have posted this on community.rstudio.com too.

I want to optimise a Box-Cox transformation on columns of a matrix (ie, a
unique lambda for each column). So I wrote a function that includes the
call to MASS::boxcox in order that it can be applied to each column easily.
Except that I'm getting an error when calling the function. If I just
extract a column of the matrix and run the code not in the function, it
works. If I call the function either with an extracted column (ie dd1 in
the reprex below) or in a call to apply I get an error (see the reprex
below).

I'm sure I'm doing something silly, but I can't see what it is. Any help
appreciated.

library(MASS)

# Find optimised Lambda for Boc-Cox transformation
BoxCoxLambda <- function(z){
 b <- boxcox(lm(z+1 ~ 1), lambda = seq(-5, 5, length.out = 61), plotit
= FALSE)
 b$x[which.max(b$y)]# best lambda
}

mrow <- 500
mcol <- 2
set.seed(12345)
dd <- matrix(rgamma(mrow*mcol, shape = 2, scale = 5), nrow = mrow, ncol =
mcol)

# Try it not using the BoxCoxLambda function:
dd1 <- dd[,1] # 1st column of dd
bb <- boxcox(lm(dd1+1 ~ 1), lambda = seq(-5, 5, length.out = 101), plotit
= FALSE)
print(paste0("1st column's lambda is ", bb$x[which.max(bb$y)]))
#> [1] "1st column's lambda is 0.2"

# Calculate lambda for each column of dd
lambdas <- apply(dd, 2, BoxCoxLambda, simplify = TRUE)
#> Error in eval(predvars, data, env): object 'z' not found

Created on 2023-07-08 with reprex v2.0.2

Thanks for your time and help.

Ron
 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] loess plotting problem

2023-03-23 Thread John Fox

Dear ,

On 2023-03-23 11:08 a.m., Anupam Tyagi wrote:

Thanks, John.

However, loess.smooth() is producing a very different curve compared to 
the one that results from applying predict() on a loess(). I am guessing 
they are using different defaults. Correct?

No need to guess. Just look at the help pages ?loess and ?loess.smooth. 
If you don't like the default for loess.smooth(), just specify the 
arguments you want.

Best,
 John

On Thu, 23 Mar 2023 at 20:20, John Fox <mailto:j...@mcmaster.ca>> wrote:

Dear Anupam Tyagi,

You didn't include your data, so it's not possible to see exactly what
happened, but I think that you misunderstand the object that loess()
returns. It returns a "loess" object with several components, including
the original data in x and y. So if pass the object to lines(), you'll
simply connect the points, and if x isn't sorted, the points won't
be in
order. Try, e.g.,

plot(speed ~ dist, data=cars)
m <- loess(speed ~ dist, data=cars)
names(m)
lines(m)

You'd do better to use loess.smooth(), which is intended for adding a
loess regression to a scatterplot; for example,

plot(speed ~ dist, data=cars)
with(cars, lines(loess.smooth(dist, speed)))

Other points: You don't have to load the stats package which is
available by default when you start R. It's best to avoid attach(), the
use of which can cause confusion.

I hope this helps,
   John

-- 
* preferred email: john.david@proton.me

<mailto:john.david@proton.me>
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://www.john-fox.ca/ <https://www.john-fox.ca/>

On 2023-03-23 10:18 a.m., Anupam Tyagi wrote:
 > For some reason the following code is not plotting as I want it
to. I want
 > to plot a "loess" line plotted over a scatter plot. I get a
jumble, with
 > lines connecting all the points. I had a similar problem with
"lowess". I
 > solved that by dropping "NA" rows from the data columns. Please help.
 >
 > library(stats)
 > attach(gini_pci_wdi_narm)
 > plot(ny_gnp_pcap_pp_kd, si_pov_gini)
 > lines(loess(si_pov_gini ~ ny_gnp_pcap_pp_kd, gini_pci_wdi_narm))
 > detach(gini_pci_wdi_narm)
 >

--
Anupam.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] loess plotting problem

2023-03-23 Thread John Fox


Dear Anupam Tyagi,

You didn't include your data, so it's not possible to see exactly what 
happened, but I think that you misunderstand the object that loess() 
returns. It returns a "loess" object with several components, including 
the original data in x and y. So if pass the object to lines(), you'll 
simply connect the points, and if x isn't sorted, the points won't be in 
order. Try, e.g.,


plot(speed ~ dist, data=cars)
m <- loess(speed ~ dist, data=cars)
names(m)
lines(m)

You'd do better to use loess.smooth(), which is intended for adding a 
loess regression to a scatterplot; for example,


plot(speed ~ dist, data=cars)
with(cars, lines(loess.smooth(dist, speed)))

Other points: You don't have to load the stats package which is 
available by default when you start R. It's best to avoid attach(), the 
use of which can cause confusion.


I hope this helps,
 John

--
* preferred email: john.david@proton.me
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://www.john-fox.ca/

On 2023-03-23 10:18 a.m., Anupam Tyagi wrote:

For some reason the following code is not plotting as I want it to. I want
to plot a "loess" line plotted over a scatter plot. I get a jumble, with
lines connecting all the points. I had a similar problem with "lowess". I
solved that by dropping "NA" rows from the data columns. Please help.

library(stats)
attach(gini_pci_wdi_narm)
plot(ny_gnp_pcap_pp_kd, si_pov_gini)
lines(loess(si_pov_gini ~ ny_gnp_pcap_pp_kd, gini_pci_wdi_narm))
detach(gini_pci_wdi_narm)



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Good Will Legal Question

2023-03-21 Thread John Fox


Dear Timothy,

On 2023-03-21 1:38 p.m., Ebert,Timothy Aaron wrote:

My guess: It I clear from the link that they can use the R logo for commercial purposes. The issue 
is what to do about the "appropriate credit" and "link to the license." How 
would I do that on a hoodie? Would they need a web address or something?


That's a good question, and one that I missed -- the implicit focus is 
on using the logo, e.g., in software.


With the caveat that I'm not speaking for the R Foundation, I think that 
it would be sufficient to provide credit and a link to the license on 
the webpage that sells the hoodie. FWIW, I (and I expect you) have seen 
many t-shirts, etc., with R logos, some from companies, and I even have 
a few. I doubt that anyone will care.


Best,
 John



-Original Message-----
From: R-help  On Behalf Of John Fox
Sent: Tuesday, March 21, 2023 1:19 PM
To: Coding Hoodies 
Cc: r-help@r-project.org
Subject: Re: [R] Good Will Legal Question

[External Email]

Dear Arid Sweeting,

R-help is probably not the place to ask this question, although perhaps since 
you're seeking moral advice, people might want to say something. I would 
normally expect to see a query like this addressed to the R website webmasters, 
of which I'm one -- with the caveat that the R Foundation doesn't give legal 
advice.

Just to be sure, you say that you read the rules for use of the R logo, so I assume that you've 
seen 
<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.r-project.org%2Flogo%2F&data=05%7C01%7Ctebert%40ufl.edu%7C99f01774c9f5452bd99a08db2a31ec23%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638150166126816193%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jNvmCKITcZFcmqiRqkjqZnJVY3TYuD3wu3Mp0zhSHPs%3D&reserved=0>,
 which seems entirely clear to me. I think that it's safe to say that if the R Foundation wanted 
to limit commercial use of the R logo, it wouldn't have released it under the CC-BY-SA 4.0 
license. I'm not sure what moral issues concern you.

I hope this helps,
   John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: 
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsocialsciences.mcmaster.ca%2Fjfox%2F&data=05%7C01%7Ctebert%40ufl.edu%7C99f01774c9f5452bd99a08db2a31ec23%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638150166126816193%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=iLeUGFcyjk3kYNi2v8fV1jgc9M9OVdWYv9nJeI1G7Q4%3D&reserved=0

On 2023-03-21 6:18 a.m., Coding Hoodies wrote:

Hi R Team!,

We are opening a new start up soon, codinghoodies.com, we want to make coders 
feel stylish.

Out of goodwill I wanted to ask you formally if I can have permission to use 
the standard R logo on the front of hoodies to sell? I have read your rules but 
wanted to ask as I feel a moral right to email you asking to show support and 
respect for the R project.

If it makes it easier I could build send a picture of the hoodie with the logo 
on to you to see if this is acceptable.

Arid Sweeting




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl.edu
%7C99f01774c9f5452bd99a08db2a31ec23%7C0d4da0f84a314d76ace60a62331e1b84
%7C0%7C0%7C638150166126972400%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sda
ta=p2ffNKEh6intBdGjjtr6jaaaRcdtiBw4iMI1CL6K9Xg%3D&reserved=0
PLEASE do read the posting guide
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r
-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%7C99
f01774c9f5452bd99a08db2a31ec23%7C0d4da0f84a314d76ace60a62331e1b84%7C0%
7C0%7C638150166126972400%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL
CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=bg
OZVdlLFSw3mbQGmF0OLrMOVUcYonH9wHMN3Y2TqDM%3D&reserved=0
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl.edu%7C99f01774c9f5452bd99a08db2a31ec23%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638150166126972400%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=p2ffNKEh6intBdGjjtr6jaaaRcdtiBw4iMI1CL6K9Xg%3D&reserved=0
PLEASE do read the posting guide 
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=

Re: [R] DOUBT

2023-03-21 Thread John Fox

Dear Nandiniraj,

Please cc r-help in your emails so that others can see what happened 
with your problem.

You don't provide enough information to know what exactly is the source 
of your problem  -- you're more likely to get effective help if you 
provide a minimal reproducible example of the problem -- but it's a good 
guess that the variable (HHsize or perhaps some other variable) isn't in 
the newdata data frame.

Best,
 John

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://www.john-fox.ca/

On 2023-03-21 1:24 p.m., Nandini raj wrote:

I removed space even though it is showing error. I.e Variable not found

Nandiniraj

On Tue, Mar 21, 2023, 10:36 PM John Fox <mailto:j...@mcmaster.ca>> wrote:

Dear Nandini raj,

You have a space in the variable name "HH size".

    I hope this helps,
   John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/
<https://socialsciences.mcmaster.ca/jfox/>

On 2023-03-20 1:16 p.m., Nandini raj wrote:
 > Respected sir/madam
 > can you please suggest what is an unexpected symbol in the below
code for
 > running a multinomial logistic regression
 >
 > model <- multinom(adoption ~ age + education + HH size +
landholding +
 > Farmincome + nonfarmincome + creditaccesibility + LHI, data=newdata)
 >
 >       [[alternative HTML version deleted]]
 >
 > __
 > R-help@r-project.org <mailto:R-help@r-project.org> mailing list
-- To UNSUBSCRIBE and more, see
 > https://stat.ethz.ch/mailman/listinfo/r-help
<https://stat.ethz.ch/mailman/listinfo/r-help>
 > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
<http://www.R-project.org/posting-guide.html>
 > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Good Will Legal Question

2023-03-21 Thread John Fox

Dear Arid Sweeting,

R-help is probably not the place to ask this question, although perhaps
since you're seeking moral advice, people might want to say something. I
would normally expect to see a query like this addressed to the R
website webmasters, of which I'm one -- with the caveat that the R
Foundation doesn't give legal advice.

Just to be sure, you say that you read the rules for use of the R logo,
so I assume that you've seen <https://www.r-project.org/logo/>, which
seems entirely clear to me. I think that it's safe to say that if the R
Foundation wanted to limit commercial use of the R logo, it wouldn't
have released it under the CC-BY-SA 4.0 license. I'm not sure what moral
issues concern you.

I hope this helps,
John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2023-03-21 6:18 a.m., Coding Hoodies wrote:

Hi R Team!,

We are opening a new start up soon, codinghoodies.com, we want to make coders feel stylish.

Out of goodwill I wanted to ask you formally if I can have permission to use the standard R logo on the front of hoodies to sell? I have read your rules but wanted to ask as I feel a moral right to email you asking to show support and respect for the R project.

If it makes it easier I could build send a picture of the hoodie with the logo on to you to see if this is acceptable.

Arid Sweeting

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] DOUBT

2023-03-21 Thread John Fox


Dear Nandini raj,

You have a space in the variable name "HH size".

I hope this helps,
 John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2023-03-20 1:16 p.m., Nandini raj wrote:

Respected sir/madam
can you please suggest what is an unexpected symbol in the below code for
running a multinomial logistic regression

model <- multinom(adoption ~ age + education + HH size + landholding +
Farmincome + nonfarmincome + creditaccesibility + LHI, data=newdata)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tcl tk: set the position button

2023-03-13 Thread John Fox


Dear Rodrigo,

Try tkwm.geometry(win1, "-0+0"), which should position win1 at the top 
right.


I hope this helps,
 John

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2023-03-12 8:41 p.m., Rodrigo Badilla wrote:

Hi all,
I am using tcltk2 library to show buttons and messages. Everything
work fine but I would like set the tk2button to the right of my screen, by 
default it display at the left of my screen.
my script example:
library(tcltk2) win1 <- tktoplevel() butOK <- tk2button(win1, text = "TEST", 
width = 77) tkgrid(butOK)
Thanks in advance
Saludos
Rodrigo


--
Este correo electrónico ha sido analizado en busca de virus por el software 
antivirus de Avast.
www.avast.com
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] MFA variables graph, filtered by separate.analyses

2023-02-21 Thread John Fox


Dear gavin,

I think that it's likely that Jim meant the hetcor() function in the 
polycor package.


Best,
 John

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2023-02-21 5:42 p.m., gavin duley wrote:

Hi Jim,

On Tue, 21 Feb 2023 at 22:17, Jim Lemon  wrote:

I can't work through this right now, but I would start by looking at
the 'hetcor' package to get the correlations, or if they are already
in the return object, build a plot from these.


Thanks for the suggestion. I'll read up on the 'hetcor' package.

Thanks,
gavin,



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] unexpected 'else' in " else"

2022-10-21 Thread John Fox

Dear Jinsong,

When you enter these code lines at the R command prompt, the interpreter 
evaluates an expression when it's syntactically complete, which occurs 
before it sees the else clause. The interpreter can't read your mind and 
know that an else clause will be entered on the next line. When the code 
lines are in a function, the function body is enclosed in braces and so 
the interpreter sees the else clause.

As I believe was already pointed out, you can similarly use braces at 
the command prompt to signal incompleteness of an expression, as in

> {if (FALSE) print(1)
+ else print(2)}
[1] 2

I hope this helps,
 John

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/
On 2022-10-21 8:06 a.m., Jinsong Zhao wrote:

Thanks a lot!

I know the first and third way to correct the error. The second way 
seems make me know why the code is correct in the function 
stats::weighted.residuals.

On 2022/10/21 17:36, Andrew Simmons wrote:
The error comes from the expression not being wrapped with braces. You 
could change it to

if (is.matrix(r)) {
    r[w != 0, , drop = FALSE]
} else r[w != 0]

or

{
    if (is.matrix(r))
        r[w != 0, , drop = FALSE]
    else r[w != 0]
}

or

if (is.matrix(r)) r[w != 0, , drop = FALSE] else r[w != 0]

On Fri., Oct. 21, 2022, 05:29 Jinsong Zhao,  wrote:

    Hi there,

    The following code would cause R error:

     > w <- 1:5
     > r <- 1:5
     >         if (is.matrix(r))
    +             r[w != 0, , drop = FALSE]
     >         else r[w != 0]
    Error: unexpected 'else' in "        else"

    However, the code:
             if (is.matrix(r))
                 r[w != 0, , drop = FALSE]
             else r[w != 0]
    is extracted from stats::weighted.residuals.

    My question is why the code in the function does not cause error?

    Best,
    Jinsong

    __
    R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide
    http://www.R-project.org/posting-guide.html
    <http://www.R-project.org/posting-guide.html>
    and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to obtain a consistent estimator with a binary response model with endogenous explanatory variables?

2022-09-28 Thread John Fox


Dear John (again),

I was surprised that you were unable to find an existing R function that 
estimates a probit model by IV and so I tried a Google search for 
"probit instrumental variables R", which turned up the ivprobit package 
as the first hit. That package is also mentioned in the Econometrics 
CRAN taskview <https://cran.r-project.org/web/views/Econometrics.html>.


The model fit by the ivprobit() function in the ivprobit package is a 
bit more general than the one in Wikipedia (at least by my quick reading 
of both) in that it permits more than one endogenous explanatory variable.


Best,
 John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2022-09-28 3:47 p.m., John Fox wrote:

Dear John,

The Wikipedia page to which you refer appears to have all the 
information you need to write your own straightfoward R program for the 
2SLS or ML estimator for a probit model.


I hope this helps,
  John

On 2022-09-28 8:50 a.m., Sun, John wrote:

Dear All,

I stumbled on a Wikipedia page describing the Two stage least-squares 
with a probit model with implementing a consistent estimator in binary 
variable regression.
How do I implement this method in R? It is related to instrumental 
variables estimator. I looked in ivreg and plm package and found 
nothing I think related.

https://en.wikipedia.org/wiki/Binary_response_model_with_continuous_endogenous_explanatory_variables

Best regards,
John

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to obtain a consistent estimator with a binary response model with endogenous explanatory variables?

2022-09-28 Thread John Fox


Dear John,

The Wikipedia page to which you refer appears to have all the 
information you need to write your own straightfoward R program for the 
2SLS or ML estimator for a probit model.


I hope this helps,
 John

On 2022-09-28 8:50 a.m., Sun, John wrote:

Dear All,

I stumbled on a Wikipedia page describing the Two stage least-squares with a 
probit model with implementing a consistent estimator in binary variable 
regression.
How do I implement this method in R? It is related to instrumental variables 
estimator. I looked in ivreg and plm package and found nothing I think related.
https://en.wikipedia.org/wiki/Binary_response_model_with_continuous_endogenous_explanatory_variables

Best regards,
John

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Correlate

2022-08-22 Thread John Fox

 this helps,
 John



Thank you,

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl
.edu%7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e
1b84%7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w
LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
&sdata=3iAfMs1QzQARKF3lqUI8s43PX4IIkgEuQ9PUDyUtpqY%3D&reserved
=0 PLEASE do read the posting guide
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r
-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%
7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e1b84%
7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&
sdata=v3IEonnPgg1xTKUzLK4rJc3cfMFxw5p%2FW6puha5CFz0%3D&reserved=0
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl
.edu%7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e
1b84%7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w
LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
&sdata=3iAfMs1QzQARKF3lqUI8s43PX4IIkgEuQ9PUDyUtpqY%3D&reserved
=0 PLEASE do read the posting guide
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r
-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%
7C871d5009dd3c455f398f08da84585e4a%7C0d4da0f84a314d76ace60a62331e1b84%
7C0%7C0%7C637967812337328788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&
sdata=v3IEonnPgg1xTKUzLK4rJc3cfMFxw5p%2FW6puha5CFz0%3D&reserved=0
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to add count to pie chart legend

2022-08-16 Thread John Fox


Dear Jim and Ana,

Why not skip the legend and put the counts in the labels?

with(df, pie(n, paste0(V1, " (", n, ")"),
col=c(3, 2), main="Yes and No", radius=1))

Best,
 John

On 2022-08-15 9:43 p.m., Jim Lemon wrote:

Hi Ana,
A lot of work for a little pie.

df<-read.table(text="V1 n
  Yes 8
  No 14",
  header=TRUE,
  stringsAsFactors=FALSE)
par(mar=c(5,4,4,4))
pie(df$n,df$V1,col=c(3,2),main="Yes and No",
  xlab="",ylab="",radius=1)
legend(0.75,-0.8,paste(df$V1,df$n),fill=c(3,2),
  xpd=TRUE)

Jim

On Tue, Aug 16, 2022 at 1:59 AM Ana Marija  wrote:


Hi All,

I have df like this:


df# A tibble: 2 × 4

   V1n  perc labels
   1 Yes   8 0.364 36%   2 No   14 0.636 64%

I am making pie chart like this:

library(ggplot2)

ggplot(df, aes(x = "", y = perc, fill = V1)) +
   geom_col(color = "black") +
   geom_label(aes(label = labels),
  position = position_stack(vjust = 0.5),
  show.legend = FALSE) +
   guides(fill = guide_legend(title = "Answer")) +
   coord_polar(theta = "y") +
   theme_void()

How would I add in the legend beside Answer "Yes" count 8 (just number
8) and beside "No" count 14?

Thanks

Ana

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Odd behavior of a function within apply

2022-08-08 Thread John Fox


Dear Erin,

The problem is that the data frame gets coerced to a character matrix, 
and the only column with "" entries is the 9th (the second one you 
supplied):


as.matrix(test1.df)
   X1_1_HZP1 X1_1_HBM1_mon X1_1_HBM1_yr
1  "48160"   "December""2014"
2  "48198"   "June""2018"
3  "80027"   "August"  "2016"
4  "48161"   ""NA
5  NA""NA
6  "48911"   "August"  "1985"
7  NA"April"   "2019"
8  "48197"   "February""1993"
9  "48021"   ""NA
10 "11355"   "December""1990"

(Here, test1.df only contains the three columns you provided.)

A solution is to use sapply:

> sapply(test1.df, count1a)
X1_1_HZP1 X1_1_HBM1_mon  X1_1_HBM1_yr
2 3 3


I hope this helps,
 John


On 2022-08-08 1:22 p.m., Erin Hodgess wrote:

Hello!

I have the following data.frame
  dput(test1.df[1:10,8:10])
structure(list(X1_1_HZP1 = c(48160L, 48198L, 80027L, 48161L,
NA, 48911L, NA, 48197L, 48021L, 11355L), X1_1_HBM1_mon = c("December",
"June", "August", "", "", "August", "April", "February", "",
"December"), X1_1_HBM1_yr = c(2014L, 2018L, 2016L, NA, NA, 1985L,
2019L, 1993L, NA, 1990L)), row.names = c(NA, 10L), class = "data.frame")

And the following function:

dput(count1a)

function (x)
{
 if (typeof(x) == "integer")
 y <- sum(is.na(x))
 if (typeof(x) == "character")
 y <- sum(x == "")
 return(y)
}
When I use the apply function with count1a, I get the following:
  apply(test1.df[1:10,8:10],2,count1a)
 X1_1_HZP1 X1_1_HBM1_mon  X1_1_HBM1_yr
NA 3NA
However, when I do use columns 8 and 10, I get the correct response:
  apply(test1.df[1:10,c(8,10)],2,count1a)
X1_1_HZP1 X1_1_HBM1_yr
23



I am really baffled.  If I use count1a on a single column, it works fine.

Any suggestions much appreciated.
Thanks,
Sincerely,
Erin


Erin Hodgess, PhD
mailto: erinm.hodg...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Predicted values from glm() when linear predictor is NA.

2022-07-28 Thread John Fox


Dear Jeff,

On 2022-07-28 11:12 a.m., Jeff Newmiller wrote:

No, in this case I think I needed the "obvious" breakdown. Still digesting, 
though... I would prefer that if an arbitrary selection had been made that it be explicit 
.. the NA should be replaced with zero if the singular.ok argument is TRUE, rather than 
making that interpretation in predict.glm.


That's one way to think about, but another is that the model matrix X 
has 10 columns but is of rank 9. Thus 9 basis vectors are needed to span 
the column space of X, and a simple way to provide a basis is to 
eliminate a redundant column, hence the NA. The fitted values y-hat in a 
linear model are the orthogonal projection of y onto the space spanned 
by the columns of X, and are thus independent of the basis chosen. A GLM 
is a little more complicated, but it's still the column space of X 
that's important.


Best,
 John



On July 28, 2022 5:45:35 AM PDT, John Fox  wrote:

Dear Jeff,

On 2022-07-28 1:31 a.m., Jeff Newmiller wrote:

But "disappearing" is not what NA is supposed to do normally. Why is it being 
treated that way here?


NA has a different meaning here than in data.

By default, in glm() the argument singular.ok is TRUE, and so estimates are 
provided even when there are singularities, and even though the singularities 
are resolved arbitrarily.

In this model, the columns of the model matrix labelled LifestageL1 and 
TrtTime:LifestageL1 are perfectly collinear -- the second is 12 times the first 
(both have 0s in the same rows and either 1 or 12 in three of the rows) -- and 
thus both can't be estimated simultaneously, but the model can be estimated by 
eliminating one or the other (effectively setting its coefficient to 0), or by 
taking any linear combination of the two regressors (i.e., using any regressor 
with 0s and some other value). The fitted values under the model are invariant 
with respect to this arbitrary choice.

My apologies if I'm stating the obvious and misunderstand your objection.

Best,
John



On July 27, 2022 7:04:20 PM PDT, John Fox  wrote:

Dear Rolf,

The coefficient of TrtTime:LifestageL1 isn't estimable (as you explain) and by 
setting it to NA, glm() effectively removes it from the model. An equivalent 
model is therefore


fit2 <- glm(cbind(Dead,Alive) ~ TrtTime + Lifestage +

+   I((Lifestage == "Egg + L1")*TrtTime) +
+   I((Lifestage == "L1 + L2")*TrtTime) +
+   I((Lifestage == "L3")*TrtTime),
+ family=binomial, data=demoDat)
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred


cbind(coef(fit, complete=FALSE), coef(fit2))

   [,1] [,2]
(Intercept)-0.91718302  -0.91718302
TrtTime 0.88846195   0.88846195
LifestageEgg + L1 -45.36420974 -45.36420974
LifestageL114.27570572  14.27570572
LifestageL1 + L2   -0.30332697  -0.30332697
LifestageL3-3.58672631  -3.58672631
TrtTime:LifestageEgg + L1   8.10482459   8.10482459
TrtTime:LifestageL1 + L20.05662651   0.05662651
TrtTime:LifestageL3 1.66743472   1.66743472

There is no problem computing fitted values for the model, specified either way. That the 
fitted values when Lifestage == "L1" all round to 1 on the probability scale is 
coincidental -- that is, a consequence of the data.

I hope this helps,
John

On 2022-07-27 8:26 p.m., Rolf Turner wrote:


I have a data frame with a numeric ("TrtTime") and a categorical
("Lifestage") predictor.

Level "L1" of Lifestage occurs only with a single value of TrtTime,
explicitly 12, whence it is not possible to estimate a TrtTime "slope"
when Lifestage is "L1".

Indeed, when I fitted the model

   fit <- glm(cbind(Dead,Alive) ~ TrtTime*Lifestage, family=binomial,
  data=demoDat)

I got:


as.matrix(coef(fit))
 [,1]
(Intercept)-0.91718302
TrtTime 0.88846195
LifestageEgg + L1 -45.36420974
LifestageL114.27570572
LifestageL1 + L2   -0.30332697
LifestageL3-3.58672631
TrtTime:LifestageEgg + L1   8.10482459
TrtTime:LifestageL1 NA
TrtTime:LifestageL1 + L20.05662651
TrtTime:LifestageL3 1.66743472


That is, TrtTime:LifestageL1 is NA, as expected.

I would have thought that fitted or predicted values corresponding to
Lifestage = "L1" would thereby be NA, but this is not the case:


predict(fit)[demoDat$Lifestage=="L1"]
 26   65  131
24.02007 24.02007 24.02007

fitted(fit)[demoDat$Lifestage=="L1"]
26  65 131
 1   1   1


That is, the predicted values on the scale of the linear predictor are
large and positive, rather than being NA.

What this amounts to, it see

Re: [R] Predicted values from glm() when linear predictor is NA.

2022-07-28 Thread John Fox


Dear Jeff,

On 2022-07-28 1:31 a.m., Jeff Newmiller wrote:

But "disappearing" is not what NA is supposed to do normally. Why is it being 
treated that way here?


NA has a different meaning here than in data.

By default, in glm() the argument singular.ok is TRUE, and so estimates 
are provided even when there are singularities, and even though the 
singularities are resolved arbitrarily.


In this model, the columns of the model matrix labelled LifestageL1 and 
TrtTime:LifestageL1 are perfectly collinear -- the second is 12 times 
the first (both have 0s in the same rows and either 1 or 12 in three of 
the rows) -- and thus both can't be estimated simultaneously, but the 
model can be estimated by eliminating one or the other (effectively 
setting its coefficient to 0), or by taking any linear combination of 
the two regressors (i.e., using any regressor with 0s and some other 
value). The fitted values under the model are invariant with respect to 
this arbitrary choice.


My apologies if I'm stating the obvious and misunderstand your objection.

Best,
 John



On July 27, 2022 7:04:20 PM PDT, John Fox  wrote:

Dear Rolf,

The coefficient of TrtTime:LifestageL1 isn't estimable (as you explain) and by 
setting it to NA, glm() effectively removes it from the model. An equivalent 
model is therefore


fit2 <- glm(cbind(Dead,Alive) ~ TrtTime + Lifestage +

+   I((Lifestage == "Egg + L1")*TrtTime) +
+   I((Lifestage == "L1 + L2")*TrtTime) +
+   I((Lifestage == "L3")*TrtTime),
+ family=binomial, data=demoDat)
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred


cbind(coef(fit, complete=FALSE), coef(fit2))

  [,1] [,2]
(Intercept)-0.91718302  -0.91718302
TrtTime 0.88846195   0.88846195
LifestageEgg + L1 -45.36420974 -45.36420974
LifestageL114.27570572  14.27570572
LifestageL1 + L2   -0.30332697  -0.30332697
LifestageL3-3.58672631  -3.58672631
TrtTime:LifestageEgg + L1   8.10482459   8.10482459
TrtTime:LifestageL1 + L20.05662651   0.05662651
TrtTime:LifestageL3 1.66743472   1.66743472

There is no problem computing fitted values for the model, specified either way. That the 
fitted values when Lifestage == "L1" all round to 1 on the probability scale is 
coincidental -- that is, a consequence of the data.

I hope this helps,
John

On 2022-07-27 8:26 p.m., Rolf Turner wrote:


I have a data frame with a numeric ("TrtTime") and a categorical
("Lifestage") predictor.

Level "L1" of Lifestage occurs only with a single value of TrtTime,
explicitly 12, whence it is not possible to estimate a TrtTime "slope"
when Lifestage is "L1".

Indeed, when I fitted the model

  fit <- glm(cbind(Dead,Alive) ~ TrtTime*Lifestage, family=binomial,
 data=demoDat)

I got:


as.matrix(coef(fit))
[,1]
(Intercept)-0.91718302
TrtTime 0.88846195
LifestageEgg + L1 -45.36420974
LifestageL114.27570572
LifestageL1 + L2   -0.30332697
LifestageL3-3.58672631
TrtTime:LifestageEgg + L1   8.10482459
TrtTime:LifestageL1 NA
TrtTime:LifestageL1 + L20.05662651
TrtTime:LifestageL3 1.66743472


That is, TrtTime:LifestageL1 is NA, as expected.

I would have thought that fitted or predicted values corresponding to
Lifestage = "L1" would thereby be NA, but this is not the case:


predict(fit)[demoDat$Lifestage=="L1"]
26   65  131
24.02007 24.02007 24.02007

fitted(fit)[demoDat$Lifestage=="L1"]
   26  65 131
1   1   1


That is, the predicted values on the scale of the linear predictor are
large and positive, rather than being NA.

What this amounts to, it seems to me, is saying that if the linear
predictor in a Binomial glm is NA, then "success" is a certainty.
This strikes me as being a dubious proposition.  My gut feeling is that
misleading results could be produced.

Can anyone explain to me a rationale for this behaviour pattern?
Is there some justification for it that I am not currently seeing?
Any other comments?  (Please omit comments to the effect of "You are as
thick as two short planks!". :-) )

I have attached the example data set in a file "demoDat.txt", should
anyone want to experiment with it.  The file was created using dput() so
you should access it (if you wish to do so) via something like

  demoDat <- dget("demoDat.txt")

Thanks for any enlightenment.

cheers,

Rolf Turner


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the pos

Re: [R] Predicted values from glm() when linear predictor is NA.

2022-07-27 Thread John Fox

Dear Rolf,

The coefficient of TrtTime:LifestageL1 isn't estimable (as you explain) 
and by setting it to NA, glm() effectively removes it from the model. An 
equivalent model is therefore

> fit2 <- glm(cbind(Dead,Alive) ~ TrtTime + Lifestage +
+   I((Lifestage == "Egg + L1")*TrtTime) +
+   I((Lifestage == "L1 + L2")*TrtTime) +
+   I((Lifestage == "L3")*TrtTime),
+ family=binomial, data=demoDat)
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred

> cbind(coef(fit, complete=FALSE), coef(fit2))
  [,1] [,2]
(Intercept)-0.91718302  -0.91718302
TrtTime 0.88846195   0.88846195
LifestageEgg + L1 -45.36420974 -45.36420974
LifestageL114.27570572  14.27570572
LifestageL1 + L2   -0.30332697  -0.30332697
LifestageL3-3.58672631  -3.58672631
TrtTime:LifestageEgg + L1   8.10482459   8.10482459
TrtTime:LifestageL1 + L20.05662651   0.05662651
TrtTime:LifestageL3 1.66743472   1.66743472

There is no problem computing fitted values for the model, specified 
either way. That the fitted values when Lifestage == "L1" all round to 1 
on the probability scale is coincidental -- that is, a consequence of 
the data.

I hope this helps,
 John

On 2022-07-27 8:26 p.m., Rolf Turner wrote:

I have a data frame with a numeric ("TrtTime") and a categorical
("Lifestage") predictor.

Level "L1" of Lifestage occurs only with a single value of TrtTime,
explicitly 12, whence it is not possible to estimate a TrtTime "slope"
when Lifestage is "L1".

Indeed, when I fitted the model

 fit <- glm(cbind(Dead,Alive) ~ TrtTime*Lifestage, family=binomial,
data=demoDat)

I got:

as.matrix(coef(fit))
   [,1]
(Intercept)-0.91718302
TrtTime 0.88846195
LifestageEgg + L1 -45.36420974
LifestageL114.27570572
LifestageL1 + L2   -0.30332697
LifestageL3-3.58672631
TrtTime:LifestageEgg + L1   8.10482459
TrtTime:LifestageL1 NA
TrtTime:LifestageL1 + L20.05662651
TrtTime:LifestageL3 1.66743472

That is, TrtTime:LifestageL1 is NA, as expected.

I would have thought that fitted or predicted values corresponding to
Lifestage = "L1" would thereby be NA, but this is not the case:

predict(fit)[demoDat$Lifestage=="L1"]
   26   65  131
24.02007 24.02007 24.02007

fitted(fit)[demoDat$Lifestage=="L1"]
  26  65 131
   1   1   1

That is, the predicted values on the scale of the linear predictor are
large and positive, rather than being NA.

What this amounts to, it seems to me, is saying that if the linear
predictor in a Binomial glm is NA, then "success" is a certainty.
This strikes me as being a dubious proposition.  My gut feeling is that
misleading results could be produced.

Can anyone explain to me a rationale for this behaviour pattern?
Is there some justification for it that I am not currently seeing?
Any other comments?  (Please omit comments to the effect of "You are as
thick as two short planks!". :-) )

I have attached the example data set in a file "demoDat.txt", should
anyone want to experiment with it.  The file was created using dput() so
you should access it (if you wish to do so) via something like

 demoDat <- dget("demoDat.txt")

Thanks for any enlightenment.

cheers,

Rolf Turner

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep

2022-07-10 Thread John Fox


Dear Steven,

Beyond ?regex, the Wikipedia article on regular expressions 
<https://en.wikipedia.org/wiki/Regular_expression> is quite helpful and 
not too long.


I hope this helps,
 John

On 2022-07-10 9:43 p.m., Steven T. Yen wrote:
Thanks Jeff. It works. If there is a good reference I should read 
(besides ? grep) I's be glad to have it.


On 7/11/2022 9:30 AM, Jeff Newmiller wrote:

grep( "^(z|x)\\.", jj, value = TRUE )

or

grep( r"(^(z|x)\.)", jj, value = TRUE )


On July 10, 2022 6:08:45 PM PDT, "Steven T. Yen"  
wrote:
Dear, Below, jj contains character strings starting with “z.” and 
“x.”. I want to grep all that contain either “z.” or “x.”. I had to 
grep “z.” and “x.” separately and then tack the result together. Is 
there a convenient grep option that would grep strings with either 
“z.” or “x.”. Thank you!



jj<-names(v$est); jj
  [1] "z.one" "z.liberal" "z.conserv" "z.dem" "z.rep" 
"z.realinc"
  [7] "x.one" "x.liberal" "x.conserv" "x.dem" "x.rep" 
"x.realinc"

[13] "mu1_1" "mu2_1" "rho"

j1<-grep("z.",jj,value=TRUE); j1

[1] "z.one" "z.liberal" "z.conserv" "z.dem" "z.rep" "z.realinc"

j2<-grep("x.",jj,value=TRUE); j2

[1] "x.one" "x.liberal" "x.conserv" "x.dem" "x.rep" "x.realinc"

j<-c(j1,j2); j
  [1] "z.one" "z.liberal" "z.conserv" "z.dem" "z.rep" 
"z.realinc"
  [7] "x.one" "x.liberal" "x.conserv" "x.dem" "x.rep" 
"x.realinc"


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R for linear algebra

2022-06-25 Thread John Fox


Dear Avi,

On 2022-06-25 2:09 p.m., Avi Gross via R-help wrote:

John,

I am not in any way disparaging the matlib package and it seems well-built for 
the limited purpose of teaching Linear Algebra rather than R. It is probably a 
better answer to a question about how to teach linear algebra while making some 
more complex tasks doable. I recall the frustration of multiplying matrices by 
hand as well as other operations, necessarily on smaller matrices.

My comments were more along the lines of the charter of this group which seems 
far narrower. Yes, people can ask for suggestions for a package that does 
something that may interest them but getting help on any one of thousands of 
such packages here would get overwhelming.
 From what you said though, others looking for a package to use for real 
projects might well beware as it may indeed not be particularly fast or in some 
cases perhaps not as flexible.


As would be clear from the documentation -- e.g., from the Details 
section of ?Inverse: "The method is purely didactic: The identity 
matrix, I, is appended to X, giving [X | I]. Applying Gaussian 
elimination gives [I | X^{-1}], and the portion corresponding to X^{-1} 
is returned."


Best,
 John




-Original Message-----
From: John Fox 
To: Avi Gross 
Cc: r-help@r-project.org 
Sent: Sat, Jun 25, 2022 1:34 pm
Subject: Re: [R] R for linear algebra

Dear Avi,

The purpose of the matlib package is to *teach* linear algebra and
related topics, not to replace or even compete with similar
functionality in base R.

Consider, e.g., the following example for matlib::Inverse(), which
computes matrix inverses by Gaussian elimination (I've elided most of
the steps):

  > example("Inverse")

Invers>  A <- matrix(c(2, 1, -1,
Invers+                -3, -1, 2,
Invers+                -2,  1, 2), 3, 3, byrow=TRUE)

Invers>  Inverse(A)
       [,1] [,2] [,3]
[1,]    4    3  -1
[2,]  -2  -2    1
[3,]    5    4  -1

Invers>  Inverse(A, verbose=TRUE, fractions=TRUE)

Initial matrix:
       [,1] [,2] [,3] [,4] [,5] [,6]
[1,]  2    1  -1    1    0    0
[2,] -3  -1    2    0    1    0
[3,] -2    1    2    0    0    1

row: 1

   exchange rows 1 and 2
       [,1] [,2] [,3] [,4] [,5] [,6]
[1,] -3  -1    2    0    1    0
[2,]  2    1  -1    1    0    0
[3,] -2    1    2    0    0    1

   multiply row 1 by -1/3
       [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1  1/3 -2/3    0 -1/3    0
[2,]    2    1  -1    1    0    0
[3,]  -2    1    2    0    0    1

   multiply row 1 by 2 and subtract from row 2
       [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1  1/3 -2/3    0 -1/3    0
[2,]    0  1/3  1/3    1  2/3    0
[3,]  -2    1    2    0    0    1


. . .


   multiply row 3 by 2/5 and subtract from row 2
       [,1] [,2] [,3] [,4] [,5] [,6]
[1,]  1    0    0    4    3  -1
[2,]  0    1    0  -2  -2    1
[3,]  0    0    1    5    4  -1
       [,1] [,2] [,3]
[1,]    4    3  -1
[2,]  -2  -2    1
[3,]    5    4  -1

And similarly for the other functions in the package. Moreover, the
functions in the package are transparently programmed in R rather than
calling (usually more efficient but relatively inaccessible) compiled
code, e.g., in a BLAS.

Best,
   John

On 2022-06-24 9:57 p.m., Avi Gross via R-help wrote:

Yes, Michael, packages like matlib will extend the basic support within base R 
and I was amused at looking at what the package supported that I had not 
thought about in years!
https://www.rdocumentation.org/packages/matlib/versions/0.9.5
However, once you through in packages/modules/libraries and other add-ons, I 
suggest many languages share such gifts that then allow a serious amount of 
what we call linear Algebra to be done, albeit with some work.
In some ways my review of Linear Algebra in recent years showed me that some 
things have changed from when I took it as part of a math degree in college but 
a big change has been in finding so many uses of it now that computers can do 
complicated things fast and even faster when using vectors and matrices as a 
way to organize and consolidate operations, sometimes even in parallel.
Packages can be nice and especially if they gather together lots of related 
functions to teach a subject but anyone doing serious work should first make 
sure they know what is in the base.
If your matrix is A, you could load a package like psych to get a trace of A:
psych::tr(A)
Or install  matlib with oodles of dependencies

matlib::tr(A)
Or without worrying if the user had installed and made available a library, use 
built-ins like diag() and sum():
sum(diag(A))
And what does  matlib::Det(A) gain you most of the time that det(A) does not 
tell you?
A policy of this forum is to point out mostly what is part of standard R and 
obviously there can  be specialized functionality in packages, albeit some 
functions  in that package are unlikely to be taught in a first undergraduate 
Linear Algebra course. I note matlib is billed as also being for learning 
M

Re: [R] Obtaining the source code

2022-06-19 Thread John Fox


Dear Cristofer,

> stats:::rstandard.lm
function (model, infl = lm.influence(model, do.coef = FALSE),
sd = sqrt(deviance(model)/df.residual(model)), type = c("sd.1",
"predictive"), ...)
{
type <- match.arg(type)
res <- infl$wt.res/switch(type, sd.1 = c(outer(sqrt(1 - infl$hat),
sd)), predictive = 1 - infl$hat)
res[is.infinite(res)] <- NaN
res
}



More generally, use ::: for an object that's hidden in a package namespace.

I hope this helps,
 John

On 2022-06-19 1:23 p.m., Christofer Bogaso wrote:

Hi,

I am trying to see the source code of rstandard function. I tried below,


methods('rstandard')


[1] rstandard.glm* rstandard.lm*

What do I need to do if I want to see the source code of rstandard.lm*?

Thanks for your help.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rbind of multiple data frames by column name, when each data frames can contain different columns

2022-06-03 Thread John Fox

e in error, please forward it to the sender and delete it
completely from your computer system.

--
Questo messaggio stato analizzato da Libraesva ESG ed risultato non infetto.
This message was scanned by Libraesva ESG and is believed to be clean.

[[alternative HTML version deleted]]

__
R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help<https://urlsand.esvalabs.com/?u=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&e=5a635173&h=06ff70f3&f=y&p=y>
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html<https://urlsand.esvalabs.com/?u=http%3A%2F%2Fwww.R-project.org%2Fposting-guide.html&e=5a635173&h=e12f63e8&f=y&p=y>
and provide commented, minimal, self-contained, reproducible code.

--
Questo messaggio � stato analizzato con Libraesva ESG ed � risultato non infetto

AVVISO IMPORTANTE: Questo messaggio di posta elettronica pu� contenere
informazioni confidenziali, pertanto � destinato solo a persone autorizzate
alla ricezione. I messaggi di posta elettronica per i client di Regione Marche
possono contenere informazioni confidenziali e con privilegi legali. Se non si
� il destinatario specificato, non leggere, copiare, inoltrare o archiviare
questo messaggio. Se si � ricevuto questo messaggio per errore, inoltrarlo al
mittente ed eliminarlo completamente dal sistema del proprio computer. Ai sensi
dell�art. 6 della DGR n. 1394/2008 si segnala che, in caso di necessit� ed
urgenza, la risposta al presente messaggio di posta elettronica pu� essere
visionata da persone estranee al destinatario.
IMPORTANT NOTICE: This e-mail message is intended to be received only by
persons entitled to receive the confidential information it may contain. E-mail
messages to clients of Regione Marche may contain information that is
confidential and legally privileged. Please do not read, copy, forward, or
store this message unless you are an intended recipient of it. If you have
received this message in error, please forward it to the sender and delete it
completely from your computer system.

--
Questo messaggio � stato analizzato con Libraesva ESG ed � risultato non
infetto.
This message has been checked by Libraesva ESG and is believed to be clean.

--
Questo messaggio � stato analizzato con Libraesva ESG ed � risultato non infetto

--
Questo messaggio stato analizzato da Libraesva ESG ed risultato non infetto.
This message was scanned by Libraesva ESG and is believed to be clean.

[[alternative HTML version deleted]]

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

Re: [R] Placement of legend in base plot()

2022-05-31 Thread John Fox


Dear Helmut,

I'm not sure why you're seeing an error, but replacing the last several 
commands in your code with


	legend(loc, legend = lgd, x.intersp = 0, title = paste("n =", 
n.pts[1]), bg = bg)


works perfectly fine for me. I suspect that the example you posted 
differs in some respect from the code that produces the error.


I hope this helps,
 John

On 2022-05-31 9:07 a.m., Helmut Schütz wrote:

Dear all,

I try to figure out where to automatically place the legend in a scatter 
plot.
If there is large variability, points may cover the legend. Hence, I 
assess in which section the fewest points are.

Example:

set.seed(27) # for reproducibility
n  <- 25
slope  <- +1
sd <- 10
x  <- 1:n
mean.x <- mean(x)
y  <- slope * x + rnorm(n = n, mean = mean.x, sd = sd)
mean.y <- mean(y)
top    <- which(y >= mean.y)
bottom <- which(y < mean.y)
left   <- which(x <= mean.x)
right  <- which(x > mean.x)
n.pts  <- data.frame("topleft" = sum(top %in% left),
  "topright"    = sum(top %in% right),
  "bottomleft"  = sum(bottom %in% left),
  "bottomright" = sum(bottom %in% right))
loc    <- names(n.pts)[n.pts == min(n.pts)]
if (length(loc) > 1) loc <- loc[1] # arbitrary selection (better 
approaches?)

bg <- "transparent"
lgd    <- paste("Pretty long legend line number #", 1:3)
plot(x, y, type ="n", pch = 19, xlab = "", ylab = "", axes = FALSE, 
frame.plot = TRUE)

abline(h = mean.y, v = mean.x)
mtext(text = paste0("top left: n = ", n.pts[1], ", right: n = ", n.pts[2]),
   side = 3, line = 1)
mtext(text = paste0("bottom left: n = ", n.pts[3], ", right: n = ", 
n.pts[4]),

   side = 1, line = 1)
mtext(text = paste0("bottom: n = ", sum(n.pts[3:4]),
     ", top = ", sum(n.pts[1:2])), side = 2, line = 1)
points(x, y, pch = 19, col = "red", cex = 1.25)
print(n.pts); loc
if (loc == "topleft") legend("topleft", legend = lgd, x.intersp = 0,
  title = paste("n =", n.pts[1]), bg = bg)
if (loc == "topright")    legend("topright", legend = lgd, x.intersp = 0,
  title = paste("n =", n.pts[2]), bg = bg)
if (loc == "bottomleft")  legend("bottomleft", legend = lgd, x.intersp = 0,
  title = paste("n =", n.pts[3]), bg = bg)
if (loc == "bottomright") legend("bottomright", legend = lgd, x.intersp 
= 0,

  title = paste("n =", n.pts[4]), bg = bg)

Unfortunately, one of the keywords in legend() instead of x, y cannot be 
a variable.

Hence, legend(loc, ...) throws an error...
Error in match.arg(x, c("bottomright", "bottom", "bottomleft", "left",  :
'arg' must be of length 1
... and I had to resort to conditionally specify all 4.
Given.

Problems:
1. If there are the same number of points in sections, I select the 
first though another might lead to fewer overlapping points. Is there a 
better approach?
2. I know how to get the width/height of the legend box with (..., plot 
= FALSE) but couldn't figure out how to squeeze it between points where 
enough space might exist.


Best,
Helmut

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Complex Survey Data and EFA

2022-04-30 Thread John Fox


Dear Lybrya,

I don't have personal experience with it, but to do parallel analyses, 
you'd have to simulate data according to the sampling design. That 
shouldn't be too hard but would require custom programming and you may 
be able to adapt existing code, such as that in the psych package.


You already have, in base R and the packages that you reference, what's 
necessary for computing scree plots and tetrachoric correlations.


Here, using an example in ?svyfactanal, is one way to get scree plots 
based on the correlation matrix with either 1's or communalities on the 
main diagonal:


library(survey)
example("factanal")
fa <- factanal(~api99+api00+hsg+meals+ell+emer, data=apipop, factors=2)
RR <- R <- fa$correlation
(u <- fa$uniquenesses)
diag(RR) <- 1 - u

plot(eigen(R, only.values=TRUE)$values, type="b",
 ylab=expression(lambda[i]), main="Scree Plot (correlations)")

plot(eigen(RR, only.values=TRUE)$values, type="b",
 ylab=expression(lambda[i]),
 main="Scree Plot (correlations with communalities)")

Here is an example of computing a polychoric correlation based on an 
example in ?svytable:


example("svytable")
library(polycor)
polychoric(tbl)

The example is nonsense in that the levels of stype in the table are out 
of order -- I show it here just to demonstrate how to do the 
computation. As well, stype has three levels, but polychoric() computes 
tetrachoric correlations when both variable are binary. I know that you 
want a correlation matrix for several binary variables, but it would be 
simple to compute them in a double for loop.


I hope this helps,
 John

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2022-04-29 1:31 p.m., Lybrya Kebreab wrote:

Hello,

Thank you for the help already received in conducting EFAs with complex 
samples. I have successfully generated the EFA with svyfactanal. I have been 
unsuccessful in using the survey weighted data to generate the extra bells and 
whistles of EFA such as scree plots and parallel analyses. I noticed there is a 
way to create scree plots in the ggplots (or ggplots2) package, but am 
wondering if the svyfactanal  function (or another function) in the survey 
package can generate these plots and subsequent parallel analyses.

I also have binary variables.  I can generate the EFA with the binary variables 
using the hetcor function within the polycor package--but without the complex 
sampling design. Is there a way to conduct the EFA  with binary items that also 
allows me to apply the design weights?


Thank you kindly,

Lybrya Kebreab
Doctoral Candidate
Education-Mathematics Education Track
School of Teacher Education
College of Community Innovation and Education
University of Central Florida


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Package nlme

2022-04-07 Thread John Fox


Dear Rohan,

Bert Gunter has already made several general useful suggestions.

In addition, why did you make the variable on the left-hand side of the 
model a factor? Shouldn't it be a numeric variable?


I hope this helps,
 John

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontarion, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 4/7/2022 6:30 AM, Rohan Richard via R-help wrote:

Dear Help Desk,

I am trying to perform a non-linear regression (Sigmoid curves) using the R 
package nlme. My field trial is a randomised complete block design (RCBD) with 
3 blocks and I would like to assess the block effect in the model. Do you know 
how I can incorporate the block term in nlme function?

So far I tried the following code and it did not work and I an error error 
message:

# code
minitab$NDVI<-as.factor(minitab$NDVI)
modnlme1 <- nlme(NDVI ~ a + d / (1 + exp(-b * (DegreeDay - m)) ), data = 
minitab,
  random =a + d + b + m~ 1|Block,
  fixed = list(a ~ Lines1, d~Lines1,b ~ Lines1, m ~ Lines1),
  weights = varPower(),
  start=c(b=0.5,c=3,d=0.4, e=700), control = list(msMaxIter = 
200))

#Error message
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
   contrasts can be applied only to factors with 2 or more levels

Could you please kindly help me?

Thank you in advance,

Best wishes,

Rohan

Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows


Rothamsted Research is a company limited by guarantee, registered in England at 
Harpenden, Hertfordshire, AL5 2JQ under the registration number 2393175 and a 
not for profit charity number 802038.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] print only a few lines of an object

2022-03-23 Thread John Fox

Dear Jeff,

On 2022-03-23 3:36 p.m., Jeff Newmiller wrote:

After-thought...

Why not just use head() and tail() like normal R users do?

head() and tail() are reasonable choices if there are many rows, but not 
if there are many columns.

My first thought was your previous suggestion to redefine print() 
methods (although I agree with you that this isn't a good idea), but 
though I could get that to work for data frames, I couldn't for matrices.

Adapting my preceding examples using car::brief():

> X <- matrix(rnorm(2*200), nrow = 2)

> library("car")
Loading required package: carData

> print.data.frame <- function(x, ...){ # not recommended!
+   brief(x, ...)
+   invisible(x)
+ }

> as.data.frame(X)
2 x 200 data.frame (19995 rows and 195 columns omitted)
  V1 V2 V3 . . .V199   V200
 [n][n][n]   [n][n]
1 -1.1810658 -0.6090037  1.00579081.23860428  0.6265465
2 -1.6395909 -0.2828005 -0.64181501.12875894 -0.7594760
3  0.2751099  0.2268473  0.22677130.64305445  1.1951732
. . .
1  1.2744054  1.0170934 -1.0172511   -0.02997537  0.7645707
2 -0.4798590 -1.8248293 -1.4664622   -0.06359483  0.7671203

> print.matrix <- function(x, ...){ # not recommended (and doesn't work)!
+   brief(x, ...)
+   invisible(x)
+ }

> X
  [,1]  [,2]  [,3]  [,4] 
  [,5]
[1,] -1.181066e+00 -6.090037e-01  1.005791e+00  3.738742e+00 
-6.986169e-01
[2,] -1.639591e+00 -2.828005e-01 -6.418150e-01 -7.424275e-01 
-1.415092e-01
[3,]  2.751099e-01  2.268473e-01  2.267713e-01 -6.308073e-01 
7.042624e-01
[4,] -9.210181e-01 -4.617637e-01  1.523291e+00  4.003071e-01 
-2.792705e-01
[5,] -6.047414e-01  1.976075e-01  6.065795e-01 -8.074581e-01 
-4.089352e-01

. . . [many lines elided]

[,196][,197][,198][,199] 
[,200]
[1,] -1.453015e+00  1.347678e+00  1.189217e+00  1.238604e+00 
0.6265465033
[2,] -1.693822e+00  2.689917e-01 -1.703176e-01  1.128759e+00 
-0.7594760299
[3,]  1.260585e-01  6.589839e-01 -7.928987e-01  6.430545e-01 
1.1951731814
[4,] -1.890582e+00  7.614779e-01 -5.726204e-01  1.090881e+00 
0.9570510645
[5,] -8.667687e-01  5.365750e-01 -2.079445e+00  1.209543e+00 
-0.2697400234

 [ reached getOption("max.print") -- omitted 19995 rows ]

So, something more complicated that I don't understand is going on with 
matrices.

Best,
 John

On March 23, 2022 12:31:46 PM PDT, Jeff Newmiller  
wrote:

Sure. Re-define the print method for those objects. Can't say I recommend this, 
but it can be done.

On March 23, 2022 11:44:01 AM PDT, Naresh Gurbuxani 
 wrote:

In an R session, when I type the name of an object, R prints the entire object 
(for example, a 2 x 5 data.frame).  Is it possible to change the default 
behavior so that only the first five and last five rows are printed?

Similarly, if the object is a 2 x 200 matrix, the default behavior will be 
to print first five and last five columns, combined with first five and last 
five rows.

Thanks,
Naresh
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cook's distance for least absolute deviation (lad) regressions

2022-03-21 Thread John Fox


Dear Kelly and Jim,

On 2022-03-20 9:40 p.m., Jim Lemon wrote:

Hi Kelly,
Perhaps the best place to look is the "car" package. There is a
somewhat confusing reference in the "cookd" function help page to the
"cooks.distance" function in the "base" package that doesn't seem to
be there.  Whether this is the case or not, I think you can still use
the "cookd" alias.


cookd() in the car package has been defunct for some time.

To address the original question: One can compute Cook's distances for 
*any* regression model by brute-force, omitting each case i in turn and 
computing the Wald F or chisquare test statistic for the "hypothesis" 
that the deleted estimate of the regression coefficients b_{-i} is equal 
to the estimate b for all of the data. In a linear model, D can be 
computed much more efficiently based on the hatvalues, etc., without 
having to refit the model n times, but that's not generally the case, 
unless the model can be linearized (as for a GLM fit by IWLS).


I'm insufficiently familiar with the computational details of LAD 
regression (or quantile regression more generally) to know whether a 
more efficient computation is possible there, but unless the data set is 
very large, in which case it's highly unlikely that influence of 
individual cases is an issue, the brute-force approach should be 
feasible and very easy to program.


I hope this helps,
 John

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/



Jim

On Mon, Mar 21, 2022 at 11:57 AM Kelly Thompson  wrote:


I'm wanting to calculate Cook's distance for least absolute deviation
(lad) regressions.

Which R packages and functions offer this?

Thanks!

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with data distribution

2022-02-17 Thread John Fox


Dear Nega gupta,

In the last point, I meant to say, "Finally, it's better to post to the 
list in plain-text email, rather than html (as the posting guide 
suggests)." (I accidentally inserted a "not" in this sentence.)


Sorry,
 John

On 2022-02-17 2:21 p.m., John Fox wrote:

Dear Nega gupta,

On 2022-02-17 1:54 p.m., Neha gupta wrote:

Hello everyone

I have a dataset with output variable "bug" having the following 
values (at
the bottom of this email). My advisor asked me to provide data 
distribution

of bugs with 0 values and bugs with more than 0 values.

data = readARFF("synapse.arff")
data2 = readARFF("synapse.arff")
data$bug
library(tidyverse)
data %>%
   filter(bug == 0)
data2 %>%
   filter(bug >= 1)
boxplot(data2$bug, data$bug, range=0)

But both the graphs are exactly the same, how is it possible? Where I am
doing wrong?


As it turns out, you're doing several things wrong.

First, you're not using pipes and filter() correctly. That is, you don't 
do anything with the filtered versions of the data sets. You're 
apparently under the incorrect impression that filtering modifies the 
original data set.


Second, you're greatly complicating a simple problem. You don't need to 
read the data twice and keep two versions of the data set. As well, 
processing the data with pipes and filter() is entirely unnecessary. The 
following code works:


    with(data, boxplot(bug[bug == 0], bug[bug >= 1], range=0))

Third, and most fundamentally, the parallel boxplots you're apparently 
trying to construct don't really make sense. The first "boxplot" is just 
a horizontal line at 0 and so conveys no information. Why not just plot 
the nonzero values if that's what you're interested in?


Fourth, you didn't share your data in a convenient form. I was able to 
reconstruct them via


   bug <- scan()
   0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0 0 0 0
   0 4 1 0
   0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1 1 0 0
   0 0 0 0
   1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0 0 0 0
   7 0 0 1
   0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0 0 0 0
   0 1 0 0
   0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4 1 1 0
   0 0 0 1
   0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0

   data <- data.frame(bug)

Finally, it's better not to post to the list in plain-text email, rather 
than html (as the posting guide suggests).


I hope this helps,
  John




data$bug
   [1] 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0 
0 0 0

0 4 1 0
  [40] 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1 
1 0 0

0 0 0 0
  [79] 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0 
0 0 0

7 0 0 1
[118] 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0 
0 0 0

0 1 0 0
[157] 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4 
1 1 0

0 0 0 1
[196] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with data distribution

2022-02-17 Thread John Fox

Dear Neha gupta,

I hope that I'm not overstepping my role when I say that googling 
solutions to specific problems isn't an inefficient way to learn a 
programming language, and will probably waste your time in the long run. 
There are many good introductions to R.

Best,
 John

On 2022-02-17 2:27 p.m., Neha gupta wrote:

Dear John, thanks a lot for the detailed answer.

Yes, I am not an expert in R language and when a problem comes in, I 
google it or post it on these forums. (I have just a little bit 
experience of ML in R).

On Thu, Feb 17, 2022 at 8:21 PM John Fox <mailto:j...@mcmaster.ca>> wrote:

Dear Nega gupta,

On 2022-02-17 1:54 p.m., Neha gupta wrote:
 > Hello everyone
 >
 > I have a dataset with output variable "bug" having the following
values (at
 > the bottom of this email). My advisor asked me to provide data
distribution
 > of bugs with 0 values and bugs with more than 0 values.
 >
 > data = readARFF("synapse.arff")
 > data2 = readARFF("synapse.arff")
 > data$bug
 > library(tidyverse)
 > data %>%
 >    filter(bug == 0)
 > data2 %>%
 >    filter(bug >= 1)
 > boxplot(data2$bug, data$bug, range=0)
 >
 > But both the graphs are exactly the same, how is it possible?
Where I am
 > doing wrong?

As it turns out, you're doing several things wrong.

First, you're not using pipes and filter() correctly. That is, you
don't
do anything with the filtered versions of the data sets. You're
apparently under the incorrect impression that filtering modifies the
original data set.

Second, you're greatly complicating a simple problem. You don't need to
read the data twice and keep two versions of the data set. As well,
processing the data with pipes and filter() is entirely unnecessary.
The
following code works:

     with(data, boxplot(bug[bug == 0], bug[bug >= 1], range=0))

Third, and most fundamentally, the parallel boxplots you're apparently
trying to construct don't really make sense. The first "boxplot" is
just
a horizontal line at 0 and so conveys no information. Why not just plot
the nonzero values if that's what you're interested in?

Fourth, you didn't share your data in a convenient form. I was able to
reconstruct them via

    bug <- scan()
    0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0
0 0 0
    0 4 1 0
    0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1
1 0 0
    0 0 0 0
    1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0
0 0 0
    7 0 0 1
    0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0
0 0 0
    0 1 0 0
    0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4
1 1 0
    0 0 0 1
    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0

    data <- data.frame(bug)

Finally, it's better not to post to the list in plain-text email,
rather
than html (as the posting guide suggests).

I hope this helps,
   John

 >
 >
 > data$bug
 >    [1] 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0
1 0 0 0 0 0
 > 0 4 1 0
 >   [40] 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0
0 1 1 1 0 0
 > 0 0 0 0
 >   [79] 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5
0 0 0 0 0 0
 > 7 0 0 1
 > [118] 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0
0 0 0 0 0
 > 0 1 0 0
 > [157] 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1
0 4 1 1 0
 > 0 0 0 1
 > [196] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0
 >
 >       [[alternative HTML version deleted]]
 >
 > __
 > R-help@r-project.org <mailto:R-help@r-project.org> mailing list
-- To UNSUBSCRIBE and more, see
 > https://stat.ethz.ch/mailman/listinfo/r-help
<https://stat.ethz.ch/mailman/listinfo/r-help>
 > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
<http://www.R-project.org/posting-guide.html>
 > and provide commented, minimal, self-contained, reproducible code.
-- 
John Fox, Professor Emeritus

McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/
<https://socialsciences.mcmaster.ca/jfox/>

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Constructing confidence interval ellipses with R

2022-01-25 Thread John Fox


Dear Paul,

This looks like a version of the question you asked a couple of weeks 
ago. As I explained then, I'm pretty sure that you want concentration 
(i.e., data) ellipses and not confidence ellipses, which pertain to 
parameters (e.g., regression coefficients). Also, the hand-drawn 
concentration contours in your example graph don't look elliptical, so 
I'm not sure that you really want ellipses, but I'll assume that you do.


Since as far as I can see you didn't share your data, here's a similar 
example using the scatterplot() function in the car package:


library("car")
scatterplot(prestige ~ income | type, data=Prestige, ellipse=TRUE, 
smooth=FALSE, regLine=FALSE)


By default, this draws 50% and 95% concentration ellipses assuming 
bivariate normality in each group, but that and other aspects of the 
graph can be customized -- see ?scatterplot.


I hope this helps,
 John

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2022-01-24 4:24 p.m., Paul Bernal wrote:

Dear friends,

I will be sharing a dataset which has the following columns:
1. Scenario
2. Day of Transit date
3. Canal Ampliado
4. Canal Original

Basically, I need to create a scatter plot diagram, with the Canal Ampliado
column values in the x-axis, and the Canal Original column values in the
y-axis, but also, I need to create confidence interval ellipses grouping
the points on the scatterplot, based on the different scenarios.

So I need to have in one graph, the scatterplot of Canal Ampliado vs Canal
Original and then, on the same graph, construct the confidence interval
ellipses.

I will attach an image depicting what I need to accomplish, as well as the
dataset, for your reference.

Any help and/or guidance will be greatly appreciated.

Cheers,
Paul


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to create density ellipses with R

2022-01-15 Thread John Fox

Dear Chris,

I took a quick look at your document. You might be interested in 
Friendly, Monette, and Fox, Elliptical Insights: Understanding 
Statistical Methods through Elliptical Geometry, Statistical Science
2013, 28: 1–39, which is available at 
<https://arxiv.org/pdf/1302.4881.pdf>. I probably should cite that paper 
in ?car::ellipse.

Best,
John

On 2022-01-15 12:00 p.m., Chris Evans wrote:

This spurred me on to clarify my own understanding of these ellipses (and of
ellipsoid hulls) and led to:

https://www.psyctc.org/Rblog/posts/2022-01-15-data-ellipses-and-confidence-ellipses/

And I am, decidedly nervously, putting it here in case it's useful to you Paul 
or
to anyone else.  I think I have the basic ideas correct but of course, if any 
proper
statisticians have corrections, I would love to receive them, off list probably
unless the errors are terrible.  I am entirely self-taught as a statistician, 
much
of what I've learned has come from probably over 10, perhaps nearer 20 years on 
this
list.  Thanks to all for all the work to maintain the list and contribute to it.

Chris

- Original Message -----

From: "John Fox" 
To: "Paul Bernal" 
Cc: "R" 
Sent: Friday, 14 January, 2022 18:53:55
Subject: Re: [R] How to create density ellipses with R

Dear Paul,

On 2022-01-14 1:17 p.m., Paul Bernal wrote:

Dear John and R community friends,

To be a little bit more specific, what I need to accomplish is the
creation of a confidence interval ellipse over a scatterplot at
different percentiles. The confidence interval ellipses should be drawn
over the scatterplot.

I'm not sure what you mean. Confidence ellipses are for regression
coefficients and so are on the scale of the coefficients; data
(concentration) ellipses are for and on the scale of the explanatory
variables. As it turns out, for a linear model, the former is the
rescaled 90 degree rotation of the latter.

Because the scatterplot of the (two) variables has the variables on the
axes, a data ellipse but not a confidence ellipse makes sense (i.e., is
in the proper units). Data ellipses are drawn by car::dataEllipse() and
(as explained by Martin Maechler) cluster::ellipsoidPoints(); confidence
ellipses are drawn by car::confidenceEllipse() and the various methods
of ellipse::ellipse().

I hope this helps,
  John

Any other guidance will be greatly appreciated.

Cheers,

Paul

El vie, 14 ene 2022 a las 11:27, John Fox (mailto:j...@mcmaster.ca>>) escribió:

 Dear Paul,

 As I understand it, the ellipse package is meant for drawing confidence
 ellipses, not density (i.e., data) ellipses. You should be able to use
 ellipse::ellipse() to draw a bivariate-normal density ellipse (assuming
 that's what you want), but you'll have to do some computation first.

 You might find the dataEllipse() function in the car package more
 convenient (again assuming that you want bivariate-normal density
 contours).

 I hope this helps,
    John

 --
 John Fox, Professor Emeritus
 McMaster University
 Hamilton, Ontario, Canada
 web: https://socialsciences.mcmaster.ca/jfox/
 <https://socialsciences.mcmaster.ca/jfox/>

 On 2022-01-14 10:12 a.m., Paul Bernal wrote:
  > Dear R friends,
  >
  > Happy new year to you all. Not quite sure if this is the proper
 place to
  > ask about this, so I apologize if it is not, and if it isn´t,
 maybe you can
  > point me to the right place.
  >
  > I would like to know if there is any R package that allows me to
 produce
  > density ellipses. Searching through the net, I came across a
 package called
  > ellipse, but I'm not sure if this is the one I should use.
  >
  > Any help and/or guidance will be greatly appreciated.
  >
  > Best regards,
  >
  > Paul
  >
  >       [[alternative HTML version deleted]]
  >
  > __
  > R-help@r-project.org <mailto:R-help@r-project.org> mailing list
 -- To UNSUBSCRIBE and more, see
  > https://stat.ethz.ch/mailman/listinfo/r-help
 <https://stat.ethz.ch/mailman/listinfo/r-help>
  > PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 <http://www.R-project.org/posting-guide.html>
  > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
htt

Re: [R] How to create density ellipses with R

2022-01-14 Thread John Fox

Dear Paul,

On 2022-01-14 1:17 p.m., Paul Bernal wrote:

Dear John and R community friends,

To be a little bit more specific, what I need to accomplish is the 
creation of a confidence interval ellipse over a scatterplot at 
different percentiles. The confidence interval ellipses should be drawn 
over the scatterplot.

I'm not sure what you mean. Confidence ellipses are for regression 
coefficients and so are on the scale of the coefficients; data 
(concentration) ellipses are for and on the scale of the explanatory 
variables. As it turns out, for a linear model, the former is the 
rescaled 90 degree rotation of the latter.

Because the scatterplot of the (two) variables has the variables on the 
axes, a data ellipse but not a confidence ellipse makes sense (i.e., is 
in the proper units). Data ellipses are drawn by car::dataEllipse() and 
(as explained by Martin Maechler) cluster::ellipsoidPoints(); confidence 
ellipses are drawn by car::confidenceEllipse() and the various methods 
of ellipse::ellipse().

I hope this helps,
 John

Any other guidance will be greatly appreciated.

Cheers,

Paul

El vie, 14 ene 2022 a las 11:27, John Fox (<mailto:j...@mcmaster.ca>>) escribió:

Dear Paul,

As I understand it, the ellipse package is meant for drawing confidence
ellipses, not density (i.e., data) ellipses. You should be able to use
ellipse::ellipse() to draw a bivariate-normal density ellipse (assuming
that's what you want), but you'll have to do some computation first.

You might find the dataEllipse() function in the car package more
convenient (again assuming that you want bivariate-normal density
contours).

I hope this helps,
   John

-- 
John Fox, Professor Emeritus

McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/
<https://socialsciences.mcmaster.ca/jfox/>

On 2022-01-14 10:12 a.m., Paul Bernal wrote:
 > Dear R friends,
 >
 > Happy new year to you all. Not quite sure if this is the proper
place to
 > ask about this, so I apologize if it is not, and if it isn´t,
maybe you can
 > point me to the right place.
 >
 > I would like to know if there is any R package that allows me to
produce
 > density ellipses. Searching through the net, I came across a
package called
 > ellipse, but I'm not sure if this is the one I should use.
 >
 > Any help and/or guidance will be greatly appreciated.
 >
 > Best regards,
 >
 > Paul
 >
 >       [[alternative HTML version deleted]]
 >
 > __
 > R-help@r-project.org <mailto:R-help@r-project.org> mailing list
-- To UNSUBSCRIBE and more, see
 > https://stat.ethz.ch/mailman/listinfo/r-help
<https://stat.ethz.ch/mailman/listinfo/r-help>
 > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
<http://www.R-project.org/posting-guide.html>
 > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to create density ellipses with R

2022-01-14 Thread John Fox


Dear Paul,

As I understand it, the ellipse package is meant for drawing confidence 
ellipses, not density (i.e., data) ellipses. You should be able to use 
ellipse::ellipse() to draw a bivariate-normal density ellipse (assuming 
that's what you want), but you'll have to do some computation first.


You might find the dataEllipse() function in the car package more 
convenient (again assuming that you want bivariate-normal density contours).


I hope this helps,
 John

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2022-01-14 10:12 a.m., Paul Bernal wrote:

Dear R friends,

Happy new year to you all. Not quite sure if this is the proper place to
ask about this, so I apologize if it is not, and if it isn´t, maybe you can
point me to the right place.

I would like to know if there is any R package that allows me to produce
density ellipses. Searching through the net, I came across a package called
ellipse, but I'm not sure if this is the one I should use.

Any help and/or guidance will be greatly appreciated.

Best regards,

Paul

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Time for a companion mailing list for R packages?

2022-01-13 Thread John Fox


Dear Avi et al.,

Rather than proliferating R mailing lists, why not just allow questions 
on non-standard packages on the r-help list?


(1) If people don't want to answer these questions, they don't have to.

(2) Users won't necessarily find the new email list and so may post to 
r-help anyway, only to be told that they should have posted to another list.


(3) Many of the questions currently posted to the list concern 
non-standard packages and most of them are answered.


(4) If people prefer other sources of help (as listed on the R website 
"getting help" page) then they are free to use them.


(5) As I read the posting guide, questions about non-standard packages 
aren't actually disallowed; the posting guide suggests, however, that 
the package maintainer be contacted first. But answers can be helpful to 
other users, and so it may be preferable for at least some of these 
questions to be asked on the list.


(6) Finally, the instruction concerning non-standard packages is buried 
near the end of the posting guide, and users, especially new users, may 
not understand what the term "standard packages" means even if they find 
their way to the posting guide.


Best,
 John

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2022-01-12 10:27 p.m., Avi Gross via R-help wrote:

Respectfully, this forum gets lots of questions that include non-base R 
components and especially packages in the tidyverse. Like it or not, the 
extended R language is far more useful and interesting for many people and 
especially those who do not wish to constantly reinvent the wheel.
And repeatedly, we get people reminding (and sometimes chiding) others for 
daring to post questions or supply answers on what they see as a pure R list. 
They have a point.
Yes, there are other places (many not being mailing lists like this one) where 
we can direct the questions but why can't there be an official mailing list 
alongside this one specifically focused on helping or just discussing R issues 
related partially to the use of packages. I don't mean for people making a 
package to share, just users who may be searching for an appropriate package or 
using a common package, especially the ones in the tidyverse that are NOT GOING 
AWAY just because some purists ...
I prefer a diverse set of ways to do things and base R is NOT enough for me, 
nor frankly is R with all packages included as I find other languages suit my 
needs at times for doing various things. If this group is for purists, fine. 
Can we have another for the rest of us? Live and let live.


-Original Message-
From: Duncan Murdoch 
To: Kai Yang ; R-help Mailing List 
Sent: Wed, Jan 12, 2022 3:22 pm
Subject: Re: [R] how to find the table in R studio

On 12/01/2022 3:07 p.m., Kai Yang via R-help wrote:

Hi all,
I created a function in R. It will be generate a table "temp". I can view it in 
R studio, but I cannot find it on the top right window in R studio. Can someone tell me 
how to find it in there? Same thing for f_table.
Thank you,
Kai
library(tidyverse)

f1 <- function(indata , subgrp1){
     subgrp1 <- enquo(subgrp1)
     indata0 <- indata
     temp    <- indata0 %>% select(!!subgrp1) %>% arrange(!!subgrp1) %>%
       group_by(!!subgrp1) %>%
       mutate(numbering =row_number(), max=max(numbering))
     view(temp)
     f_table <- table(temp$Species)
     view(f_table)
}

f1(iris, Species)



Someone is sure to point out that this isn't an RStudio support list,
but your issue is with R, not with RStudio.  You created the table in
f1, but you never returned it.  The variable f_table is local to the
function.  You'd need the following code to do what you want:

f1 <- function(indata , subgrp1){
   subgrp1 <- enquo(subgrp1)
   indata0 <- indata
   temp    <- indata0 %>% select(!!subgrp1) %>% arrange(!!subgrp1) %>%
     group_by(!!subgrp1) %>%
     mutate(numbering =row_number(), max=max(numbering))
   view(temp)
   f_table <- table(temp$Species)
   view(f_table)
   f_table
}

f_table <- f1(iris, Species)

It's not so easy to also make temp available.  You can do it with
assign(), but I think you'd be better off splitting f1 into two
functions, one to create temp, and one to create f_table.

Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEAS

Re: [R] [EXTERNAL] Re: bug in Windows implementation of nlme::groupedData

2022-01-07 Thread John Fox

Dear Melissa,

Normally, in evaluating a formula an R modeling function follows the 
scoping rules in ?formula; that is,

"A formula object has an associated environment, and this environment 
(rather than the parent environment) is used by model.frame to evaluate 
variables that are not found in the supplied data argument.

"Formulas created with the ~ operator use the environment in which they 
were created. Formulas created with as.formula will use the env argument 
for their environment."

So, for example, if the variables in the formula live in the environment 
of the calling function and if the data argument isn't used, then the 
variables should be found. Modifying your example a bit and calling 
lme() works, for example:

--- snip ---

> analyze_this <- function(df) {
+
+   mean.x <- mean(df$age)
+   mean.y <- mean(df$height)
+   sd.x <- sd(df$age)
+   sd.y <- sd(df$height)
+
+   x <- (df$age - mean.x) / sd.x
+   y <- (df$height - mean.y) / sd.y
+   X <- model.matrix(~ x * male * black, data = df)
+   dummyID <- rep(1:2, times=c(floor(nrow(X)/2), ceiling(nrow(X)/2)))
+
+   lme(y ~ X[, -1], random= ~ 1 | dummyID)
+
+   # groupedData(y ~ X[, -1] | dummyID)
+
+ }

> analyze_this(growthIndiana)
Linear mixed-effects model fit by REML
  Data: NULL
  Log-restricted-likelihood: -2546.266
  Fixed: y ~ X[, -1]
(Intercept)X[, -1]x X[, -1]maleX[, 
-1]black
-0.16086555  0.73401464  0.26303561 
-0.04761425
  X[, -1]x:male  X[, -1]x:black   X[, -1]male:black X[, 
-1]x:male:black
 0.27517924 -0.10318100  0.21899350 
0.03048160

Random effects:
 Formula: ~1 | dummyID
 (Intercept)  Residual
StdDev: 3.688461e-05 0.4462984

Number of Observations: 4123
Number of Groups: 2

--- snip ---

Note that model.matrix() finds x and y in the environment of 
analyze_this(), and male and black in df.

But if you unquote the line groupedData(y ~ X[, -1] | dummyID), the 
function fails:

--- snip ---

> analyze_this(growthIndiana)
 Error in data.frame(y = y, X = X, dummyID = dummyID) :
  object 'X' not found

--- snip ---

This suggests that groupedData() is doing something unusual (which I 
don't have the inclination to figure out).

I'm not sure why one needs to manipulate the model matrix directly like 
this, but I assume that there is some coherent reason or you wouldn't be 
asking. Also isn't the formula for groupedData() supposed to have a 
*single* covariate on the right, like y ~ x | g (where y, x, and g are 
individual variables)?

Best,
 John

On 2022-01-07 5:29 p.m., Key, Melissa wrote:

John,

Thanks for your response.  I agree that the definition of the data frame is 
poor (in my defense it came directly from the demo code, but I should have 
checked it more thoroughly).  The good news is that your comments caused me to 
take a closer look at where X was defined, and I found the reason I wasn't 
getting the same results on my Mac and PC - that error was between keyboard and 
chair.

There is still something funny going on though (at least relative to my 
previous experience with how R searches environments:

If X is defined in the global environment, groupedData can find it there and 
use it.  (this is what I'm used to)
If X is defined within a function, groupedData cannot find it, even if 
groupedData is called within the same function. (this seems strange to me - 
usually parent.frame() captures information within the function environment, or 
so I thought)

My solution at the bottom still works - and unlike groupedData, nlme allows a 
list as input to the data argument (or at least, doesn't check to make sure 
it's a data frame), so I have a working (albeit hacky) solution that actually 
makes more sense to me than using groupedData, but it still seems strange that 
the function cannot find X in its search path.

Thanks again!
Melissa

-Original Message-
From: John Fox 
Sent: Friday, January 7, 2022 4:35 PM
To: Key, Melissa 
Cc: r-help@r-project.org
Subject: [EXTERNAL] Re: [R] bug in Windows implementation of nlme::groupedData

Dear Melissa,

It seems strange to me that your code would work on any platform (it doesn't on my Mac) 
because the data frame you create shouldn't contain a matrix named "X" but 
rather columns including those originating from X.
To illustrate:

  > X <- matrix(1:12, 4, 3)
  > colnames(X) <- c("a", "b", "c")
  > X
   a b  c
[1,] 1 5  9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12

  > y <- 1:4

  > (D <- data.frame(y, X))
y a b  c
1 1 1 5  9
2 2 2 6 10
3 3 3 7 11
4 4 4 8 12

  > str(D)
'data.frame':   4 obs. of  4 variables:
   $ y: int  1 2 3 4
   $ a: int  1 2 3 4
   $ b: int  5 6 7 8
   $ c: int  9 10 11 12

My session info:

  > sessionInfo()
R

Re: [R] bug in Windows implementation of nlme::groupedData

2022-01-07 Thread John Fox


Dear Melissa,

It seems strange to me that your code would work on any platform (it 
doesn't on my Mac) because the data frame you create shouldn't contain a 
matrix named "X" but rather columns including those originating from X. 
To illustrate:


> X <- matrix(1:12, 4, 3)
> colnames(X) <- c("a", "b", "c")
> X
 a b  c
[1,] 1 5  9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12

> y <- 1:4

> (D <- data.frame(y, X))
  y a b  c
1 1 1 5  9
2 2 2 6 10
3 3 3 7 11
4 4 4 8 12

> str(D)
'data.frame':   4 obs. of  4 variables:
 $ y: int  1 2 3 4
 $ a: int  1 2 3 4
 $ b: int  5 6 7 8
 $ c: int  9 10 11 12

My session info:

> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.1

Matrix products: default
LAPACK: 
/Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib


locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] nlme_3.1-153 HRW_1.0-5

loaded via a namespace (and not attached):
[1] compiler_4.1.2 tools_4.1.2KernSmooth_2.23-20 
splines_4.1.2

[5] grid_4.1.2 lattice_0.20-45

I hope this helps,
 John

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2022-01-07 11:23 a.m., Key, Melissa wrote:

I am trying to replicate a semi-parametric analysis described in Harezlak, 
Jaroslaw, David Ruppert, and Matt P. Wand. Semiparametric regression with R. 
New York, NY: Springer, 2018. 
(https://link.springer.com/book/10.1007%2F978-1-4939-8853-2).

I can successfully run the analysis, but now I'm trying to move it into my 
workflow, which requires that the analysis be conducted within a function 
(using targets package), and the `groupedData` function now fails with an error 
that it cannot find the `X` matrix (see reprex below).  I've tried the reprex 
on both my personal Mac (where it works??) and on windows machines (where it 
does not) - so the problem is likely specific to Windows computers (yes, this 
seems weird to me too).
All packages have been updated, and I'm running the latest version of R on all 
machines.

Reprex:

library(HRW) # contains example data and ZOSull function
library(nlme)

data(growthIndiana)


analyze_this <- function(df) {

   mean.x <- mean(df$age)
   mean.y <- mean(df$height)
   sd.x <- sd(df$age)
   sd.y <- sd(df$height)

   df$x <- (df$age - mean.x) / sd.x
   df$y <- (df$height - mean.y) / sd.y

   X <- model.matrix(~ x * male * black, data = df)
   dummyID <- rep(1, length(nrow(X)))

   grouped_data <- groupedData(y ~ X[,-1]|rep(1, length = nrow(X)), data = 
data.frame(y = df$y, X, dummyID))

}


# doesn't work on Windows machine, does work on the Mac
analyze_this(growthIndiana)
#> Error in eval(aux[[2]], object): object 'X' not found

# does work

df <- growthIndiana

mean.x <- mean(df$age)
mean.y <- mean(df$height)
sd.x <- sd(df$age)
sd.y <- sd(df$height)

df$x <- (df$age - mean.x) / sd.x
df$y <- (df$height - mean.y) / sd.y

X <- model.matrix(~ x * male * black, data = df)
dummyID <- rep(1, length(nrow(X)))

grouped_data <- groupedData(y ~ X[,-1]|rep(1, length = nrow(X)), data = 
data.frame(y = df$y, X, dummyID))


# attempted work-around.

analyze_this2 <- function(df) {
   num.global.knots = 20
   num.subject.knots = 10

   mean.x <- mean(df$age)
   mean.y <- mean(df$height)
   sd.x <- sd(df$age)
   sd.y <- sd(df$height)

   df$x <- (df$age - mean.x) / sd.x
   df$y <- (df$height - mean.y) / sd.y

   X <- model.matrix(~ x * male * black, data = df)
   dummyID <- rep(1, length(nrow(X)))

   # grouped_data <- groupedData(y ~ X[,-1]|rep(1, length = nrow(X)), data = 
data.frame(y = df$y, X, dummyID))

   global.knots = quantile(unique(df$x), seq(0, 1, length = num.global.knots + 
2)[-c(1, num.global.knots + 2)])
   subject.knots = quantile(unique(df$x), seq(0, 1, length = num.subject.knots 
+ 2)[-c(1, num.subject.knots + 2)])

   Z.global <- ZOSull(df$x, range.x = range(df$x), global.knots)
   Z.group <- df$black * Z.global
   Z.subject <- ZOSull(df$x, range.x = range(df$x), subject.knots)

   Zblock <- list(
 dummyID = pdIdent(~ 0 + Z.global),
 dummyID = pdIdent(~ 0 + Z.group),
 idnum = pdSymm(~ x),
 idnum = pdIdent(~ 0 + Z.subject)
   )

   df$dummyID <- dummyID
   tmp_data <- c(
 df,
 X = list(X),
 Z.global = list(Z.global),
 Z.group = list(Z.global),
 Z.subject = list(Z.subject)
   )

   fit <- lme(y ~ 0 + X,
 data = tmp_data,
 random = Zblock
   )

}

# this works (warning - lme takes awhile to fit)
analyze_this2(growthIndiana)

sessionInfo()
#> R version 4.1.2 (2021-11-01)
#&g

Re: [R] How to use ifelse without invoking warnings

2021-10-08 Thread John Fox


Dear Ravi,

On 2021-10-08 8:21 a.m., Ravi Varadhan wrote:
Thank you to Bert, Sarah, and John. I did consider suppressing warnings, 
but I felt that there must be a more principled approach.  While John's 
solution is what I would prefer, I cannot help but wonder why `ifelse' 
was not constructed to avoid this behavior.


The conditional if () else, which works on an individual logical value, 
uses lazy evaluation and so can avoid the problem you encountered. My 
guess is that implementing lazy evaluation for the vectorized ifelse() 
would incur too high a computational overhead for large arguments.


Best,
 John



Thanks & Best regards,
Ravi
----
*From:* John Fox 
*Sent:* Thursday, October 7, 2021 2:00 PM
*To:* Ravi Varadhan 
*Cc:* R-Help 
*Subject:* Re: [R] How to use ifelse without invoking warnings

   External Email - Use Caution



Dear Ravi,

It's already been suggested that you could disable warnings, but that's
risky in case there's a warning that you didn't anticipate. Here's a
different approach:

  > kk <- k[k >= -1 & k <= n]
  > ans <- numeric(length(k))
  > ans[k > n] <- 1
  > ans[k >= -1 & k <= n] <- pbeta(p, kk + 1, n - kk, lower.tail=FALSE)
  > ans
[1] 0.0 0.006821826 0.254991551 1.0

BTW, I don't think that you mentioned that p = 0.3, but that seems
apparent from the output you showed.

I hope this helps,
   John

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: 
https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsocialsciences.mcmaster.ca%2Fjfox%2F&data=04%7C01%7Cravi.varadhan%40jhu.edu%7Cfd882e7c4f4349db34e108d989bc6a9f%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637692265160038474%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=Q33yXm36BwEVKUWO72CWFpSUx7gcEEXhM3qFi7n78ZM%3D&reserved=0 
<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsocialsciences.mcmaster.ca%2Fjfox%2F&data=04%7C01%7Cravi.varadhan%40jhu.edu%7Cfd882e7c4f4349db34e108d989bc6a9f%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637692265160038474%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=Q33yXm36BwEVKUWO72CWFpSUx7gcEEXhM3qFi7n78ZM%3D&reserved=0>


On 2021-10-07 12:29 p.m., Ravi Varadhan via R-help wrote:

Hi,
I would like to execute the following vectorized calculation:

    ans <- ifelse (k >= -1 & k <= n, pbeta(p, k+1, n-k, lower.tail = FALSE), 
ifelse (k < -1, 0, 1) )

For example:



k <- c(-1.2,-0.5, 1.5, 10.4)
n <- 10
ans <- ifelse (k >= -1 & k <= n, pbeta(p,k+1,n-k,lower.tail=FALSE), ifelse (k < 
-1, 0, 1) )

Warning message:
In pbeta(p, k + 1, n - k, lower.tail = FALSE) : NaNs produced

print(ans)

[1] 0.0 0.006821826 0.254991551 1.0

The answer is correct.  However, I would like to eliminate the annoying 
warnings.  Is there a better way to do this?

Thank you,
Ravi


   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=04%7C01%7Cravi.varadhan%40jhu.edu%7Cfd882e7c4f4349db34e108d989bc6a9f%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637692265160048428%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=FXX%2B4zNT0JHBnDFO5dXBDQ484oQF1EK5%2Fa0dG9P%2F4k4%3D&reserved=0 

<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=04%7C01%7Cravi.varadhan%40jhu.edu%7Cfd882e7c4f4349db34e108d989bc6a9f%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637692265160048428%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=FXX%2B4zNT0JHBnDFO5dXBDQ484oQF1EK5%2Fa0dG9P%2F4k4%3D&reserved=0>
PLEASE do read the posting guide https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=04%7C01%7Cravi.varadhan%40jhu.edu%7Cfd882e7c4f4349db34e108d989bc6a9f%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637692265160048428%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=ss2ohzJIY6qj0eAexk4yVzTzbjXxK5VZNors0GpsbA0%3D&reserved=0 

<https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=04%7C01%7Cravi.varadhan%40jhu.edu%7Cfd882e7c4f4349db34e108d989bc6a9f%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637692265160048428%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=ss2

Re: [R] How to use ifelse without invoking warnings

2021-10-07 Thread John Fox

Dear Ravi,

It's already been suggested that you could disable warnings, but that's 
risky in case there's a warning that you didn't anticipate. Here's a 
different approach:

> kk <- k[k >= -1 & k <= n]
> ans <- numeric(length(k))
> ans[k > n] <- 1
> ans[k >= -1 & k <= n] <- pbeta(p, kk + 1, n - kk, lower.tail=FALSE)
> ans
[1] 0.0 0.006821826 0.254991551 1.0

BTW, I don't think that you mentioned that p = 0.3, but that seems 
apparent from the output you showed.

I hope this helps,
 John

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2021-10-07 12:29 p.m., Ravi Varadhan via R-help wrote:

Hi,
I would like to execute the following vectorized calculation:

   ans <- ifelse (k >= -1 & k <= n, pbeta(p, k+1, n-k, lower.tail = FALSE), ifelse 
(k < -1, 0, 1) )

For example:

k <- c(-1.2,-0.5, 1.5, 10.4)
n <- 10
ans <- ifelse (k >= -1 & k <= n, pbeta(p,k+1,n-k,lower.tail=FALSE), ifelse (k < 
-1, 0, 1) )

Warning message:
In pbeta(p, k + 1, n - k, lower.tail = FALSE) : NaNs produced

print(ans)

[1] 0.0 0.006821826 0.254991551 1.0

The answer is correct.  However, I would like to eliminate the annoying 
warnings.  Is there a better way to do this?

Thank you,
Ravi

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error msg trying to load R Commander with an older R edition...

2021-09-13 Thread John Fox


Dear Brian,

On 2021-09-13 9:33 a.m., Brian Lunergan wrote:

Hi folks:

I'm running Linux Mint 19.3 on my machine. Tried to install a more
recent edition of R but I couldn't seem to get it working so I pulled it
off and went with a good, basic install of the edition available through
the software manager. So... I'm running version 3.4.4.

Mucking about with the attempt at a newer edition seems to have left
some excess baggage behind. When I loaded R Commander and attempted to
run it I received the following error message.

Error: package or namespace load failed for ‘car’ in readRDS(pfile):
  cannot read workspace version 3 written by R 3.6.2; need R 3.5.0 or newer
During startup - Warning message:
package ‘Rcmdr’ in options("defaultPackages") was not found

I get a similar message in Rkward when I try to load any more packages.

Is there any solution for this? Any "leftovers" I can track down and
delete? Any assistance would be greatly appreciated.


It's hard to know exactly how many things are wrong here, but one 
problem seems to be that you saved the R workspace in the newer version 
of R, and that the older version is trying to load the saved workspace, 
which is an incompatible format.


The workspace is probably saved in the file .RData in your R home 
directory. If that's the case, then you should see a message to this 
effect when R starts up. I'd begin by simply deleting this file.


Then, if the Rcmdr package fails to load with an error indicating that 
car or another package is missing, I'd try installing the missing 
package(s).


Finally, you might be better off persevering in your attempt to install 
the current version of R rather than the quite old version that you're 
trying get working.


I hope this helps,
 John



Kind regards...


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Can't add error bars to existing line graph

2021-06-16 Thread John Fox


Dear Bruno,

There are (at least) two errors here:

(1) I think that you misunderstand how interaction.plot(), and more 
generally R base graphics, work. interaction.plot() doesn't return a 
graphics object, but rather draws on a graphics device as a side effect. 
Of course, interaction.plot() is a function and so it must return 
something -- it invisibly returns NULL.


(2) I assume that you independently computed Scaphmeans and Scaphse, 
although you didn't include the corresponding code in your message. In 
any event, the arrows() function generally takes 4 arguments (x0, y0, 
x1, y1), specifying the x and y coordinates of the endpoints of the 
arrows. It's true that because your "arrows" are intended to be 
vertical, you need not specify x1, which defaults to x0, but the other 3 
arguments are necessary. See ?arrows for details.


I hope this helps,
 John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2021-06-15 9:02 a.m., Bruno.Salonen wrote:


Hi all,

I'm trying to add error bars to an existing line graph in base R.

The basic line graph comes up just fine, but it does not show my error
bars...

Data frame = readscaphfileNEW
Plot name = SCAPHLINEGRAPHNEW
x axis = TEMP
y axis = SCAPH.BPM
Tracer = Year (SAME AS 'EXPERIMENT)
Scaphmeans = means of SCAPH.BPM
Scaphse = standard error of SCAPH.BPM

Here is the code..

SCAPHLINEGRAPHNEW <- interaction.plot(readscaphfileNEW$TEMP,
readscaphfileNEW$EXPERIMENT, readscaphfileNEW$SCAPH.BPM,
xlab = "Temperature (°C)", ylab = "Scaphognathite Rate (BPM)",
main = "Scaphognathite",
ylim = c(0,300), trace.label = "Year",
type = "b", pch = c(19,17), fixed = TRUE)
arrows(SCAPHLINEGRAPHNEW,Scaphmeans+Scaphse,SCAPHLINEGRAPHNEW,Scaphmeans-Scaphse,code=3,
angle=90, length=0.1)

Why are my error bars not showing? Is the 'arrows' line wrong?

Thanks a million for your help, everybody.

Here is my data set:


readscaphfileNEW

EXPERIMENT TEMP SCAPH.BPM
12021   1282
22021   1258
32021   1278
42021   1259
52021   1280
62021   12   100
72021   1261
82021   12   103
92021   1261
10   2021   17   100
11   2021   1770
12   2021   1783
13   2021   1773
14   2021   17   143
15   2021   17   103
16   2021   1773
17   2021   17   158
18   2021   1795
19   2021   1780
20   1939   12   158
21   1939   12   148
22   1939   12   152
23   1939   12   148
24   1939   12   160
25   1939   12   168
26   1939   12   152
27   1939   12   150
28   1939   12   187
29   1939   17   300
30   1939   17   302
31   1939   17   291
32   1939   17   240
33   1939   17   253
34   1939   17   207
35   1939   17   184
36   1939   17   224
37   1939   17   242
38   1939   17   236

Bruno



--
Sent from: https://r.789695.n4.nabble.com/R-help-f789696.html

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] calculating area of ellipse

2021-05-11 Thread John Fox


Dear Jeff,

I don't think that it would be sensible to claim that it *never* makes 
sense to multiply quantities measured in different units, but rather 
that this would rarely make sense for regression coefficients. James 
might have a justification for finding the area, but it is still, I 
think, reasonable to point out that doing so may be problematic.


With respect to ratios of areas: I apologize if my examples were 
cryptic. Imagine, for example, that the same regression model is fit to 
two groups and joint-confidence ellipse for two coefficients computed 
for each. The ratio of the two areas would reflect the relative 
precision of the estimates in the two groups, which is unaffected by the 
units of measurement of the coefficients. This is also the idea behind 
generalized variance inflation, where the comparison is to a "utopian" 
situation in which the parameters are uncorrelated. For details, see 
help("vif", package="car") and in particular Fox, J. and Monette, G. 
(1992) Generalized collinearity diagnostics. JASA, 87, 178–183.


Best,
 John


On 2021-05-11 10:48 a.m., Jeff Newmiller wrote:

The area is a product, not a ratio. There are certainly examples out there of 
meaningful products of different units, such as distance * force (work) or power 
" time (work).

If you choose to form a ratio with the area as numerator, you could conceivably 
obtain the numerator with force snd distance and then meaningfully form a ratio 
with time (power). So this asserted requirement as to homogeneous units seems 
inaccurate. But without context I don't know if any of this will aid in 
interpretation of variance for the OP.

On May 11, 2021 7:30:22 AM PDT, John Fox  wrote:

Dear Stephen,

On 2021-05-11 10:20 a.m., Stephen Ellison wrote:

In doing meta-analysis of diagnostic accuracy I produce ellipses of

confidence

and prediction intervals in two dimensions.  How can I calculate the


area of

the ellipse in ggplot2 or base R?


There are established formulae for ellipse area, but I am curious: in


a 2-d ellipse with different quantities (eg coefficients for salary and

age) represented by the different dimensions, what does 'area' mean?

I answered James's question narrowly, but the point you raise is
correct
-- the area isn't directly interpretable unless the coefficients are
measured in the same units.

It still may be possible to compare areas of ellipsoids for, say,
different regressions with the same predictors, as ratios, however,
since these ratios would be unaffected by rescaling the coefficients.
The generalization of this idea to ellipsoids of any dimension is the
basis for the generalized variance-inflation factors computed by the
vif() function in the car package.

Best,
  John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/



S


***
This email and any attachments are confidential. Any

use...{{dropped:8}}


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] calculating area of ellipse

2021-05-11 Thread John Fox

Dear Stephen,

On 2021-05-11 10:20 a.m., Stephen Ellison wrote:
>> In doing meta-analysis of diagnostic accuracy I produce ellipses of 
confidence
>> and prediction intervals in two dimensions.  How can I calculate the 
area of

>> the ellipse in ggplot2 or base R?
>
> There are established formulae for ellipse area, but I am curious: in 
a 2-d ellipse with different quantities (eg coefficients for salary and 
age) represented by the different dimensions, what does 'area' mean?

I answered James's question narrowly, but the point you raise is correct 
-- the area isn't directly interpretable unless the coefficients are 
measured in the same units.

It still may be possible to compare areas of ellipsoids for, say, 
different regressions with the same predictors, as ratios, however, 
since these ratios would be unaffected by rescaling the coefficients. 
The generalization of this idea to ellipsoids of any dimension is the 
basis for the generalized variance-inflation factors computed by the 
vif() function in the car package.

Best,
 John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

>
> S
>
>
> ***
> This email and any attachments are confidential. Any use...{{dropped:8}}
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] calculating area of ellipse

2021-05-07 Thread John Fox


Dear David and Jim,

As I explained yesterday, a confidence ellipse is based on a quadratic 
form in the inverse of the covariance matrix of the estimated 
coefficients. When the coefficients are uncorrelated, the axes of the 
ellipse are parallel to the parameter axes, and the radii of the ellipse 
are just a constant times the inverses of the standard deviations of the 
coefficients. The constant is typically the square root of twice a 
corresponding quantile (say, 0.95) of an F distribution with 2 numerator 
df, or a quantile of the chi-square distribution with 2 df.


In the more general case, the confidence ellipse is tilted, and the 
radii correspond to the square roots of the eigenvalues of the 
coefficient covariance matrix, again multiplied by a constant. That 
explains the result I gave yesterday based on the determinant of the 
coefficient covariance matrix, which is the product of its eigenvalues.


These results generalize readily to ellipsoids in higher dimensions, and 
to degenerate cases, such as perfectly correlated coefficients.


For more on the statistics of ellipses, see 
<http://euclid.psych.yorku.ca/datavis/papers/ellipses-STS402.pdf>.


Best,
 John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2021-05-06 10:31 p.m., David Winsemius wrote:


On 5/6/21 6:29 PM, Jim Lemon wrote:

Hi James,
If the result contains the major (a) and minor (b) axes of the
ellipse, it's easy:

area<-pi*a*b



ITYM semi-major and semi-minor axes.




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] calculating area of ellipse

2021-05-06 Thread John Fox


Dear James,

To mix notation a bit, presumably the (border of the) confidence ellipse 
is of the form (b - beta)'V(b)^-1 (b - beta) = c, where V(b) is the 
covariance matrix of b and c is a constant. Then the area of the ellipse 
is pi*c^2*sqrt(det(V(b))). It shouldn't be hard to translate that into R 
code.


I hope this helps,
 John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2021-05-06 7:24 a.m., james meyer wrote:

In doing meta-analysis of diagnostic accuracy I produce ellipses of confidence
and prediction intervals in two dimensions.  How can I calculate the area of
the ellipse in ggplot2 or base R?

thank you
James Meyer

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Contrasts in coxph

2021-04-06 Thread John Fox

Dear John,

It's not clear to me exactly what you have in mind, but 
car::linearHypothesis(), multcomp::glht(), and the emmeans package work 
with Cox models. I expect there are functions in other packages that 
will work too.

Here's an example, surely simpler than what you have in mind, but you 
can probably adapt it:

-- snip -

> library("survival")
> library("car")
Loading required package: carData
> mod.allison <- coxph(Surv(week, arrest) ~
+fin + age + race + wexp + mar + paro + prio,
+  data=Rossi)
> mod.allison
Call:
coxph(formula = Surv(week, arrest) ~ fin + age + race + wexp +
mar + paro + prio, data = Rossi)

   coef exp(coef) se(coef)  z   p
finyes -0.37942   0.68426  0.19138 -1.983 0.04742
age-0.05744   0.94418  0.02200 -2.611 0.00903
raceother  -0.31390   0.73059  0.30799 -1.019 0.30812
wexpyes-0.14980   0.86088  0.21222 -0.706 0.48029
marnot married  0.43370   1.54296  0.38187  1.136 0.25606
paroyes-0.08487   0.91863  0.19576 -0.434 0.66461
prio0.09150   1.09581  0.02865  3.194 0.00140

Likelihood ratio test=33.27  on 7 df, p=2.362e-05
n= 432, number of events= 114
>
> linearHypothesis(mod.allison, "finyes")
Linear hypothesis test

Hypothesis:
finyes = 0

Model 1: restricted model
Model 2: Surv(week, arrest) ~ fin + age + race + wexp + mar + paro + prio

  Res.Df Df  Chisq Pr(>Chisq)
1426
2425  1 3.93060.04742 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> library("multcomp")
Loading required package: mvtnorm
Loading required package: TH.data
Loading required package: MASS

Attaching package: ‘TH.data’

The following object is masked from ‘package:MASS’:

geyser

> summary(glht(mod.allison, "finyes=0"))

 Simultaneous Tests for General Linear Hypotheses

Fit: coxph(formula = Surv(week, arrest) ~ fin + age + race + wexp +
mar + paro + prio, data = Rossi)

Linear Hypotheses:
Estimate Std. Error z value Pr(>|z|)
finyes == 0  -0.3794 0.1914  -1.983   0.0474 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Adjusted p values reported -- single-step method)

>
> library(emmeans)
> pairs(emmeans(mod.allison, ~ fin))
 contrast estimateSE  df z.ratio p.value
 no - yes0.379 0.191 Inf 1.983   0.0474

-- snip -

Results are averaged over the levels of: race, wexp, mar, paro
Results are given on the log (not the response) scale.

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2021-04-05 11:28 p.m., Sorkin, John wrote:

I would like to define contrasts on the output of a coxph function. It appears 
that the contrast function from the contrast library does not have a method 
defined that will allow computation of contrasts on a coxph object.

How does one define and evaluate contrasts for a cox model?

Thank you,
John

John David Sorkin M.D., Ph.D.
Professor of Medicine
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology and Geriatric 
Medicine
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plotting adjusted KM curve

2021-04-04 Thread John Fox


On 2021-04-04 10:45 p.m., John Fox wrote:

Dear John,

I think that what you're looking for is

plot(survfit(fit1Cox, newdata=data.frame(age=rep(65, 2), 
sex=factor("female", "male"


Whoops, that should be

plot(survfit(fit1Cox, newdata=data.frame(age=rep(65, 2), 
sex=factor(c("female", "male")


John




assuming, of course, that sex is a factor with levels "female" and "male".

I hope this helps,
John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2021-04-04 9:34 p.m., Sorkin, John wrote:

Colleagues,
I am using the coxph to model survival time. How do I plot an adjusted 
Kaplan Meir plot resulting from coxph? The code I would like to run 
would start with:


# run cox model
fit1Cox <- coxph(surv_object ~age+sex,data=mydata)

I have no idea what would follow.

I would like to plot adjusted KM curves for men vs. women at age 65.

Thank you,
John


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plotting adjusted KM curve

2021-04-04 Thread John Fox


Dear John,

I think that what you're looking for is

plot(survfit(fit1Cox, newdata=data.frame(age=rep(65, 2), 
sex=factor("female", "male"


assuming, of course, that sex is a factor with levels "female" and "male".

I hope this helps,
 John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2021-04-04 9:34 p.m., Sorkin, John wrote:

Colleagues,
I am using the coxph to model survival time. How do I plot an adjusted Kaplan 
Meir plot resulting from coxph? The code I would like to run would start with:

# run cox model
fit1Cox <- coxph(surv_object ~age+sex,data=mydata)

I have no idea what would follow.

I would like to plot adjusted KM curves for men vs. women at age 65.

Thank you,
John


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] using eigen function in MAP and purr

2021-03-29 Thread John Fox

Dear V. K. Chetty,

Perhaps I'm missing something but why wouldn't you just use a list of 
matrices, as in the following?

-- snip -

> set.seed(123) # for reproducibility

> (Matrices <- lapply(1:3, function(i) matrix(sample(1:50, 4), 2, 2)))

[[1]]
 [,1] [,2]
[1,]   31   14
[2,]   153

[[2]]
 [,1] [,2]
[1,]   42   37
[2,]   43   14

[[3]]
 [,1] [,2]
[1,]   25   27
[2,]   265

> (Eigenvalues <- lapply(Matrices, function(x) eigen(x, 
only.values=TRUE)$values))

[[1]]
[1] 37.149442 -3.149442

[[2]]
[1]  70.27292 -14.27292

[[3]]
[1]  43.3196 -13.3196

-- snip -

I hope this helps,
 John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2021-03-29 5:28 p.m., Veerappa Chetty wrote:

I want to use map and purr functions to compute eigen values for 3000
matrices. Each matrix has 2 rows and 2 columns. The following code does not
work.

test.dat<- tibble(ID=c(1,2),a=c(1,1),b=c(1,1),c=c(2,2),d=c(4,3))

test.out<-test.dat %>% nest(-ID) %>% mutate(fit = purrr::map(data,~
function(x) eigen(matrix(x,2,2)), data=.))

This must be a trivial question for current young practitioners ( In my 9
th decade, I am having fun using R markdown and I am trying to continue my
research!) I would greatly appreciate any help.
Thanks.
V.K.Chetty

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error using nls function

2021-03-27 Thread John Fox

Dear David,

I'm afraid that this doesn't make much sense -- that is, I expect that 
you're not doing what you intended.

First, sin(2*pi*t) and cos(2*pi*t) are each invariant:

> sin(2*pi*t)
 [1] -2.449294e-16 -4.898587e-16 -7.347881e-16 -9.797174e-16 
-1.224647e-15 -1.469576e-15
 [7] -1.714506e-15 -1.959435e-15 -2.204364e-15 -2.449294e-15 
-9.799650e-15 -2.939152e-15

> cos(2*pi*t)
 [1] 1 1 1 1 1 1 1 1 1 1 1 1

Second, as formulated the model is linear in the parameters.

I hope this helps,
John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2021-03-26 8:31 a.m., David E.S. wrote:

I'm trying to fit a harmonic equation to my data, but when I'm applying the
nls function, R gives me the following error:

Error in nlsModel(formula, mf, start, wts) : singular gradient matrix at
initial parameter estimates.

All posts I've seen, related to this error, are of exponential functions,
where a linearization is used to fix this error, but in this case, I'm not
able to solve it in this way. I tried to use other starting points but it
still not working.

y <- c(20.91676, 20.65219, 20.39272, 20.58692, 21.64712, 23.30965, 23.35657,
24.22724, 24.83439, 24.34865, 23.13173, 21.96117)
t <- c(1, 2, 3, 4 , 5 , 6, 7, 8, 9, 10, 11, 12)

# Fitting function

fit <- function(x, a, b, c) {a+b*sin(2*pi*x)+c*cos(2*pi*x)}

res <- nls(y ~ fit(t, a, b, c), data=data.frame(t,y), start = list(a=1,b=0,
c=1))

Can you help me? Thanks!

David

--
Sent from: https://r.789695.n4.nabble.com/R-help-f789696.html

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] library(hms)

2021-03-17 Thread John Fox


Dear Greg,

As I explained to you in a private email, and as others have told you, 
there is no Install.libraries() command, nor is there an 
install.libraries(0 command, but there is an install.packages() command.


So install.packages("hms") should work, on a Mac or on any other 
internet-connected computer on which R runs -- as you've also been told 
by others, this is not a Mac-specific issue. Note that the argument to 
install.packages must be quoted. See ?install.packages for details.


I'll also repeat the advice that I gave you privately to learn something 
about R before you try to use it, possibly starting with the "An 
Introduction to R" manual that ships with the standard R distribution.


Best,
 John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2021-03-17 1:07 p.m., Gregory Coats wrote:

On my MacBook, I do not have, and do not know how to install, library(hms).
Greg Coats


library(hms)

Error in library(hms) : there is no package called ‘hms’

Install.libraries(“hms”)

Error: unexpected input in "Install.libraries(“"




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to plot dates

2021-03-16 Thread John Fox


Dear Greg,

Coordinate plots typically have a horizontal (x) and vertical (y) axis. 
The command


ggplot(myDat, aes(x=datetime, y = datetime)) + geom_point()

works, but I doubt that it produces what you want.

You have only one variable in your data set -- datetime -- so it's not 
obvious what you want to do. If you can't clearly describe the structure 
of the plot you intend to draw, it's doubtful that I or anyone else can 
help you.


Best,
 John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2021-03-16 2:56 p.m., Gregory Coats via R-help wrote:

I need a plot that shows the date and time that each event started.
This ggplot command was publicly given to me via this R Help Mailing LIst.
But the result of issuing the ggplot command is an Error in FUN message.
ggplot(myDat, aes(x=datetime, y = Y_Var)) + geom_point()
Error in FUN(X[[i]], ...) : object 'Y_Var' not found
Greg Coats


On Mar 16, 2021, at 2:18 PM, John Fox  wrote:

There is no variable named Y_Var in your data set. I suspect that it's intended 
to be a generic specification in the recipe you were apparently given. In fact, 
there appears to be only one variable in myDat and that's datetime. What is it 
that you're trying to do?



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to plot dates

2021-03-16 Thread John Fox


Dear Greg,

There is no variable named Y_Var in your data set. I suspect that it's 
intended to be a generic specification in the recipe you were apparently 
given. In fact, there appears to be only one variable in myDat and 
that's datetime. What is it that you're trying to do?


A more general comment: If I'm correct and you're just following a 
recipe, that's a recipe for problems. You'd probably be more successful 
if you tried to learn how ggplot(), etc., work. My apologies if I'm 
misinterpreting the source of your difficulties.


I hope this helps,
 John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2021-03-16 12:21 p.m., Gregory Coats via R-help wrote:

Sarah, Thank you. Yes, now as.POSIXct works.
But the ggplot command I was told to use yields an Error message, and there is 
no output plot.
Please help me. Greg

library(ggplot2)
myDat <- read.table(text =

+ "datetime
+ 2021-03-11 10:00:00
+ 2021-03-11 14:17:00
+ 2021-03-12 05:16:46
+ 2021-03-12 09:17:02
+ 2021-03-12 13:31:43
+ 2021-03-12 22:00:32
+ 2021-03-13 09:21:43",
+ sep = ",", header = TRUE)

head(myDat)

  datetime
1 2021-03-11 10:00:00
2 2021-03-11 14:17
3 2021-03-12 05:16:46
4 2021-03-12 09:17:02
5 2021-03-12 13:31:43
6 2021-03-12 22:00:32

myDat$datetime <- as.POSIXct(myDat$datetime, tz = "", format ="%Y-%M-%d 
%H:%M:%OS”)
ggplot(myDat, aes(x=datetime, y = Y_Var)) + geom_point()

Error in FUN(X[[i]], ...) : object 'Y_Var' not found


On Mar 16, 2021, at 9:36 AM, Sarah Goslee  wrote:

Hi,

It doesn't have anything to do with having a Mac - you have POSIX.

It's because something is wrong with your data import. Looking at the
head() output you provided, it looks like your data file does NOT have
a header, because there's no datetime column, and the column name is
actually X2021.03.11.10.00.0

So you specified a nonexistent column, and got a zero-length answer.

With correct specification, the as.POSIXct function works as expected on Mac:

myDat <- read.table(text =
"datetime
2021-03-11 10:00:00
2021-03-11 14:17:00
2021-03-12 05:16:46
2021-03-12 09:17:02
2021-03-12 13:31:43
2021-03-12 22:00:32
2021-03-13 09:21:43",
sep = ",", header = TRUE)

myDat$datetime <- as.POSIXct(myDat$datetime, tz = "", format =
"%Y-%M-%d %H:%M:%OS")

Sarah

On Tue, Mar 16, 2021 at 9:26 AM Gregory Coats via R-help
 wrote:


My computer is an Apple MacBook. I do not have POSIX.
The command
myDat$datetime <- as.POSIXct(myDat$datetime, tz = "", format = "%Y-%M-%d 
%H:%M:%OS")
yields the error
Error in `$<-.data.frame`(`*tmp*`, datetime, value = numeric(0)) :
  replacement has 0 rows, data has 13
Please advise, How to proceed?
Greg Coats


library(ggplot2)
# Read a txt file on the Desktop, named "myDat.txt"
myDat <- read.delim("~/Desktop/myDat.txt", header = TRUE, sep = ",")
head(myDat)

  X2021.03.11.10.00.00
1  2021-03-11 14:17:00
2  2021-03-12 05:16:46
3  2021-03-12 09:17:02
4  2021-03-12 13:31:43
5  2021-03-12 22:00:32
6  2021-03-13 09:21:43

# convert data to date time object
myDat$datetime <- as.POSIXct(myDat$datetime, tz = "", format = "%Y-%M-%d 
%H:%M:%OS")

Error in `$<-.data.frame`(`*tmp*`, datetime, value = numeric(0)) :
  replacement has 0 rows, data has 13



[[alternative HTML version deleted]]
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Sarah Goslee (she/her)
http://www.numberwright.com



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] mpfr function in Rmpfr crashes R

2021-03-07 Thread John Fox


Dear Roger,

This works perfectly fine for me on an apparently similar system, with 
the exceptions that I'm running R 4.0.4, have many fewer packages 
loaded, and am in a slightly different locale:


--- snip 

> Rmpfr::mpfr(pi, 120)
1 'mpfr' number of precision  120   bits
[1] 3.1415926535897931159979634685441851616

> sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
LAPACK: 
/Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib


locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] Rmpfr_0.8-2 gmp_0.6-2

loaded via a namespace (and not attached):
 [1] compiler_4.0.4htmltools_0.5.1.1 tools_4.0.4   yaml_2.2.1 
  rmarkdown_2.6
 [6] knitr_1.31xfun_0.21 digest_0.6.27 
packrat_0.5.0 rlang_0.4.10

[11] evaluate_0.14

--- snip 

You might try updating R or running Rmpfr in a cleaner session.

I hope this helps,
 John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2021-03-06 7:07 p.m., Roger Bos wrote:

All,

The following code crashes by R on my mac with a message "R session
aborted.  A fatal error occured".

```
library(Rmpfr)
Rmpfr::mpfr(pi, 120)
```

Does anyone have any suggestions?   My session info is below:

R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
LAPACK:
/Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] datasets  utils stats graphics  grDevices methods   base

other attached packages:
  [1] alphavantager_0.1.2 googlesheets4_0.2.0 googledrive_1.0.1
clipr_0.7.1
  [5] jsonlite_1.7.2  stringi_1.5.3   dtplyr_1.0.1
  data.table_1.13.6
  [9] dplyr_1.0.4 plyr_1.8.6  testthat_3.0.1
  lubridate_1.7.9.2
[13] timeDate_3043.102   sendmailR_1.2-1 rmarkdown_2.6
devtools_2.3.2
[17] usethis_2.0.0   xts_0.12.1  zoo_1.8-8
MASS_7.3-53
[21] fortunes_1.5-4

loaded via a namespace (and not attached):
  [1] tinytex_0.29  tidyselect_1.1.0  xfun_0.20 remotes_2.2.0
   purrr_0.3.4
  [6] gargle_0.5.0  lattice_0.20-41   generics_0.1.0vctrs_0.3.6
   htmltools_0.5.1.1
[11] base64enc_0.1-3   rlang_0.4.10  pkgbuild_1.2.0pillar_1.4.7
  glue_1.4.2
[16] withr_2.4.1   DBI_1.1.1 sessioninfo_1.1.1 lifecycle_0.2.0
   cellranger_1.1.0
[21] evaluate_0.14 memoise_2.0.0 knitr_1.31callr_3.5.1
   fastmap_1.1.0
[26] ps_1.5.0  curl_4.3  Rcpp_1.0.6openssl_1.4.3
   cachem_1.0.1
[31] desc_1.2.0pkgload_1.1.0 fs_1.5.0  askpass_1.1
   digest_0.6.27
[36] processx_3.4.5grid_4.0.3rprojroot_2.0.2   cli_2.3.0
   tools_4.0.3
[41] magrittr_2.0.1tibble_3.0.6  crayon_1.4.0  pkgconfig_2.0.3
   ellipsis_0.3.1
[46] prettyunits_1.1.1 httr_1.4.2assertthat_0.2.1  R6_2.5.0
  compiler_4.0.3
19:05:52  >

Thanks,

Roger

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Out from an R package

2021-02-25 Thread John Fox


Dear Goran,

It's not clear from your question what you want to do, but my guess is 
that you simply what a "printout" of your results. The usual way to 
obtain that is via the summary() function. In your case summary(Output).


That's typical of statistical modeling functions in R: They return 
objects, which can be used for further computing, rather than directly 
producing printouts.


If my guess is correct, then you probably should learn more about 
statistical modeling in R, and about R in general, before using it in 
your work.


One more thing: I doubt whether the command

Output <- lmer(G10ln ~ v191_ms + (1 | couno), data = 'G10R')

actually works. The data argument should be a data frame, not the *name* 
of a data frame, i.e., data = G10R .


I hope this helps,
 John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/


On 2021-02-25 10:24 a.m., Göran Djurfeldt wrote:

Help! I am going crazy for a very simple reason. I can’t access the output from 
for instance the lme4 package in R. I have been able to import an SPSS file 
into an R data frame. I have downloaded and installed the Lme4 package and I 
think I have also learnt how to produce a mixed model with lmer:

Output <- lmer(G10ln ~ v191_ms + (1 | couno), data = 'G10R')

How shall I define the output from lmer? What kind of object is it? How do I 
define it?

Goran

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Different results on running Wilcoxon Rank Sum test in R and SPSS

2021-01-20 Thread John Fox

Dear Bharat Rawlley,

On 2021-01-20 1:45 p.m., bharat rawlley via R-help wrote:

  Dear Professor John,
Thank you very much for your reply!
I agree with you that the non-parametric tests I mentioned in my previous email 
(Moods median test and Median test) do not make sense in this situation as they 
treat PFD_n and drug_code as different groups. As you correctly said, I want to 
use PFD_n as a vector of scores and drug_code to make two groups out of it. 
This is exactly what the Independent samples median test does in SPSS. I wish 
to perform the same test in R and am unable to do so.
Simply put, I am asking how to perform the Independent samples median test in R 
just like it is performed in SPSS?

I'm afraid that I'm the wrong person to ask, since I haven't used SPSS 
in perhaps 30 years and have no idea what it does to test for 
differences in medians. A Google search for "independent samples median 
test in R" turns up a number of hits.

Secondly, for the question you are asking about the test statistic, I have not 
performed the Wilcoxon Rank sum test in SPSS for the PFD_n and drug_code data. 
I have said something to the contrary in my first email, I apologize for that.

For continuous data, the Wilcoxon test is, I believe, a reasonable 
choice, but not when there are so many ties. If SPSS doesn't perform a 
Wilcoxon test for a difference in medians, then there's of course no 
reason to expect that the p-values would be the same.

Best,
 John

Thank you very much for your time!
Yours sincerelyBharat RawlleyOn Wednesday, 20 January, 2021, 04:47:21 am IST, 
John Fox  wrote:

  Dear Bharat Rawlley,

What you tried to do appears to be nonsense. That is, you're treating
PFD_n and drug_code as if they were scores for two different groups.

I assume that what you really want to do is to treat PFD_n as a vector
of scores and drug_code as defining two groups. If that's correct, and
with your data into Data, you can try the following:

--snip --

  > wilcox.test(PFD_n ~ drug_code, data=Data, conf.int=TRUE)

     Wilcoxon rank sum test with continuity correction

data:  PFD_n by drug_code
W = 197, p-value = 0.05563
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
   -2.14e+00  5.037654e-05
sample estimates:
difference in location
               -1.19

Warning messages:
1: In wilcox.test.default(x = c(27, 26, 20, 24, 28, 28, 27, 27, 26,  :
   cannot compute exact p-value with ties
2: In wilcox.test.default(x = c(27, 26, 20, 24, 28, 28, 27, 27, 26,  :
   cannot compute exact confidence intervals with ties

--snip --

You can get an approximate confidence interval by specifying exact=FALSE:

--snip --

  > wilcox.test(PFD_n ~ drug_code, data=Data, conf.int=TRUE, exact=FALSE)

     Wilcoxon rank sum test with continuity correction

data:  PFD_n by drug_code
W = 197, p-value = 0.05563
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
   -2.14e+00  5.037654e-05
sample estimates:
difference in location
               -1.19

--snip --

As it turns out, your data are highly discrete and have a lot of ties
(see in particular PFD_n = 28):

--snip --

  > xtabs(~ PFD_n + drug_code, data=Data)

       drug_code
PFD_n  0  1
     0  2  0
     16  1  1
     18  0  1
     19  0  1
     20  2  0
     22  0  1
     24  2  0
     25  1  2
     26  5  2
     27  4  2
     28  5 13
     30  1  2

--snip --

I'm no expert in nonparametric inference, but I doubt whether the
approximate p-value will be very accurate for data like these.

I don't know why wilcox.test() (correctly used) and SPSS are giving you
slightly different results -- assuming that you're actually doing the
same thing in both cases. I couldn't help but notice that most of your
data are missing. Are you getting the same value of the test statistic
and different p-values, or is the test statistic different as well?

I hope this helps,
   John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2021-01-19 5:46 a.m., bharat rawlley via R-help wrote:

   Thank you for the reply and suggestion, Michael!
I used dput() and this is the output I can share with you. Simply explained, I 
have 3 columns namely, drug_code, freq4w_n and PFD_n. Each column has 132 
values (including NA). The problem with the Wilcoxon Rank Sum test has been 
described in my first email.
Please do let me know if you need any further clarification from my side! 
Thanks a lot for your time!
structure(list(drug_code = c(0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 
0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 
1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1,

Re: [R] Different results on running Wilcoxon Rank Sum test in R and SPSS

2021-01-19 Thread John Fox


Dear Bharat Rawlley,

What you tried to do appears to be nonsense. That is, you're treating 
PFD_n and drug_code as if they were scores for two different groups.


I assume that what you really want to do is to treat PFD_n as a vector 
of scores and drug_code as defining two groups. If that's correct, and 
with your data into Data, you can try the following:


--snip --

> wilcox.test(PFD_n ~ drug_code, data=Data, conf.int=TRUE)

Wilcoxon rank sum test with continuity correction

data:  PFD_n by drug_code
W = 197, p-value = 0.05563
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -2.14e+00  5.037654e-05
sample estimates:
difference in location
 -1.19

Warning messages:
1: In wilcox.test.default(x = c(27, 26, 20, 24, 28, 28, 27, 27, 26,  :
  cannot compute exact p-value with ties
2: In wilcox.test.default(x = c(27, 26, 20, 24, 28, 28, 27, 27, 26,  :
  cannot compute exact confidence intervals with ties

--snip --

You can get an approximate confidence interval by specifying exact=FALSE:

--snip --

> wilcox.test(PFD_n ~ drug_code, data=Data, conf.int=TRUE, exact=FALSE)

Wilcoxon rank sum test with continuity correction

data:  PFD_n by drug_code
W = 197, p-value = 0.05563
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -2.14e+00  5.037654e-05
sample estimates:
difference in location
 -1.19

--snip --

As it turns out, your data are highly discrete and have a lot of ties 
(see in particular PFD_n = 28):


--snip --

> xtabs(~ PFD_n + drug_code, data=Data)

 drug_code
PFD_n  0  1
   0   2  0
   16  1  1
   18  0  1
   19  0  1
   20  2  0
   22  0  1
   24  2  0
   25  1  2
   26  5  2
   27  4  2
   28  5 13
   30  1  2

--snip --

I'm no expert in nonparametric inference, but I doubt whether the 
approximate p-value will be very accurate for data like these.


I don't know why wilcox.test() (correctly used) and SPSS are giving you 
slightly different results -- assuming that you're actually doing the 
same thing in both cases. I couldn't help but notice that most of your 
data are missing. Are you getting the same value of the test statistic 
and different p-values, or is the test statistic different as well?


I hope this helps,
 John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2021-01-19 5:46 a.m., bharat rawlley via R-help wrote:

  Thank you for the reply and suggestion, Michael!
I used dput() and this is the output I can share with you. Simply explained, I 
have 3 columns namely, drug_code, freq4w_n and PFD_n. Each column has 132 
values (including NA). The problem with the Wilcoxon Rank Sum test has been 
described in my first email.
Please do let me know if you need any further clarification from my side! 
Thanks a lot for your time!
structure(list(drug_code = c(0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 
0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 
1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 
1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0), freq4w_n 
= c(1, NA, NA, 0, NA, 4, NA, 10, NA, 0, 6, NA, NA, NA, NA, NA, 10, NA, 0, NA, NA, NA, NA, 0, NA, 0, NA, NA, 
NA, 0, NA, 0, NA, NA, NA, NA, NA, NA, NA, NA, 0, 0, 12, 0, NA, 1, 2, 1, 2, 2, NA, 28, 0, NA, 4, NA, 1, NA, 
NA, NA, NA, NA, 0, 3, 1, NA, NA, NA, NA, 4, 28, NA, NA, 0, 2, 12, 0, NA, NA, NA, 0, NA, 0, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, 3, NA, NA, NA, NA, NA, NA, 6, 1, NA, NA, NA, 0, NA, NA, NA, 0, 0, NA, 0, NA, 2, 8, 3, NA, 
NA, NA, 0, NA, NA, NA, 9, NA, NA, NA, NA, NA, NA, NA, NA), PFD_n = c(27, NA, NA, 28, NA, 26, NA, 20, NA, 30, 
24, NA, NA, NA, NA, NA, 18, NA, 28, NA, NA, NA, NA, 28, NA, 28, NA, NA, NA, 28, NA, 28, NA, NA, NA, NA, NA, 
NA, NA, NA, 28, 28, 16, 28, NA, 27, 26, 27, 26, 26, NA, 0, 30, NA, 24, NA, 27, NA, NA, NA, NA, NA, 28, 25, 
27, NA, NA, NA, NA, 26, 0, NA, NA, 28, 26, 16, 28, NA, NA, NA, 28, NA, 28, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, 25, NA, NA, NA, NA, NA, NA, 22, 27, NA, NA, NA, 28, NA, NA, NA, 28, 28, NA, 28, NA, 26, 20, 25, NA, NA, 
NA, 30, NA, NA, NA, 19, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, -132L), class = 
c("tbl_df", "tbl", "data.frame"))

Yours sincerely Bharat RawlleyOn Tuesday, 19 January, 2021, 03:53:27 pm IST, 
Michael Dewey  wrote:
  
  Unfortunately your data did not come through. Try using dput() and then

pasting that into the body of your e-mail message.

On 18/01/2021 17:26, bharat rawlley via R-help wrote:

Hello,
On running the Wilcoxon Rank Sum test in R and SPSS, I am getting the following 
discrepancies which I am un

Re: [R] Troubles installing Rcmdr on Mac

2021-01-12 Thread John Fox


Dear Eberhard,

On 2021-01-12 9:41 a.m., Dr Eberhard W Lisse wrote:

John,

maybe I misunderestimate the students :-)-O but there is not much
sophistication required to follow simple instructions while thinking
about what one is doing when doing so.  At least that was my generation
of students did.

If they can install XQuartz, they can install the command line tools.


In my experience, mostly with social-science undergraduates and 
graduate students, the more steps an installation requires, the more 
likely students will encounter difficulties. Your students may well have 
a different level of computer sophistication than mine did, but I know 
that my experience isn't unique.




And, it's not about RCmdr but about the source packages that you may
want to have to install for which the command line tools are required.


Right. But all of the packages on which the Rcmdr package depends have 
Mac binaries. As I confirmed yesterday, one can install the Rcmdr and 
its dependencies on macOS without building any packages from source.




Finally, RStudio is so much easier and more powerful, that I wonder why
one is bothering with this including XQuartz.


Because I think you underestimate the obstacle that working at the 
command line presents to students who often are already struggling to 
learn basic statistical concepts. In my, and others', experience, it's 
easier for students at this level to work with a statistical GUI than to 
write commands. While working n RStudio or another IDE is undoubtedly 
more powerful, for them it certainly isn't easier. Your teaching 
experiences may be different.


Best,
 John




greetings, el 


On 12/01/2021 16:19, John Fox wrote:

Dear Eberhard,

On 2021-01-12 12:32 a.m., Dr Eberhard W Lisse wrote:

John,

what is wrong with installing Xcode’s command lime tools (not Xcode
itself)?


Nothing, and I did miss the distinction, but it shouldn't be
necessary, and the instructions for installing the Rcmdr are already
more complicated on macOS than on other platforms because of the
necessity to install XQuartz.  Users should be able to install the
Rcmdr package on macOS without having to install packages from source.

Remember that Rcmdr users are typically students in basic statistics
courses, many of whom have limited software sophistication.
Unnecessarily complicating the installation is undesirable.  Of
course, if it's necessary to complicate the installation, one has to
live with that.

I'll be interested to learn whether my suggestions solve the problem.
If not, I can add an instruction concerning the Xcode tools to the
Rcmdr installation notes for macOS.

Thanks for your help,
  John

[...]



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Troubles installing Rcmdr on Mac

2021-01-12 Thread John Fox


Dear Stephane,

On 2021-01-12 1:48 a.m., CHAMPELY STEPHANE wrote:

Dear John,
thank you for these responses, we will try this... today. We carefully read the 
installation notes, but it is sometimes difficult to really check what was done 
by the students because in France, our lessons are online lessons (covid-19...)


Yes, the pandemic has made teaching very difficult. As a general matter, 
it's been my experience that almost all Rcmdr installation problems are 
on macOS. It's usually easy to help students get going in person, much 
less so remotely.


Please let me know whether your students solve their problems, and if 
so, how, so that I can update the Rcmdr installation notes, if necessary.


Also please keep the conversation on r-help so that others are able to 
follow it.


Best,
 John


All the best,
Stéphane CHAMPELY
Maître de conférences
UFR STAPS , Laboratoire L-ViS, Université Lyon 1



De : John Fox 
Envoyé : mardi 12 janvier 2021 03:30
À : CHAMPELY STEPHANE
Cc : r-help@r-project.org; Dr Eberhard W Lisse
Objet : Re: [R] Troubles installing Rcmdr on Mac

Dear Stephane,

I've taken yet another look at this and have an additional suggestion
for your students to try:

 install.packages("Rcmdr", type="mac.binary")

That should avoid any attempt to install Rcmdr package dependencies from
source.

I hope this helps,
   John

On 2021-01-11 3:53 p.m., John Fox wrote:

Dear Stephane and Eberhard,

As an addendum to my previous response, I uninstalled the Rcmdr package
and all of its direct and indirect dependencies and then reinstalled the
package -- on a macOS 11.1 system running R 4.0.3 with all other
packages up-to-date.

I then reinstalled the Rcmdr and dependencies via the command
install.packages("Rcmdr"), and responded "no" when asked whether to
install some packages from source (perhaps this is the explanation for
the problem, if your students responded "yes" without having Xcode
installed).

Following these steps, everything (still) works fine. I therefore can't
duplicate your students' problem, which makes it hard to suggest how to
fix it, without having some additional details.

Best,
   John


On 2021-01-11 3:33 p.m., John Fox wrote:

Dear Stephane and Eberhard,

It should not be necessary to install Xcode (which includes otools) to
install and use the Rcmdr package on macOS because it shouldn't be
necessary to install the CRAN packages required from source. I'm
currently running the Rcmdr on two macOS 11.1 systems, with all CRAN
packages up-to-date, and don't have any problems.

Stephane, have you and your students checked the Rcmdr installation
notes (at
<https://socialsciences.mcmaster.ca/jfox/Misc/Rcmdr/installation-notes.html>)
and followed the instructions there? If you have, and still experience
this problem, it would help to have some more information about what
they did to install the Rcmdr and what happened.

In the meantime, I'll try a fresh install of the Rcmdr and
dependencies to see whether I encounter any difficulties.

Best,
   John



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Troubles installing Rcmdr on Mac

2021-01-12 Thread John Fox


Dear Eberhard,

On 2021-01-12 12:32 a.m., Dr Eberhard W Lisse wrote:

John,

what is wrong with installing Xcode’s command lime tools (not Xcode itself)?


Nothing, and I did miss the distinction, but it shouldn't be necessary, 
and the instructions for installing the Rcmdr are already more 
complicated on macOS than on other platforms because of the necessity to 
install XQuartz. Users should be able to install the Rcmdr package on 
macOS without having to install packages from source.


Remember that Rcmdr users are typically students in basic statistics 
courses, many of whom have limited software sophistication. 
Unnecessarily complicating the installation is undesirable. Of course, 
if it's necessary to complicate the installation, one has to live with that.


I'll be interested to learn whether my suggestions solve the problem. If 
not, I can add an instruction concerning the Xcode tools to the Rcmdr 
installation notes for macOS.


Thanks for your help,
 John




—
Sent from Dr Lisse’s iPhone
On 12 Jan 2021, 04:30 +0200, John Fox , wrote:

Dear Stephane,

I've taken yet another look at this and have an additional suggestion
for your students to try:

install.packages("Rcmdr", type="mac.binary")

That should avoid any attempt to install Rcmdr package dependencies from
source.

I hope this helps,
John

On 2021-01-11 3:53 p.m., John Fox wrote:

Dear Stephane and Eberhard,

As an addendum to my previous response, I uninstalled the Rcmdr package
and all of its direct and indirect dependencies and then reinstalled the
package -- on a macOS 11.1 system running R 4.0.3 with all other
packages up-to-date.

I then reinstalled the Rcmdr and dependencies via the command
install.packages("Rcmdr"), and responded "no" when asked whether to
install some packages from source (perhaps this is the explanation for
the problem, if your students responded "yes" without having Xcode
installed).

Following these steps, everything (still) works fine. I therefore can't
duplicate your students' problem, which makes it hard to suggest how to
fix it, without having some additional details.

Best,
  John


On 2021-01-11 3:33 p.m., John Fox wrote:

Dear Stephane and Eberhard,

It should not be necessary to install Xcode (which includes otools) to
install and use the Rcmdr package on macOS because it shouldn't be
necessary to install the CRAN packages required from source. I'm
currently running the Rcmdr on two macOS 11.1 systems, with all CRAN
packages up-to-date, and don't have any problems.

Stephane, have you and your students checked the Rcmdr installation
notes (at
<https://socialsciences.mcmaster.ca/jfox/Misc/Rcmdr/installation-notes.html>)
and followed the instructions there? If you have, and still experience
this problem, it would help to have some more information about what
they did to install the Rcmdr and what happened.

In the meantime, I'll try a fresh install of the Rcmdr and
dependencies to see whether I encounter any difficulties.

Best,
   John



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Troubles installing Rcmdr on Mac

2021-01-11 Thread John Fox


Dear Stephane,

I've taken yet another look at this and have an additional suggestion 
for your students to try:


install.packages("Rcmdr", type="mac.binary")

That should avoid any attempt to install Rcmdr package dependencies from 
source.


I hope this helps,
 John

On 2021-01-11 3:53 p.m., John Fox wrote:

Dear Stephane and Eberhard,

As an addendum to my previous response, I uninstalled the Rcmdr package 
and all of its direct and indirect dependencies and then reinstalled the 
package -- on a macOS 11.1 system running R 4.0.3 with all other 
packages up-to-date.


I then reinstalled the Rcmdr and dependencies via the command 
install.packages("Rcmdr"), and responded "no" when asked whether to 
install some packages from source (perhaps this is the explanation for 
the problem, if your students responded "yes" without having Xcode 
installed).


Following these steps, everything (still) works fine. I therefore can't 
duplicate your students' problem, which makes it hard to suggest how to 
fix it, without having some additional details.


Best,
  John


On 2021-01-11 3:33 p.m., John Fox wrote:

Dear Stephane and Eberhard,

It should not be necessary to install Xcode (which includes otools) to 
install and use the Rcmdr package on macOS because it shouldn't be 
necessary to install the CRAN packages required from source. I'm 
currently running the Rcmdr on two macOS 11.1 systems, with all CRAN 
packages up-to-date, and don't have any problems.


Stephane, have you and your students checked the Rcmdr installation 
notes (at 
<https://socialsciences.mcmaster.ca/jfox/Misc/Rcmdr/installation-notes.html>) 
and followed the instructions there? If you have, and still experience 
this problem, it would help to have some more information about what 
they did to install the Rcmdr and what happened.


In the meantime, I'll try a fresh install of the Rcmdr and 
dependencies to see whether I encounter any difficulties.


Best,
  John



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Troubles installing Rcmdr on Mac

2021-01-11 Thread John Fox


Dear Stephane and Eberhard,

As an addendum to my previous response, I uninstalled the Rcmdr package 
and all of its direct and indirect dependencies and then reinstalled the 
package -- on a macOS 11.1 system running R 4.0.3 with all other 
packages up-to-date.


I then reinstalled the Rcmdr and dependencies via the command 
install.packages("Rcmdr"), and responded "no" when asked whether to 
install some packages from source (perhaps this is the explanation for 
the problem, if your students responded "yes" without having Xcode 
installed).


Following these steps, everything (still) works fine. I therefore can't 
duplicate your students' problem, which makes it hard to suggest how to 
fix it, without having some additional details.


Best,
 John


On 2021-01-11 3:33 p.m., John Fox wrote:

Dear Stephane and Eberhard,

It should not be necessary to install Xcode (which includes otools) to 
install and use the Rcmdr package on macOS because it shouldn't be 
necessary to install the CRAN packages required from source. I'm 
currently running the Rcmdr on two macOS 11.1 systems, with all CRAN 
packages up-to-date, and don't have any problems.


Stephane, have you and your students checked the Rcmdr installation 
notes (at 
<https://socialsciences.mcmaster.ca/jfox/Misc/Rcmdr/installation-notes.html>) 
and followed the instructions there? If you have, and still experience 
this problem, it would help to have some more information about what 
they did to install the Rcmdr and what happened.


In the meantime, I'll try a fresh install of the Rcmdr and dependencies 
to see whether I encounter any difficulties.


Best,
  John



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Troubles installing Rcmdr on Mac

2021-01-11 Thread John Fox


Dear Stephane and Eberhard,

It should not be necessary to install Xcode (which includes otools) to 
install and use the Rcmdr package on macOS because it shouldn't be 
necessary to install the CRAN packages required from source. I'm 
currently running the Rcmdr on two macOS 11.1 systems, with all CRAN 
packages up-to-date, and don't have any problems.


Stephane, have you and your students checked the Rcmdr installation 
notes (at 
<https://socialsciences.mcmaster.ca/jfox/Misc/Rcmdr/installation-notes.html>) 
and followed the instructions there? If you have, and still experience 
this problem, it would help to have some more information about what 
they did to install the Rcmdr and what happened.


In the meantime, I'll try a fresh install of the Rcmdr and dependencies 
to see whether I encounter any difficulties.


Best,
 John

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2021-01-11 1:55 p.m., Dr Eberhard W Lisse wrote:

Use RStudio.

But it can be that the command line tools are missing, which you (may) need to 
compile packages (from source).Ask one of them to open a terminal window and 
type the command ‘make —version’ without the ‘’) if that results in an error 
they need to enter ‘sudo xcode-select —install’ and then their password when 
asked.If that fixes the issue have all of them do that.

el

—
Sent from Dr Lisse’s iPhone
On 11 Jan 2021, 20:39 +0200, CHAMPELY STEPHANE 
, wrote:

Dear colleagues,
I try to help my (french) student since five days to install Rcmd for mac and they have 
(ALL of them, and I use windows so I am not very skilled for that task) the same problem. 
When they load Rcmd, some supplementary tools (in order to use the command 
"otool") are missing according a message and trying to download them leads to a 
message indicated that it is not at the present moment available (since Thursday last 
week...)
So the menu of the Rcmdr are "white". Any idea where this technical problem 
come from ?
Thank you fr any help !

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [effects] Wrong xlevels in effects plot for mixed effects model when multiline = TRUE

2020-11-11 Thread John Fox


Dear Gerrit,

The bug you reported should now be fixed in the development version 
4.2-1 of the effects package, which you can currently install from 
R-Forge via  install.packages("effects", 
repos="http://R-Forge.R-project.org";) . Eventually, the updated version 
of the effects package will be submitted to CRAN.


Thank you again for the bug report,
 John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2020-11-09 4:51 p.m., Gerrit Eichner wrote:

Dear John,

thank you for prompt reply and your hints. The problem is that our
lmer model is much more complicated and has several interaction
terms:

Mass ~ Sex + I(YoE - 1996) + I(PAI/0.1 - 16) + I(gProt/10 - 6.2) +
     I(Age/10 - 7.2) + I((Age/10 - 7.2)^2) + Diuretics +
     Sex:I(PAI/0.1 - 16) + Sex:I(gProt/10 - 6.2) +
     Sex:I(Age/10 - 7.2) + Sex:I((Age/10 - 7.2)^2) +
     I(YoE - 1996):I(Age/10 - 7.2) + I(PAI/0.1 - 16):I(Age/10 - 7.2) +
     I(gProt/10 - 6.2):I(Age/10 - 7.2) +
     (I(Age/10 - 7.2) + I((Age/10 - 7.2)^2) | ID)

so that allEffects is quite efficient, and since I want to place
several interaction terms with Age in one figure with Age on the
horizontal axis the argument x.var = "Age" in plot would be very
helpful. :-)

Further hints using the above complex model: The following works well:
eff <- Effect(c("gProt", "Age"), m,
   xlevels = list(gProt = 1:6 * 30, Age = 60:100))
plot(eff, lines=list(multiline=TRUE), x.var = "Age")

But this fails (note that Age is missing in xlevels):
eff <- Effect(c("gProt", "Age"), m, xlevels = list(gProt = 1:6 * 30))
plot(eff, lines=list(multiline=TRUE), x.var = "Age")


And that just led me to a solutution also for allEffects: Specifying
Age in xlevels for allEffects (although it seems unnecessary when
x.var = "Age" is used in plot) produces the correct graphical
output! :-)

Thank you very much for your support and the brilliant effects
package in general! :-)

  Best regards  --  Gerrit

-
Dr. Gerrit Eichner   Mathematical Institute, Room 212
gerrit.eich...@math.uni-giessen.de   Justus-Liebig-University Giessen
Tel: +49-(0)641-99-32104  Arndtstr. 2, 35392 Giessen, Germany
Fax: +49-(0)641-99-32109    http://www.uni-giessen.de/eichner
-----

Am 09.11.2020 um 19:51 schrieb John Fox:

Dear Gerrit,

This looks like a bug in plot.eff(), which I haven't yet tracked down, 
but the following should give you what you want:


eff <- Effect(c("gProt", "Age"), m, xlevels = list(gProt = 1:6 * 30, 
Age=60:100))

plot(eff, lines=list(multiline=TRUE))

or

eff <- predictorEffect("Age", m, xlevels = list(gProt = 1:6 * 30))
plot(eff, lines=list(multiline=TRUE))

A couple of comments on your code, unrelated to the bug in plot.eff():

You don't need allEffects() because there's only one high-order fixed 
effect in the model, I(gProt/10 - 6.2):I(Age/10 - 7.2) (i.e., the 
interaction of gProt with Age).


x.var isn't intended as an argument for plot() with allEffects() 
because there generally isn't a common horizontal axis for all of the 
high-order effect plots.


Finally, thank you for the bug report. Barring unforeseen 
difficulties, we'll fix the bug in due course.


I hope this helps,
  John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2020-11-09 8:06 a.m., Gerrit Eichner wrote:

Dear list members,

I observe a strange/wrong graphical output when I set the xlevels
in (e. g.) allEffects for an lmer model and plot the effects with
multiline = TRUE. I have compiled a reprex for which you need the
lmer model and the environment in which the model was fitted. They
are contained in the zip file at
https://jlubox.uni-giessen.de/dl/fiSzTCc3bW8z2npZvPpqG1xr/m-and-G1.zip
After unpacking the following should work:

m <- readRDS("m.rds")   # The lmer-model.
G1 <- readRDS("G1.rds") # Environment in which the model
  # was fitted; needed by alaEffects.
summary(m) # Just to see the model.

library(effects)
aE <- allEffects(m, xlevels = list(gProt = 1:6 * 30))
  # Non-default values for xlevels.

plot(aE)  # Fine.
plot(aE, x.var = "Age")   # Fine.
plot(aE, lines = list(multiline = TRUE))  # Fine.

plot(aE, lines = list(multiline = TRUE),
   x.var = "Age")    # Nonsense.


Anybody any idea about the reason, my mistake, or a
workaround? Thx for any hint!

   Regards  --  Gerrit


PS:
  > sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (buil

Re: [R] [effects] Wrong xlevels in effects plot for mixed effects model when multiline = TRUE

2020-11-09 Thread John Fox


Dear Gerrit,

This looks like a bug in plot.eff(), which I haven't yet tracked down, 
but the following should give you what you want:


eff <- Effect(c("gProt", "Age"), m, xlevels = list(gProt = 1:6 * 30, 
Age=60:100))

plot(eff, lines=list(multiline=TRUE))

or

eff <- predictorEffect("Age", m, xlevels = list(gProt = 1:6 * 30))
plot(eff, lines=list(multiline=TRUE))

A couple of comments on your code, unrelated to the bug in plot.eff():

You don't need allEffects() because there's only one high-order fixed 
effect in the model, I(gProt/10 - 6.2):I(Age/10 - 7.2) (i.e., the 
interaction of gProt with Age).


x.var isn't intended as an argument for plot() with allEffects() because 
there generally isn't a common horizontal axis for all of the high-order 
effect plots.


Finally, thank you for the bug report. Barring unforeseen difficulties, 
we'll fix the bug in due course.


I hope this helps,
 John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2020-11-09 8:06 a.m., Gerrit Eichner wrote:

Dear list members,

I observe a strange/wrong graphical output when I set the xlevels
in (e. g.) allEffects for an lmer model and plot the effects with
multiline = TRUE. I have compiled a reprex for which you need the
lmer model and the environment in which the model was fitted. They
are contained in the zip file at
https://jlubox.uni-giessen.de/dl/fiSzTCc3bW8z2npZvPpqG1xr/m-and-G1.zip
After unpacking the following should work:

m <- readRDS("m.rds")   # The lmer-model.
G1 <- readRDS("G1.rds") # Environment in which the model
  # was fitted; needed by alaEffects.
summary(m) # Just to see the model.

library(effects)
aE <- allEffects(m, xlevels = list(gProt = 1:6 * 30))
  # Non-default values for xlevels.

plot(aE)  # Fine.
plot(aE, x.var = "Age")   # Fine.
plot(aE, lines = list(multiline = TRUE))  # Fine.

plot(aE, lines = list(multiline = TRUE),
   x.var = "Age")    # Nonsense.


Anybody any idea about the reason, my mistake, or a
workaround? Thx for any hint!

   Regards  --  Gerrit


PS:
  > sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
[5] LC_TIME=German_Germany.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] effects_4.2-0 carData_3.0-4

loaded via a namespace (and not attached):
   [1] Rcpp_1.0.5   lattice_0.20-41  MASS_7.3-53  grid_4.0.2 
DBI_1.1.0
   [6] nlme_3.1-149 survey_4.0   estimability_1.3 minqa_1.2.4 
nloptr_1.2.2.2
[11] Matrix_1.2-18    boot_1.3-25  splines_4.0.2    statmod_1.4.34 
lme4_1.1-23
[16] tools_4.0.2  survival_3.2-3   yaml_2.2.1   compiler_4.0.2 
colorspace_1.4-1

[21] mitools_2.4  insight_0.9.5    nnet_7.3-14

-
Dr. Gerrit Eichner   Mathematical Institute, Room 212
gerrit.eich...@math.uni-giessen.de   Justus-Liebig-University Giessen
Tel: +49-(0)641-99-32104  Arndtstr. 2, 35392 Giessen, Germany
http://www.uni-giessen.de/eichner

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to get a numeric vector?

2020-10-04 Thread John Fox

Dear vod vos,

On 2020-10-04 6:47 p.m., vod vos via R-help wrote:

Hi,

a <- c(1, 4)
b <- c(5, 8)

a:b

[1] 1 2 3 4 5
Warning messages:
1: In a:b : numerical expression has 2 elements: only the first used
2: In a:b : numerical expression has 2 elements: only the first used

how to get:

c(1:5, 4:8)

The simplest way is c(1:5, 4:8) but I don't suppose that's what you 
really want. Perhaps the following is what you have in mind:

> unlist(mapply(':', c(1, 4), c(5, 8), SIMPLIFY=FALSE))
 [1] 1 2 3 4 5 4 5 6 7 8

In your case, but not more generally,

> as.vector(mapply(':', c(1, 4), c(5, 8)))
 [1] 1 2 3 4 5 4 5 6 7 8

also works.

I hope this helps,
 John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

Thanks.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] formula wrangling

2020-09-21 Thread John Fox


Dear Roger,

This is an interesting puzzle and I started to look at it when your 
second message arrived. I can simplify your code slightly in two places, 
here:


  if (exists("fqssnames")) {
mff <- m
ffqss <- paste(fqssnames, collapse = "+")
mff$formula <- as.formula(paste(deparse(Terms), "+", ffqss))
  }

and here:

  if (length(qssterms) > 0) {
X <- do.call(cbind,
 c(list(X),
   lapply(tmpc$vars, function(u) eval(parse(text = u), 
mff

}

and the following line is extraneous:

   ef <- environment(formula)

That doesn't amount to much, and I haven't tested my substitute code 
beyond your example.


Best,
 John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2020-09-21 9:40 a.m., Koenker, Roger W wrote:

Here is a revised snippet that seems to work the way that was intended.  
Apologies to anyone
who wasted time looking at the original post.  Of course my interest in simpler 
or more efficient
solutions remains unabated.

if (exists("fqssnames")) {
mff <- m
mff$formula <- Terms
 ffqss <- paste(fqssnames, collapse = "+")
 mff$formula <- as.formula(paste(deparse(mff$formula), "+", ffqss))
 }
 m$formula <- Terms
 m <- eval(m, parent.frame())
 mff <- eval(mff, parent.frame())
 Y <- model.extract(m, "response")
 X <- model.matrix(Terms, m)
 ef <- environment(formula)
 qss <- function(x, lambda) (x^lambda - 1)/lambda
 if (length(qssterms) > 0) {
 xss <- lapply(tmpc$vars, function(u) eval(parse(text = u), mff))
for(i in 1:length(xss)){
X <- cbind(X, xss[[i]]) # Here is the problem
}
 }



On Sep 21, 2020, at 9:52 AM, Koenker, Roger W  wrote:

I need some help with a formula processing problem that arose from a seemingly 
innocuous  request
that I add a “subset” argument to the additive modeling function “rqss” in my 
quantreg package.

I’ve tried to boil the relevant code down to something simpler as illustrated 
below.  The formulae in
question involve terms called “qss” that construct sparse matrix objects, but 
I’ve replaced all that with
a much simpler BoxCox construction that I hope illustrates the basic 
difficulty.  What is supposed to happen
is that xss objects are evaluated and cbind’d to the design matrix, subject to 
the same subset restriction
as the rest of the model frame.  However, this doesn’t happen, instead the xss 
vectors are evaluated
on the full sample and the cbind operation generates a warning which probably 
should be an error.
I’ve inserted a browser() to make it easy to verify that the length of 
xss[[[1]] doesn’t match dim(X).

Any suggestions would be most welcome, including other simplifications of the 
code.  Note that
the function untangle.specials() is adapted, or perhaps I should say adopted 
form the survival
package so you would need the quantreg package to run the attached code.

Thanks,
Roger



fit <- function(formula, subset, data, ...){
call <- match.call()
m <- match.call(expand.dots = FALSE)
tmp <- c("", "formula", "subset", "data")
m <- m[match(tmp, names(m), nomatch = 0)]
m[[1]] <- as.name("model.frame")
Terms <- if(missing(data)) terms(formula,special = "qss")
else terms(formula, special = "qss", data = data)
qssterms <- attr(Terms, "specials")$qss
if (length(qssterms)) {
tmpc <- untangle.specials(Terms, "qss")
dropx <- tmpc$terms
if (length(dropx))
Terms <- Terms[-dropx]
attr(Terms, "specials") <- tmpc$vars
fnames <- function(x) {
fy <- all.names(x[[2]])
if (fy[1] == "cbind")
fy <- fy[-1]
fy
}
fqssnames <- unlist(lapply(parse(text = tmpc$vars), fnames))
qssnames <- unlist(lapply(parse(text = tmpc$vars), function(x) 
deparse(x[[2]])))
}
if (exists("fqssnames")) {
ffqss <- paste(fqssnames, collapse = "+")
ff <- as.formula(paste(deparse(formula), "+", ffqss))
}
m$formula <- Terms
m <- eval(m, parent.frame())
Y <- model.extract(m, "response")
X <- model.matrix(Terms, m)
ef <- environment(formula)
qss <- function(x, lambda) (x^lambda - 1)/lambda
if (length(qssterms) > 0) {
xss <- lapply(tmpc$vars, function(u) eval(parse(text = u), m, enclos = 
ef))
for(i in 1:length(xss)){
X <- cbind(X, xss[[i]]) # Here is the problem
}
}
browser()
z <- lm.fit(X,Y) # The dreaded least squares fit
z

Re: [R] linearHypothesis

2020-09-17 Thread John Fox

Dear Johan,

It's generally a good idea to keep the conversation on r-help to allow 
list members to follow it, and so I'm cc'ing this response to the list.

I hope that it's clear that car::linearHypothesis() computes the test as 
a Wald test of a linear hypothesis and not as a likelihood-ratio test by 
model comparison. As your example illustrates, however, the two tests 
are the same for a linear model, but this is not true more generally.

As I mentioned, you can find the details in many sources, including in 
Section 5.3.5 of Fox and Weisberg, An R Companion to Applied Regression, 
3rd Edition, the book with which the car package is associated.

Best,
 John

On 2020-09-17 4:03 p.m., Johan Lassen wrote:
Thank you John - highly appreciated! Yes, you are right, the less 
complex model may be seen as a restricted model of the starting model. 
Although the set of variables in the less complex model is not directly 
a subset of the variables of the starting model. What confused me at 
first was that I think of a subset model as a model having a direct 
subset of the set of variables of the starting model. Even though this 
is not the case in the example, the test still is on a restricted model 
of the starting model.

Thanks,
Johan

Den tor. 17. sep. 2020 kl. 15.55 skrev John Fox <mailto:j...@mcmaster.ca>>:

Dear Johan,

On 2020-09-17 9:07 a.m., Johan Lassen wrote:
 > Dear R-users,
 >
 > I am using the R-function "linearHypothesis" to test if the sum
of all
 > parameters, but the intercept, in a multiple linear regression is
different
 > from zero.
 > I wonder if it is statistically valid to use the 
linearHypothesis-function

 > for this?

Yes, assuming of course that the hypothesis makes sense.

 > Below is a reproducible example in R. A multiple regression: y =
 > beta0*t0+beta1*t1+beta2*t2+beta3*t3+beta4*t4
 >
 > It seems to me that the linearHypothesis function does the
calculation as
 > an F-test on the extra residuals when going from the starting
model to a
 > 'subset' model, although all variables in the 'subset' model
differ from
 > the variables in the starting model.
 > I normally think of a subset model as a model built on the same
input data
 > as the starting model but one variable.
 >
 > Hence, is this a valid calculation?

First, linearHypothesis() doesn't literally fit alternative models, but
rather tests the linear hypothesis directly from the coefficient
estimates and their covariance matrix. The test is standard -- look at
the references in ?linearHypothesis or most texts on linear models.

Second, formulating the hypothesis using alternative models is also
legitimate, since the second model is a restricted version of the first.

 >
 > Thanks in advance,Johan
 >
 > # R-code:
 > y <-
 >

c(101133190,96663050,106866486,97678429,83212348,75719714,77861937,74018478,82181104,68667176,64599495,62414401,63534709,58571865,65222727,60139788,
 >

63355011,57790610,55214971,55535484,55759192,49450719,48834699,51383864,51250871,50629835,52154608,54636478,54942637)
 >
 > data <-
 >

data.frame(y,"t0"=1,"t1"=1990:2018,"t2"=c(rep(0,12),1:17),"t3"=c(rep(0,17),1:12),"t4"=c(rep(0,23),1:6))
 >
 > model <- lm(y~t0+t1+t2+t3+t4+0,data=data)

You need not supply the constant regressor t0 explicitly and suppress
the intercept -- you'd get the same test from linearHypothesis() for
lm(y~t1+t2+t3+t4,data=data).

 >
 > linearHypothesis(model,"t1+t2+t3+t4=0",test=c("F"))

test = "F" is the default.

 >
 > # Reproduce the result from linearHypothesis:
 > # beta1+beta2+beta3+beta4=0 -> beta4=-(beta1+beta2+beta3) ->
 > # y=beta0+beta1*t1+beta2*t2+beta3*t3-(beta1+beta2+beta3)*t4
 > # y = beta0'+beta1'*(t1-t4)+beta2'*(t2-t4)+beta3'*(t3-t4)
 >
 > data$t1 <- data$t1-data$t4
 > data$t2 <- data$t2-data$t4
 > data$t3 <- data$t3-data$t4
 >
 > model_reduced <- lm(y~t0+t1+t2+t3+0,data=data)
 >
 > anova(model_reduced,model)

Yes, this is equivalent to the test performed by linearHypothesis()
using the coefficients and their covariances from the original model.

I hope this helps,
   John

-- 
John Fox, Professor Emeritus

McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/
 >

--
Johan Lassen

"In the cities people live in time -
in the mountains people live in space" (Budistisk munk).

--
John Fox, Professor Emeritus
McMaster Unive

Re: [R] linearHypothesis

2020-09-17 Thread John Fox


Dear Johan,

On 2020-09-17 9:07 a.m., Johan Lassen wrote:

Dear R-users,

I am using the R-function "linearHypothesis" to test if the sum of all
parameters, but the intercept, in a multiple linear regression is different
from zero.
I wonder if it is statistically valid to use the  linearHypothesis-function
for this?


Yes, assuming of course that the hypothesis makes sense.



Below is a reproducible example in R. A multiple regression: y =
beta0*t0+beta1*t1+beta2*t2+beta3*t3+beta4*t4

It seems to me that the linearHypothesis function does the calculation as
an F-test on the extra residuals when going from the starting model to a
'subset' model, although all variables in the 'subset' model differ from
the variables in the starting model.
I normally think of a subset model as a model built on the same input data
as the starting model but one variable.

Hence, is this a valid calculation?


First, linearHypothesis() doesn't literally fit alternative models, but 
rather tests the linear hypothesis directly from the coefficient 
estimates and their covariance matrix. The test is standard -- look at 
the references in ?linearHypothesis or most texts on linear models.


Second, formulating the hypothesis using alternative models is also 
legitimate, since the second model is a restricted version of the first.




Thanks in advance,Johan

# R-code:
y <-
c(101133190,96663050,106866486,97678429,83212348,75719714,77861937,74018478,82181104,68667176,64599495,62414401,63534709,58571865,65222727,60139788,
63355011,57790610,55214971,55535484,55759192,49450719,48834699,51383864,51250871,50629835,52154608,54636478,54942637)

data <-
data.frame(y,"t0"=1,"t1"=1990:2018,"t2"=c(rep(0,12),1:17),"t3"=c(rep(0,17),1:12),"t4"=c(rep(0,23),1:6))

model <- lm(y~t0+t1+t2+t3+t4+0,data=data)


You need not supply the constant regressor t0 explicitly and suppress 
the intercept -- you'd get the same test from linearHypothesis() for 
lm(y~t1+t2+t3+t4,data=data).




linearHypothesis(model,"t1+t2+t3+t4=0",test=c("F"))


test = "F" is the default.



# Reproduce the result from linearHypothesis:
# beta1+beta2+beta3+beta4=0 -> beta4=-(beta1+beta2+beta3) ->
# y=beta0+beta1*t1+beta2*t2+beta3*t3-(beta1+beta2+beta3)*t4
# y = beta0'+beta1'*(t1-t4)+beta2'*(t2-t4)+beta3'*(t3-t4)

data$t1 <- data$t1-data$t4
data$t2 <- data$t2-data$t4
data$t3 <- data$t3-data$t4

model_reduced <- lm(y~t0+t1+t2+t3+0,data=data)

anova(model_reduced,model)


Yes, this is equivalent to the test performed by linearHypothesis() 
using the coefficients and their covariances from the original model.


I hope this helps,
 John

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [R-pkgs] new ivreg package for 2SLS regression with diagnostics

2020-09-03 Thread John Fox


Dear list members,

Christian Kleiber, Achim Zeileis, and I would like to announce a new 
CRAN package, ivreg, which provides a comprehensive implementation of 
instrumental variables estimation using two-stage least-squares (2SLS) 
regression.


The standard regression functionality (parameter estimation, inference, 
robust covariances, predictions, etc.) in the package is derived from 
and supersedes the ivreg() function in the AER package. Additionally, 
various regression diagnostics are supported, including hat values, 
deletion diagnostics such as studentized residuals and Cook's distances; 
graphical diagnostics such as component-plus-residual plots and 
added-variable plots; and effect plots with partial residuals. In order 
to provide these features, the ivreg package integrates seamlessly with 
other packages through suitable S3 methods, specifically for generic 
functions in the base-R stats package, and in the car, effects, lmtest, 
and sandwich packages, among others.


The ivreg package is accompanied by two online vignettes: a brief 
general introduction to the package, and an introduction to the 
regression diagnostics and graphics that are provided.


For more information, see the ivreg CRAN webpage at 
<https://cran.r-project.org/package=ivreg> and the ivreg pkgdown webpage 
at <https://john-d-fox.github.io/ivreg/>.


Comments, suggestions, and bug reports would be appreciated.

John

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to obtain individual log-likelihood value from glm?

2020-08-29 Thread John Fox

Dear John,

If you look at the code for logitreg() in the MASS text, you'll see that 
the casewise components of the log-likelihood are multiplied by the 
corresponding weights. As far as I can see, this only makes sense if the 
weights are binomial trials. Otherwise, while the coefficients 
themselves will be the same as obtained for proportionally similar 
integer weights (e.g., using your weights rather than weights/10), 
quantities such as the maximized log-likelihood, deviance, and 
coefficient standard errors will be uninterpretable.

logitreg() is simply another way to compute the MLE, using a 
general-purpose optimizer rather than than iteratively weighted 
least-squares, which is what glm() uses. That the two functions provide 
the same answer within rounding error is unsurprising -- they're solving 
the same problem. A difference between the two functions is that glm() 
issues a warning about non-integer weights, while logitreg() doesn't. As 
I understand it, the motivation for writing logitreg() is to provide a 
function that could easily be modified, e.g., to impose parameter 
constraints on the solution.

I think that this discussion has gotten unproductive. If you feel that 
proceeding with noninteger weights makes sense, for a reason that I 
don't understand, then you should go ahead.

Best,
 John

On 2020-08-29 1:23 p.m., John Smith wrote:
In the book Modern Applied Statistics with S, 4th edition, 2002, by 
Venables and Ripley, there is a function logitreg on page 445, which 
does provide the weighted logistic regression I asked, judging by the 
loss function. And interesting enough, logitreg provides the same 
coefficients as glm in the example I provided earlier, even with weights 
< 1. Also for residual deviance, logitreg yields the same number as glm. 
Unless I misunderstood something, I am convinced that glm is a 
valid tool for weighted logistic regression despite the description on 
weights and somehow questionable logLik value in the case of non-integer 
weights < 1. Perhaps this is a bold claim: the description of weights 
can be modified and logLik can be updated as well.

The stackexchange inquiry I provided is what I feel interesting, not the 
link in that post. Sorry for the confusion.

On Sat, Aug 29, 2020 at 10:18 AM John Smith <mailto:jsw...@gmail.com>> wrote:

Thanks for very insightful thoughts. What I am trying to achieve
with the weights is actually not new, something like

https://stats.stackexchange.com/questions/44776/logistic-regression-with-weighted-instances.
I thought my inquiry was not too strange, and I could utilize some
existing codes. It is just an optimization problem at the end of
day, or not? Thanks

    On Sat, Aug 29, 2020 at 9:02 AM John Fox mailto:j...@mcmaster.ca>> wrote:

Dear John,

On 2020-08-29 1:30 a.m., John Smith wrote:
 > Thanks Prof. Fox.
 >
 > I am curious: what is the model estimated below?

Nonsense, as Peter explained in a subsequent response to your
prior posting.

 >
 > I guess my inquiry seems more complicated than I thought:
with y being 0/1, how to fit weighted logistic regression with
weights <1, in the sense of weighted least squares? Thanks

What sense would that make? WLS is meant to account for
non-constant
error variance in a linear model, but in a binomial GLM, the
variance is
purely a function for the mean.

If you had binomial (rather than binary 0/1) observations (i.e.,
binomial trials exceeding 1), then you could account for
overdispersion,
e.g., by introducing a dispersion parameter via the quasibinomial
family, but that isn't equivalent to variance weights in a LM,
rather to
the error-variance parameter in a LM.

I guess the question is what are you trying to achieve with the
weights?

Best,
       John

 >
 >> On Aug 28, 2020, at 10:51 PM, John Fox mailto:j...@mcmaster.ca>> wrote:
 >>
 >> Dear John
 >>
 >> I think that you misunderstand the use of the weights
argument to glm() for a binomial GLM. From ?glm: "For a binomial
GLM prior weights are used to give the number of trials when the
response is the proportion of successes." That is, in this case
y should be the observed proportion of successes (i.e., between
0 and 1) and the weights are integers giving the number of
trials for each binomial observation.
 >>
 >> I hope this helps,
 >> John
 >>
 >> John Fox, Professor Emeritus
 >> McMaster University
 >> Hamilton, Ontario, Canada
 >> web: https://socialscie

Re: [R] How to obtain individual log-likelihood value from glm?

2020-08-29 Thread John Fox


Dear John,

On 2020-08-29 11:18 a.m., John Smith wrote:

Thanks for very insightful thoughts. What I am trying to achieve with the
weights is actually not new, something like
https://stats.stackexchange.com/questions/44776/logistic-regression-with-weighted-instances.
I thought my inquiry was not too strange, and I could utilize some existing
codes. It is just an optimization problem at the end of day, or not? Thanks


So the object is to fit a regularized (i.e, penalized) logistic 
regression rather than to fit by ML. glm() won't do that.


I took a quick look at the stackexchange link that you provided and the 
document referenced in that link.  The penalty proposed in the document 
is just a multiple of the sum of squared regression coefficients, what 
usually called an L2 penalty in the machine-learning literature.  There 
are existing implementations of regularized logistic regression in R -- 
see the machine learning CRAN taskview 
<https://cran.r-project.org/web/views/MachineLearning.html>. I believe 
that the penalized package will fit a regularized logistic regression 
with an L2 penalty.


As well, unless my quick reading was inaccurate, I think that you, and 
perhaps the stackexchange poster, might have been confused by the 
terminology used in the document: What's referred to as "weights" in the 
document is what statisticians more typically call "regression 
coefficients," and the "bias weight" is the "intercept" or "regression 
constant." Perhaps I'm missing some connection -- I'm not the best 
person to ask about machine learning.


Best,
 John



On Sat, Aug 29, 2020 at 9:02 AM John Fox  wrote:


Dear John,

On 2020-08-29 1:30 a.m., John Smith wrote:

Thanks Prof. Fox.

I am curious: what is the model estimated below?


Nonsense, as Peter explained in a subsequent response to your prior
posting.



I guess my inquiry seems more complicated than I thought: with y being

0/1, how to fit weighted logistic regression with weights <1, in the sense
of weighted least squares? Thanks

What sense would that make? WLS is meant to account for non-constant
error variance in a linear model, but in a binomial GLM, the variance is
purely a function for the mean.

If you had binomial (rather than binary 0/1) observations (i.e.,
binomial trials exceeding 1), then you could account for overdispersion,
e.g., by introducing a dispersion parameter via the quasibinomial
family, but that isn't equivalent to variance weights in a LM, rather to
the error-variance parameter in a LM.

I guess the question is what are you trying to achieve with the weights?

Best,
   John




On Aug 28, 2020, at 10:51 PM, John Fox  wrote:

Dear John

I think that you misunderstand the use of the weights argument to glm()

for a binomial GLM. From ?glm: "For a binomial GLM prior weights are used
to give the number of trials when the response is the proportion of
successes." That is, in this case y should be the observed proportion of
successes (i.e., between 0 and 1) and the weights are integers giving the
number of trials for each binomial observation.


I hope this helps,
John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/


On 2020-08-28 9:28 p.m., John Smith wrote:
If the weights < 1, then we have different values! See an example

below.

How  should I interpret logLik value then?
set.seed(135)
   y <- c(rep(0, 50), rep(1, 50))
   x <- rnorm(100)
   data <- data.frame(cbind(x, y))
   weights <- c(rep(1, 50), rep(2, 50))
   fit <- glm(y~x, data, family=binomial(), weights/10)
   res.dev <- residuals(fit, type="deviance")
   res2 <- -0.5*res.dev^2
   cat("loglikelihood value", logLik(fit), sum(res2), "\n")

On Tue, Aug 25, 2020 at 11:40 AM peter dalgaard 

wrote:

If you don't worry too much about an additive constant, then half the
negative squared deviance residuals should do. (Not quite sure how

weights

factor in. Looks like they are accounted for.)

-pd


On 25 Aug 2020, at 17:33 , John Smith  wrote:

Dear R-help,

The function logLik can be used to obtain the maximum log-likelihood

value

from a glm object. This is an aggregated value, a summation of

individual

log-likelihood values. How do I obtain individual values? In the

following

example, I would expect 9 numbers since the response has length 9. I

could

write a function to compute the values, but there are lots of
family members in glm, and I am trying not to reinvent wheels.

Thanks!


counts <- c(18,17,15,20,10,20,25,13,12)
  outcome <- gl(3,1,9)
  treatment <- gl(3,3)
  data.frame(treatment, outcome, counts) # showing data
  glm.D93 <- glm(counts ~ outcome + treatment, family = poisson())
  (ll <- logLik(glm.D93))

[[alternative HTML version deleted]]

Re: [R] How to obtain individual log-likelihood value from glm?

2020-08-29 Thread John Fox


Dear John,

On 2020-08-29 1:30 a.m., John Smith wrote:

Thanks Prof. Fox.

I am curious: what is the model estimated below?


Nonsense, as Peter explained in a subsequent response to your prior posting.



I guess my inquiry seems more complicated than I thought: with y being 0/1, how to 
fit weighted logistic regression with weights <1, in the sense of weighted 
least squares? Thanks


What sense would that make? WLS is meant to account for non-constant 
error variance in a linear model, but in a binomial GLM, the variance is 
purely a function for the mean.


If you had binomial (rather than binary 0/1) observations (i.e., 
binomial trials exceeding 1), then you could account for overdispersion, 
e.g., by introducing a dispersion parameter via the quasibinomial 
family, but that isn't equivalent to variance weights in a LM, rather to 
the error-variance parameter in a LM.


I guess the question is what are you trying to achieve with the weights?

Best,
 John




On Aug 28, 2020, at 10:51 PM, John Fox  wrote:

Dear John

I think that you misunderstand the use of the weights argument to glm() for a binomial 
GLM. From ?glm: "For a binomial GLM prior weights are used to give the number of 
trials when the response is the proportion of successes." That is, in this case y 
should be the observed proportion of successes (i.e., between 0 and 1) and the weights 
are integers giving the number of trials for each binomial observation.

I hope this helps,
John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/


On 2020-08-28 9:28 p.m., John Smith wrote:
If the weights < 1, then we have different values! See an example below.
How  should I interpret logLik value then?
set.seed(135)
  y <- c(rep(0, 50), rep(1, 50))
  x <- rnorm(100)
  data <- data.frame(cbind(x, y))
  weights <- c(rep(1, 50), rep(2, 50))
  fit <- glm(y~x, data, family=binomial(), weights/10)
  res.dev <- residuals(fit, type="deviance")
  res2 <- -0.5*res.dev^2
  cat("loglikelihood value", logLik(fit), sum(res2), "\n")

On Tue, Aug 25, 2020 at 11:40 AM peter dalgaard  wrote:
If you don't worry too much about an additive constant, then half the
negative squared deviance residuals should do. (Not quite sure how weights
factor in. Looks like they are accounted for.)

-pd


On 25 Aug 2020, at 17:33 , John Smith  wrote:

Dear R-help,

The function logLik can be used to obtain the maximum log-likelihood

value

from a glm object. This is an aggregated value, a summation of individual
log-likelihood values. How do I obtain individual values? In the

following

example, I would expect 9 numbers since the response has length 9. I

could

write a function to compute the values, but there are lots of
family members in glm, and I am trying not to reinvent wheels. Thanks!

counts <- c(18,17,15,20,10,20,25,13,12)
 outcome <- gl(3,1,9)
 treatment <- gl(3,3)
 data.frame(treatment, outcome, counts) # showing data
 glm.D93 <- glm(counts ~ outcome + treatment, family = poisson())
 (ll <- logLik(glm.D93))

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com











[[alternative HTML version deleted]]
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to obtain individual log-likelihood value from glm?

2020-08-28 Thread John Fox


Dear John

I think that you misunderstand the use of the weights argument to glm() 
for a binomial GLM. From ?glm: "For a binomial GLM prior weights are 
used to give the number of trials when the response is the proportion of 
successes." That is, in this case y should be the observed proportion of 
successes (i.e., between 0 and 1) and the weights are integers giving 
the number of trials for each binomial observation.


I hope this helps,
 John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2020-08-28 9:28 p.m., John Smith wrote:

If the weights < 1, then we have different values! See an example below.
How  should I interpret logLik value then?

set.seed(135)
  y <- c(rep(0, 50), rep(1, 50))
  x <- rnorm(100)
  data <- data.frame(cbind(x, y))
  weights <- c(rep(1, 50), rep(2, 50))
  fit <- glm(y~x, data, family=binomial(), weights/10)
  res.dev <- residuals(fit, type="deviance")
  res2 <- -0.5*res.dev^2
  cat("loglikelihood value", logLik(fit), sum(res2), "\n")

On Tue, Aug 25, 2020 at 11:40 AM peter dalgaard  wrote:


If you don't worry too much about an additive constant, then half the
negative squared deviance residuals should do. (Not quite sure how weights
factor in. Looks like they are accounted for.)

-pd


On 25 Aug 2020, at 17:33 , John Smith  wrote:

Dear R-help,

The function logLik can be used to obtain the maximum log-likelihood

value

from a glm object. This is an aggregated value, a summation of individual
log-likelihood values. How do I obtain individual values? In the

following

example, I would expect 9 numbers since the response has length 9. I

could

write a function to compute the values, but there are lots of
family members in glm, and I am trying not to reinvent wheels. Thanks!

counts <- c(18,17,15,20,10,20,25,13,12)
 outcome <- gl(3,1,9)
 treatment <- gl(3,3)
 data.frame(treatment, outcome, counts) # showing data
 glm.D93 <- glm(counts ~ outcome + treatment, family = poisson())
 (ll <- logLik(glm.D93))

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com












[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] install.packages() R vs RStudio

2020-08-17 Thread John Fox

Hi Duncan,

What you say is entirely sensible.

Yes, it's primarily the silent part that seems problematic to me. 
Messages about masking are uninteresting until one encounters a problem, 
and then they may provide an important clue to the source of the problem.

As to this specific case: It's not clear to me why it's necessary or 
even desirable for RStudio to mask utils::install.packages(). After all 
RStudio provides an alternative route to package installation via the 
Packages tab, and it wouldn't have been hard to name the function 
something different from install.packages() to provide additional 
functionality via direct commands.

Best,
 John

On 2020-08-17 3:15 p.m., Duncan Murdoch wrote:

Hi John.

I suspect most good front ends do similar things.  For example, on 
MacOS, R.app messes up "history()".  I've never used ESS, but I imagine 
one could find examples where it acts differently than base R:  isn't 
that the point?

One hopes all differences are improvements, but sometimes they're not. 
If the modifications cause trouble (e.g. the ones you and I have never 
experienced with install.packages() in RStudio, or the one I experience 
every now and then with history() in R.app), then that may be a bug in 
the front-end.  It should be reported to the authors.

R is designed to be flexible, and to let people change its behaviour. 
Using that flexibility is what all users should do.  Improving the user 
experience is what front-end writers should do.  I don't find it 
inadvisable at all.  If it's the "silent" part that you object to, I 
think that's a matter of taste.  Personally, I've stopped reading the 
messages like

"Attaching package: ‘zoo’

The following objects are masked from ‘package:base’:

     as.Date, as.Date.numeric"

so they may as well be silent.

Duncan Murdoch

On 17/08/2020 10:02 a.m., John Fox wrote:

Dear Duncan,

On 2020-08-17 9:03 a.m., Duncan Murdoch wrote:

On 17/08/2020 7:54 a.m., Ivan Calandra wrote:

Dear useRs,

Following the recent activity on the list, I have been made aware of
this discussion:
https://stat.ethz.ch/pipermail/r-help/2020-May/466788.html

I used to install all packages in R, but for simplicity (I use RStudio
for all purposes), I now do it in RStudio. Now I am left wondering
whether I should continue installing packages directly from RStudio or
whether I should revert to using R.

My goal is not to flare a debate over whether RStudio is better or 
worse

than R, but rather simply to understand whether there are differences
and potential issues (that could lead to problems in code) about
installing packages through RStudio.

In general, it would be nice to have a list of the differences in
behavior between R and RStudio, but I believe this should come from the
RStudio side of things.

Thank you all for the insights.
Ivan

To see the install.packages function that RStudio installs, just type
its name:

  > install.packages
function (...)
.rs.callAs(name, hook, original, ...)

You can debug it to see the other variables:

  > debug(install.packages)
  > install.packages("abind")
debugging in: install.packages("abind")
debug: .rs.callAs(name, hook, original, ...)
Browse[2]> name
[1] "install.packages"
Browse[2]> hook
function (original, pkgs, lib, repos = getOption("repos"), ...)
{
      if (missing(pkgs))
      return(utils::install.packages())
      if (!.Call("rs_canInstallPackages", PACKAGE = "(embedding)")) {
      stop("Package installation is disabled in this version of
RStudio",
      call. = FALSE)
      }
      packratMode <- !is.na(Sys.getenv("R_PACKRAT_MODE", unset = NA))
      if (!is.null(repos) && !packratMode &&
.rs.loadedPackageUpdates(pkgs)) {
      installCmd <- NULL
      for (i in seq_along(sys.calls())) {
      if (identical(deparse(sys.call(i)[[1]]),
"install.packages")) {
      installCmd <- gsub("\\s+", " ",
paste(deparse(sys.call(i)),
    collapse = " "))
      break
      }
      }
      .rs.enqueLoadedPackageUpdates(installCmd)
      stop("Updating loaded packages")
      }
      .rs.addRToolsToPath()
      on.exit({
      .rs.updatePackageEvents()
      .Call("rs_packageLibraryMutated", PACKAGE = "(embedding)")
      .rs.restorePreviousPath()
      })
      original(pkgs, lib, repos, ...)
}

The .rs.callAs function just substitutes the call to "hook" for the call
to the original install.packages.  So you can see that they do the
following:
   - they allow a way to disable installing packages,
   - they support "packrat" (a system for installing particular versions
of packages, see https://github.com/rstudio/packrat),
   -

Re: [R] install.packages() R vs RStudio

2020-08-17 Thread John Fox

Dear Duncan,

On 2020-08-17 9:03 a.m., Duncan Murdoch wrote:

On 17/08/2020 7:54 a.m., Ivan Calandra wrote:

Dear useRs,

Following the recent activity on the list, I have been made aware of
this discussion:
https://stat.ethz.ch/pipermail/r-help/2020-May/466788.html

I used to install all packages in R, but for simplicity (I use RStudio
for all purposes), I now do it in RStudio. Now I am left wondering
whether I should continue installing packages directly from RStudio or
whether I should revert to using R.

My goal is not to flare a debate over whether RStudio is better or worse
than R, but rather simply to understand whether there are differences
and potential issues (that could lead to problems in code) about
installing packages through RStudio.

In general, it would be nice to have a list of the differences in
behavior between R and RStudio, but I believe this should come from the
RStudio side of things.

Thank you all for the insights.
Ivan

To see the install.packages function that RStudio installs, just type 
its name:

 > install.packages
function (...)
.rs.callAs(name, hook, original, ...)

You can debug it to see the other variables:

 > debug(install.packages)
 > install.packages("abind")
debugging in: install.packages("abind")
debug: .rs.callAs(name, hook, original, ...)
Browse[2]> name
[1] "install.packages"
Browse[2]> hook
function (original, pkgs, lib, repos = getOption("repos"), ...)
{
     if (missing(pkgs))
     return(utils::install.packages())
     if (!.Call("rs_canInstallPackages", PACKAGE = "(embedding)")) {
     stop("Package installation is disabled in this version of 
RStudio",

     call. = FALSE)
     }
     packratMode <- !is.na(Sys.getenv("R_PACKRAT_MODE", unset = NA))
     if (!is.null(repos) && !packratMode && 
.rs.loadedPackageUpdates(pkgs)) {

     installCmd <- NULL
     for (i in seq_along(sys.calls())) {
     if (identical(deparse(sys.call(i)[[1]]), 
"install.packages")) {
     installCmd <- gsub("\\s+", " ", 
paste(deparse(sys.call(i)),

   collapse = " "))
     break
     }
     }
     .rs.enqueLoadedPackageUpdates(installCmd)
     stop("Updating loaded packages")
     }
     .rs.addRToolsToPath()
     on.exit({
     .rs.updatePackageEvents()
     .Call("rs_packageLibraryMutated", PACKAGE = "(embedding)")
     .rs.restorePreviousPath()
     })
     original(pkgs, lib, repos, ...)
}

The .rs.callAs function just substitutes the call to "hook" for the call 
to the original install.packages.  So you can see that they do the 
following:

  - they allow a way to disable installing packages,
  - they support "packrat" (a system for installing particular versions 
of packages, see https://github.com/rstudio/packrat),

  - they add RTools to the path (presumably only on Windows)
  - they call the original function, and at the end update internal 
variables so they can show the library in the Packages pane.

So there is no reason not to do it in R.

By the way, saying that this is a "modified version of R" is like saying 
every single user who defines a variable creates a modified version of 
R.  If you type "x" in the plain R console, you see "Error: object 'x' 
not found".  If you "modify" R by assigning a value to x, you'll see 
something different.  Very scary!

I can't recall ever disagreeing with something you said on the R-help, 
but this seems to me to be off-base. While what you say is technically 
correct, silently masking a standard R function, in this case, I 
believe, by messing with the namespace of the utils package, seems 
inadvisable to me.

As has been noted, cryptic problems have arisen with install.packages() 
in RStudio -- BTW, I use it regularly and haven't personally experienced 
any issues. One could concoct truly scary examples, such as redefining 
isTRUE().

Best,
 John

Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Best settings for RStudio video recording?

2020-08-14 Thread John Fox


Hi,

I had occasion last month to teach a two-week, two-hour-per-day lecture 
series on R via Zoom for the ICPSR Summer Program -- the website for the 
lectures is at 
<https://socialsciences.mcmaster.ca/jfox/Courses/R/ICPSR/index.html>.


I used RStudio and mostly displayed my desktop via one monitor in a 
two-monitor setup. That allowed me to show the website (or Canvas site) 
for the lectures, PDF slides, or the RStudio window, and to have the 
other monitor free to control the Zoom session. Most of the time, 
perhaps 1.5 hours per session, I displayed the RStudio window.


To set the size of the fonts in RStudio, I tested in a dummy Zoom 
session that I viewed on a small laptop prior to the start of the 
lecture series.


I hope this helps,
 John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2020-08-14 4:29 a.m., peter dalgaard wrote:

[Sorry about the misfire a second ago...]

As others have said, for deeper questions, try RStudio's own lists or 
R-sig-teaching.

However, FWIW, I seem to have gotten away with just using a separate virtual 
desktop with my usual work setup, and then switch to it when necessary. This 
was for Panopto video recordings, but Zoom et al. should be much the same. 
Compared to physical lecturing it is actually somewhat easier, because you 
don't need to worry so much about projector shortcomings, readability from the 
back row, etc.

-pd


On 13 Aug 2020, at 20:58 , Jonathan Greenberg  wrote:

Folks:

I was wondering if you all would suggest some helpful RStudio
configurations that make recording a session via e.g. zoom the most useful
for students doing remote learning.  Thoughts?

--j

--
Jonathan A. Greenberg, PhD
Randall Endowed Professor and Associate Professor of Remote Sensing
Global Environmental Analysis and Remote Sensing (GEARS) Laboratory
Natural Resources & Environmental Science
University of Nevada, Reno
1664 N Virginia St MS/0186
Reno, NV 89557
Phone: 415-763-5476
https://www.gearslab.org/

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Dependent Variable in Logistic Regression

2020-08-01 Thread John Fox

Dear Paul,

I think that this thread has gotten unnecessarily complicated. The 
answer, as is easily demonstrated, is that a binary response for a 
binomial GLM in glm() may be a factor, a numeric variable, or a logical 
variable, with identical results; for example:

--- snip -

> set.seed(123)

> head(x <- rnorm(100))
[1] -0.56047565 -0.23017749  1.55870831  0.07050839  0.12928774  1.71506499

> head(y <- rbinom(100, 1, 1/(1 + exp(-x
[1] 0 1 1 1 1 0

> head(yf <- as.factor(y))
[1] 0 1 1 1 1 0
Levels: 0 1

> head(yl <- y == 1)
[1] FALSE  TRUE  TRUE  TRUE  TRUE FALSE

> glm(y ~ x, family=binomial)

Call:  glm(formula = y ~ x, family = binomial)

Coefficients:
(Intercept)x
 0.3995   1.1670

Degrees of Freedom: 99 Total (i.e. Null);  98 Residual
Null Deviance:  134.6
Residual Deviance: 114.9AIC: 118.9

> glm(yf ~ x, family=binomial)

Call:  glm(formula = yf ~ x, family = binomial)

Coefficients:
(Intercept)x
 0.3995   1.1670

Degrees of Freedom: 99 Total (i.e. Null);  98 Residual
Null Deviance:  134.6
Residual Deviance: 114.9AIC: 118.9

> glm(yl ~ x, family=binomial)

Call:  glm(formula = yl ~ x, family = binomial)

Coefficients:
(Intercept)x
 0.3995   1.1670

Degrees of Freedom: 99 Total (i.e. Null);  98 Residual
Null Deviance:  134.6
Residual Deviance: 114.9AIC: 118.9

--- snip -

The original poster claimed to have encountered an error with a 0/1 
numeric response, but didn't show any data or even a command. I suspect 
that the response was a character variable, but of course can't really 
know that.

Best,
 John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2020-08-01 2:25 p.m., Paul Bernal wrote:

Dear friend,

I am aware that I have a binomial dependent variable, which is covid status
(1 if covid positive, and 0 otherwise).

My question was if R requires to turn a binomial response variable into a
factor or not, that's all.

Cheers,

Paul

El sáb., 1 de agosto de 2020 1:22 p. m., Bert Gunter 
escribió:

... yes, but so does lm() for a categorical **INdependent** variable with
more than 2 numerically labeled levels. n levels  = (n-1) df for a
categorical covariate, but 1 for a continuous one (unless more complex
models are explicitly specified of course). As I said, the OP seems
confused about whether he is referring to the response or covariates. Or
maybe he just made the same typo I did.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Sat, Aug 1, 2020 at 11:15 AM Patrick (Malone Quantitative) <
mal...@malonequantitative.com> wrote:

No, R does not. glm() does in order to do logistic regression.

On Sat, Aug 1, 2020 at 2:11 PM Paul Bernal 
wrote:

Hi Bert,

Thank you for the kind reply.

But what if I don't turn the variable into a factor. Let's say that in
excel I just coded the variable as 1s and 0s and just imported the
dataset
into R and fitted the logistic regression without turning any categorical
variable or dummy variable into a factor?

Does R requires every dummy variable to be treated as a factor?

Best regards,

Paul

El sáb., 1 de agosto de 2020 12:59 p. m., Bert Gunter <
bgunter.4...@gmail.com> escribió:

x <- factor(0:1)
x <- factor("yes","no")

will produce identical results up to labeling.

Bert Gunter

"The trouble with having an open mind is that people keep coming along

and

sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Sat, Aug 1, 2020 at 10:40 AM Paul Bernal 
wrote:

Dear friends,

Hope you are doing great. I want to fit a logistic regression in R,

where

the dependent variable is the covid status (I used 1 for covid

positives,

and 0 for covid negatives), but when I ran the glm, R complains that I
should make the dependent variable a factor.

What would be more advisable, to keep the dependent variable with 1s

and

0s, or code it as yes/no and then make it a factor?

Any guidance will be greatly appreciated,

Best regards,

Paul

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
ht

Re: [R] Axis with inverse logarithmic scale

2020-07-28 Thread John Fox


Dear Martin,

On 7/28/2020 10:17 AM, Martin Maechler wrote:

Martin Maechler
 on Tue, 28 Jul 2020 15:56:10 +0200 writes:



John Fox
 on Mon, 27 Jul 2020 12:57:57 -0400 writes:


 >> Dear Dileepkumar R,
 >> As is obvious from the tick marks, the vertical axis is not log-scaled:

 >>> log10(99.999) - log10(99.99)
 >> [1] 3.908865e-05
 >>> log10(99) - log10(90)
 >> [1] 0.04139269


 >> That is, these (approximately?) equally spaced ticks aren't equally
 >> spaced on the log scale.

 >> The axis is instead apparently (at least approximately) on the logit
 >> (log-odds) scale:

 >>> library(car)
 >> Loading required package: carData
 >>> logit(99.999) - logit(99.99)
 >> [1] 2.302675
 >>> logit(99) - logit(90)
 >> [1] 2.397895

 > Small remark : You don't need car (or any other extra pkg) to have logit:

 > logit <- plogis # is sufficient

 > Note that the ?plogis (i.e. 'Logistic') help page has had a
 > \concept{logit}

 > entry (which would help if one used  help.search() .. {I don't;
 > I have 1 of packages}),
 > and that same help page has been talking about 'logit' for ca 16
 > years now (and I'm sure this is news for most readers, still)...

but now I see that car uses the "empirical logit" function,
where plogis() provides the mathematical logit():


Not quite the empirical logit, because we don't know the counts, but a 
similar idea when the proportions include 0 or 1. Also, logit() 
recognizes percents as well as proportions, and so there's no need to 
convert the former to the latter.




The former is typically needed for data transformations where
you don't want to map {0,1} to  -/+ Inf but rather to finite
values ..

So I should stayed quiet, probably..


Well, I wouldn't go so far as that.

Best,
 John



Martin


 >> You can get a graph close to the one you shared via the following:

 >> library(car) # repeated so you don't omit it

 > .. and here you need 'car'  for the nice  probabilityAxis(.) ..

 >>> logits <- logit(y_values)
 >>> plot(x_value, logits, log="x", axes=FALSE,
 >> +  xlim=c(1, 200), ylim=logit(c(10, 99.999)),
 >> +  xlab="Precipitation Intensity (mm/d)",
 >> +  ylab="Cumulative Probability",
 >> +  main="Daily U.S. Precipitation",
 >> +  col="magenta")
 >>> axis(1, at=c(1, 2, 5, 10, 20, 50, 100, 200))
 >>> probabilityAxis(side=2, at=c(10, 30, 50, 90, 99, 99.9, 99.99,
 >> 99.999)/100)
 >>> box()

 >> Comments:

 >> This produces probabilities, not percents, on the vertical axis, which
 >> conforms to what the axis label says. Also, the ticks in the R version
 >> point out rather than into the plotting region -- the former is
 >> generally considered better practice. Finally, the graph is not a
 >> histogram as the original title states.

 >> I hope this helps,
 >> John


 >> 
 >> John Fox
 >> Professor Emeritus
 >> McMaster University
 >> Hamilton, Ontario, Canada
 >> web: https://socialsciences.mcmaster.ca/jfox/

 >> On 7/27/2020 11:56 AM, Dileepkumar R wrote:
 >>> I think the attached sample figure is not visible
 >>> Here is the sample figure:
 >>> 
https://drive.google.com/file/d/16Uy3JD0wsEucUv_KOhXCxLZ4U-3wiBTs/view?usp=sharing
 >>>
 >>> sincerely,
 >>>
 >>>
 >>> Dileepkumar R
 >>>
 >>>
 >>>
 >>>
 >>> On Mon, Jul 27, 2020 at 7:13 PM Dileepkumar R 
 >>> wrote:
 >>>
 >>>> Dear All,
 >>>>
 >>>> I want to plot a simple cumulative probability distribution graph with
 >>>> like the attached screenshot.
 >>>> But I couldn't fix the y-axis scale as in that screenshot.
 >>>>
 >>>> My data details are follows:
 >>>>
 >>>> y_values
 >>>> 
=c(66.78149,76.10846,81.65518,85.06448,87.61703,89.61314,91.20297,92.36884,
 >>>> 
93.64070,94.57693,95.23052,95.75163,96.15792,96.58188,96.97933,97.29730,
 >>>> 
97.59760,97.91556,98.14520,98.37485,98.57799,98.74580,98.87829,99.06377,
 >>>> 
99.16093,99.25808,99.37290,99.45239,99.54072,99.59371,99.62904,99.6643

Re: [R] Axis with inverse logarithmic scale

2020-07-27 Thread John Fox

Dear Dileepkumar R,

As is obvious from the tick marks, the vertical axis is not log-scaled:

> log10(99.999) - log10(99.99)
[1] 3.908865e-05
> log10(99) - log10(90)
[1] 0.04139269

That is, these (approximately?) equally spaced ticks aren't equally 
spaced on the log scale.

The axis is instead apparently (at least approximately) on the logit 
(log-odds) scale:

> library(car)
Loading required package: carData
> logit(99.999) - logit(99.99)
[1] 2.302675
> logit(99) - logit(90)
[1] 2.397895

You can get a graph close to the one you shared via the following:

library(car) # repeated so you don't omit it
> logits <- logit(y_values)
> plot(x_value, logits, log="x", axes=FALSE,
+  xlim=c(1, 200), ylim=logit(c(10, 99.999)),
+  xlab="Precipitation Intensity (mm/d)",
+  ylab="Cumulative Probability",
+  main="Daily U.S. Precipitation",
+  col="magenta")
> axis(1, at=c(1, 2, 5, 10, 20, 50, 100, 200))
> probabilityAxis(side=2, at=c(10, 30, 50, 90, 99, 99.9, 99.99, 
99.999)/100)

> box()

Comments:

This produces probabilities, not percents, on the vertical axis, which 
conforms to what the axis label says. Also, the ticks in the R version 
point out rather than into the plotting region -- the former is 
generally considered better practice. Finally, the graph is not a 
histogram as the original title states.

I hope this helps,
 John

  John Fox
  Professor Emeritus
  McMaster University
  Hamilton, Ontario, Canada
  web: https://socialsciences.mcmaster.ca/jfox/

On 7/27/2020 11:56 AM, Dileepkumar R wrote:

I think the attached sample figure is not visible
Here is the sample figure:
https://drive.google.com/file/d/16Uy3JD0wsEucUv_KOhXCxLZ4U-3wiBTs/view?usp=sharing

sincerely,

Dileepkumar R

On Mon, Jul 27, 2020 at 7:13 PM Dileepkumar R 
wrote:

Dear All,

I want to plot a simple cumulative probability distribution graph with
like the attached screenshot.
But I couldn't fix the y-axis scale as in that screenshot.

My data details are follows:

y_values
=c(66.78149,76.10846,81.65518,85.06448,87.61703,89.61314,91.20297,92.36884,
93.64070,94.57693,95.23052,95.75163,96.15792,96.58188,96.97933,97.29730,
97.59760,97.91556,98.14520,98.37485,98.57799,98.74580,98.87829,99.06377,
99.16093,99.25808,99.37290,99.45239,99.54072,99.59371,99.62904,99.66437,
99.69970,99.70853,99.72620,99.73503,99.77036,99.79686,99.80569,99.82335,
99.83219,99.84985,99.86751,99.87635,99.87635,99.90284,99.90284,99.90284,
99.91168,99.92051,99.92051,99.93817,99.93817,99.93817,99.95584,99.95584,
99.97350,99.97350,99.97350,99.97350,99.97350,99.97350,99.97350)

x_value=seq(63)

Thank you all in advance

Dileepkumar R

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error message from allEffects(model) / effect(model)" ‘range’ not meaningful for factors"

2015-08-19 Thread John Fox

Hi,

> m <- lm(y~x) # no problem

> allEffects(m)# also no problem
 model: y ~ x

 x effect
x
   abc 
3.322448 3.830997 4.969154

> effect("x", m) # ditto

 x effect
x
   abc 
3.322448 3.830997 4.969154 

> Effect("x", m) # ditto

 x effect
x
   abc 
3.322448 3.830997 4.969154

Best,
 John

-------
John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
http://socserv.socsci.mcmaster.ca/jfox/




> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Robert
> Zimbardo
> Sent: Tuesday, August 18, 2015 8:50 PM
> To: r-help@r-project.org
> Subject: [R] Error message from allEffects(model) / effect(model)"
> ‘range’ not meaningful for factors"
> 
> Hi
> 
> I cannot figure out why the effects package throws me error messages
> with the following simple code:
> 
> 
> rm(list=ls(all=TRUE)); set.seed(1); library(effects)
> # set up data
> x <- factor(rep(letters[1:3], each=100))
> y <- c(rnorm(100, 3, 3), rnorm(100, 4, 3), rnorm(100, 5, 3))
> 
> 
> # fit linear model
> m <- summary(lm(y~x)) # no problem
> 
> # now the problem
> plot(allEffects(m))
> # Error in Summary.factor(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> :
> #   ‘range’ not meaningful for factors
> plot(effect("x", m))
> # Error in Summary.factor(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> :
> #   ‘range’ not meaningful for factors
> 
> 
> Any ideas? It's go to be something superobvious, but I don't get it.
> Thanks,
> RZ
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R GUI tklistbox get value

2015-07-20 Thread John Fox

Dear j.para.fernandez,

Try

selecvar <- dat[, as.numeric(tkcurselection(tl))+1]

Omitting the comma returns a one-column data frame, not a numeric vector.

I hope this helps,
 John

----
John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/

On Mon, 20 Jul 2015 03:29:07 -0700 (PDT)
 jpara3  wrote:
> Hi, i have a dataframe, dat, with 2 variables, one and two.
> 
> I want to print in R the mean of the selected variable of the dataframe. You
> can select it with a tklistbox, but when you click OK button, the mean is
> not displayed, just NA
> 
> 
> 
> 
> 
> one<-c(5,5,6,9,5,8)
> two<-c(12,13,14,12,14,12)
> dat<-data.frame(uno,dos)
> 
> require(tcltk)
> tt<-tktoplevel()
> tl<-tklistbox(tt,height=4,selectmode="single")
> tkgrid(tklabel(tt,text="Selecciona la variable para calcular media"))
> tkgrid(tl)
> for (i in (1:4))
> {
> tkinsert(tl,"end",colnames(dat[i]))
> }
>  
> OnOK <- function()
> {
> 
>   selecvar <- dat[as.numeric(tkcurselection(tl))+1]
>
>   print(mean(selecvar))
> 
> }
> OK.but <-tkbutton(tt,text="   OK   ",command=OnOK)
> tkgrid(OK.but)
> tkfocus(tt)
> 
> #
> 
> Can someone please help me?? Thanks!!! 
> 
> 
> 
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/R-GUI-tklistbox-get-value-tp4710064.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] powerTransform warning message?

2015-07-16 Thread John Fox

Dear Brittany,

On Thu, 16 Jul 2015 17:35:38 -0600
 Brittany Demmitt  wrote:
> Hello,
> 
> I have a series of 40 variables that I am trying to transform via the boxcox 
> method using the powerTransfrom function in R.  I have no zero values in any 
> of my variables.  When I run the powerTransform function on the full data set 
> I get the following warning. 
> 
> Warning message:
> In sqrt(diag(solve(res$hessian))) : NaNs produced
> 
> However, when I analyze the variables in groups, rather than all 40 at a time 
> I do not get this warning message.  Why would this be? And does this mean 
> this warning is safe to ignore?
> 

No, it is not safe to ignore the warning, and the problem has nothing to do 
with non-positive values in the data -- when you say that there are no 0s in 
the data, I assume that you mean that the data values are all positive. The 
square-roots of the diagonal entries of the Hessian at the (pseudo-) ML 
estimates are the SEs of the estimated transformation parameters. If the 
Hessian can't be inverted, that usually implies that the maximum of the 
(pseudo-) likelihood isn't well defined. 

This isn't surprising when you're trying to transform as many as 40 variables 
at a time to multivariate normality. It's my general experience that people 
often throw their data into the Box-Cox black box and hope for the best without 
first examining the data, and, e.g., insuring a reasonable ratio of 
maximum/minimum values for each variable, checking for extreme outliers, etc. 
Of course, I don't know that you did that, and it's perfectly possible that you 
were careful.

> I would like to add that all of my lambda values are in the -5 to 5 range.  I 
> also get different lambda values when I analyze the variables together versus 
> in groups.  Is this to be expected?
> 

Yes. It's very unlikely that both are right. If, e.g., the variables are 
multivariate normal within groups then their marginal distribution is a mixture 
of multivariate normals, which almost surely isn't itself normal.

I hope this helps,
 John

John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/

> Thank you so much!
> 
> Brittany
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plot in Rcmdr

2015-07-14 Thread John Fox

Dear David and Joanne,

David, thank you for answering Joanne's question before I saw it.

The help page for car::scatterplot() is also accessible via the Help button
in the Rcmdr scatterplot dialog.

I'll think about whether to add a control for legend position to the
scatterplot dialog. There are already some enhancements to the dialog in the
forthcoming version 2.2-0 of the Rcmdr package, due late this summer, but I
try not to make the dialogs too complicated.

Best,
 John

-------
John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
http://socserv.socsci.mcmaster.ca/jfox/



> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of David L
> Carlson
> Sent: July-14-15 1:17 PM
> To: INGRAM Joanne; r-help@R-project.org
> Subject: Re: [R] Plot in Rcmdr
> 
> It can be changed by slightly modifying the scatterplot() command in the
> R Script window and re-submitting it.
> 
> >From the top menu select Data | Data in packages | Read data set from
> an attached package. Then type Pottery in the space next to "Enter name
> of data set" (notice that Pottery is capitalized).
> 
> >From the top menu select Graphs | Scatterplot and then select Al as the
> x-variable and Ca as the y-variable. Click on Plot by groups... and
> select Site (and unselect Plot lines by group). Click OK and OK again to
> produce the plot. The legend is outside the plot region and the top
> margin has been expanded to make room for it.
> 
> In the R Script window you will see the command:
> 
> scatterplot(Ca~Al | Site, reg.line=lm, smooth=TRUE, spread=TRUE,
>   id.method='mahal', id.n = 2, boxplots='xy', span=0.5, by.groups=FALSE,
>   data=Pottery)
> 
> add a single argument to the end of the command so that it looks like
> this:
> 
> scatterplot(Ca~Al | Site, reg.line=lm, smooth=TRUE, spread=TRUE,
>   id.method='mahal', id.n = 2, boxplots='xy', span=0.5, by.groups=FALSE,
>   data=Pottery, legend.coords="topright")
> 
> Then select all three lines and click Submit:
> 
> The new plot puts the legend in the upper right corner of the plot
> region. R Commander uses the scatterplot() function from package ca to
> create the plot. It has several options that are not included on the
> options dialog window in R Commander, but can be accessed simply by
> editing the command that R Commander creates.
> 
> To see these options type
> 
> ?scatterplot
> 
> On an empty line in the R Script window, put the cursor on the line and
> click Submit. This will open your web browser with the manual page for
> scatterplot.
> 
> -
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77840-4352
> 
> 
> 
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of INGRAM
> Joanne
> Sent: Tuesday, July 14, 2015 9:53 AM
> To: r-help@R-project.org
> Subject: [R] Plot in Rcmdr
> 
> Hello,
> 
> I wondered if anyone could help me with a small issue in Rcmdr.
> 
> I have used the 'Graphs' function in the drop-down menu to create a
> scatterplot for groups (gender).  But when I do this the legend (telling
> me the symbols which represent male etc.) keeps obscuring the title of
> the plot.  Does anyone know how to fix this problem - within Rcmdr?
> 
> Please note I am not looking for help with creating the graph in another
> way (for example in R).  I am specifically trying to figure out if this
> can be fixed in Rcmdr.  If the answer is "No - this cannot currently be
> changed within Rcmdr" I would still like to hear from you.
> 
> Many thanks for any help.
> 
> Joanne Ingram
> Research Associate (Medical Statistics)
> Centre for Population Health Science
> University of Edinburgh
> 
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] detecting any element in a vector of strings, appearing anywhere in any of several character variables in a dataframe

2015-07-09 Thread John Fox

Dear Christopher,

My usual orientation to this kind of one-off problem is that I'm looking for a 
simple correct solution. Computing time is usually much smaller than 
programming time. 

That said, Bert Gunter's solution was about 5 times faster in a simple check 
that I ran with microbenchmark, and Jeff Newmiller's solution was about 10 
times faster. Both Bert's and Jeff's (eventual) solution protect against 
partial (rather than full-word) matches, while mine doesn't (though it could 
easily be modified to do that).

Best,
 John

> -Original Message-
> From: Christopher W Ryan [mailto:cr...@binghamton.edu]
> Sent: July-09-15 2:49 PM
> To: Bert Gunter
> Cc: Jeff Newmiller; R Help; John Fox
> Subject: Re: [R] detecting any element in a vector of strings, appearing
> anywhere in any of several character variables in a dataframe
> 
> Thanks everyone.  John's original solution worked great.  And with
> 27,000 records, 65 alarm.words, and 6 columns to search, it takes only
> about 15 seconds.  That is certainly adequate for my needs.  But I
> will try out the other strategies too.
> 
> And thanks also for lot's of new R things to learn--grep, grepl,
> do.call . . . that's always a bonus!
> 
> --Chris Ryan
> 
> On Thu, Jul 9, 2015 at 1:52 PM, Bert Gunter 
> wrote:
> > Yup, that does it. Let grep figure out what's a word rather than doing
> > it manually. Forgot about "\b"
> >
> > Cheers,
> > Bert
> >
> >
> > Bert Gunter
> >
> > "Data is not information. Information is not knowledge. And knowledge
> > is certainly not wisdom."
> >-- Clifford Stoll
> >
> >
> > On Thu, Jul 9, 2015 at 10:30 AM, Jeff Newmiller
> >  wrote:
> >> Just add a word break marker before and after:
> >>
> >> zz$v5 <- grepl( paste0( "\\b(", paste0( alarm.words, collapse="|" ),
> ")\\b" ), do.call( paste, zz[ , 2:3 ] ) ) )
> >> -
> --
> >> Jeff NewmillerThe .   .  Go
> Live...
> >> DCN:Basics: ##.#.   ##.#.  Live
> Go...
> >>   Live:   OO#.. Dead: OO#..
> Playing
> >> Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
> >> /Software/Embedded Controllers)   .OO#.   .OO#.
> rocks...1k
> >> -
> --
> >> Sent from my phone. Please excuse my brevity.
> >>
> >> On July 9, 2015 10:12:23 AM PDT, Bert Gunter 
> wrote:
> >>>Jeff:
> >>>
> >>>Well, it would be much better (no loops!) except, I think, for one
> >>>issue: "red" would match "barred" and I don't think that this is what
> >>>is wanted: the matches should be on whole "words" not just string
> >>>patterns.
> >>>
> >>>So you would need to fix up the matching pattern to make this work,
> >>>but it may be a little tricky, as arbitrary whitespace characters,
> >>>e.g. " " or "\n" etc. could be in the strings to be matched
> separating
> >>>the words or ending the "sentence."  I'm sure it can be done, but
> I'll
> >>>leave it to you or others to figure it out.
> >>>
> >>>Of course, if my diagnosis is wrong or silly, please point this out.
> >>>
> >>>Cheers,
> >>>Bert
> >>>
> >>>
> >>>Bert Gunter
> >>>
> >>>"Data is not information. Information is not knowledge. And knowledge
> >>>is certainly not wisdom."
> >>>   -- Clifford Stoll
> >>>
> >>>
> >>>On Thu, Jul 9, 2015 at 9:34 AM, Jeff Newmiller
> >>> wrote:
> >>>> I think grep is better suited to this:
> >>>>
> >>>> zz$v5 <- grepl( paste0( alarm.words, collapse="|" ), do.call(
> paste,
> >>>zz[ , 2:3 ] ) ) )
> >>>>
> >>>-
> --
> >>>> Jeff NewmillerThe .   .  Go
> >>>Live...
> >>>> DCN:Basics: ##.#.   ##.#.
> Live
> >>>Go...
> >>>>   Live:   OO#.. Dead: OO#..
> >>>Playing
> >>>> Research Engineer (So

Re: [R] detecting any element in a vector of strings, appearing anywhere in any of several character variables in a dataframe

2015-07-09 Thread John Fox

Dear Chris,

If I understand correctly what you want, how about the following?

> rows <- apply(zz[, 2:3], 1, function(x) any(sapply(alarm.words, grepl, x=x)))
> zz[rows, ]

  v1  v2v3 v4
3  -1.022329green turtleronald weasley  2
6   0.336599  waffle the hamsterred sparks  1
9  -1.631874 yellow giraffe with a long neck gandalf the white  1
10  1.130622  black bear  gandalf the grey  2

I hope this helps,
 John

----
John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/


On Wed, 08 Jul 2015 22:23:37 -0400
 "Christopher W. Ryan"  wrote:
> Running R 3.1.1 on windows 7
> 
> I want to identify as a case any record in a dataframe that contains any
> of several keywords in any of several variables.
> 
> Example:
> 
> # create a dataframe with 4 variables and 10 records
> v2 <- c("white bird", "blue bird", "green turtle", "quick brown fox",
> "big black dog", "waffle the hamster", "benny likes food a lot", "hello
> world", "yellow giraffe with a long neck", "black bear")
> v3 <- c("harry potter", "hermione grainger", "ronald weasley", "ginny
> weasley", "dudley dursley", "red sparks", "blue sparks", "white dress
> robes", "gandalf the white", "gandalf the grey")
> zz <- data.frame(v1=rnorm(10), v2=v2, v3=v3, v4=rpois(10, lambda=2),
> stringsAsFactors=FALSE)
> str(zz)
> zz
> 
> # here are the keywords
> alarm.words <- c("red", "green", "turtle", "gandalf")
> 
> # For each row/record, I want to test whether the string in v2 or the
> string in v3 contains any of the strings in alarm.words. And then if so,
> set zz$v5=TRUE for that record.
> 
> # I'm thinking the str_detect function in the stringr package ought to
> be able to help, perhaps with some use of apply over the rows, but I
> obviously misunderstand something about how str_detect works
> 
> library(stringr)
> 
> str_detect(zz[,2:3], alarm.words)# error: the target of the search
>  # must be a vector, not multiple
>  # columns
> 
> str_detect(zz[1:4,2:3], alarm.words) # same error
> 
> str_detect(zz[,2], alarm.words)  # error, length of alarm.words
>  # is less than the number of
>  # rows I am using for the
>  # comparison
> 
> str_detect(zz[1:4,2], alarm.words)   # works as hoped when
> length(alarm.words)  # confining nrows
>  # to the length of alarm.words
> 
> str_detect(zz, alarm.words)  # obviously not right
> 
> # maybe I need apply() ?
> my.f <- function(x){str_detect(x, alarm.words)}
> 
> apply(zz[,2], 1, my.f) # again, a mismatch in lengths
># between alarm.words and that
># in which I am searching for
># matching strings
> 
> apply(zz, 2, my.f) # now I'm getting somewhere
> apply(zz[1:4,], 2, my.f)   # but still only works with 4
># rows of the dataframe
> 
> 
> # perhaps %in% could do the job?
> 
> Appreciate any advice.
> 
> --Chris Ryan
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tcltk2 entry box

2015-07-08 Thread John Fox

Dear Matthew,

For file selection, see ?tcltk::tk_choose.files or ?tcltk::tkgetOpenFile . 

You could enter a number in a tk entry widget, but, depending upon the
nature of the number, a slider or other widget might be a better choice. 

For a variety of helpful tcltk examples see
<http://www.sciviews.org/_rgui/tcltk/>, originally by James Wettenhall but
now maintained by Philippe Grosjean (the author of the tcltk2 package). (You
probably don't need tcltk2 for the simple operations that you mention, but
see ?tk2spinbox for an alternative to a slider.)

Best,
 John

-------
John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
http://socserv.socsci.mcmaster.ca/jfox/




> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Matthew
> Sent: July-08-15 8:01 PM
> To: r-help
> Subject: [R] tcltk2 entry box
> 
> Is anyone familiar enough with the tcltk2 package to know if it is
> possible to have an entry box where a user can enter information (such
> as a path to a file or a number) and then be able to use the entered
> information downstream in a R script ?
> 
> The idea is for someone unfamiliar with R to just start an R script that
> would take care of all the commands for them so all they have to do is
> get the script started. However, there is always a couple of pieces of
> information that will change each time the script is used (for example,
> a different file will be processed by the script). So, I would like a
> way for the user to input that information as the script ran.
> 
> Matthew McCormack
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 3 4 5 6 7 8 9 >

1 - 100 of 853 matches

Mail list logo