Re: [R] The Suggests field in a DESCRIPTION file.

2018-11-17 Thread Rolf Turner

On 11/18/18 11:22 AM, Fox, John wrote:

Dear Rolf,

"fortunes" needs to be quoted in requireNamespace("fortunes", quietly=TRUE).

I hope this helps,
  John


Thanks.  I actually figured this out myself, just before getting your 
message!


cheers,

Rolf

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The Suggests field in a DESCRIPTION file --- never mind!!!

2018-11-17 Thread Rolf Turner



I figured it out.  The package name in the call to requireNamespace() 
has to be a *text string*.  I should've had:


fortOK <- requireNamespace("fortunes",quietly=TRUE)

So require() takes a package name, but requireNamespace() takes a 
*string* specifying the package name.  Trap for young players.


Psigh.

cheers,

Rolf Turner

On 11/18/18 11:16 AM, Rolf Turner wrote:


I am building a package which contains a function from which I wish to 
call the fortune() function from the fortunes package --- if that 
package is available.


I have place the line

    Suggests: fortunes

in the DESCRIPTION file.

In my code for the function that I am writing (let's call it "foo")
I put


fortOK <- requireNamespace(fortunes,quietly=TRUE)
    if(fortOK) {
    fortunes::fortune()
    }


thinking that I was following all of the prescriptions in "Writing R 
Extensions".  Yet when I do R CMD check on the package I get



* checking R code for possible problems ... NOTE
foo: no visible binding for global variable ‘fortunes’
Undefined global functions or variables:
  fortunes


What am I doing wrong?

Thanks for any insight.

cheers,

Rolf Turner

P. S.  I tried putting an "imports(fortunes)" in the NAMESPACE file, but 
this just made matters worse:



* checking package dependencies ... ERROR
Namespace dependency not required: ‘fortunes’


Huh?  What on earth is this actually saying?  I cannot parse this error 
message.


R. T.




--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The Suggests field in a DESCRIPTION file.

2018-11-17 Thread Fox, John
Dear Rolf,

"fortunes" needs to be quoted in requireNamespace("fortunes", quietly=TRUE).

I hope this helps,
 John

-
John Fox
Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
Web: https://socialsciences.mcmaster.ca/jfox/



> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Rolf Turner
> Sent: Saturday, November 17, 2018 5:17 PM
> To: r-help@r-project.org
> Subject: [R] The Suggests field in a DESCRIPTION file.
> 
> 
> I am building a package which contains a function from which I wish to call 
> the
> fortune() function from the fortunes package --- if that package is available.
> 
> I have place the line
> 
> Suggests: fortunes
> 
> in the DESCRIPTION file.
> 
> In my code for the function that I am writing (let's call it "foo") I put
> 
> > fortOK <- requireNamespace(fortunes,quietly=TRUE)
> > if(fortOK) {
> > fortunes::fortune()
> > }
> 
> thinking that I was following all of the prescriptions in "Writing R 
> Extensions".
> Yet when I do R CMD check on the package I get
> 
> > * checking R code for possible problems ... NOTE
> > foo: no visible binding for global variable ‘fortunes’
> > Undefined global functions or variables:
> >   fortunes
> 
> What am I doing wrong?
> 
> Thanks for any insight.
> 
> cheers,
> 
> Rolf Turner
> 
> P. S.  I tried putting an "imports(fortunes)" in the NAMESPACE file, but this 
> just
> made matters worse:
> 
> > * checking package dependencies ... ERROR Namespace dependency not
> > required: ‘fortunes’
> 
> Huh?  What on earth is this actually saying?  I cannot parse this error
> message.
> 
> R. T.
> 
> --
> Technical Editor ANZJS
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] The Suggests field in a DESCRIPTION file.

2018-11-17 Thread Rolf Turner



I am building a package which contains a function from which I wish to 
call the fortune() function from the fortunes package --- if that 
package is available.


I have place the line

   Suggests: fortunes

in the DESCRIPTION file.

In my code for the function that I am writing (let's call it "foo")
I put


fortOK <- requireNamespace(fortunes,quietly=TRUE)
if(fortOK) {
fortunes::fortune()
}


thinking that I was following all of the prescriptions in "Writing R 
Extensions".  Yet when I do R CMD check on the package I get



* checking R code for possible problems ... NOTE
foo: no visible binding for global variable ‘fortunes’
Undefined global functions or variables:
  fortunes


What am I doing wrong?

Thanks for any insight.

cheers,

Rolf Turner

P. S.  I tried putting an "imports(fortunes)" in the NAMESPACE file, but 
this just made matters worse:



* checking package dependencies ... ERROR
Namespace dependency not required: ‘fortunes’


Huh?  What on earth is this actually saying?  I cannot parse this error 
message.


R. T.

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiplication of regression coefficient by factor type variable

2018-11-17 Thread Bert Gunter
You shouldn't have to do any of what you are doing.

See ?predict.lm  and note the "newdata" argument.

Also, you should spend some time studying a linear model text, as your
question appears to indicate some basic confusion (e.g. about
"contrasts" ) about how they work.


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Sat, Nov 17, 2018 at 1:24 PM Julian Righ Sampedro
 wrote:
>
> Dear all,
>
> In a context of regression, I have three regressors, two of which are
> categorical variables (sex and education) and have class 'factor'.
>
> y = data$income
> x1 = as.factor(data$sex)  # 2 levels
> x2 = data$age  # continuous
> x3 = as.factor(data$ed)  # 8 levels
>
> for example, the first entries of x3 are
>
> head(x3)[1] 5 3 5 5 4 2
> Levels: 1 2 3 4 5 6 7 8
>
> When we call the model, the output looks like this
>
> model1=lm(y ~ x1 + x2 + x3, data = data)
> summary(model1)
>
> Residuals:
> Min 1Q Median 3QMax -31220  -6300   -594   4429 190731
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)(Intercept)  1440.66
>  3809.99   0.378 0.705417
> x1  -4960.88 772.96  -6.418 2.13e-10 ***
> x2181.45  25.03   7.249 8.41e-13 ***
> x32  2174.953453.22   0.630 0.528948
> x33  7497.683428.94   2.187 0.029004 *
> x34  8278.973576.30   2.315 0.020817 *
> x35 13686.883454.93   3.962 7.97e-05 ***
> x36 15902.924408.49   3.607 0.000325 ***
> x37 28773.133696.77   7.783 1.76e-14 ***
> x38 31455.555448.11   5.774 1.03e-08 ***---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 12060 on 1001 degrees of freedom
> Multiple R-squared:  0.2486,Adjusted R-squared:  0.2418
> F-statistic: 36.79 on 9 and 1001 DF,  p-value: < 2.2e-16
>
> Now suppose I want to compute the residuals. To do so I first need to
> compute the prediction by the model. (I use it in a cross validation
> context so it is a partial display of the code)
>
> yhat1 = model1$coef[1] + model1$coef[2]*x1[i] + model1$coef[3]*x2[i] +
> model1$coef[4]*x3[i]
>
> But I get the following warnings
>
> Warning messages:1: In Ops.factor(model1$coef[2], x1[i]) : ‘*’ not
> meaningful for factors2: In Ops.factor(model1$coef[4], x3[i]) : ‘*’
> not meaningful for factors
>
> 1st question: Is there a way to multiply the coefficient by the 'factor'
> without having to transform my 'factor' into a 'numeric' type variable ?
>
> 2nd question: Since x3 is associated with 7 parameters (one for x32, one
> for x33, ... , one for x38), how do I multiply the 'correct' parameter
> coefficient with my 'factor' x3 ?
>
> I have been considering a 'if then' solution, but to no avail. I also have
> considered splitting my x3 variable into 8 binary variables without
> succeeding. What may be the best approach ? Thank you for your help.
>
> Since I understand this my not be specific enough, I add here the complete
> code
>
> # for n-fold cross validation# fit models on leave-one-out samples
> x1= as.factor(data$sex)
> x2= data$age
> x3= as.factor(data$ed)
> yn=data$income
> n = length(yn)
> e1 = e2 = numeric(n)
>
> for (i in 1:n) {
>   # the ith observation is excluded
>   y = yn[-i]
>   x_1 = x1[-i]
>   x_2 = x2[-i]
>   x_3 = x3[-i]
>   x_4 = as.factor(cf4)[-i]
>   # fit the first model without the ith observation
>   J1 = lm(y ~ x_1 + x_2 + x_3)
>   yhat1 = J1$coef[1] + J1$coef[2]*x1[i] + J1$coef[3]*x2[i] + J1$coef[4]*x3[i]
>   # construct the ith part of the loss function for model 1
>   e1[i] = yn[i] - yhat1
>   # fit the second model without the ith observation
>   J2 = lm(y ~ x_1 + x_2 + x_3  + x_4)
>   
> yhat2=J2$coef[1]+J2$coef[2]*x1[i]+J2$coef[3]*x2[i]+J2$coef[4]*x3[i]+J2$coef[5]*cf4[i]
>   e2[i] = yn[i] - yhat2
>  }
>  sqrt(c(mean(e1^2),mean(e2^2))) # RMSE
>
> cf4 is a variable corresponding to groups (using mixtures) . What we want
> to demonstrate is that the prediction error after cross-validation is lower
> when we include this latent grouping variable. It works wonders when the
> categorical variables are treated as 'numeric' variables. Though the ols
> estimates are obviously very different.
>
> Thank you in advance for your views on the problem.
>
> Best Regards,
>
> julian
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

[R] Multiplication of regression coefficient by factor type variable

2018-11-17 Thread Julian Righ Sampedro
Dear all,

In a context of regression, I have three regressors, two of which are
categorical variables (sex and education) and have class 'factor'.

y = data$income
x1 = as.factor(data$sex)  # 2 levels
x2 = data$age  # continuous
x3 = as.factor(data$ed)  # 8 levels

for example, the first entries of x3 are

head(x3)[1] 5 3 5 5 4 2
Levels: 1 2 3 4 5 6 7 8

When we call the model, the output looks like this

model1=lm(y ~ x1 + x2 + x3, data = data)
summary(model1)

Residuals:
Min 1Q Median 3QMax -31220  -6300   -594   4429 190731

Coefficients:
Estimate Std. Error t value Pr(>|t|)(Intercept)  1440.66
 3809.99   0.378 0.705417
x1  -4960.88 772.96  -6.418 2.13e-10 ***
x2181.45  25.03   7.249 8.41e-13 ***
x32  2174.953453.22   0.630 0.528948
x33  7497.683428.94   2.187 0.029004 *
x34  8278.973576.30   2.315 0.020817 *
x35 13686.883454.93   3.962 7.97e-05 ***
x36 15902.924408.49   3.607 0.000325 ***
x37 28773.133696.77   7.783 1.76e-14 ***
x38 31455.555448.11   5.774 1.03e-08 ***---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 12060 on 1001 degrees of freedom
Multiple R-squared:  0.2486,Adjusted R-squared:  0.2418
F-statistic: 36.79 on 9 and 1001 DF,  p-value: < 2.2e-16

Now suppose I want to compute the residuals. To do so I first need to
compute the prediction by the model. (I use it in a cross validation
context so it is a partial display of the code)

yhat1 = model1$coef[1] + model1$coef[2]*x1[i] + model1$coef[3]*x2[i] +
model1$coef[4]*x3[i]

But I get the following warnings

Warning messages:1: In Ops.factor(model1$coef[2], x1[i]) : ‘*’ not
meaningful for factors2: In Ops.factor(model1$coef[4], x3[i]) : ‘*’
not meaningful for factors

1st question: Is there a way to multiply the coefficient by the 'factor'
without having to transform my 'factor' into a 'numeric' type variable ?

2nd question: Since x3 is associated with 7 parameters (one for x32, one
for x33, ... , one for x38), how do I multiply the 'correct' parameter
coefficient with my 'factor' x3 ?

I have been considering a 'if then' solution, but to no avail. I also have
considered splitting my x3 variable into 8 binary variables without
succeeding. What may be the best approach ? Thank you for your help.

Since I understand this my not be specific enough, I add here the complete
code

# for n-fold cross validation# fit models on leave-one-out samples
x1= as.factor(data$sex)
x2= data$age
x3= as.factor(data$ed)
yn=data$income
n = length(yn)
e1 = e2 = numeric(n)

for (i in 1:n) {
  # the ith observation is excluded
  y = yn[-i]
  x_1 = x1[-i]
  x_2 = x2[-i]
  x_3 = x3[-i]
  x_4 = as.factor(cf4)[-i]
  # fit the first model without the ith observation
  J1 = lm(y ~ x_1 + x_2 + x_3)
  yhat1 = J1$coef[1] + J1$coef[2]*x1[i] + J1$coef[3]*x2[i] + J1$coef[4]*x3[i]
  # construct the ith part of the loss function for model 1
  e1[i] = yn[i] - yhat1
  # fit the second model without the ith observation
  J2 = lm(y ~ x_1 + x_2 + x_3  + x_4)
  
yhat2=J2$coef[1]+J2$coef[2]*x1[i]+J2$coef[3]*x2[i]+J2$coef[4]*x3[i]+J2$coef[5]*cf4[i]
  e2[i] = yn[i] - yhat2
 }
 sqrt(c(mean(e1^2),mean(e2^2))) # RMSE

cf4 is a variable corresponding to groups (using mixtures) . What we want
to demonstrate is that the prediction error after cross-validation is lower
when we include this latent grouping variable. It works wonders when the
categorical variables are treated as 'numeric' variables. Though the ols
estimates are obviously very different.

Thank you in advance for your views on the problem.

Best Regards,

julian

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] which() function help page precision

2018-11-17 Thread Juan Gomez
Hi again (, I am the PO from my own email account)
I agree that the word  "basically" puts the NA issues aside. But my
point is that  R subsetting  behavior when there are  NAs in a logical
index is quite tricky to say the less, and deserves the trouble of
pointing it out in every place it is appropiate. As  "which()" is the
function I use to overcome this issue, I thought it would be good to
emphasize that  it solves this situation in a different way that the
subsetting operation does.
I am aware that it is clearly stated  above in the help page though.
But I still would change the expression rather than "hiding" this distinction.
And the "diplomatic refinement seems even better to me.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.