Re: [Rd] Bug in model.matrix.default for higher-order interaction encoding when specific model terms are missing

2017-11-06 Thread Arie ten Cate
Hello Tyler,

model.matrix(~(X1+X2+X3)^3-X1:X3)

T_i = X1:X2:X3. Let F_j = X3. (The numerical variables X1 and X2 are
not encoded at all. Then, again, T_{i(j)} = X1:X2, which in this
example is NOT dropped from the model. Hence the X3 in T_i must be
encoded by contrast, as indeed it is.

  Arie

On Mon, Nov 6, 2017 at 5:09 PM, Tyler  wrote:
> Hi Arie,
>
> Given the heuristic, in all of my examples with a missing two-factor
> interaction the three-factor interaction should be coded with dummy
> variables. In reality, it is encoded by dummy variables only when the
> numeric:numeric interaction is missing, and by contrasts for the other two.
> The heuristic does not specify separate behavior for numeric vs categorical
> factors (When the author of Statistical Models in S refers to F_j as a
> "factor", it is a more general usage than the R type "factor" and includes
> numeric variables--the language used later on in the chapter on page 40
> confirms this): when there is a missing marginal term in the formula, the
> higher-order interaction should be coded by dummy variables, regardless of
> type. Thus, the terms() function is only following the cited behavior 1/3rd
> of the time.
>
> Best regards,
> Tyler
>
> On Mon, Nov 6, 2017 at 6:45 AM, Arie ten Cate  wrote:
>>
>> Hello Tyler,
>>
>> You write that you understand what I am saying. However, I am now at
>> loss about what exactly is the problem with the behavior of R.  Here
>> is a script which reproduces your experiments with three variables
>> (excluding the full model):
>>
>> m=expand.grid(X1=c(1,-1),X2=c(1,-1),X3=c("A","B","C"))
>> model.matrix(~(X1+X2+X3)^3-X1:X3,data=m)
>> model.matrix(~(X1+X2+X3)^3-X2:X3,data=m)
>> model.matrix(~(X1+X2+X3)^3-X1:X2,data=m)
>>
>> Below are the three results, similar to your first mail. (The first
>> two are basically the same, of course.) Please pick one result which
>> you think is not consistent with the heuristic and please give what
>> you think is the correct result:
>>
>> model.matrix(~(X1+X2+X3)^3-X1:X3)
>>   (Intercept)
>>   X1 X2 X3B X3C
>>   X1:X2 X2:X3B X2:X3C
>>   X1:X2:X3B X1:X2:X3C
>>
>> model.matrix(~(X1+X2+X3)^3-X2:X3)
>>   (Intercept)
>>   X1 X2 X3B X3C
>>   X1:X2 X1:X3B X1:X3C
>>   X1:X2:X3B X1:X2:X3C
>>
>> model.matrix(~(X1+X2+X3)^3-X1:X2)
>>   (Intercept)
>>   X1 X2 X3B X3C
>>   X1:X3B X1:X3C X2:X3B X2:X3C
>>   X1:X2:X3A X1:X2:X3B X1:X2:X3C
>>
>> (I take it that the combination of X3A and X3B and X3C implies dummy
>> encoding, and the combination of only X3B and X3C implies contrasts
>> encoding, with respect to X3A.)
>>
>> Thanks in advance,
>>
>> Arie
>>
>>
>> On Sat, Nov 4, 2017 at 5:33 PM, Tyler  wrote:
>> > Hi Arie,
>> >
>> > I understand what you're saying. The following excerpt out of the book
>> > shows
>> > that F_j does not refer exclusively to categorical factors: "...the rule
>> > does not do anything special for them, and it remains valid, in a
>> > trivial
>> > sense, whenever any of the F_j is numeric rather than categorical."
>> > Since
>> > F_j refers to both categorical and numeric variables, the behavior of
>> > model.matrix is not consistent with the heuristic.
>> >
>> > Best regards,
>> > Tyler
>> >
>> > On Sat, Nov 4, 2017 at 6:50 AM, Arie ten Cate 
>> > wrote:
>> >>
>> >> Hello Tyler,
>> >>
>> >> I rephrase my previous mail, as follows:
>> >>
>> >> In your example, T_i = X1:X2:X3. Let F_j = X3. (The numerical
>> >> variables X1 and X2 are not encoded at all.) Then T_{i(j)} = X1:X2,
>> >> which in the example is dropped from the model. Hence the X3 in T_i
>> >> must be encoded by dummy variables, as indeed it is.
>> >>
>> >>   Arie
>> >>
>> >>
>> >> On Thu, Nov 2, 2017 at 4:11 PM, Tyler  wrote:
>> >> > Hi Arie,
>> >> >
>> >> > The book out of which this behavior is based does not use factor (in
>> >> > this
>> >> > section) to refer to categorical factor. I will again point to this
>> >> > sentence, from page 40, in the same section and referring to the
>> >> > behavior
>> >> > under question, that shows F_j is not limited to categorical factors:
>> >> > "Numeric variables appear in the computations as themselves, uncoded.
>> >> > Therefore, the rule does not do anything special for them, and it
>> >> > remains
>> >> > valid, in a trivial sense, whenever any of the F_j is numeric rather
>> >> > than
>> >> > categorical."
>> >> >
>> >> > Note the "... whenever any of the F_j is numeric rather than
>> >> > categorical."
>> >> > Factor here is used in the more general sense of the word, not
>> >> > referring
>> >> > to
>> >> > the R type "factor." The behavior of R does not match the heuristic
>> >> > that
>> >> > it's citing.
>> >> >
>> >> > Best regards,
>> >> > Tyler
>> >> >
>> >> > On Thu, Nov 2, 2017 at 2:51 AM, Arie ten Cate 
>> >> > wrote:
>> >> >>
>> >> >> Hello Tyler,
>> >> >>
>> >> >> Thank you for searching for, and finding, the basic description 

[Bioc-devel] RStan/ StanCon - maybe of interest

2017-11-06 Thread Aedin Culhane
Hi
I thought I should forward this as it might of of interest to those of 
you using RStan.


StanCon is happening at the beautiful Asilomar conference facility at 
the beach in Monterey California for three days starting January 10, 
2018. We have space for 200 attendees and expect that this will sell 
out, so if you really want to go register soon.

StanCon offers a dense schedule of invited talks, submitted papers, and 
tutorials unavailable in any other format. Balancing the intellectual 
intensity of cutting edge statistical modeling are fun activities like 
indoor R/C airplane building/flying/designing 
(http://brooklynaerodrome.com ) and 
non-snobby blind wine tasting for after dinner activities. We will have 
the first ever "wear your poster" reception--see the call for posters 
below. And no parallel sessions--you get the entire StanCon2018, not a 
slice.


Go tohttp://mc-stan.org/events/stancon2018 
to
 
register.


*More details:*

We have 7 invited talks:

  * Andrew Gelman, Department of Statistics and Political Science,
Columbia University
  * Susan Holmes, Department of Statistics, Stanford University
  * Frank Harrell, School of Medicine and Department of Biostatistics,
Vanderbilt University
  * Sophia Rabe-Hesketh, Educational Statistics and Biostatistics,
University of California, Berkeley
  * Sean Taylor and Ben Letham, Facebook Core Data Science
  * Manuel Rivas, Department of Biomedical Data Science, Stanford University
  * Talia Weiss, Department of Physics, Massachusetts Institute of
Technology

Submitted talks:
We have 18 accepted talks ranging from public policy viewed through 
Bayesian analysis to painful theory papers. Talks are self-contained 
knitr or Jupyter notebooks that will be made publicly available after 
the conference.


Tutorials:
We have tutorials that start at the crack of 8am for those desiring 
further edification beyond the awesome program. Total time ranges from 6 
hours to 1 hour depending on topic—these will be parallel but don’t 
conflict with the main conference.
- Introduction to Stan: Know how to program? Know basic statistics? 
Curious about Bayesian analysis and Stan? This is the course for you. 
Hands on, focused and an excellent way to get started working in Stan. 2 
hours every morning 8am to 10am.
- Executive decision making the Bayesian way: This is for nontechnical 
managers to learn the core of decision making under uncertainty and how 
to interpret the talks that they will be attending the rest of the day. 
1 hour/day every day.
- Advanced Modeling in Stan: The hard stuff led by the best of the best. 
Very interactive, very intense. Varying topics, every day 1-2 hours.


Poster Call for Participation:
We will take poster submissions on a rolling basis until December 5th. 
One page exclusive of references is the desired format but anything that 
gives us enough information to make a decision is fine. We will 
accept/reject within 48 hours. Send tostancon2...@mc-stan.org 
.
The only somewhat odd requirement is that your poster must be "wearable" 
to the 5pm reception where you will be a walking presentation. Great way 
to network, signboard supplies will be available so you need only have 
sheets of paper which can be attached to signboard material which 
coincidentally will be the source airframe material for the R/C airplane 
activities following dinner.


That's it! StanCon2018 is going to be a pressure cooker of learning and 
fun. Don't miss it.


Early bird registration ends November 10, 2017.

*Go tohttp://mc-stan.org/events/stancon2018 
and
 
register.*


Lizzie and the rest of the StanCon Organizing Committee


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [R-pkg-devel] Error in globalVariables(".") : could not find function "globalVariables"

2017-11-06 Thread Anthony Ebert
Thank you Uwe Ligges!

I did a grep (including all sub-directories) for globalVariables in my
code. It is not in the code at all.

Could it be the generate_input function with it's "<<-" ?
https://github.com/AnthonyEbert/queuecomputer/blob/master/R/utils.R

It's not exported and it's not used internally by anything. I just use
it as a shortcut when debugging my code.

Best,
Anthony Ebert

On Mon, Nov 6, 2017 at 11:34 PM, Uwe Ligges
 wrote:
>
>
> On 06.11.2017 07:51, Anthony Ebert wrote:
>>
>> Hello r-package-devel,
>>
>> I have updated my R package queuecomputer from 0.8.1 to 0.8.2. The
>> source code is here https://github.com/AnthonyEbert/queuecomputer .
>>
>> I have checked queuecomputer-0.8.2 with rhub and win-builder with
>> status OK for every platform.
>>
>> After submission to CRAN I got an incoming check error.
>>
>> https://github.com/AnthonyEbert/queuecomputer/blob/master/cran-comments.md
>>
>>...
>>Error in library.dynam(lib, package, package.lib) :
>>  DLL 'dplyr' not found: maybe not installed for this architecture?
>>...
>
>
> Ignore this, a race condition on the check sytsem.
>
>
>>
>> I chose to ignore that and resubmit with exactly the same source code.
>> I now get a NOTE in a different place, further along in the check.
>>
>>...
>>* checking dependencies in R code ... NOTE
>>Error in globalVariables(".") : could not find function
>> "globalVariables"
>
>
> Can you grep for globalVariables in your code?
> THis is from package utils and hence you may have to declare it.
>
> Best,
> Uwe Ligges
>
>
>
>
>>...
>>* checking PDF version of manual ... OK
>>* DONE
>>Status: 1 NOTE
>>
>> See
>> https://win-builder.r-project.org/incoming_pretest/171106_065442_queuecomputer_082/00check.log
>> for full output.
>>
>> I don't use that function anywhere in queuecomputer. Is this something
>> I should worry about? Or should I try again?
>>
>> Thanks for your help! Much appreciated.
>>
>> Anthony Ebert
>>
>> __
>> R-package-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>>
>

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] Bug in model.matrix.default for higher-order interaction encoding when specific model terms are missing

2017-11-06 Thread Tyler
Hi Arie,

Given the heuristic, in all of my examples with a missing two-factor
interaction the three-factor interaction should be coded with dummy
variables. In reality, it is encoded by dummy variables only when the
numeric:numeric interaction is missing, and by contrasts for the other two.
The heuristic does not specify separate behavior for numeric vs categorical
factors (When the author of Statistical Models in S refers to F_j as a
"factor", it is a more general usage than the R type "factor" and includes
numeric variables--the language used later on in the chapter on page 40
confirms this): when there is a missing marginal term in the formula, the
higher-order interaction should be coded by dummy variables, regardless of
type. Thus, the terms() function is only following the cited behavior 1/3rd
of the time.

Best regards,
Tyler

On Mon, Nov 6, 2017 at 6:45 AM, Arie ten Cate  wrote:

> Hello Tyler,
>
> You write that you understand what I am saying. However, I am now at
> loss about what exactly is the problem with the behavior of R.  Here
> is a script which reproduces your experiments with three variables
> (excluding the full model):
>
> m=expand.grid(X1=c(1,-1),X2=c(1,-1),X3=c("A","B","C"))
> model.matrix(~(X1+X2+X3)^3-X1:X3,data=m)
> model.matrix(~(X1+X2+X3)^3-X2:X3,data=m)
> model.matrix(~(X1+X2+X3)^3-X1:X2,data=m)
>
> Below are the three results, similar to your first mail. (The first
> two are basically the same, of course.) Please pick one result which
> you think is not consistent with the heuristic and please give what
> you think is the correct result:
>
> model.matrix(~(X1+X2+X3)^3-X1:X3)
>   (Intercept)
>   X1 X2 X3B X3C
>   X1:X2 X2:X3B X2:X3C
>   X1:X2:X3B X1:X2:X3C
>
> model.matrix(~(X1+X2+X3)^3-X2:X3)
>   (Intercept)
>   X1 X2 X3B X3C
>   X1:X2 X1:X3B X1:X3C
>   X1:X2:X3B X1:X2:X3C
>
> model.matrix(~(X1+X2+X3)^3-X1:X2)
>   (Intercept)
>   X1 X2 X3B X3C
>   X1:X3B X1:X3C X2:X3B X2:X3C
>   X1:X2:X3A X1:X2:X3B X1:X2:X3C
>
> (I take it that the combination of X3A and X3B and X3C implies dummy
> encoding, and the combination of only X3B and X3C implies contrasts
> encoding, with respect to X3A.)
>
> Thanks in advance,
>
> Arie
>
>
> On Sat, Nov 4, 2017 at 5:33 PM, Tyler  wrote:
> > Hi Arie,
> >
> > I understand what you're saying. The following excerpt out of the book
> shows
> > that F_j does not refer exclusively to categorical factors: "...the rule
> > does not do anything special for them, and it remains valid, in a trivial
> > sense, whenever any of the F_j is numeric rather than categorical." Since
> > F_j refers to both categorical and numeric variables, the behavior of
> > model.matrix is not consistent with the heuristic.
> >
> > Best regards,
> > Tyler
> >
> > On Sat, Nov 4, 2017 at 6:50 AM, Arie ten Cate 
> wrote:
> >>
> >> Hello Tyler,
> >>
> >> I rephrase my previous mail, as follows:
> >>
> >> In your example, T_i = X1:X2:X3. Let F_j = X3. (The numerical
> >> variables X1 and X2 are not encoded at all.) Then T_{i(j)} = X1:X2,
> >> which in the example is dropped from the model. Hence the X3 in T_i
> >> must be encoded by dummy variables, as indeed it is.
> >>
> >>   Arie
> >>
> >>
> >> On Thu, Nov 2, 2017 at 4:11 PM, Tyler  wrote:
> >> > Hi Arie,
> >> >
> >> > The book out of which this behavior is based does not use factor (in
> >> > this
> >> > section) to refer to categorical factor. I will again point to this
> >> > sentence, from page 40, in the same section and referring to the
> >> > behavior
> >> > under question, that shows F_j is not limited to categorical factors:
> >> > "Numeric variables appear in the computations as themselves, uncoded.
> >> > Therefore, the rule does not do anything special for them, and it
> >> > remains
> >> > valid, in a trivial sense, whenever any of the F_j is numeric rather
> >> > than
> >> > categorical."
> >> >
> >> > Note the "... whenever any of the F_j is numeric rather than
> >> > categorical."
> >> > Factor here is used in the more general sense of the word, not
> referring
> >> > to
> >> > the R type "factor." The behavior of R does not match the heuristic
> that
> >> > it's citing.
> >> >
> >> > Best regards,
> >> > Tyler
> >> >
> >> > On Thu, Nov 2, 2017 at 2:51 AM, Arie ten Cate 
> >> > wrote:
> >> >>
> >> >> Hello Tyler,
> >> >>
> >> >> Thank you for searching for, and finding, the basic description of
> the
> >> >> behavior of R in this matter.
> >> >>
> >> >> I think your example is in agreement with the book.
> >> >>
> >> >> But let me first note the following. You write: "F_j refers to a
> >> >> factor (variable) in a model and not a categorical factor". However:
> >> >> "a factor is a vector object used to specify a discrete
> >> >> classification" (start of chapter 4 of "An Introduction to R".) You
> >> >> might also see the description of the R function factor().
> >> >>
> >> >> You note that the book says about 

Re: [Rd] ans[nas] <- NA in 'ifelse' (was: ifelse() woes ... can we agree on a ifelse2() ?)

2017-11-06 Thread Martin Maechler
> Suharto Anggono Suharto Anggono 
> on Sat, 4 Nov 2017 12:11:48 + writes:

> Removal of
> ans[nas] <- NA
> from the code of function 'ifelse' in R is not committed (yet). Why?

because I have been using it in my version of R-devel for this whole
year, but have forgotten to commit it.

Thank you for the reminder.  I have committed it now :


r73681 | maechler | 2017-11-06 15:39:20 +0100 (Mon, 06 Nov 2017) | 3 lines

drop extraneous line, as suggested by Suharto Anggono, and as "promised" on 
R-devel list, Nov 26, 2016




> On Mon, 28/11/16, Martin Maechler  wrote:

> Subject: Re: [Rd] ifelse() woes ... can we agree on a ifelse2() ?
> To: "Suharto Anggono" 
> Cc: R-devel@r-project.org, maech...@stat.math.ethz.ch
> Date: Monday, 28 November, 2016, 10:00 PM
 
> Suharto Anggono Suharto Anggono via R-devel 
> on Sat, 26 Nov 2016 17:14:01 + writes:

> ...


>> On current 'ifelse' code in R:

>> * The part
>> ans[nas] <- NA
>> could be omitted because NA's are already in place.
>> If the part is removed, variable 'nas' is no longer used.

> I agree that this seems logical.  If I apply the change, R's own
> full checks do not seem affected, and I may try to commit that
> change and "wait and see".

> ...

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] How to include examples that run > 5 secs and contain personal info

2017-11-06 Thread Maxime Turgeon
Hi Jorge,


All the Rd files are available to users if they download the source code. They 
are also available through the CRAN mirror on Github (github.com/cran). In 
other words, you shouldn't put personal information in the help files, even if 
you use the 'dontshow' macro.


Max


From: R-package-devel  on behalf of 
Jorge Cimentada 
Sent: November 6, 2017 8:25:22 AM
To: r-package-devel@r-project.org
Subject: [R-pkg-devel] How to include examples that run > 5 secs and contain 
personal info

Hi,

I'm in the process of submitting a package to CRAN and I've run into an
issue with the examples of some functions. The purpose of nearly all
functions of the package is to download data from a website. There is one
main problem I've received from CRAN.

1) If it's feasible, include examples which are executed in < 5 secs.

Downloading data usually take between 5-10 seconds in the best case. I've
included most examples as \dontrun{} specifically because of this. There is
no way of creating smaller toy examples that run in less time.

Every time the function is run the user must specify his/her email, which
is private information Because the examples are not being executed, I've
included fake emails such as y...@email.com to show the user how it works.
The CRAN team suggested using something like:

-

\examples{
   examples for users:
   executable in < 5 sec
   for checks
   \dontshow{
  examples for checks:
  executable in < 5 sec together with the examples above
  not shown to users
   }
   donttest{
  further examples for users (not used for checks)
   }
}

-

To the best of my knowledge, there is not way to get examples always
running < 5 secs, so it's not feasible to include them in \dontshow{}.
Also, I'm not sure if users can have access to \dontshow{} in any way by
going to the specific docs of the function (either on Github or somewhere
on the CRAN repo). I wouldn't want to reveal personal emails in any way,
even if it's for the tests.

Any idea how to handle this situation?

Note: I do included extensive tests which are successfully tested through
Travis CI and this wasn't an issue in the CRAN submission.

---


Jorge Cimentada
*https://cimentadaj.github.io/ *

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] How to include examples that run > 5 secs and contain personal info

2017-11-06 Thread Uwe Ligges
Then simply explain this situation to the CRAN team and "they" will 
probably let the package pass.


Best,
Uwe Ligges


On 06.11.2017 14:25, Jorge Cimentada wrote:

Hi,

I'm in the process of submitting a package to CRAN and I've run into an
issue with the examples of some functions. The purpose of nearly all
functions of the package is to download data from a website. There is one
main problem I've received from CRAN.

1) If it's feasible, include examples which are executed in < 5 secs.

Downloading data usually take between 5-10 seconds in the best case. I've
included most examples as \dontrun{} specifically because of this. There is
no way of creating smaller toy examples that run in less time.

Every time the function is run the user must specify his/her email, which
is private information Because the examples are not being executed, I've
included fake emails such as y...@email.com to show the user how it works.
The CRAN team suggested using something like:

-

\examples{
examples for users:
executable in < 5 sec
for checks
\dontshow{
   examples for checks:
   executable in < 5 sec together with the examples above
   not shown to users
}
donttest{
   further examples for users (not used for checks)
}
}

-

To the best of my knowledge, there is not way to get examples always
running < 5 secs, so it's not feasible to include them in \dontshow{}.
Also, I'm not sure if users can have access to \dontshow{} in any way by
going to the specific docs of the function (either on Github or somewhere
on the CRAN repo). I wouldn't want to reveal personal emails in any way,
even if it's for the tests.

Any idea how to handle this situation?

Note: I do included extensive tests which are successfully tested through
Travis CI and this wasn't an issue in the CRAN submission.

---


Jorge Cimentada
*https://cimentadaj.github.io/ *

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel



__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Error in globalVariables(".") : could not find function "globalVariables"

2017-11-06 Thread Uwe Ligges



On 06.11.2017 07:51, Anthony Ebert wrote:

Hello r-package-devel,

I have updated my R package queuecomputer from 0.8.1 to 0.8.2. The
source code is here https://github.com/AnthonyEbert/queuecomputer .

I have checked queuecomputer-0.8.2 with rhub and win-builder with
status OK for every platform.

After submission to CRAN I got an incoming check error.

https://github.com/AnthonyEbert/queuecomputer/blob/master/cran-comments.md

   ...
   Error in library.dynam(lib, package, package.lib) :
 DLL 'dplyr' not found: maybe not installed for this architecture?
   ...


Ignore this, a race condition on the check sytsem.




I chose to ignore that and resubmit with exactly the same source code.
I now get a NOTE in a different place, further along in the check.

   ...
   * checking dependencies in R code ... NOTE
   Error in globalVariables(".") : could not find function "globalVariables"


Can you grep for globalVariables in your code?
THis is from package utils and hence you may have to declare it.

Best,
Uwe Ligges





   ...
   * checking PDF version of manual ... OK
   * DONE
   Status: 1 NOTE

See 
https://win-builder.r-project.org/incoming_pretest/171106_065442_queuecomputer_082/00check.log
for full output.

I don't use that function anywhere in queuecomputer. Is this something
I should worry about? Or should I try again?

Thanks for your help! Much appreciated.

Anthony Ebert

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel



__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Bioc-devel] Dependency problems for package CEMiTool

2017-11-06 Thread Shepherd, Lori
Hi Pedro,


Thank you for the concern.  Some CRAN packages do not have binaries yet for R 
3.5 as mentioned in the following mailing list post. 
https://stat.ethz.ch/pipermail/bioc-devel/2017-November/012265.html


If you package is building on the other platforms you can temporarily ignore 
the failure on windows.


Lori Shepherd

Bioconductor Core Team

Roswell Park Cancer Institute

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263


From: Bioc-devel  on behalf of Pedro Russo 

Sent: Monday, November 6, 2017 8:17:53 AM
To: bioc-devel@r-project.org
Subject: [Bioc-devel] Dependency problems for package CEMiTool

Hi all,

I'm one of the maintainers for the newly accepted package CEMiTool. I'm
writing because our devel branch is getting the following error on Windows
(tokay2): "ERROR: dependencies 'gRbase', 'clusterProfiler', 'igraph',
'intergraph' are not available for package 'CEMiTool' ". These dependencies
are not new to the package, and the new RELEASE_3_6 branch doesn't show
this error in the build, so I'm wondering if there's possibly a problem
when tokay2 sets up the build environment? Any help would be much
appreciated.

Best,
Pedro Russo

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.
[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



[Bioc-devel] Dependency problems for package CEMiTool

2017-11-06 Thread Pedro Russo
Hi all,

I'm one of the maintainers for the newly accepted package CEMiTool. I'm
writing because our devel branch is getting the following error on Windows
(tokay2): "ERROR: dependencies 'gRbase', 'clusterProfiler', 'igraph',
'intergraph' are not available for package 'CEMiTool' ". These dependencies
are not new to the package, and the new RELEASE_3_6 branch doesn't show
this error in the build, so I'm wondering if there's possibly a problem
when tokay2 sets up the build environment? Any help would be much
appreciated.

Best,
Pedro Russo

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Rd] Bug in model.matrix.default for higher-order interaction encoding when specific model terms are missing

2017-11-06 Thread Arie ten Cate
Hello Tyler,

You write that you understand what I am saying. However, I am now at
loss about what exactly is the problem with the behavior of R.  Here
is a script which reproduces your experiments with three variables
(excluding the full model):

m=expand.grid(X1=c(1,-1),X2=c(1,-1),X3=c("A","B","C"))
model.matrix(~(X1+X2+X3)^3-X1:X3,data=m)
model.matrix(~(X1+X2+X3)^3-X2:X3,data=m)
model.matrix(~(X1+X2+X3)^3-X1:X2,data=m)

Below are the three results, similar to your first mail. (The first
two are basically the same, of course.) Please pick one result which
you think is not consistent with the heuristic and please give what
you think is the correct result:

model.matrix(~(X1+X2+X3)^3-X1:X3)
  (Intercept)
  X1 X2 X3B X3C
  X1:X2 X2:X3B X2:X3C
  X1:X2:X3B X1:X2:X3C

model.matrix(~(X1+X2+X3)^3-X2:X3)
  (Intercept)
  X1 X2 X3B X3C
  X1:X2 X1:X3B X1:X3C
  X1:X2:X3B X1:X2:X3C

model.matrix(~(X1+X2+X3)^3-X1:X2)
  (Intercept)
  X1 X2 X3B X3C
  X1:X3B X1:X3C X2:X3B X2:X3C
  X1:X2:X3A X1:X2:X3B X1:X2:X3C

(I take it that the combination of X3A and X3B and X3C implies dummy
encoding, and the combination of only X3B and X3C implies contrasts
encoding, with respect to X3A.)

Thanks in advance,

Arie


On Sat, Nov 4, 2017 at 5:33 PM, Tyler  wrote:
> Hi Arie,
>
> I understand what you're saying. The following excerpt out of the book shows
> that F_j does not refer exclusively to categorical factors: "...the rule
> does not do anything special for them, and it remains valid, in a trivial
> sense, whenever any of the F_j is numeric rather than categorical." Since
> F_j refers to both categorical and numeric variables, the behavior of
> model.matrix is not consistent with the heuristic.
>
> Best regards,
> Tyler
>
> On Sat, Nov 4, 2017 at 6:50 AM, Arie ten Cate  wrote:
>>
>> Hello Tyler,
>>
>> I rephrase my previous mail, as follows:
>>
>> In your example, T_i = X1:X2:X3. Let F_j = X3. (The numerical
>> variables X1 and X2 are not encoded at all.) Then T_{i(j)} = X1:X2,
>> which in the example is dropped from the model. Hence the X3 in T_i
>> must be encoded by dummy variables, as indeed it is.
>>
>>   Arie
>>
>>
>> On Thu, Nov 2, 2017 at 4:11 PM, Tyler  wrote:
>> > Hi Arie,
>> >
>> > The book out of which this behavior is based does not use factor (in
>> > this
>> > section) to refer to categorical factor. I will again point to this
>> > sentence, from page 40, in the same section and referring to the
>> > behavior
>> > under question, that shows F_j is not limited to categorical factors:
>> > "Numeric variables appear in the computations as themselves, uncoded.
>> > Therefore, the rule does not do anything special for them, and it
>> > remains
>> > valid, in a trivial sense, whenever any of the F_j is numeric rather
>> > than
>> > categorical."
>> >
>> > Note the "... whenever any of the F_j is numeric rather than
>> > categorical."
>> > Factor here is used in the more general sense of the word, not referring
>> > to
>> > the R type "factor." The behavior of R does not match the heuristic that
>> > it's citing.
>> >
>> > Best regards,
>> > Tyler
>> >
>> > On Thu, Nov 2, 2017 at 2:51 AM, Arie ten Cate 
>> > wrote:
>> >>
>> >> Hello Tyler,
>> >>
>> >> Thank you for searching for, and finding, the basic description of the
>> >> behavior of R in this matter.
>> >>
>> >> I think your example is in agreement with the book.
>> >>
>> >> But let me first note the following. You write: "F_j refers to a
>> >> factor (variable) in a model and not a categorical factor". However:
>> >> "a factor is a vector object used to specify a discrete
>> >> classification" (start of chapter 4 of "An Introduction to R".) You
>> >> might also see the description of the R function factor().
>> >>
>> >> You note that the book says about a factor F_j:
>> >>   "... F_j is coded by contrasts if T_{i(j)} has appeared in the
>> >> formula and by dummy variables if it has not"
>> >>
>> >> You find:
>> >>"However, the example I gave demonstrated that this dummy variable
>> >> encoding only occurs for the model where the missing term is the
>> >> numeric-numeric interaction, ~(X1+X2+X3)^3-X1:X2."
>> >>
>> >> We have here T_i = X1:X2:X3. Also: F_j = X3 (the only factor). Then
>> >> T_{i(j)} = X1:X2, which is dropped from the model. Hence the X3 in T_i
>> >> must be encoded by dummy variables, as indeed it is.
>> >>
>> >>   Arie
>> >>
>> >> On Tue, Oct 31, 2017 at 4:01 PM, Tyler  wrote:
>> >> > Hi Arie,
>> >> >
>> >> > Thank you for your further research into the issue.
>> >> >
>> >> > Regarding Stata: On the other hand, JMP gives model matrices that use
>> >> > the
>> >> > main effects contrasts in computing the higher order interactions,
>> >> > without
>> >> > the dummy variable encoding. I verified this both by analyzing the
>> >> > linear
>> >> > model given in my first example and noting that JMP has one more
>> >> > degree
>> >> > of
>> >> > 

Re: [Rd] Extreme bunching of random values from runif with Mersenne-Twister seed

2017-11-06 Thread Duncan Murdoch

On 05/11/2017 10:58 AM, peter dalgaard wrote:



On 5 Nov 2017, at 15:17 , Duncan Murdoch  wrote:

On 04/11/2017 10:20 PM, Daniel Nordlund wrote:

Tirthankar,
"random number generators" do not produce random numbers.  Any given
generator produces a fixed sequence of numbers that appear to meet
various tests of randomness.  By picking a seed you enter that sequence
in a particular place and subsequent numbers in the sequence appear to
be unrelated.  There are no guarantees that if YOU pick a SET of seeds
they won't produce a set of values that are of a similar magnitude.
You can likely solve your problem by following Radford Neal's advice of
not using the the first number from each seed.  However, you don't need
to use anything more than the second number.  So, you can modify your
function as follows:
function(x) {
set.seed(x, kind = "default")
y = runif(2, 17, 26)
return(y[2])
  }
Hope this is helpful,


That's assuming that the chosen seeds are unrelated to the function output, 
which seems unlikely on the face of it.  You can certainly choose a set of 
seeds that give high values on the second draw just as easily as you can choose 
seeds that give high draws on the first draw.

The interesting thing about this problem is that Tirthankar doesn't believe 
that the seed selection process is aware of the function output.  I would say 
that it must be, and he should be investigating how that happens if he is 
worried about the output, he shouldn't be worrying about R's RNG.



Hmm, no. The basic issue is that RNGs are constructed so that with x_{n+1} = 
f(x_n),
x_1, x_2, x_3,... will look random, not so that f(s_1), f(s_2), f(s_3), ... 
will look random for any s_1, s_2, ... . This is true, even if seeds s_1, s_2, 
... are not chosen so as to mess with the RNG. In the present case, it seems 
that the seeds around 86e6 tend to give similar output. On the other hand, it 
is not _just_ the similarity in magnitude that does it, try e.g.

s <- as.integer(runif(100, 86.54e6, 86.98e6))
r <- sapply(s, function(s){set.seed(s); runif(1,17,26)})
plot(s,r, pch=".")

and no obvious pattern emerges. My best guess is that the seeds are not only of 
similar magnitude, but also have other bit-pattern similarities.

(Isn't there a Knuth quote to the effect that "Every random number generator will 
fail in at least one application"?)

One remaining issue is whether it is really true that the same seeds givee 
different output on different platforms. That shouldn't happen, I believe.


I don't think there's a platform difference if the same generator is 
used. In my tests, I get the Ubuntu results on both MacOS and Windows. 
In one of the earlier messages, Tirthankar said he was using 
RNGkind(kind = NULL), which means earlier experiments with a different 
generator would taint the results.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Extreme bunching of random values from runif with Mersenne-Twister seed

2017-11-06 Thread Serguei Sokol

Le 05/11/2017 à 15:17, Duncan Murdoch a écrit :

On 04/11/2017 10:20 PM, Daniel Nordlund wrote:

Tirthankar,

"random number generators" do not produce random numbers.  Any given
generator produces a fixed sequence of numbers that appear to meet
various tests of randomness.  By picking a seed you enter that sequence
in a particular place and subsequent numbers in the sequence appear to
be unrelated.  There are no guarantees that if YOU pick a SET of seeds
they won't produce a set of values that are of a similar magnitude.

You can likely solve your problem by following Radford Neal's advice of
not using the the first number from each seed.  However, you don't need
to use anything more than the second number.  So, you can modify your
function as follows:

function(x) {
    set.seed(x, kind = "default")
    y = runif(2, 17, 26)
    return(y[2])
  }

Hope this is helpful,


That's assuming that the chosen seeds are unrelated to the function output, which seems unlikely on the face of it.  You can certainly choose a set of seeds 
that give high values on the second draw just as easily as you can choose seeds that give high draws on the first draw.

To confirm this statement, I did

s2_25=s[sapply(s, function(i) {set.seed(i); runif(2, 17, 26)[2] > 25})]
length(s2_25) # 48990

For memory, we had
length(s25) # 48631 out of 439166

which is much similar length.
So if we take the second or even the 10-th pseudo-random value we can
fall as easily (or as hard) at a seed sequence giving some narrow set.

Serguei.



The interesting thing about this problem is that Tirthankar doesn't believe that the seed selection process is aware of the function output.  I would say that 
it must be, and he should be investigating how that happens if he is worried about the output, he shouldn't be worrying about R's RNG.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel