Re: [Rd] Extreme bunching of random values from runif with Mersenne-Twister seed

2017-11-03 Thread Tirthankar Chakravarty
Bill,

Appreciate the point that both you and Serguei are making, but the sequence
in question is not a selected or filtered set. These are values as observed
in a sequence from a  mechanism described below. The probabilities required
to generate this exact sequence in the wild seem staggering to me.

T

On Fri, Nov 3, 2017 at 11:27 PM, William Dunlap  wrote:

> Another other generator is subject to the same problem with the same
> probabilitiy.
>
> > Filter(function(s){set.seed(s, 
> > kind="Knuth-TAOCP-2002");runif(1,17,26)>25.99},
> 1:1)
>  [1]  280  415  826 1372 2224 2544 3270 3594 3809 4116 4236 5018 5692 7043
> 7212 7364 7747 9256 9491 9568 9886
>
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Fri, Nov 3, 2017 at 10:31 AM, Tirthankar Chakravarty <
> tirthankar.li...@gmail.com> wrote:
>
>>
>> Bill,
>>
>> I have clarified this on SO, and I will copy that clarification in here:
>>
>> "Sure, we tested them on other 8-digit numbers as well & we could not
>> replicate. However, these are honest-to-goodness numbers generated by a
>> non-adversarial system that has no conception of these numbers being used
>> for anything other than a unique key for an entity -- these are not a
>> specially constructed edge case. Would be good to know what seeds will and
>> will not work, and why."
>>
>> These numbers are generated by an application that serves a form, and
>> associates form IDs in a sequence. The application calls our API depending
>> on the form values entered by users, which in turn calls our R code that
>> executes some code that needs an RNG. Since the API has to be stateless, to
>> be able to replicate the results for possible debugging, we need to draw
>> random numbers in a way that we can replicate the results of the API
>> response -- we use the form ID as seeds.
>>
>> I repeat, there is no design or anything adversarial about the way that
>> these numbers were generated -- the system generating these numbers and
>> the users entering inputs have no conception of our use of an RNG -- this
>> is meant to just be a random sequence of form IDs. This issue was
>> discovered completely by chance when the output of the API was observed to
>> be highly non-random. It is possible that it is a 1/10^8 chance, but that
>> is hard to believe, given that the API hit depends on user input. Note also
>> that the issue goes away when we use a different RNG as mentioned below.
>>
>> T
>>
>> On Fri, Nov 3, 2017 at 9:58 PM, William Dunlap  wrote:
>>
>>> The random numbers in a stream initialized with one seed should have
>>> about the desired distribution.  You don't win by changing the seed all the
>>> time.  Your seeds caused the first numbers of a bunch of streams to be
>>> about the same, but the second and subsequent entries in each stream do
>>> look uniformly distributed.
>>>
>>> You didn't say what your 'upstream process' was, but it is easy to come
>>> up with seeds that give about the same first value:
>>>
>>> > Filter(function(s){set.seed(s);runif(1,17,26)>25.99}, 1:1)
>>>  [1]  514  532 1951 2631 3974 4068 4229 6092 6432 7264 9090
>>>
>>>
>>>
>>> Bill Dunlap
>>> TIBCO Software
>>> wdunlap tibco.com
>>>
>>> On Fri, Nov 3, 2017 at 12:49 AM, Tirthankar Chakravarty <
>>> tirthankar.li...@gmail.com> wrote:
>>>
 This is cross-posted from SO (https://stackoverflow.com/q/4
 7079702/1414455),
 but I now feel that this needs someone from R-Devel to help understand
 why
 this is happening.

 We are facing a weird situation in our code when using R's [`runif`][1]
 and
 setting seed with `set.seed` with the `kind = NULL` option (which
 resolves,
 unless I am mistaken, to `kind = "default"`; the default being
 `"Mersenne-Twister"`).

 We set the seed using (8 digit) unique IDs generated by an upstream
 system,
 before calling `runif`:

 seeds = c(
   "86548915", "86551615", "86566163", "86577411", "86584144",
   "86584272", "86620568", "86724613", "86756002", "86768593",
 "86772411",
   "86781516", "86794389", "86805854", "86814600", "86835092",
 "86874179",
   "86876466", "86901193", "86987847", "86988080")

 random_values = sapply(seeds, function(x) {
   set.seed(x)
   y = runif(1, 17, 26)
   return(y)
 })

 This gives values that are **extremely** bunched together.

 > summary(random_values)
Min. 1st Qu.  MedianMean 3rd Qu.Max.
   25.13   25.36   25.66   25.58   25.83   25.94

 This behaviour of `runif` goes away when we use `kind =
 "Knuth-TAOCP-2002"`, and we get values that appear to be much more
 evenly
 spread out.

 random_values = sapply(seeds, function(x) {
   set.seed(x, kind = "Knuth-TAOCP-2002")
   y = runif(1, 17, 26)
   return(y)
 })

 *Output omitted.*

 ---

Re: [Rd] Extreme bunching of random values from runif with Mersenne-Twister seed

2017-11-03 Thread William Dunlap via R-devel
Another other generator is subject to the same problem with the same
probabilitiy.

> Filter(function(s){set.seed(s,
kind="Knuth-TAOCP-2002");runif(1,17,26)>25.99}, 1:1)
 [1]  280  415  826 1372 2224 2544 3270 3594 3809 4116 4236 5018 5692 7043
7212 7364 7747 9256 9491 9568 9886



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Nov 3, 2017 at 10:31 AM, Tirthankar Chakravarty <
tirthankar.li...@gmail.com> wrote:

>
> Bill,
>
> I have clarified this on SO, and I will copy that clarification in here:
>
> "Sure, we tested them on other 8-digit numbers as well & we could not
> replicate. However, these are honest-to-goodness numbers generated by a
> non-adversarial system that has no conception of these numbers being used
> for anything other than a unique key for an entity -- these are not a
> specially constructed edge case. Would be good to know what seeds will and
> will not work, and why."
>
> These numbers are generated by an application that serves a form, and
> associates form IDs in a sequence. The application calls our API depending
> on the form values entered by users, which in turn calls our R code that
> executes some code that needs an RNG. Since the API has to be stateless, to
> be able to replicate the results for possible debugging, we need to draw
> random numbers in a way that we can replicate the results of the API
> response -- we use the form ID as seeds.
>
> I repeat, there is no design or anything adversarial about the way that
> these numbers were generated -- the system generating these numbers and
> the users entering inputs have no conception of our use of an RNG -- this
> is meant to just be a random sequence of form IDs. This issue was
> discovered completely by chance when the output of the API was observed to
> be highly non-random. It is possible that it is a 1/10^8 chance, but that
> is hard to believe, given that the API hit depends on user input. Note also
> that the issue goes away when we use a different RNG as mentioned below.
>
> T
>
> On Fri, Nov 3, 2017 at 9:58 PM, William Dunlap  wrote:
>
>> The random numbers in a stream initialized with one seed should have
>> about the desired distribution.  You don't win by changing the seed all the
>> time.  Your seeds caused the first numbers of a bunch of streams to be
>> about the same, but the second and subsequent entries in each stream do
>> look uniformly distributed.
>>
>> You didn't say what your 'upstream process' was, but it is easy to come
>> up with seeds that give about the same first value:
>>
>> > Filter(function(s){set.seed(s);runif(1,17,26)>25.99}, 1:1)
>>  [1]  514  532 1951 2631 3974 4068 4229 6092 6432 7264 9090
>>
>>
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>> On Fri, Nov 3, 2017 at 12:49 AM, Tirthankar Chakravarty <
>> tirthankar.li...@gmail.com> wrote:
>>
>>> This is cross-posted from SO (https://stackoverflow.com/q/4
>>> 7079702/1414455),
>>> but I now feel that this needs someone from R-Devel to help understand
>>> why
>>> this is happening.
>>>
>>> We are facing a weird situation in our code when using R's [`runif`][1]
>>> and
>>> setting seed with `set.seed` with the `kind = NULL` option (which
>>> resolves,
>>> unless I am mistaken, to `kind = "default"`; the default being
>>> `"Mersenne-Twister"`).
>>>
>>> We set the seed using (8 digit) unique IDs generated by an upstream
>>> system,
>>> before calling `runif`:
>>>
>>> seeds = c(
>>>   "86548915", "86551615", "86566163", "86577411", "86584144",
>>>   "86584272", "86620568", "86724613", "86756002", "86768593",
>>> "86772411",
>>>   "86781516", "86794389", "86805854", "86814600", "86835092",
>>> "86874179",
>>>   "86876466", "86901193", "86987847", "86988080")
>>>
>>> random_values = sapply(seeds, function(x) {
>>>   set.seed(x)
>>>   y = runif(1, 17, 26)
>>>   return(y)
>>> })
>>>
>>> This gives values that are **extremely** bunched together.
>>>
>>> > summary(random_values)
>>>Min. 1st Qu.  MedianMean 3rd Qu.Max.
>>>   25.13   25.36   25.66   25.58   25.83   25.94
>>>
>>> This behaviour of `runif` goes away when we use `kind =
>>> "Knuth-TAOCP-2002"`, and we get values that appear to be much more evenly
>>> spread out.
>>>
>>> random_values = sapply(seeds, function(x) {
>>>   set.seed(x, kind = "Knuth-TAOCP-2002")
>>>   y = runif(1, 17, 26)
>>>   return(y)
>>> })
>>>
>>> *Output omitted.*
>>>
>>> ---
>>>
>>> **The most interesting thing here is that this does not happen on Windows
>>> -- only happens on Ubuntu** (`sessionInfo` output for Ubuntu & Windows
>>> below).
>>>
>>> # Windows output: #
>>>
>>> > seeds = c(
>>> +   "86548915", "86551615", "86566163", "86577411", "86584144",
>>> +   "86584272", "86620568", "86724613", "86756002", "86768593",
>>> "86772411",
>>> +   "86781516", "86794389", "86805854", "86814600", "86835092",
>>> "86874179",
>>> +   "86876466", "86901193", "86987847", 

Re: [Rd] Extreme bunching of random values from runif with Mersenne-Twister seed

2017-11-03 Thread Tirthankar Chakravarty
Bill,

I have clarified this on SO, and I will copy that clarification in here:

"Sure, we tested them on other 8-digit numbers as well & we could not
replicate. However, these are honest-to-goodness numbers generated by a
non-adversarial system that has no conception of these numbers being used
for anything other than a unique key for an entity -- these are not a
specially constructed edge case. Would be good to know what seeds will and
will not work, and why."

These numbers are generated by an application that serves a form, and
associates form IDs in a sequence. The application calls our API depending
on the form values entered by users, which in turn calls our R code that
executes some code that needs an RNG. Since the API has to be stateless, to
be able to replicate the results for possible debugging, we need to draw
random numbers in a way that we can replicate the results of the API
response -- we use the form ID as seeds.

I repeat, there is no design or anything adversarial about the way that
these numbers were generated -- the system generating these numbers and the
users entering inputs have no conception of our use of an RNG -- this is
meant to just be a random sequence of form IDs. This issue was discovered
completely by chance when the output of the API was observed to be highly
non-random. It is possible that it is a 1/10^8 chance, but that is hard to
believe, given that the API hit depends on user input. Note also that the
issue goes away when we use a different RNG as mentioned below.

T

On Fri, Nov 3, 2017 at 9:58 PM, William Dunlap  wrote:

> The random numbers in a stream initialized with one seed should have about
> the desired distribution.  You don't win by changing the seed all the
> time.  Your seeds caused the first numbers of a bunch of streams to be
> about the same, but the second and subsequent entries in each stream do
> look uniformly distributed.
>
> You didn't say what your 'upstream process' was, but it is easy to come up
> with seeds that give about the same first value:
>
> > Filter(function(s){set.seed(s);runif(1,17,26)>25.99}, 1:1)
>  [1]  514  532 1951 2631 3974 4068 4229 6092 6432 7264 9090
>
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Fri, Nov 3, 2017 at 12:49 AM, Tirthankar Chakravarty <
> tirthankar.li...@gmail.com> wrote:
>
>> This is cross-posted from SO (https://stackoverflow.com/q/4
>> 7079702/1414455),
>> but I now feel that this needs someone from R-Devel to help understand why
>> this is happening.
>>
>> We are facing a weird situation in our code when using R's [`runif`][1]
>> and
>> setting seed with `set.seed` with the `kind = NULL` option (which
>> resolves,
>> unless I am mistaken, to `kind = "default"`; the default being
>> `"Mersenne-Twister"`).
>>
>> We set the seed using (8 digit) unique IDs generated by an upstream
>> system,
>> before calling `runif`:
>>
>> seeds = c(
>>   "86548915", "86551615", "86566163", "86577411", "86584144",
>>   "86584272", "86620568", "86724613", "86756002", "86768593",
>> "86772411",
>>   "86781516", "86794389", "86805854", "86814600", "86835092",
>> "86874179",
>>   "86876466", "86901193", "86987847", "86988080")
>>
>> random_values = sapply(seeds, function(x) {
>>   set.seed(x)
>>   y = runif(1, 17, 26)
>>   return(y)
>> })
>>
>> This gives values that are **extremely** bunched together.
>>
>> > summary(random_values)
>>Min. 1st Qu.  MedianMean 3rd Qu.Max.
>>   25.13   25.36   25.66   25.58   25.83   25.94
>>
>> This behaviour of `runif` goes away when we use `kind =
>> "Knuth-TAOCP-2002"`, and we get values that appear to be much more evenly
>> spread out.
>>
>> random_values = sapply(seeds, function(x) {
>>   set.seed(x, kind = "Knuth-TAOCP-2002")
>>   y = runif(1, 17, 26)
>>   return(y)
>> })
>>
>> *Output omitted.*
>>
>> ---
>>
>> **The most interesting thing here is that this does not happen on Windows
>> -- only happens on Ubuntu** (`sessionInfo` output for Ubuntu & Windows
>> below).
>>
>> # Windows output: #
>>
>> > seeds = c(
>> +   "86548915", "86551615", "86566163", "86577411", "86584144",
>> +   "86584272", "86620568", "86724613", "86756002", "86768593",
>> "86772411",
>> +   "86781516", "86794389", "86805854", "86814600", "86835092",
>> "86874179",
>> +   "86876466", "86901193", "86987847", "86988080")
>> >
>> > random_values = sapply(seeds, function(x) {
>> +   set.seed(x)
>> +   y = runif(1, 17, 26)
>> +   return(y)
>> + })
>> >
>> > summary(random_values)
>>Min. 1st Qu.  MedianMean 3rd Qu.Max.
>>   17.32   20.14   23.00   22.17   24.07   25.90
>>
>> Can someone help understand what is going on?
>>
>> Ubuntu
>> --
>>
>> R version 3.4.0 (2017-04-21)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> Running under: Ubuntu 16.04.2 LTS
>>
>> Matrix products: default
>> BLAS: 

Re: [Rd] Extreme bunching of random values from runif with Mersenne-Twister seed

2017-11-03 Thread William Dunlap via R-devel
The random numbers in a stream initialized with one seed should have about
the desired distribution.  You don't win by changing the seed all the
time.  Your seeds caused the first numbers of a bunch of streams to be
about the same, but the second and subsequent entries in each stream do
look uniformly distributed.

You didn't say what your 'upstream process' was, but it is easy to come up
with seeds that give about the same first value:

> Filter(function(s){set.seed(s);runif(1,17,26)>25.99}, 1:1)
 [1]  514  532 1951 2631 3974 4068 4229 6092 6432 7264 9090



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Nov 3, 2017 at 12:49 AM, Tirthankar Chakravarty <
tirthankar.li...@gmail.com> wrote:

> This is cross-posted from SO (https://stackoverflow.com/q/47079702/1414455
> ),
> but I now feel that this needs someone from R-Devel to help understand why
> this is happening.
>
> We are facing a weird situation in our code when using R's [`runif`][1] and
> setting seed with `set.seed` with the `kind = NULL` option (which resolves,
> unless I am mistaken, to `kind = "default"`; the default being
> `"Mersenne-Twister"`).
>
> We set the seed using (8 digit) unique IDs generated by an upstream system,
> before calling `runif`:
>
> seeds = c(
>   "86548915", "86551615", "86566163", "86577411", "86584144",
>   "86584272", "86620568", "86724613", "86756002", "86768593",
> "86772411",
>   "86781516", "86794389", "86805854", "86814600", "86835092",
> "86874179",
>   "86876466", "86901193", "86987847", "86988080")
>
> random_values = sapply(seeds, function(x) {
>   set.seed(x)
>   y = runif(1, 17, 26)
>   return(y)
> })
>
> This gives values that are **extremely** bunched together.
>
> > summary(random_values)
>Min. 1st Qu.  MedianMean 3rd Qu.Max.
>   25.13   25.36   25.66   25.58   25.83   25.94
>
> This behaviour of `runif` goes away when we use `kind =
> "Knuth-TAOCP-2002"`, and we get values that appear to be much more evenly
> spread out.
>
> random_values = sapply(seeds, function(x) {
>   set.seed(x, kind = "Knuth-TAOCP-2002")
>   y = runif(1, 17, 26)
>   return(y)
> })
>
> *Output omitted.*
>
> ---
>
> **The most interesting thing here is that this does not happen on Windows
> -- only happens on Ubuntu** (`sessionInfo` output for Ubuntu & Windows
> below).
>
> # Windows output: #
>
> > seeds = c(
> +   "86548915", "86551615", "86566163", "86577411", "86584144",
> +   "86584272", "86620568", "86724613", "86756002", "86768593",
> "86772411",
> +   "86781516", "86794389", "86805854", "86814600", "86835092",
> "86874179",
> +   "86876466", "86901193", "86987847", "86988080")
> >
> > random_values = sapply(seeds, function(x) {
> +   set.seed(x)
> +   y = runif(1, 17, 26)
> +   return(y)
> + })
> >
> > summary(random_values)
>Min. 1st Qu.  MedianMean 3rd Qu.Max.
>   17.32   20.14   23.00   22.17   24.07   25.90
>
> Can someone help understand what is going on?
>
> Ubuntu
> --
>
> R version 3.4.0 (2017-04-21)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 16.04.2 LTS
>
> Matrix products: default
> BLAS: /usr/lib/libblas/libblas.so.3.6.0
> LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8  LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8   LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8   LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8  LC_NAME=en_US.UTF-8
>  [9] LC_ADDRESS=en_US.UTF-8LC_TELEPHONE=en_US.UTF-8
> [11] LC_MEASUREMENT=en_US.UTF-8LC_IDENTIFICATION=en_US.UTF-8
>
> attached base packages:
> [1] parallel  stats graphics  grDevices utils datasets
> methods   base
>
> other attached packages:
> [1] RMySQL_0.10.8   DBI_0.6-1
>  [3] jsonlite_1.4tidyjson_0.2.2
>  [5] optiRum_0.37.3  lubridate_1.6.0
>  [7] httr_1.2.1  gdata_2.18.0
>  [9] XLConnect_0.2-12XLConnectJars_0.2-12
> [11] data.table_1.10.4   stringr_1.2.0
> [13] readxl_1.0.0xlsx_0.5.7
> [15] xlsxjars_0.6.1  rJava_0.9-8
> [17] sqldf_0.4-10RSQLite_1.1-2
> [19] gsubfn_0.6-6proto_1.0.0
> [21] dplyr_0.5.0 purrr_0.2.4
> [23] readr_1.1.1 tidyr_0.6.3
> [25] tibble_1.3.0tidyverse_1.1.1
> [27] rBayesianOptimization_1.1.0 xgboost_0.6-4
> [29] MLmetrics_1.1.1 caret_6.0-76
> [31] ROCR_1.0-7  gplots_3.0.1
> [33] effects_3.1-2   pROC_1.10.0
> [35] pscl_1.4.9  lattice_0.20-35
> [37] MASS_7.3-47 ggplot2_2.2.1
>
> loaded via a namespace (and not attached):
> [1] splines_3.4.0  foreach_1.4.3  AUC_0.3.0
> 

Re: [Rd] Extreme bunching of random values from runif with Mersenne-Twister seed

2017-11-03 Thread Serguei Sokol

Le 03/11/2017 à 14:24, Tirthankar Chakravarty a écrit :

Martin,

Thanks for the helpful reply. Alas I had forgotten that (implied)
unfavorable comparisons of *nix systems with Windows systems would likely
draw irate (but always substantive) responses on the R-devel list -- poor
phrasing on my part. :)

Regardless, let me try to address some of the concerns related to the
construction of the MRE itself and try to see if we can clean away the
shrubbery & zero down on the core issue, since I continue to believe that
this is an issue with either R's implementation or a bad interaction of the
seeds supplied with the Mersenne-Twister algorithm itself.

Is there an issue or not may depend on how the vector 'seeds' was obtained.
If we simply do:

r=range(seeds)
s=seq(r[1], r[2])
# pick up seeds giving the runif() in (25; 26) interval
s25=s[sapply(s, function(i) {set.seed(i); runif(1, 17, 26) > 25})]
all(seeds %in% s25) # TRUE
length(s25)/diff(r) # 0.1107351

Thus, the proportion of such seeds is about 1/9 which is coherent with
the fraction of the interval (25; 26) in (17; 26).
Now, you can pick up any 21 numbers from s25 vector (which is 48631 long) and 
say
"Look! It's weird, all values drawn by runif() are > 25!"
But s25 has nothing strange by itself. If we plot kind of cumulative 
distribution

plot(s25, type="l")

It shows a distribution very close to uniform which means that such seeds
are not grouped more densely or rarely somewhere.
So, how your set of seeds was obtained?

Best,
Serguei.


  The latter would
require a deeper understanding of the algorithm than I have at the moment.
If we can rule out the former through this thread, then I will pursue the
latter solution path.

Responses inline below, but summarizing:

1. All examples now are run using "R CMD BATCH --vanilla" as you have
suggested, to ensure that no other loaded packages or namespace changes
have interfered with the behaviour of `set.seed`.
2. Converting the character vector to integer vector has no impact on the
output.
3. Upgrading to the latest version of R has no impact on the output.
4. Multiplying the seed vector by 10L causes the behaviour to vanish,
calling into question the large integer theory.


On Fri, Nov 3, 2017 at 3:09 PM, Martin Maechler 
wrote:


Why R-devel -- R-help would have been appropriate:

It seems you have not read the help page for
set.seed as I expect it from posters to R-devel.
Why would you use strings instead of integers if you *had* read it ?


The manual (which we did read) says:

seed a single value, interpreted as an integer,

We were confident of R coercing characters to integers correctly. We
tested, prior to making this posting that the behaviour remains intact if
we change the `seeds` variable from a character vector to the "equivalent"
integer vector by hand.


seeds = c(86548915L, 86551615L, 86566163L, 86577411L, 86584144L,

86584272L,
+   86620568L, 86724613L, 86756002L, 86768593L, 86772411L, 86781516L,
+   86794389L, 86805854L, 86814600L, 86835092L, 86874179L, 86876466L,
+   86901193L, 86987847L, 86988080L)

random_values = sapply(seeds, function(x) {

+   set.seed(x)
+   y = runif(1, 17, 26)
+   return(y)
+ })

summary(random_values)

Min. 1st Qu.  MedianMean 3rd Qu.Max.
   25.13   25.36   25.66   25.58   25.83   25.94




 > We are facing a weird situation in our code when using R's
 > [`runif`][1] and setting seed with `set.seed` with the
 > `kind = NULL` option (which resolves, unless I am
 > mistaken, to `kind = "default"`; the default being
 > `"Mersenne-Twister"`).

again this is not what the help page says; rather

  | The use of ‘kind = NULL’ or ‘normal.kind = NULL’ in ‘RNGkind’ or
  | ‘set.seed’ selects the currently-used generator (including that
  | used in the previous session if the workspace has been restored):
  | if no generator has been used it selects ‘"default"’.

but as you have > 90 (!!) packages in your sessionInfo() below,
why should we (or you) know if some of the things you did
before or (implicitly) during loading all these packages did not
change the RNG kind ?


Agreed. We are running this system in production, and we will need
`set.seed` to behave reliably with this session, however, as you say, we
are claiming that there is an issue with the PRNG, so should isolate to an
environment that does not have any of the attendant potential confounding
factors that come with having 90 packages loaded (did you count?).

As mentioned above, we have rerun all examples using "R CMD BATCH
--vanilla" and we can report that the output is unchanged.



 > We set the seed using (8 digit) unique IDs generated by an
 > upstream system, before calling `runif`:

 > seeds = c( "86548915", "86551615", "86566163",
 > "86577411", "86584144", "86584272", "86620568",
 > "86724613", "86756002", "86768593", "86772411",
 > "86781516", "86794389", "86805854", "86814600",
 > "86835092", "86874179", 

Re: [Bioc-devel] any interest in a BiocMatrix core package?

2017-11-03 Thread Martin Morgan

On 11/02/2017 06:20 PM, Peter Hickey wrote:

As Michael notes, I think the scope here is broader than considering S4
generics for functions in base R. To summarise, I think we would be looking
to have S4 generics for the following:

- All(?) the row*/col* functions in matrixStats (NB: matrixStats uses plain
old functions with no S3 or S4, which I believe was to avoid any overhead
of method dispatch since it is explicitly targeting ordinary matrix objects
as input)
- Potentially new row*/col* summaries (i.e. that don't currently exist in
matrixStats)
- Perhaps moving from BiocGenerics the S4 generics defined in
R/matrix-summary.R?
- Perhaps apply() (E.g., DelayedArray defines an S4 generic for this)

Having these as part of base R or in a recommended packages would be great,
but of course comes with its own challenges. The alternative is a
lightweight package, likely better hosted on CRAN than BioC to assist with
wider adoption and integration with Matrix, matrixStats, and other non-BioC
packages.

As Michael notes, getting the generic signature 'right' will be important
and there are undoubtedly other challenges ahead (I've started a TODO).

Might Bioconductor open up a GitHub repo (MatrixGenerics?) where this can
be discussed with accompanying code. I've made the skeleton of a
MatrixGenerics package that I could upload to kick things off, along with
adding my TODOs as Issues on GitHub for further discussion.


I did start this repository as a place to develop more concrete ideas; I 
think that a Bioconductor MatrixGenerics solution would not be optimal, 
so I think of this repository as a place to develop ideas rather than a 
precursor to an actual package.


I invited Pete as a Collaborator with 'Admin' privileges, so I think he 
should be able to extend Collaborator invites to other interested parties.


Martin



Cheers,
Pete


On Thu, 2 Nov 2017 at 13:10 Michael Lawrence 
wrote:


I'm pretty sure we're also considering generics for functions that do not
exist in base R. Like rowVars() and colVars(). This sort of suggests that
matrixStats should be part of base R.

As an aside, we should think about the signature on those implicit
generics. Should they really include na.rm and dims? The simpler the
signature, the easier to understand the API.


On Thu, Nov 2, 2017 at 10:38 AM, Martin Maechler <
maech...@stat.math.ethz.ch

wrote:



Martin Morgan 
 on Thu, 2 Nov 2017 06:17:19 -0400 writes:


 > On 11/02/2017 05:00 AM, Martin Maechler wrote:
 >>> "ML" == Michael Lawrence 
 >>> on Wed, 1 Nov 2017 14:13:54 -0700 writes:
 >>
 >> > Probably way easier to add the generics to the Matrix >
 >> package and everyone just depends on that.
 >>
 >> Yes!  It is 'Recommended' and comes with every R
 >> installation, and has had many such matrix S4 methods in
 >> place for > 10 years, notably for dealing with (large)
 >> sparse matrices.
 >>
 >> Honestly, I (as co-maintainer of Matrix, principal
 >> maintainer for several years now) had been a bit
 >> surprised and frustrated that the 'matrixStats'
 >> initiative had started w/o any contact with the Matrix
 >> package maintainers and initially has not ever tried to
 >> use Matrix package classes or functionality (and this is
 >> still the case now AFAICS).
 >>
 >> I'm happy to coordinate with maintainers of bioc packages
 >> about which generics (and classes !) to use and export,
 >> etc.

 > One issue is that Matrix is a relatively large package
 > (well, I wonder if that's a reasonable statement, given
 > the Bioc dependencies and data involved, but perhaps in
 > general...) and hence 'overkill' to obtain a collection of
 > generics. Is there any prospect for factoring out the
 > definition of the generics from implementation of the
 > methods?  Re-purposing stats4 ?

 > Martin Morgan

Hmm..  we have quite a few  setGenericImplicit()  statements in
the methods package already, notably for  'colSums' and friends,
and so other decent citizen packages do *NOT*  setGeneric() at
all on these ... and of course, Matrix _is_ a decent citizen in
the R package universe.

Instead of to stats4, I'm pretty sure we should only consider
what functions should be added to the implicit generics already
provided by the 'methods' package itself.

Could it be that (some of) you are not properly aware of
implicit generics?

If you start 'R --vanilla' you can say


implicitGeneric("colSums")

standardGeneric for "colSums" defined from package "base"

function (x, na.rm = FALSE, dims = 1, ...)
standardGeneric("colSums")


Methods may be defined for arguments: x, na.rm, dims
Use  showMethods("colSums")  for currently available ones.
-

so I think it is clear how *any* decent package has to define
methods for colSums(), and if they do, there should not be any 

Re: [Rd] Extreme bunching of random values from runif with Mersenne-Twister seed

2017-11-03 Thread Tirthankar Chakravarty
Martin,

Thanks for the helpful reply. Alas I had forgotten that (implied)
unfavorable comparisons of *nix systems with Windows systems would likely
draw irate (but always substantive) responses on the R-devel list -- poor
phrasing on my part. :)

Regardless, let me try to address some of the concerns related to the
construction of the MRE itself and try to see if we can clean away the
shrubbery & zero down on the core issue, since I continue to believe that
this is an issue with either R's implementation or a bad interaction of the
seeds supplied with the Mersenne-Twister algorithm itself. The latter would
require a deeper understanding of the algorithm than I have at the moment.
If we can rule out the former through this thread, then I will pursue the
latter solution path.

Responses inline below, but summarizing:

1. All examples now are run using "R CMD BATCH --vanilla" as you have
suggested, to ensure that no other loaded packages or namespace changes
have interfered with the behaviour of `set.seed`.
2. Converting the character vector to integer vector has no impact on the
output.
3. Upgrading to the latest version of R has no impact on the output.
4. Multiplying the seed vector by 10L causes the behaviour to vanish,
calling into question the large integer theory.


On Fri, Nov 3, 2017 at 3:09 PM, Martin Maechler 
wrote:

> Why R-devel -- R-help would have been appropriate:
>

> It seems you have not read the help page for
> set.seed as I expect it from posters to R-devel.
> Why would you use strings instead of integers if you *had* read it ?
>

The manual (which we did read) says:

seed a single value, interpreted as an integer,

We were confident of R coercing characters to integers correctly. We
tested, prior to making this posting that the behaviour remains intact if
we change the `seeds` variable from a character vector to the "equivalent"
integer vector by hand.

> seeds = c(86548915L, 86551615L, 86566163L, 86577411L, 86584144L,
86584272L,
+   86620568L, 86724613L, 86756002L, 86768593L, 86772411L, 86781516L,
+   86794389L, 86805854L, 86814600L, 86835092L, 86874179L, 86876466L,
+   86901193L, 86987847L, 86988080L)
>
> random_values = sapply(seeds, function(x) {
+   set.seed(x)
+   y = runif(1, 17, 26)
+   return(y)
+ })
>
> summary(random_values)
   Min. 1st Qu.  MedianMean 3rd Qu.Max.
  25.13   25.36   25.66   25.58   25.83   25.94



> > We are facing a weird situation in our code when using R's
> > [`runif`][1] and setting seed with `set.seed` with the
> > `kind = NULL` option (which resolves, unless I am
> > mistaken, to `kind = "default"`; the default being
> > `"Mersenne-Twister"`).
>
> again this is not what the help page says; rather
>
>  | The use of ‘kind = NULL’ or ‘normal.kind = NULL’ in ‘RNGkind’ or
>  | ‘set.seed’ selects the currently-used generator (including that
>  | used in the previous session if the workspace has been restored):
>  | if no generator has been used it selects ‘"default"’.
>
> but as you have > 90 (!!) packages in your sessionInfo() below,
> why should we (or you) know if some of the things you did
> before or (implicitly) during loading all these packages did not
> change the RNG kind ?
>

Agreed. We are running this system in production, and we will need
`set.seed` to behave reliably with this session, however, as you say, we
are claiming that there is an issue with the PRNG, so should isolate to an
environment that does not have any of the attendant potential confounding
factors that come with having 90 packages loaded (did you count?).

As mentioned above, we have rerun all examples using "R CMD BATCH
--vanilla" and we can report that the output is unchanged.


>
> > We set the seed using (8 digit) unique IDs generated by an
> > upstream system, before calling `runif`:
>
> > seeds = c( "86548915", "86551615", "86566163",
> > "86577411", "86584144", "86584272", "86620568",
> > "86724613", "86756002", "86768593", "86772411",
> > "86781516", "86794389", "86805854", "86814600",
> > "86835092", "86874179", "86876466", "86901193",
> > "86987847", "86988080")
>
> >  random_values = sapply(seeds, function(x) {
> >   set.seed(x)
> >   y = runif(1, 17, 26)
> >   return(y)
> > })
>
> Why do you do that?
>
> 1) You should set the seed *once*, not multiple times in one simulation.
>

This code is written like this since this seed is set every time the
function (API) is called for call-level replicability. It doesn't make a
lot of sense in an MRE, but this is a critical component of the larger
function. We do acknowledge that for any one of the seeds in the vector
`seeds` the vector of draws appears to have the uniform distribution.


> 2) Assuming that your strings are correctly translated to integers
>and the same on all platforms, independent of locales (!) etc,
>you are again not following the simple instruction on the help page:
>
>  ‘set.seed’ 

[Bioc-devel] Dockers have been updated for Release and Devel

2017-11-03 Thread Shepherd, Lori
Hello everyone,


The Bioconductor dockers (base and core) have been updated for

http://bioconductor.org/help/docker/




Release

R3.4.2 _ Bioc3.6

https://hub.docker.com/r/bioconductor/release_base2/

https://hub.docker.com/r/bioconductor/release_core2/



Devel

R3.5.0 _ Bioc3.7

https://hub.docker.com/r/bioconductor/devel_base2/

https://hub.docker.com/r/bioconductor/devel_core2/



Cheers,


Lori Shepherd

Bioconductor Core Team

Roswell Park Cancer Institute

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263


This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.
[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Confusion with how to maintain release/devel files on local computer.

2017-11-03 Thread Arman Shahrisa
Thank you very much for all your assistance. Unfortunately, git bash must be 
run as administrator,
otherwise user won’t be able to change the pwd from partition C to another 
partition (It will remain in
the previous directory on partition C). This is why I had to run it as 
administrator.


Best regards,
Arman

From: Martin Morgan
Sent: Friday, November 3, 2017 13:23
To: Arman Shahrisa; 
bioc-devel
Subject: Re: [Bioc-devel] Confusion with how to maintain release/devel files on 
local computer.

Thank you for the detailed description, it is much easier to understand
what is going on. I have made some comments below

On 11/02/2017 11:14 AM, Arman Shahrisa wrote:
>  > There is no branch releas_3_6.
>
>>> I don't know what 'it' is -- I guess your local repository that you
>>> cloned, but where did you clone it from?
>
> I repeated all the process again.
>
> I opened git bash as an administrator and ran commands described here:

it is not necessary or recommended to run these commands as an
administrator; generally all activities should be done as a regular user.

Likely you need, as a regular user, to regenerate your ssh key pair, and
to re-submit the public key to Bioconductor. Alternatively, perhaps as a
regular user you already have an ssh key pair, and you can submit the
public key part of the key pair. Remember that this will take at least
24 hours to process.

>
> https://bioconductor.org/developers/how-to/git/maintain-bioc-only/

This workflow means that you want to maintain only the Bioconductor
version of your repository, you do not want to maintain the github
version of your repository that you used during package submission.

>
>  > Arman@Arman-VAIO MINGW64 ~
>
>  > $ cd /z/cbaf/Source
>
>  > Arman@Arman-VAIO MINGW64 /z/cbaf/Source
>
>  > $ git clone g...@git.bioconductor.org:packages/cbaf
>
>  > Cloning into 'cbaf'...
>
>  > Enter passphrase for key '/c/Users/Arman/.ssh/id_rsa':
>
>  > remote: Counting objects: 732, done.
>
>  > remote: Compressing objects: 100% (218/218), done.
>
>  > remote: Total 732 (delta 503), reused 725 (delta 499)
>
>  > Receiving objects: 100% (732/732), 749.19 KiB | 182.00 KiB/s, done.
>
>  > Resolving deltas: 100% (503/503), done.
>
>  > Arman@Arman-VAIO MINGW64 /z/cbaf/Source
>
>  > $ git remote -v
>
>  > fatal: Not a git repository (or any of the parent directories): .git
>
>  > Arman@Arman-VAIO MINGW64 /z/cbaf/Source
>
>  > $ cd /z/cbaf/Source/cbaf
>
>  > Arman@Arman-VAIO MINGW64 /z/cbaf/Source/cbaf (master)
>
>  > $ git remote -v
>
>  > origin  g...@git.bioconductor.org:packages/cbaf (fetch)
>
>  > origin  g...@git.bioconductor.org:packages/cbaf (push)

ok, all of the above looks good. You cloned the repository, and your ssh
key was recognized.

>
>  > Arman@Arman-VAIO MINGW64 /z/cbaf/Source/cbaf (master)
>
>  > $ git pull
>
>  > Enter passphrase for key '/c/Users/Arman/.ssh/id_rsa':
>
>  > Already up-to-date.
>
> The I ran this one:
>
>  > git remote add upstream g...@git.bioconductor.org:packages/cbaf.git

This step is not required. It is not part of the workflow you cited.
Earlier, when you issued the command 'git remote -v' git indicated that
you already had a remote named 'origin' and referencing the git server
g...@git.bioconductor.org. What the command above does is to add a second
remote pointing to git.bioconductor.org, but this one named 'upstream'.

I can see that the documentation in this work flow needs to be adjusted,
because in other

It would be appropriate to rename the remote, e.g.,

   git remote rename origin upstream

nonetheless, issuing the command should not cause a problem.

It seems like at this stage you should continue with the workflow that
you started, i.e.,

   - 'Commit changes to your local repository' to develop a new feature
or fix a bug, including checking that your code changes are correct by
running R CMD build and R CMD check using the 'devel' version of
Bioconductor, and

   - 'Push your local changes to the Bioconductor repository' to make
your code changes available to the Bioconductor build server. Remember
that builds run once a day, and that you need to confirm that the build
has been successful by visiting http://bioconductor.org/checkResults/.

The steps below are not necessary for you at the moment.

Martin

>
> Then these commands from
> c
>
>  > Arman@Arman-VAIO MINGW64 /z/cbaf/Source/cbaf (master)
>
>  > $ git checkout master
>
>  > Already on 'master'
>
>  > Your branch is up-to-date with 'origin/master'.
>
>  > Arman@Arman-VAIO MINGW64 /z/cbaf/Source/cbaf (master)
>
>  > $ git fetch upstream
>
>  > Enter passphrase for key '/c/Users/Arman/.ssh/id_rsa':
>
>  > From git.bioconductor.org:packages/cbaf
>
>  >  * [new branch]  RELEASE_3_6 -> upstream/RELEASE_3_6
>
>  >  * [new branch]  master  -> upstream/master
>
>  > Arman@Arman-VAIO MINGW64 /z/cbaf/Source/cbaf (master)
>
>  > $ git merge 

Re: [Bioc-devel] Accepted packages can't find each other and fail build

2017-11-03 Thread Shepherd, Lori
The data package and software package are in Bioc 3.6 not Bioc 3.5.  You will 
need to update BiocInstaller in order to have access to the packages.


Lori Shepherd

Bioconductor Core Team

Roswell Park Cancer Institute

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263


From: Sokratis Kariotis 
Sent: Friday, November 3, 2017 7:04:08 AM
To: Herv� Pag�s
Cc: Shepherd, Lori; bioc-devel
Subject: Re: [Bioc-devel] Accepted packages can't find each other and fail build

Hey all,

I can see the landing pages for both packages now (pcxn and pcxnData) but as I 
try to install them using:


source("https://bioconductor.org/biocLite.R;)
biocLite("pcxnData")

I get the following:

BioC_mirror: https://bioconductor.org
Using Bioconductor 3.5 (BiocInstaller 1.26.1), R 3.4.0 (2017-04-21).
Installing package(s) �pcxnData�

Warning message:
package �pcxnData� is not available (for R version 3.4.0)


The same happens with the pcxn package but other unrelated packages seem to 
install fine.


Regards,

Sokratis

On 1 November 2017 at 22:25, Herv� Pag�s 
> wrote:
FYI today's data-experiment builds completed and pcxnData is
green:


https://bioconductor.org/checkResults/3.6/data-experiment-LATEST/pcxnData/

and propagated:

  https://bioconductor.org/packages/3.6/data/experiment/html/pcxnData.html

Now the next step is that the software builds will be able
to install pcxnData on tokay1 (Windows) and veracruz1 (Mac)
from the public data-experiment repo so the results for
pcxn should get cleared tomorrow:

  https://bioconductor.org/checkResults/3.6/bioc-LATEST/pcxn/

It's a long (and admittedly confusing) ping-pong game between the
software and data-expriment builds ;-)

Thanks for your patience,
H.


On 11/01/2017 08:21 AM, Herv� Pag�s wrote:
Hi Sokratis,

Not sure why but it seems that for some reason the build machines
didn't manage to install pcxn so far. Until only now. I went on
build machine malbec1 to check whether it managed to install pcxn,
and it seems that it did:

   > "pcxn" %in% rownames(installed.packages())
   [1] TRUE

According to the log, it looks like this is the 1st time that it
gets installed on malbec1 (the data experiment builds just started
today and did the installation).

So, if everything goes as expected, pcxnData should build successfully
today and pcxnData should propagate (granted of course that it also
passes CHECK). The build/check report for data-experiment packages
should update today around 5pm EST. It will take about 1 hour after
the report is updated for pcxnData to propagate to the public repo
and for its landing page to show up.

Sorry for the inconvenience,

H.

On 11/01/2017 01:36 AM, Sokratis Kariotis wrote:
Hey all,

After the release I can see the release page of the pcxn package but not
pcxnData package. In the build page it says it cannot find the pcxn
package and fails.

Regards,
Sokratis

On 31 October 2017 at 16:33, Shepherd, Lori
 
>>
wrote:

It should be the same and you should be able to see the RELEASE_3_6
branch


git fetch --all

git branch -a


Lori Shepherd

Bioconductor Core Team

Roswell Park Cancer Institute

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263



*From:* Sokratis Kariotis 

>>
*Sent:* Tuesday, October 31, 2017 11:20:39 AM
*To:* Shepherd, Lori
*Cc:* Herv� Pag�s; Obenchain, Valerie; bioc-devel

*Subject:* Re: [Bioc-devel] Accepted packages can't find each other
and fail build
Does the same hold for the pcxnData package? I can't see the
RELEASE_3_6 as in pcxn.

-Sokratis

On 31 October 2017 at 11:38, Shepherd, Lori


>> 
wrote:

The latest version bump change you made to 0.99.27 was yesterday
Oct 30 right before we said to stop committing so we could make
the release branch.  That change did make it into both the
RELEASE_3_6 and the master branch and should appear in the next
build report for both versions.


Note:  It can take 12-24 hours to see version bumps and changes
on the build report. The daily builders runs once per day to
build all the packages; while a version bump is absolutely
required, it is not built instantaneously on a version bump.


Please be sure to pull from upstream before making further

Re: [Bioc-devel] Package update not showing on Bioc 3.6 webpage

2017-11-03 Thread Shepherd, Lori
It can take up to 24 hours for the results to show in the build report.  Please 
check again today and let us know if it is not reflected in the Friday build 
report.


Lori Shepherd

Bioconductor Core Team

Roswell Park Cancer Institute

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263


From: Bioc-devel  on behalf of kushal kumar 
dey 
Sent: Thursday, November 2, 2017 6:52:56 PM
To: Turaga, Nitesh
Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] Package update not showing on Bioc 3.6 webpage

Hi Nitesh,

It seems that the bug fixes I made to my software are still not updated in
the Bioconductor page. I bumped the version to 1.4.1 but I still find the
version to be 1.4.0

https://www.bioconductor.org/packages/3.6/bioc/html/CountClust.html

Didn't the build go through?

Thanks

Kushal

On 1 November 2017 at 16:45, kushal kumar dey  wrote:

> Thanks so much for the help.  I did manage to push with the SSH protocol.
> Will wait till the next build to see if it went through.
>
> Best
>
> Kushal
>
> On 1 November 2017 at 16:18, Turaga, Nitesh  > wrote:
>
>> Hi Kushal,
>>
>> You also need to use the �SSH� protocol, and not the �HTTPS� protocol.
>>
>> Your upstream should be g...@git.bioconductor.org:packages/CountClust
>>
>> Best,
>>
>> Nitesh
>>
>> > On Nov 1, 2017, at 5:06 PM, Turaga, Nitesh
>>  wrote:
>> >
>> > Hi Kushal,
>> >
>> > Can you please try,
>> >
>> >   ssh -T g...@git.bioconductor.org -v
>> >
>> >
>> > Are you using the correct private key to your corresponding public key
>> on our machine?
>> >
>> > If not, please check the FAQ section bioconductor.org/developers/ho
>> w-to/git/faq/ point#15. You can set up a config file to use the correct
>> private key.
>> >
>> > A snippet of your public key on our machine is,
>> >
>> > ssh-rsa B3NzaC1yc2EDAQABAAACAQC8EgObkc/OI4nc8NE0sL4COb6uNaBI
>> URhJoSyisz8i8/KWum0xFhav7ouDpMoZz0lgnlRaIYIRquJx0R44ojDcsN45
>> TBR8Rna4PNBscVJEsbonD0k3wi2OBySgwxTzL5aZl99HzuAzthAPzFuskZDH
>> ahws9sPUtWMxioI6D5ZktVOb/QJbDJdFHFPXcd8l90wRJN0eGJXX9excJBDU
>> 57ufEgDBx9vGv85GTwJjP4UQYLyjDIap2CbrdC+7nJ95fa7YcNUj2znVFcrNnkW9
>> >
>> > Best,
>> >
>> > Nitesh
>> >
>> >
>> >> On Nov 1, 2017, at 2:23 PM, kushal kumar dey 
>> wrote:
>> >>
>> >> Hi Herve,
>> >>
>> >> Thanks for clarifying on the matter. I followed your instructions and
>> did *git
>> >> cherry-pick* to make the following changes
>> >>
>> >> - fix a bug in one of the functions
>> >> - add a citation
>> >> - fix the readme
>> >>
>> >> I bumped the version number from 1.4.0 to 1.4.1 and tried to push to
>> the
>> >> release branch, but I when I run the command,
>> >>
>> >> git push upstream RELEASE_3_6
>> >>
>> >> I meet with the following error
>> >>
>> >> *fatal: remote error: FATAL: W any packages/CountClust nobody DENIED by
>> >> fallthru*
>> >>
>> >> *(or you mis-spelled the reponame)*
>> >>
>> >>
>> >> You can check the branch I want to push on my Github
>> >>
>> >>
>> >>
>> >> *https://github.com/kkdey/CountClust/tree/RELEASE_3_6
>> >> *
>> >>
>> >>
>> >> This is my output from *git remote -v *
>> >>
>> >> *origin https://github.com/kkdey/CountClust.git
>> >>  (fetch)*
>> >>
>> >> *origin https://github.com/kkdey/CountClust.git
>> >>  (push)*
>> >>
>> >> *upstream https://git.bioconductor.org/packages/CountClust
>> >>  (fetch)*
>> >>
>> >> *upstream https://git.bioconductor.org/packages/CountClust
>> >>  (push)*
>> >>
>> >>
>> >> Can you please let me know what I am doing wrong here and how I can
>> update
>> >> the release version?
>> >>
>> >>
>> >> Thank you so much!
>> >>
>> >>
>> >> Kushal
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On 30 October 2017 at 17:50, Herv� Pag�s  wrote:
>> >>
>> >>> Hi Kushal,
>> >>>
>> >>> If you push changes without bumping the version of your package,
>> >>> the update won't propagate.
>> >>>
>> >>> You only bumped the version to 1.5.1 today so you need to wait
>> >>> about 24 hours before seeing this reflected on the landing page
>> >>> for CountClust.
>> >>>
>> >>> Also please note that you did the version bump plus a few other
>> >>> commits (6 commits in total) in master today **after** we created
>> >>> the RELEASE_3_6 branch so these changes will only propagate to
>> >>> BioC 3.7 (the new devel version of BioC, starting tomorrow).
>> >>> If you want these changes to also propagate to the BioC 3.6 release
>> >>> you'll need to push them to the RELEASE_3_6 branch. This is
>> >>> typically done with 'git cherry-pick' as documented here:
>> >>>
>> >>>
>> >>> 

Re: [Rd] Extreme bunching of random values from runif with Mersenne-Twister seed

2017-11-03 Thread Lukas Stadler
If I interpret the original message as “I think there’s something wrong with 
R's random number generator”:
Your assumption is that going from the seed to the first random number is a 
good hash function, which it isn’t.
E.g., with Mersenne Twister it’s a couple of multiplications, bit shifts, xors 
and ands, and the few bits that vary in your seed end up in the less 
significant bits of the result.
Something like the “digest” package might be what you want, it provides proper 
hash functions.

- Lukas

> On 3 Nov 2017, at 10:39, Martin Maechler  wrote:
> 
>> Tirthankar Chakravarty 
>>on Fri, 3 Nov 2017 13:19:12 +0530 writes:
> 
>> This is cross-posted from SO
>> (https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_q_47079702_1414455=DwIGaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=sySSOv_y4gUrdhItlSw7q2z3RRR8JsPrnS8RhIHA9W4=mDEuT7697Im9mtm3dqOQF3Abpcn1ZsA1E_sZE-PZIGg=qm177vnypIq1tc3Km5gwocAEmlwieB9pD5jkClG0I-U=),
>>  but I now
>> feel that this needs someone from R-Devel to help
>> understand why this is happening.
> 
> Why R-devel -- R-help would have been appropriate:
> 
> It seems you have not read the help page for
> set.seed as I expect it from posters to R-devel. 
> Why would you use strings instead of integers if you *had* read it ?
> 
>> We are facing a weird situation in our code when using R's
>> [`runif`][1] and setting seed with `set.seed` with the
>> `kind = NULL` option (which resolves, unless I am
>> mistaken, to `kind = "default"`; the default being
>> `"Mersenne-Twister"`).
> 
> again this is not what the help page says; rather
> 
> | The use of ‘kind = NULL’ or ‘normal.kind = NULL’ in ‘RNGkind’ or
> | ‘set.seed’ selects the currently-used generator (including that
> | used in the previous session if the workspace has been restored):
> | if no generator has been used it selects ‘"default"’.
> 
> but as you have > 90 (!!) packages in your sessionInfo() below,
> why should we (or you) know if some of the things you did
> before or (implicitly) during loading all these packages did not
> change the RNG kind ?
> 
>> We set the seed using (8 digit) unique IDs generated by an
>> upstream system, before calling `runif`:
> 
>>seeds = c( "86548915", "86551615", "86566163",
>> "86577411", "86584144", "86584272", "86620568",
>> "86724613", "86756002", "86768593", "86772411",
>> "86781516", "86794389", "86805854", "86814600",
>> "86835092", "86874179", "86876466", "86901193",
>> "86987847", "86988080")
> 
>> random_values = sapply(seeds, function(x) {
>>  set.seed(x)
>>  y = runif(1, 17, 26)
>>  return(y)
>> })
> 
> Why do you do that?
> 
> 1) You should set the seed *once*, not multiple times in one simulation.
> 
> 2) Assuming that your strings are correctly translated to integers
>   and the same on all platforms, independent of locales (!) etc,
>   you are again not following the simple instruction on the help page:
> 
> ‘set.seed’ uses a single integer argument to set as many seeds as
> are required.  It is intended as a simple way to get quite
> different seeds by specifying small integer arguments, and also as
> .
> .
> 
> Note:   ** small ** integer 
> Why do you assume   86901193  to be a small integer ?
> 
>> This gives values that are **extremely** bunched together.
> 
>>> summary(random_values)
>>   Min. 1st Qu.  Median Mean 3rd Qu.  Max.  25.13
>> 25.36 25.66 25.58 25.83 25.94
> 
>> This behaviour of `runif` goes away when we use `kind =
>> "Knuth-TAOCP-2002"`, and we get values that appear to be
>> much more evenly spread out.
> 
>>random_values = sapply(seeds, function(x) {
>> set.seed(x, kind = "Knuth-TAOCP-2002") y = runif(1, 17,
>> 26) return(y) })
> 
>> *Output omitted.*
> 
>> ---
> 
>> **The most interesting thing here is that this does not
>> happen on Windows -- only happens on Ubuntu**
>> (`sessionInfo` output for Ubuntu & Windows below).
> 
>> # Windows output: #
> 
>>> seeds = c(
>>+ "86548915", "86551615", "86566163", "86577411",
>> "86584144", + "86584272", "86620568", "86724613",
>> "86756002", "86768593", "86772411", + "86781516",
>> "86794389", "86805854", "86814600", "86835092",
>> "86874179", + "86876466", "86901193", "86987847",
>> "86988080")
>>> 
>>> random_values = sapply(seeds, function(x) {
>>+ set.seed(x) + y = runif(1, 17, 26) + return(y) + })
>>> 
>>> summary(random_values)
>>   Min. 1st Qu.  Median Mean 3rd Qu.  Max.  17.32
>> 20.14 23.00 22.17 24.07 25.90
> 
>> Can someone help understand what is going on?
> 
>> Ubuntu
>> --
> 
>> R version 3.4.0 (2017-04-21)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> Running under: Ubuntu 16.04.2 LTS
> 
> You have not learned to get a current version of R.
> ===> You should not write to R-devel (sorry if this may sound harsh ..)
> 
> Hint:
>   We know that  Ubuntu LTS -- by its virtue of LTS (Long Time
>   Support) will not update R.
>   But the Ubuntu/Debian pages on CRAN tell 

Re: [Bioc-devel] Confusion with how to maintain release/devel files on local computer.

2017-11-03 Thread Martin Morgan
Thank you for the detailed description, it is much easier to understand 
what is going on. I have made some comments below


On 11/02/2017 11:14 AM, Arman Shahrisa wrote:

 > There is no branch releas_3_6.

I don't know what 'it' is -- I guess your local repository that you 
cloned, but where did you clone it from?


I repeated all the process again.

I opened git bash as an administrator and ran commands described here:


it is not necessary or recommended to run these commands as an 
administrator; generally all activities should be done as a regular user.


Likely you need, as a regular user, to regenerate your ssh key pair, and 
to re-submit the public key to Bioconductor. Alternatively, perhaps as a 
regular user you already have an ssh key pair, and you can submit the 
public key part of the key pair. Remember that this will take at least 
24 hours to process.




https://bioconductor.org/developers/how-to/git/maintain-bioc-only/


This workflow means that you want to maintain only the Bioconductor 
version of your repository, you do not want to maintain the github 
version of your repository that you used during package submission.




 > Arman@Arman-VAIO MINGW64 ~

 > $ cd /z/cbaf/Source

 > Arman@Arman-VAIO MINGW64 /z/cbaf/Source

 > $ git clone g...@git.bioconductor.org:packages/cbaf

 > Cloning into 'cbaf'...

 > Enter passphrase for key '/c/Users/Arman/.ssh/id_rsa':

 > remote: Counting objects: 732, done.

 > remote: Compressing objects: 100% (218/218), done.

 > remote: Total 732 (delta 503), reused 725 (delta 499)

 > Receiving objects: 100% (732/732), 749.19 KiB | 182.00 KiB/s, done.

 > Resolving deltas: 100% (503/503), done.

 > Arman@Arman-VAIO MINGW64 /z/cbaf/Source

 > $ git remote -v

 > fatal: Not a git repository (or any of the parent directories): .git

 > Arman@Arman-VAIO MINGW64 /z/cbaf/Source

 > $ cd /z/cbaf/Source/cbaf

 > Arman@Arman-VAIO MINGW64 /z/cbaf/Source/cbaf (master)

 > $ git remote -v

 > origin  g...@git.bioconductor.org:packages/cbaf (fetch)

 > origin  g...@git.bioconductor.org:packages/cbaf (push)


ok, all of the above looks good. You cloned the repository, and your ssh 
key was recognized.




 > Arman@Arman-VAIO MINGW64 /z/cbaf/Source/cbaf (master)

 > $ git pull

 > Enter passphrase for key '/c/Users/Arman/.ssh/id_rsa':

 > Already up-to-date.

The I ran this one:

 > git remote add upstream g...@git.bioconductor.org:packages/cbaf.git


This step is not required. It is not part of the workflow you cited. 
Earlier, when you issued the command 'git remote -v' git indicated that 
you already had a remote named 'origin' and referencing the git server 
g...@git.bioconductor.org. What the command above does is to add a second 
remote pointing to git.bioconductor.org, but this one named 'upstream'.


I can see that the documentation in this work flow needs to be adjusted, 
because in other


It would be appropriate to rename the remote, e.g.,

  git remote rename origin upstream

nonetheless, issuing the command should not cause a problem.

It seems like at this stage you should continue with the workflow that 
you started, i.e.,


  - 'Commit changes to your local repository' to develop a new feature 
or fix a bug, including checking that your code changes are correct by 
running R CMD build and R CMD check using the 'devel' version of 
Bioconductor, and


  - 'Push your local changes to the Bioconductor repository' to make 
your code changes available to the Bioconductor build server. Remember 
that builds run once a day, and that you need to confirm that the build 
has been successful by visiting http://bioconductor.org/checkResults/.


The steps below are not necessary for you at the moment.

Martin



Then these commands from 
c


 > Arman@Arman-VAIO MINGW64 /z/cbaf/Source/cbaf (master)

 > $ git checkout master

 > Already on 'master'

 > Your branch is up-to-date with 'origin/master'.

 > Arman@Arman-VAIO MINGW64 /z/cbaf/Source/cbaf (master)

 > $ git fetch upstream

 > Enter passphrase for key '/c/Users/Arman/.ssh/id_rsa':

 > From git.bioconductor.org:packages/cbaf

 >  * [new branch]  RELEASE_3_6 -> upstream/RELEASE_3_6

 >  * [new branch]  master  -> upstream/master

 > Arman@Arman-VAIO MINGW64 /z/cbaf/Source/cbaf (master)

 > $ git merge upstream/master

 > Already up-to-date.



I have attached an screenshot of what I have now. The following is the

Command-line output:

 > Arman@Arman-VAIO MINGW64 /z/cbaf/Source/cbaf (master)

 > $ git branch -a

 > * master

 >   remotes/origin/HEAD -> origin/master

 >   remotes/origin/RELEASE_3_6

 >   remotes/origin/master

 >   remotes/upstream/RELEASE_3_6

 >   remotes/upstream/master

Thank you very much for your help.

Best regards,

Arman

*From: *Martin Morgan 
*Sent: *Thursday, November 2, 2017 17:58
*To: *Arman Shahrisa ; bioc-devel 

*Subject: *Re: [Bioc-devel] Confusion with how to 

Re: [R-pkg-devel] Package valgrind problem I can't solve: Direction?

2017-11-03 Thread Iñaki Úcar
2017-11-03 6:01 GMT+01:00 Peter Dunn :
> Iñaki and all
>
> Well, thanks for pointers to rhub. Wonderful. Moving things to github, but
> have to go home now…
>
> So, when I download CRAN code, initialise w and lambda (which workled for
> Iñaki), and run
>
> rhub::check_with_valgrind()
>
> on the code, I get no errors
> (https://builder.r-hub.io/status/tweedie_2.2.5.tar.gz-c8873979fcf84b4f8a0a4d5a47175f63).
>
>
> But running
>
> R -d "valgrind --tool=memcheck --leak-check=full --track-origins=yes"
> --vanilla < tweedie-Ex.R
>
> from the command line *still* gives me errors about “Conditional jump or
> move depends on uninitialised value(s)” in the subroutine smallp”.

That's impossible. Did you rebuild and reinstall the package after
making those changes?

Iñaki

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [Rd] Extreme bunching of random values from runif with Mersenne-Twister seed

2017-11-03 Thread Martin Maechler
> Tirthankar Chakravarty 
> on Fri, 3 Nov 2017 13:19:12 +0530 writes:

> This is cross-posted from SO
> (https://stackoverflow.com/q/47079702/1414455), but I now
> feel that this needs someone from R-Devel to help
> understand why this is happening.

Why R-devel -- R-help would have been appropriate:

It seems you have not read the help page for
set.seed as I expect it from posters to R-devel. 
Why would you use strings instead of integers if you *had* read it ?

> We are facing a weird situation in our code when using R's
> [`runif`][1] and setting seed with `set.seed` with the
> `kind = NULL` option (which resolves, unless I am
> mistaken, to `kind = "default"`; the default being
> `"Mersenne-Twister"`).

again this is not what the help page says; rather

 | The use of ‘kind = NULL’ or ‘normal.kind = NULL’ in ‘RNGkind’ or
 | ‘set.seed’ selects the currently-used generator (including that
 | used in the previous session if the workspace has been restored):
 | if no generator has been used it selects ‘"default"’.

but as you have > 90 (!!) packages in your sessionInfo() below,
why should we (or you) know if some of the things you did
before or (implicitly) during loading all these packages did not
change the RNG kind ?

> We set the seed using (8 digit) unique IDs generated by an
> upstream system, before calling `runif`:

> seeds = c( "86548915", "86551615", "86566163",
> "86577411", "86584144", "86584272", "86620568",
> "86724613", "86756002", "86768593", "86772411",
> "86781516", "86794389", "86805854", "86814600",
> "86835092", "86874179", "86876466", "86901193",
> "86987847", "86988080")

>  random_values = sapply(seeds, function(x) {
>   set.seed(x)
>   y = runif(1, 17, 26)
>   return(y)
> })

Why do you do that?

1) You should set the seed *once*, not multiple times in one simulation.

2) Assuming that your strings are correctly translated to integers
   and the same on all platforms, independent of locales (!) etc,
   you are again not following the simple instruction on the help page:

 ‘set.seed’ uses a single integer argument to set as many seeds as
 are required.  It is intended as a simple way to get quite
 different seeds by specifying small integer arguments, and also as
 .
 .

Note:   ** small ** integer 
Why do you assume   86901193  to be a small integer ?

> This gives values that are **extremely** bunched together.

>> summary(random_values)
>Min. 1st Qu.  Median Mean 3rd Qu.  Max.  25.13
> 25.36 25.66 25.58 25.83 25.94

> This behaviour of `runif` goes away when we use `kind =
> "Knuth-TAOCP-2002"`, and we get values that appear to be
> much more evenly spread out.

> random_values = sapply(seeds, function(x) {
> set.seed(x, kind = "Knuth-TAOCP-2002") y = runif(1, 17,
> 26) return(y) })

> *Output omitted.*

> ---

> **The most interesting thing here is that this does not
> happen on Windows -- only happens on Ubuntu**
> (`sessionInfo` output for Ubuntu & Windows below).

> # Windows output: #

>> seeds = c(
> + "86548915", "86551615", "86566163", "86577411",
> "86584144", + "86584272", "86620568", "86724613",
> "86756002", "86768593", "86772411", + "86781516",
> "86794389", "86805854", "86814600", "86835092",
> "86874179", + "86876466", "86901193", "86987847",
> "86988080")
>> 
>> random_values = sapply(seeds, function(x) {
> + set.seed(x) + y = runif(1, 17, 26) + return(y) + })
>> 
>> summary(random_values)
>Min. 1st Qu.  Median Mean 3rd Qu.  Max.  17.32
> 20.14 23.00 22.17 24.07 25.90

> Can someone help understand what is going on?

> Ubuntu
> --

> R version 3.4.0 (2017-04-21)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 16.04.2 LTS

You have not learned to get a current version of R.
===> You should not write to R-devel (sorry if this may sound harsh ..)

Hint:
   We know that  Ubuntu LTS -- by its virtue of LTS (Long Time
   Support) will not update R.
   But the Ubuntu/Debian pages on CRAN tell you how to ensure to
   automatically get current versions of R on your ubuntu-run computer
   (Namely by adding a CRAN mirror to your ubuntu sources)

And then in your sessionInfo :


   38 packages attached + 56 namespaces loaded !!


   and similar nonsense (tons of packages+namespaces)
   on Windows which uses an even more outdated version of
   R 3.3.2.

-

Can you please learn to work with a minimal reproducible example MRE
(well you are close in your R code, but not if you load 50
 packages and do how-knows-what before running the example,
 you RNGkind() and many other things could have been changed ...)

Since you run ubuntu, you know the shell and you could
(after installing a current version of 

[Rd] Extreme bunching of random values from runif with Mersenne-Twister seed

2017-11-03 Thread Tirthankar Chakravarty
This is cross-posted from SO (https://stackoverflow.com/q/47079702/1414455),
but I now feel that this needs someone from R-Devel to help understand why
this is happening.

We are facing a weird situation in our code when using R's [`runif`][1] and
setting seed with `set.seed` with the `kind = NULL` option (which resolves,
unless I am mistaken, to `kind = "default"`; the default being
`"Mersenne-Twister"`).

We set the seed using (8 digit) unique IDs generated by an upstream system,
before calling `runif`:

seeds = c(
  "86548915", "86551615", "86566163", "86577411", "86584144",
  "86584272", "86620568", "86724613", "86756002", "86768593",
"86772411",
  "86781516", "86794389", "86805854", "86814600", "86835092",
"86874179",
  "86876466", "86901193", "86987847", "86988080")

random_values = sapply(seeds, function(x) {
  set.seed(x)
  y = runif(1, 17, 26)
  return(y)
})

This gives values that are **extremely** bunched together.

> summary(random_values)
   Min. 1st Qu.  MedianMean 3rd Qu.Max.
  25.13   25.36   25.66   25.58   25.83   25.94

This behaviour of `runif` goes away when we use `kind =
"Knuth-TAOCP-2002"`, and we get values that appear to be much more evenly
spread out.

random_values = sapply(seeds, function(x) {
  set.seed(x, kind = "Knuth-TAOCP-2002")
  y = runif(1, 17, 26)
  return(y)
})

*Output omitted.*

---

**The most interesting thing here is that this does not happen on Windows
-- only happens on Ubuntu** (`sessionInfo` output for Ubuntu & Windows
below).

# Windows output: #

> seeds = c(
+   "86548915", "86551615", "86566163", "86577411", "86584144",
+   "86584272", "86620568", "86724613", "86756002", "86768593",
"86772411",
+   "86781516", "86794389", "86805854", "86814600", "86835092",
"86874179",
+   "86876466", "86901193", "86987847", "86988080")
>
> random_values = sapply(seeds, function(x) {
+   set.seed(x)
+   y = runif(1, 17, 26)
+   return(y)
+ })
>
> summary(random_values)
   Min. 1st Qu.  MedianMean 3rd Qu.Max.
  17.32   20.14   23.00   22.17   24.07   25.90

Can someone help understand what is going on?

Ubuntu
--

R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
[1] LC_CTYPE=en_US.UTF-8  LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8   LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8   LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8  LC_NAME=en_US.UTF-8
 [9] LC_ADDRESS=en_US.UTF-8LC_TELEPHONE=en_US.UTF-8
[11] LC_MEASUREMENT=en_US.UTF-8LC_IDENTIFICATION=en_US.UTF-8

attached base packages:
[1] parallel  stats graphics  grDevices utils datasets
methods   base

other attached packages:
[1] RMySQL_0.10.8   DBI_0.6-1
 [3] jsonlite_1.4tidyjson_0.2.2
 [5] optiRum_0.37.3  lubridate_1.6.0
 [7] httr_1.2.1  gdata_2.18.0
 [9] XLConnect_0.2-12XLConnectJars_0.2-12
[11] data.table_1.10.4   stringr_1.2.0
[13] readxl_1.0.0xlsx_0.5.7
[15] xlsxjars_0.6.1  rJava_0.9-8
[17] sqldf_0.4-10RSQLite_1.1-2
[19] gsubfn_0.6-6proto_1.0.0
[21] dplyr_0.5.0 purrr_0.2.4
[23] readr_1.1.1 tidyr_0.6.3
[25] tibble_1.3.0tidyverse_1.1.1
[27] rBayesianOptimization_1.1.0 xgboost_0.6-4
[29] MLmetrics_1.1.1 caret_6.0-76
[31] ROCR_1.0-7  gplots_3.0.1
[33] effects_3.1-2   pROC_1.10.0
[35] pscl_1.4.9  lattice_0.20-35
[37] MASS_7.3-47 ggplot2_2.2.1

loaded via a namespace (and not attached):
[1] splines_3.4.0  foreach_1.4.3  AUC_0.3.0
modelr_0.1.0
 [5] gtools_3.5.0   assertthat_0.2.0   stats4_3.4.0
 cellranger_1.1.0
 [9] quantreg_5.33  chron_2.3-50   digest_0.6.10
rvest_0.3.2
[13] minqa_1.2.4colorspace_1.3-2   Matrix_1.2-10
plyr_1.8.4
[17] psych_1.7.3.21 XML_3.98-1.7   broom_0.4.2
SparseM_1.77
[21] haven_1.0.0scales_0.4.1   lme4_1.1-13
MatrixModels_0.4-1
[25] mgcv_1.8-17car_2.1-5  nnet_7.3-12
lazyeval_0.2.0
[29] pbkrtest_0.4-7 mnormt_1.5-5   magrittr_1.5
 memoise_1.0.0
[33] nlme_3.1-131   forcats_0.2.0  xml2_1.1.1
 foreign_0.8-69
[37] tools_3.4.0hms_0.3munsell_0.4.3
compiler_3.4.0
[41] caTools_1.17.1 rlang_0.1.1grid_3.4.0
 nloptr_1.0.4
[45] iterators_1.0.8bitops_1.0-6   tcltk_3.4.0
gtable_0.2.0
[49] ModelMetrics_1.1.0 codetools_0.2-15   reshape2_1.4.2 R6_2.2.0

[53] knitr_1.15.1