Tirthankar,

"random number generators" do not produce random numbers. Any given generator produces a fixed sequence of numbers that appear to meet various tests of randomness. By picking a seed you enter that sequence in a particular place and subsequent numbers in the sequence appear to be unrelated. There are no guarantees that if YOU pick a SET of seeds they won't produce a set of values that are of a similar magnitude.

You can likely solve your problem by following Radford Neal's advice of not using the the first number from each seed. However, you don't need to use anything more than the second number. So, you can modify your function as follows:

function(x) {
      set.seed(x, kind = "default")
      y = runif(2, 17, 26)
      return(y[2])
    }

Hope this is helpful,

Dan

--
Daniel Nordlund
Port Townsend, WA  USA


On 11/3/2017 11:30 AM, Tirthankar Chakravarty wrote:
Bill,

Appreciate the point that both you and Serguei are making, but the sequence
in question is not a selected or filtered set. These are values as observed
in a sequence from a  mechanism described below. The probabilities required
to generate this exact sequence in the wild seem staggering to me.

T

On Fri, Nov 3, 2017 at 11:27 PM, William Dunlap <wdun...@tibco.com> wrote:

Another other generator is subject to the same problem with the same
probabilitiy.

Filter(function(s){set.seed(s, kind="Knuth-TAOCP-2002");runif(1,17,26)>25.99},
1:10000)
  [1]  280  415  826 1372 2224 2544 3270 3594 3809 4116 4236 5018 5692 7043
7212 7364 7747 9256 9491 9568 9886



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Nov 3, 2017 at 10:31 AM, Tirthankar Chakravarty <
tirthankar.li...@gmail.com> wrote:


Bill,

I have clarified this on SO, and I will copy that clarification in here:

"Sure, we tested them on other 8-digit numbers as well & we could not
replicate. However, these are honest-to-goodness numbers generated by a
non-adversarial system that has no conception of these numbers being used
for anything other than a unique key for an entity -- these are not a
specially constructed edge case. Would be good to know what seeds will and
will not work, and why."

These numbers are generated by an application that serves a form, and
associates form IDs in a sequence. The application calls our API depending
on the form values entered by users, which in turn calls our R code that
executes some code that needs an RNG. Since the API has to be stateless, to
be able to replicate the results for possible debugging, we need to draw
random numbers in a way that we can replicate the results of the API
response -- we use the form ID as seeds.

I repeat, there is no design or anything adversarial about the way that
these numbers were generated -- the system generating these numbers and
the users entering inputs have no conception of our use of an RNG -- this
is meant to just be a random sequence of form IDs. This issue was
discovered completely by chance when the output of the API was observed to
be highly non-random. It is possible that it is a 1/10^8 chance, but that
is hard to believe, given that the API hit depends on user input. Note also
that the issue goes away when we use a different RNG as mentioned below.

T

On Fri, Nov 3, 2017 at 9:58 PM, William Dunlap <wdun...@tibco.com> wrote:

The random numbers in a stream initialized with one seed should have
about the desired distribution.  You don't win by changing the seed all the
time.  Your seeds caused the first numbers of a bunch of streams to be
about the same, but the second and subsequent entries in each stream do
look uniformly distributed.

You didn't say what your 'upstream process' was, but it is easy to come
up with seeds that give about the same first value:

Filter(function(s){set.seed(s);runif(1,17,26)>25.99}, 1:10000)
  [1]  514  532 1951 2631 3974 4068 4229 6092 6432 7264 9090



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Nov 3, 2017 at 12:49 AM, Tirthankar Chakravarty <
tirthankar.li...@gmail.com> wrote:

This is cross-posted from SO (https://stackoverflow.com/q/4
7079702/1414455),
but I now feel that this needs someone from R-Devel to help understand
why
this is happening.

We are facing a weird situation in our code when using R's [`runif`][1]
and
setting seed with `set.seed` with the `kind = NULL` option (which
resolves,
unless I am mistaken, to `kind = "default"`; the default being
`"Mersenne-Twister"`).

We set the seed using (8 digit) unique IDs generated by an upstream
system,
before calling `runif`:

     seeds = c(
       "86548915", "86551615", "86566163", "86577411", "86584144",
       "86584272", "86620568", "86724613", "86756002", "86768593",
"86772411",
       "86781516", "86794389", "86805854", "86814600", "86835092",
"86874179",
       "86876466", "86901193", "86987847", "86988080")

     random_values = sapply(seeds, function(x) {
       set.seed(x)
       y = runif(1, 17, 26)
       return(y)
     })

This gives values that are **extremely** bunched together.

     > summary(random_values)
        Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
       25.13   25.36   25.66   25.58   25.83   25.94

This behaviour of `runif` goes away when we use `kind =
"Knuth-TAOCP-2002"`, and we get values that appear to be much more
evenly
spread out.

     random_values = sapply(seeds, function(x) {
       set.seed(x, kind = "Knuth-TAOCP-2002")
       y = runif(1, 17, 26)
       return(y)
     })

*Output omitted.*

---

**The most interesting thing here is that this does not happen on
Windows
-- only happens on Ubuntu** (`sessionInfo` output for Ubuntu & Windows
below).

# Windows output: #

     > seeds = c(
     +   "86548915", "86551615", "86566163", "86577411", "86584144",
     +   "86584272", "86620568", "86724613", "86756002", "86768593",
"86772411",
     +   "86781516", "86794389", "86805854", "86814600", "86835092",
"86874179",
     +   "86876466", "86901193", "86987847", "86988080")
     >
     > random_values = sapply(seeds, function(x) {
     +   set.seed(x)
     +   y = runif(1, 17, 26)
     +   return(y)
     + })
     >
     > summary(random_values)
        Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
       17.32   20.14   23.00   22.17   24.07   25.90

Can someone help understand what is going on?

Ubuntu
------

     R version 3.4.0 (2017-04-21)
     Platform: x86_64-pc-linux-gnu (64-bit)
     Running under: Ubuntu 16.04.2 LTS

     Matrix products: default
     BLAS: /usr/lib/libblas/libblas.so.3.6.0
     LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

     locale:
     [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C
      [3] LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8
      [5] LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8
      [7] LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8
      [9] LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8
     [11] LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8

     attached base packages:
     [1] parallel  stats     graphics  grDevices utils     datasets
methods   base

     other attached packages:
     [1] RMySQL_0.10.8               DBI_0.6-1
      [3] jsonlite_1.4                tidyjson_0.2.2
      [5] optiRum_0.37.3              lubridate_1.6.0
      [7] httr_1.2.1                  gdata_2.18.0
      [9] XLConnect_0.2-12            XLConnectJars_0.2-12
     [11] data.table_1.10.4           stringr_1.2.0
     [13] readxl_1.0.0                xlsx_0.5.7
     [15] xlsxjars_0.6.1              rJava_0.9-8
     [17] sqldf_0.4-10                RSQLite_1.1-2
     [19] gsubfn_0.6-6                proto_1.0.0
     [21] dplyr_0.5.0                 purrr_0.2.4
     [23] readr_1.1.1                 tidyr_0.6.3
     [25] tibble_1.3.0                tidyverse_1.1.1
     [27] rBayesianOptimization_1.1.0 xgboost_0.6-4
     [29] MLmetrics_1.1.1             caret_6.0-76
     [31] ROCR_1.0-7                  gplots_3.0.1
     [33] effects_3.1-2               pROC_1.10.0
     [35] pscl_1.4.9                  lattice_0.20-35
     [37] MASS_7.3-47                 ggplot2_2.2.1

     loaded via a namespace (and not attached):
     [1] splines_3.4.0      foreach_1.4.3      AUC_0.3.0
modelr_0.1.0
      [5] gtools_3.5.0       assertthat_0.2.0   stats4_3.4.0
  cellranger_1.1.0
      [9] quantreg_5.33      chron_2.3-50       digest_0.6.10
rvest_0.3.2
     [13] minqa_1.2.4        colorspace_1.3-2   Matrix_1.2-10
plyr_1.8.4
     [17] psych_1.7.3.21     XML_3.98-1.7       broom_0.4.2
SparseM_1.77
     [21] haven_1.0.0        scales_0.4.1       lme4_1.1-13
MatrixModels_0.4-1
     [25] mgcv_1.8-17        car_2.1-5          nnet_7.3-12
lazyeval_0.2.0
     [29] pbkrtest_0.4-7     mnormt_1.5-5       magrittr_1.5
  memoise_1.0.0
     [33] nlme_3.1-131       forcats_0.2.0      xml2_1.1.1
  foreign_0.8-69
     [37] tools_3.4.0        hms_0.3            munsell_0.4.3
compiler_3.4.0
     [41] caTools_1.17.1     rlang_0.1.1        grid_3.4.0
  nloptr_1.0.4
     [45] iterators_1.0.8    bitops_1.0-6       tcltk_3.4.0
gtable_0.2.0
     [49] ModelMetrics_1.1.0 codetools_0.2-15   reshape2_1.4.2
  R6_2.2.0

     [53] knitr_1.15.1       KernSmooth_2.23-15 stringi_1.1.5
Rcpp_0.12.11



Windows
-------

     > sessionInfo()
     R version 3.3.2 (2016-10-31)
     Platform: x86_64-w64-mingw32/x64 (64-bit)
     Running under: Windows >= 8 x64 (build 9200)

     locale:
     [1] LC_COLLATE=English_India.1252  LC_CTYPE=English_India.1252
LC_MONETARY=English_India.1252
     [4] LC_NUMERIC=C                   LC_TIME=English_India.1252

     attached base packages:
     [1] graphics  grDevices utils     datasets  grid      stats
  methods   base

     other attached packages:
      [1] bindrcpp_0.2         h2o_3.14.0.3         ggrepel_0.6.5
eulerr_1.1.0         VennDiagram_1.6.17
      [6] futile.logger_1.4.3  scales_0.4.1         FinCal_0.6.3
  xml2_1.0.0           httr_1.3.0
     [11] wesanderson_0.3.2    wordcloud_2.5        RColorBrewer_1.1-2
  htmltools_0.3.6      urltools_1.6.0
     [16] timevis_0.4          dtplyr_0.0.1         magrittr_1.5
  shiny_1.0.5          RODBC_1.3-14
     [21] zoo_1.8-0            sqldf_0.4-10         RSQLite_1.1-2
gsubfn_0.6-6         proto_1.0.0
     [26] gdata_2.17.0         stringr_1.2.0        XLConnect_0.2-12
  XLConnectJars_0.2-12 data.table_1.10.4
     [31] xlsx_0.5.7           xlsxjars_0.6.1       rJava_0.9-8
readxl_0.1.1         googlesheets_0.2.1
     [36] jsonlite_1.5         tidyjson_0.2.1       RMySQL_0.10.9
RPostgreSQL_0.4-1    DBI_0.5-1
     [41] dplyr_0.7.2          purrr_0.2.3          readr_1.1.1
tidyr_0.7.0          tibble_1.3.3
     [46] ggplot2_2.2.0        tidyverse_1.0.0      lubridate_1.6.0

     loaded via a namespace (and not attached):
      [1] gtools_3.5.0         assertthat_0.2.0     triebeard_0.3.0
cellranger_1.1.0     yaml_2.1.14
      [6] slam_0.1-40          lattice_0.20-34      glue_1.1.1
  chron_2.3-48         digest_0.6.12.1
     [11] colorspace_1.3-1     httpuv_1.3.5         plyr_1.8.4
  pkgconfig_2.0.1      xtable_1.8-2
     [16] lazyeval_0.2.0       mime_0.5             memoise_1.0.0
tools_3.3.2          hms_0.3
     [21] munsell_0.4.3        lambda.r_1.1.9       rlang_0.1.1
RCurl_1.95-4.8       labeling_0.3
     [26] bitops_1.0-6         tcltk_3.3.2          gtable_0.2.0
  reshape2_1.4.2       R6_2.2.0
     [31] bindr_0.1            futile.options_1.0.0 stringi_1.1.2
Rcpp_0.12.12.1

   [1]: http://stat.ethz.ch/R-manual/R-devel/library/stats/html/Unif
orm.html

         [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel






        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to