Re: [R-pkg-devel] Package valgrind problem I can't solve: Direction?

2017-11-05 Thread Merlise Clyde, Ph.D.
Peter,

I ran into similar problems with using valgrind to check  my package  BAS

From my understanding using valgrind with rhub does not run the same checks as

R -d "valgrind --tool=memcheck --leak-check=full" --vanilla < mypkg-Ex.R


see https://github.com/r-hub/rhub/issues/23
which might explain why it will run without errors there but the CRAN check 
will flag errors/warnings.  (perhaps someone else can confirm)


best,
Merlise


Merlise A Clyde
Professor & Chair Department of Statistical Science
Duke University
http://stat.duke.edu/~clyde

cl...@duke.edu
919 681 8440




On Nov 5, 2017, at 7:05 PM, Peter Dunn 
> wrote:

“unless I am being really stupid”

Lucky for that caveat. Feeling rather stupid for this late effort.

Thanks so much.  Help greatly appreciated.

P.



On 6/Nov/17, 9:51 am, "Iñaki Úcar" 
> wrote:

   2017-11-06 0:09 GMT+01:00 Peter Dunn 
>:
Impossible or not… it just happened (unless I am being really stupid, which
is entirely possible, indeed probable). I confirmed again this morning:
After rebuilding (R CMD build) and checking (R CMD check) without any
errors, I used rhub and the command line again:


Running valgrind at the command line, I get this error:

==5097== Conditional jump or move depends on uninitialised value(s)
==5097== at 0x1118F61FB: smallp_ (in
/Users/pdunn2/Library/R/3.4/library/tweedie/libs/tweedie.so)

   What I meant is that, if you run the script in tweedie.Rcheck from the
   command line, Rscript uses your *installed* version of tweedie, not
   the tarball you built with R CMD build. So if you didn't run R CMD
   INSTALL, Rscript is running an old version without any fix, hence the
   disagreement with R CMD check.

   Iñaki



USC, Locked Bag 4, Maroochydore DC, Queensland, 4558 Australia.
CRICOS Provider No: 01595D
Please consider the environment before printing this email.
This email is confidential. If received in error, please delete it from your 
system.

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dpackage-2Ddevel=DwIGaQ=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc=NOkxkvdFOOffXzeTY2kgZQ=wQM2pJJEon1zRq8pImHXAnjSO5MT728HIQf-oNWJfAo=VTJwkGdmvI9lNBVHxBc_38mQ16_pBYZ3dLnKdZBvBwg=


[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] Package valgrind problem I can't solve: Direction?

2017-11-05 Thread Peter Dunn
“unless I am being really stupid”

Lucky for that caveat. Feeling rather stupid for this late effort.

Thanks so much.  Help greatly appreciated.

P.



On 6/Nov/17, 9:51 am, "Iñaki Úcar"  wrote:

2017-11-06 0:09 GMT+01:00 Peter Dunn :
> Impossible or not… it just happened (unless I am being really stupid, 
which
> is entirely possible, indeed probable). I confirmed again this morning:
> After rebuilding (R CMD build) and checking (R CMD check) without any
> errors, I used rhub and the command line again:
>
>
> Running valgrind at the command line, I get this error:
>
> ==5097== Conditional jump or move depends on uninitialised value(s)
> ==5097== at 0x1118F61FB: smallp_ (in
> /Users/pdunn2/Library/R/3.4/library/tweedie/libs/tweedie.so)

What I meant is that, if you run the script in tweedie.Rcheck from the
command line, Rscript uses your *installed* version of tweedie, not
the tarball you built with R CMD build. So if you didn't run R CMD
INSTALL, Rscript is running an old version without any fix, hence the
disagreement with R CMD check.

Iñaki



USC, Locked Bag 4, Maroochydore DC, Queensland, 4558 Australia.
CRICOS Provider No: 01595D
Please consider the environment before printing this email.
This email is confidential. If received in error, please delete it from your 
system.

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] Package valgrind problem I can't solve: Direction?

2017-11-05 Thread Iñaki Úcar
2017-11-06 0:09 GMT+01:00 Peter Dunn :
> Impossible or not… it just happened (unless I am being really stupid, which
> is entirely possible, indeed probable). I confirmed again this morning:
> After rebuilding (R CMD build) and checking (R CMD check) without any
> errors, I used rhub and the command line again:
>
>
> Running valgrind at the command line, I get this error:
>
> ==5097== Conditional jump or move depends on uninitialised value(s)
> ==5097== at 0x1118F61FB: smallp_ (in
> /Users/pdunn2/Library/R/3.4/library/tweedie/libs/tweedie.so)

What I meant is that, if you run the script in tweedie.Rcheck from the
command line, Rscript uses your *installed* version of tweedie, not
the tarball you built with R CMD build. So if you didn't run R CMD
INSTALL, Rscript is running an old version without any fix, hence the
disagreement with R CMD check.

Iñaki

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] Package valgrind problem I can't solve: Direction?

2017-11-05 Thread Peter Dunn
Impossible or not… it just happened (unless I am being really stupid, which is 
entirely possible, indeed probable).  I confirmed again this morning: After 
rebuilding (R CMD build) and checking (R CMD check) without any errors, I used 
rhub and the command line again:


Running valgrind at the command line, I get this error:

==5097== Conditional jump or move depends on uninitialised value(s)
==5097==at 0x1118F61FB: smallp_ (in 
/Users/pdunn2/Library/R/3.4/library/tweedie/libs/tweedie.so)

…when I am here:

$ pwd
/Users/pdunn2/Documents/Research/Rpackages/tweedie/tweedie-debug/CRAN-update/tweedie.Rcheck



In rhub:

check_with_valgrind("tweedie")

gives:

── 0 errors ✔ | 0 warnings ✔ | 0 notes ✔
─  Done with R CMD check
─  Saving artifacts

…when I am here:

> getwd()
[1] 
"/Users/pdunn2/Documents/Research/Rpackages/tweedie/tweedie-debug/CRAN-update"




I will take this to mean that there really is no problem… and put this down to 
things I don’t understand and “experience”.

Thanks for everyone’s help.

P.

From: Iñaki Úcar 
Date: Friday, 3 November 2017 at 7:45 pm
To: Peter Dunn 
Cc: "r-package-devel@r-project.org" 
Subject: Re: [R-pkg-devel] Package valgrind problem I can't solve: Direction?

2017-11-03 6:01 GMT+01:00 Peter Dunn :
> Iñaki and all
>
> Well, thanks for pointers to rhub. Wonderful. Moving things to github, but
> have to go home now…
>
> So, when I download CRAN code, initialise w and lambda (which workled for
> Iñaki), and run
>
> rhub::check_with_valgrind()
>
> on the code, I get no errors
> (https://builder.r-hub.io/status/tweedie_2.2.5.tar.gz-c8873979fcf84b4f8a0a4d5a47175f63).
>
>
> But running
>
> R -d "valgrind --tool=memcheck --leak-check=full --track-origins=yes"
> --vanilla < tweedie-Ex.R
>
> from the command line *still* gives me errors about “Conditional jump or
> move depends on uninitialised value(s)” in the subroutine smallp”.

That's impossible. Did you rebuild and reinstall the package after
making those changes?

Iñaki



USC, Locked Bag 4, Maroochydore DC, Queensland, 4558 Australia.
CRICOS Provider No: 01595D
Please consider the environment before printing this email.
This email is confidential. If received in error, please delete it from your 
system.

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [Rd] Extreme bunching of random values from runif with Mersenne-Twister seed

2017-11-05 Thread Paul Gilbert
I'll point out that there is there is a large literature on generating 
pseudo random numbers for parallel processes, and it is not as easy as 
one (at least me) would intuitively think. By a contra-positive like 
thinking one might guess that it will not be easy to pick seeds in a way 
that will produce independent sequences.


(I'm a bit confused about the objective but) If the objective is to 
produce independent sequence from some different seeds then the RNGs for 
parallel processing might be a good place to start. (And, BTW, if you 
want to reproduce parallel generated random numbers you need to keep 
track of both the starting seed and the number of nodes.)


Paul Gilbert

On 11/05/2017 10:58 AM, peter dalgaard wrote:



On 5 Nov 2017, at 15:17 , Duncan Murdoch  wrote:

On 04/11/2017 10:20 PM, Daniel Nordlund wrote:

Tirthankar,
"random number generators" do not produce random numbers.  Any given
generator produces a fixed sequence of numbers that appear to meet
various tests of randomness.  By picking a seed you enter that sequence
in a particular place and subsequent numbers in the sequence appear to
be unrelated.  There are no guarantees that if YOU pick a SET of seeds
they won't produce a set of values that are of a similar magnitude.
You can likely solve your problem by following Radford Neal's advice of
not using the the first number from each seed.  However, you don't need
to use anything more than the second number.  So, you can modify your
function as follows:
function(x) {
set.seed(x, kind = "default")
y = runif(2, 17, 26)
return(y[2])
  }
Hope this is helpful,


That's assuming that the chosen seeds are unrelated to the function output, 
which seems unlikely on the face of it.  You can certainly choose a set of 
seeds that give high values on the second draw just as easily as you can choose 
seeds that give high draws on the first draw.

The interesting thing about this problem is that Tirthankar doesn't believe 
that the seed selection process is aware of the function output.  I would say 
that it must be, and he should be investigating how that happens if he is 
worried about the output, he shouldn't be worrying about R's RNG.



Hmm, no. The basic issue is that RNGs are constructed so that with x_{n+1} = 
f(x_n),
x_1, x_2, x_3,... will look random, not so that f(s_1), f(s_2), f(s_3), ... 
will look random for any s_1, s_2, ... . This is true, even if seeds s_1, s_2, 
... are not chosen so as to mess with the RNG. In the present case, it seems 
that the seeds around 86e6 tend to give similar output. On the other hand, it 
is not _just_ the similarity in magnitude that does it, try e.g.

s <- as.integer(runif(100, 86.54e6, 86.98e6))
r <- sapply(s, function(s){set.seed(s); runif(1,17,26)})
plot(s,r, pch=".")

and no obvious pattern emerges. My best guess is that the seeds are not only of 
similar magnitude, but also have other bit-pattern similarities.

(Isn't there a Knuth quote to the effect that "Every random number generator will 
fail in at least one application"?)

One remaining issue is whether it is really true that the same seeds givee 
different output on different platforms. That shouldn't happen, I believe.



Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Extreme bunching of random values from runif with Mersenne-Twister seed

2017-11-05 Thread peter dalgaard

> On 5 Nov 2017, at 15:17 , Duncan Murdoch  wrote:
> 
> On 04/11/2017 10:20 PM, Daniel Nordlund wrote:
>> Tirthankar,
>> "random number generators" do not produce random numbers.  Any given
>> generator produces a fixed sequence of numbers that appear to meet
>> various tests of randomness.  By picking a seed you enter that sequence
>> in a particular place and subsequent numbers in the sequence appear to
>> be unrelated.  There are no guarantees that if YOU pick a SET of seeds
>> they won't produce a set of values that are of a similar magnitude.
>> You can likely solve your problem by following Radford Neal's advice of
>> not using the the first number from each seed.  However, you don't need
>> to use anything more than the second number.  So, you can modify your
>> function as follows:
>> function(x) {
>>set.seed(x, kind = "default")
>>y = runif(2, 17, 26)
>>return(y[2])
>>  }
>> Hope this is helpful,
> 
> That's assuming that the chosen seeds are unrelated to the function output, 
> which seems unlikely on the face of it.  You can certainly choose a set of 
> seeds that give high values on the second draw just as easily as you can 
> choose seeds that give high draws on the first draw.
> 
> The interesting thing about this problem is that Tirthankar doesn't believe 
> that the seed selection process is aware of the function output.  I would say 
> that it must be, and he should be investigating how that happens if he is 
> worried about the output, he shouldn't be worrying about R's RNG.
> 

Hmm, no. The basic issue is that RNGs are constructed so that with x_{n+1} = 
f(x_n),
x_1, x_2, x_3,... will look random, not so that f(s_1), f(s_2), f(s_3), ... 
will look random for any s_1, s_2, ... . This is true, even if seeds s_1, s_2, 
... are not chosen so as to mess with the RNG. In the present case, it seems 
that the seeds around 86e6 tend to give similar output. On the other hand, it 
is not _just_ the similarity in magnitude that does it, try e.g.

s <- as.integer(runif(100, 86.54e6, 86.98e6))
r <- sapply(s, function(s){set.seed(s); runif(1,17,26)})
plot(s,r, pch=".")

and no obvious pattern emerges. My best guess is that the seeds are not only of 
similar magnitude, but also have other bit-pattern similarities.

(Isn't there a Knuth quote to the effect that "Every random number generator 
will fail in at least one application"?)

One remaining issue is whether it is really true that the same seeds givee 
different output on different platforms. That shouldn't happen, I believe.


> Duncan Murdoch
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Extreme bunching of random values from runif with Mersenne-Twister seed

2017-11-05 Thread Tirthankar Chakravarty
Duncan, Daniel,

Thanks and indeed we intend to take the advice that Radford and Lukas have
provided in this thread.

I do want to re-iterate that the generating system itself cannot have any
conception of the use of form IDs as seeds for a PRNG *and* the system
itself only generates a sequence of form IDs, which are then filtered & are
passed to our API depending on basic rules on user inputs in that form.
Either in our production system a truly remarkable probability event has
happened or that the Mersenne-Twister is very susceptible to the first draw
in the sequence to be correlated across closely related seeds. Both of
these require understanding the Mersenne-Twister better.

The solution here as has been suggested is to use a different RNG with
adequate burn-in (in which case even MT would work) or to look more
carefully at our problem and understand if we just need a hash function.

In either case, we will cease to question R's implementation of
Mersenne-Twister (for the time being). :)

T



On Sun, Nov 5, 2017 at 7:47 PM, Duncan Murdoch 
wrote:

> On 04/11/2017 10:20 PM, Daniel Nordlund wrote:
>
>> Tirthankar,
>>
>> "random number generators" do not produce random numbers.  Any given
>> generator produces a fixed sequence of numbers that appear to meet
>> various tests of randomness.  By picking a seed you enter that sequence
>> in a particular place and subsequent numbers in the sequence appear to
>> be unrelated.  There are no guarantees that if YOU pick a SET of seeds
>> they won't produce a set of values that are of a similar magnitude.
>>
>> You can likely solve your problem by following Radford Neal's advice of
>> not using the the first number from each seed.  However, you don't need
>> to use anything more than the second number.  So, you can modify your
>> function as follows:
>>
>> function(x) {
>> set.seed(x, kind = "default")
>> y = runif(2, 17, 26)
>> return(y[2])
>>   }
>>
>> Hope this is helpful,
>>
>
> That's assuming that the chosen seeds are unrelated to the function
> output, which seems unlikely on the face of it.  You can certainly choose a
> set of seeds that give high values on the second draw just as easily as you
> can choose seeds that give high draws on the first draw.
>
> The interesting thing about this problem is that Tirthankar doesn't
> believe that the seed selection process is aware of the function output.  I
> would say that it must be, and he should be investigating how that happens
> if he is worried about the output, he shouldn't be worrying about R's RNG.
>
> Duncan Murdoch
>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Extreme bunching of random values from runif with Mersenne-Twister seed

2017-11-05 Thread Duncan Murdoch

On 04/11/2017 10:20 PM, Daniel Nordlund wrote:

Tirthankar,

"random number generators" do not produce random numbers.  Any given
generator produces a fixed sequence of numbers that appear to meet
various tests of randomness.  By picking a seed you enter that sequence
in a particular place and subsequent numbers in the sequence appear to
be unrelated.  There are no guarantees that if YOU pick a SET of seeds
they won't produce a set of values that are of a similar magnitude.

You can likely solve your problem by following Radford Neal's advice of
not using the the first number from each seed.  However, you don't need
to use anything more than the second number.  So, you can modify your
function as follows:

function(x) {
set.seed(x, kind = "default")
y = runif(2, 17, 26)
return(y[2])
  }

Hope this is helpful,


That's assuming that the chosen seeds are unrelated to the function 
output, which seems unlikely on the face of it.  You can certainly 
choose a set of seeds that give high values on the second draw just as 
easily as you can choose seeds that give high draws on the first draw.


The interesting thing about this problem is that Tirthankar doesn't 
believe that the seed selection process is aware of the function output. 
 I would say that it must be, and he should be investigating how that 
happens if he is worried about the output, he shouldn't be worrying 
about R's RNG.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Extreme bunching of random values from runif with Mersenne-Twister seed

2017-11-05 Thread Daniel Nordlund

Tirthankar,

"random number generators" do not produce random numbers.  Any given 
generator produces a fixed sequence of numbers that appear to meet 
various tests of randomness.  By picking a seed you enter that sequence 
in a particular place and subsequent numbers in the sequence appear to 
be unrelated.  There are no guarantees that if YOU pick a SET of seeds 
they won't produce a set of values that are of a similar magnitude.


You can likely solve your problem by following Radford Neal's advice of 
not using the the first number from each seed.  However, you don't need 
to use anything more than the second number.  So, you can modify your 
function as follows:


function(x) {
  set.seed(x, kind = "default")
  y = runif(2, 17, 26)
  return(y[2])
}

Hope this is helpful,

Dan

--
Daniel Nordlund
Port Townsend, WA  USA


On 11/3/2017 11:30 AM, Tirthankar Chakravarty wrote:

Bill,

Appreciate the point that both you and Serguei are making, but the sequence
in question is not a selected or filtered set. These are values as observed
in a sequence from a  mechanism described below. The probabilities required
to generate this exact sequence in the wild seem staggering to me.

T

On Fri, Nov 3, 2017 at 11:27 PM, William Dunlap  wrote:


Another other generator is subject to the same problem with the same
probabilitiy.


Filter(function(s){set.seed(s, kind="Knuth-TAOCP-2002");runif(1,17,26)>25.99},

1:1)
  [1]  280  415  826 1372 2224 2544 3270 3594 3809 4116 4236 5018 5692 7043
7212 7364 7747 9256 9491 9568 9886



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Nov 3, 2017 at 10:31 AM, Tirthankar Chakravarty <
tirthankar.li...@gmail.com> wrote:



Bill,

I have clarified this on SO, and I will copy that clarification in here:

"Sure, we tested them on other 8-digit numbers as well & we could not
replicate. However, these are honest-to-goodness numbers generated by a
non-adversarial system that has no conception of these numbers being used
for anything other than a unique key for an entity -- these are not a
specially constructed edge case. Would be good to know what seeds will and
will not work, and why."

These numbers are generated by an application that serves a form, and
associates form IDs in a sequence. The application calls our API depending
on the form values entered by users, which in turn calls our R code that
executes some code that needs an RNG. Since the API has to be stateless, to
be able to replicate the results for possible debugging, we need to draw
random numbers in a way that we can replicate the results of the API
response -- we use the form ID as seeds.

I repeat, there is no design or anything adversarial about the way that
these numbers were generated -- the system generating these numbers and
the users entering inputs have no conception of our use of an RNG -- this
is meant to just be a random sequence of form IDs. This issue was
discovered completely by chance when the output of the API was observed to
be highly non-random. It is possible that it is a 1/10^8 chance, but that
is hard to believe, given that the API hit depends on user input. Note also
that the issue goes away when we use a different RNG as mentioned below.

T

On Fri, Nov 3, 2017 at 9:58 PM, William Dunlap  wrote:


The random numbers in a stream initialized with one seed should have
about the desired distribution.  You don't win by changing the seed all the
time.  Your seeds caused the first numbers of a bunch of streams to be
about the same, but the second and subsequent entries in each stream do
look uniformly distributed.

You didn't say what your 'upstream process' was, but it is easy to come
up with seeds that give about the same first value:


Filter(function(s){set.seed(s);runif(1,17,26)>25.99}, 1:1)

  [1]  514  532 1951 2631 3974 4068 4229 6092 6432 7264 9090



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Nov 3, 2017 at 12:49 AM, Tirthankar Chakravarty <
tirthankar.li...@gmail.com> wrote:


This is cross-posted from SO (https://stackoverflow.com/q/4
7079702/1414455),
but I now feel that this needs someone from R-Devel to help understand
why
this is happening.

We are facing a weird situation in our code when using R's [`runif`][1]
and
setting seed with `set.seed` with the `kind = NULL` option (which
resolves,
unless I am mistaken, to `kind = "default"`; the default being
`"Mersenne-Twister"`).

We set the seed using (8 digit) unique IDs generated by an upstream
system,
before calling `runif`:

 seeds = c(
   "86548915", "86551615", "86566163", "86577411", "86584144",
   "86584272", "86620568", "86724613", "86756002", "86768593",
"86772411",
   "86781516", "86794389", "86805854", "86814600", "86835092",
"86874179",
   "86876466", "86901193", "86987847", "86988080")

 random_values = sapply(seeds, function(x) {
   set.seed(x)
   y = runif(1, 17, 26)
   return(y)
 })

This gives values that