Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)

2014-07-31 Thread Markus Neteler
On Sun, Jul 27, 2014 at 1:58 AM, Glynn Clements
gl...@gclements.plus.com wrote:
 Glynn Clements wrote:
  I wonder if there are any modules in core or addons which need to be
  updated.

 The following are candidates for conversion:

 Most uses of rand, random, lrand48, etc have been replaced in r61415
 and r61416.

Thanks for your work. It is good that all the private rand()
implementations finally disappeared.

I have made a bundled backport to relbranch7 in r61471.

(The open issues mentioned in your previous email I am not able to judge.)

Markus
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)

2014-07-31 Thread Markus Neteler
On Thu, Jul 31, 2014 at 1:30 PM, Markus Neteler nete...@osgeo.org wrote:
 On Sun, Jul 27, 2014 at 1:58 AM, Glynn Clements
 gl...@gclements.plus.com wrote:
 Glynn Clements wrote:
  I wonder if there are any modules in core or addons which need to be
  updated.

 The following are candidates for conversion:

 Most uses of rand, random, lrand48, etc have been replaced in r61415
 and r61416.

 Thanks for your work. It is good that all the private rand()
 implementations finally disappeared.

 I have made a bundled backport to relbranch7 in r61471.

In r61475 (relbranch70) I have bundle backported the changes to
r.mapcalc/r3.mapcalc, python interface and wx r.mapcalculator.

Markus
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)

2014-07-26 Thread Glynn Clements

Glynn Clements wrote:

  I wonder if there are any modules in core or addons which need to be
  updated.
 
 The following are candidates for conversion:

Most uses of rand, random, lrand48, etc have been replaced in r61415
and r61416.

Many of these modules seeded the RNG from either the clock or the PID,
with no way to provide an explicit seed. In such cases, the use of
G_srand48_auto() has been marked with a FIXME comment.

It would be possible to modify G_srand48_auto() to allow the use of an
environment variable, but this has problems of its own (e.g. setting
the variable manually then forgetting about it).

r.li.daemon/daemon.c uses a hard-coded seed of zero.

r61416 changes readcell.c in r.proj, i.rectify, and i.ortho.rectify. 
These all used rand() to select a random cache block for ejection.

While this wouldn't have affected the result, only the first RAND_MAX
cache blocks would have been used. RAND_MAX is only guaranteed to be
at least 32767, which would limit the effective cache size to 1 GiB
(each block is 64 * 64 = 4096 doubles = 32kiB).

Cases which haven't been changed are:

lib/raster3d/test/test_put_get_value_large_file.c. This appears to be
a test case; does it matter?

include/iostream/quicksort.h uses random() or rand() to select a pivot
for the quicksort algorithm. That file has no dependency on lib/gis
(or anything else, except for stdlib.h for random/rand), and I
didn't want to add one unnecessarily.

Again, this shouldn't affect the result, but there may be performance
issues if the size of the array being sorted is significantly larger
than RAND_MAX (in this situation, the algorithm will be O(n^2) even in
the best case).

Unless there's a specific reason not to, it may be better to simply
replace all uses of that file with std::sort() from algorithm.

-- 
Glynn Clements gl...@gclements.plus.com
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)

2014-07-23 Thread Anna Petrášová
On Tue, Jul 22, 2014 at 10:07 PM, Anna Petrášová kratocha...@gmail.com
wrote:




 On Tue, Jul 22, 2014 at 8:14 PM, Glynn Clements gl...@gclements.plus.com
 wrote:


 Glynn Clements wrote:

  I'm inclined to add both an option (to specify a seed, replacing the
  environment variable) and a flag (to seed from the system clock or
  whatever), and having the PRNG generate a fatal error if neither of
  those are used.

 This is now done.

 r61350 adds the lrand48/mrand48/drand48 equivalents to lib/gis. Brief
 testing suggests that the results are identical to those generated by
 GNU libc (which should be identical to any other POSIX implementation).

 r61352 changes it to generate a fatal error if used prior to seeding.

 r61353 changes r.mapcalc so that seeding is performed via seed= or -s.
 The seed (whether specified by seed= or generated for -s) is added to
 the history (for r.mapcalc; r3.mapcalc's create_history() function is
 a stub; do 3D rasters have history?)

 Note that GRASS_RND_SEED is no longer supported. That was a hack from
 the time before r.mapcalc used G_parser().

 As I write this, it has occurred to me that the behaviour of rand()
 may be non-deterministic in the presence of certain forms of
 parallelism, e.g. multiple occurences of rand() in the expression(s)
 in conjunction with pthreads. Ultimately we may need to expand the
 PRNG to support explicit state (as per erand48, nrand48 and jrand48).


 I added the support for -s and seed to the r.mapcalc gui in 61354. Testing
 is very welcome.

  I wonder if there are any modules in core or addons which need to be
 updated.


For example all TGRASS tests are affected.



 Anna


 --
 Glynn Clements gl...@gclements.plus.com
 ___
 grass-dev mailing list
 grass-dev@lists.osgeo.org
 http://lists.osgeo.org/mailman/listinfo/grass-dev



___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)

2014-07-23 Thread Glynn Clements

Anna Petrášová wrote:

 I wonder if there are any modules in core or addons which need to be
 updated.

The following are candidates for conversion:

lib/gmath/rand1.cdrand48
raster/r.random/creat_rand.c lrand48
vector/v.kcv/main.c  drand48
vector/v.random/main.c   drand48

lib/raster/color_rand.c  rand
imagery/i.ortho.photo/i.ortho.rectify/readcell.c rand
imagery/i.rectify/readcell.c rand
raster/r.proj/readcell.c rand
raster/r.spread/pick_dist.c  rand
raster/r.spread/pick_ignite.crand
vector/v.kcv/main.c  rand
vector/v.qcount/findquads.c  rand
vector/v.random/main.c   rand

r.proj probably doesn't matter, as rand() is only used to choose which
block to eject from the cache, so it shouldn't have any effect upon
the end result.

-- 
Glynn Clements gl...@gclements.plus.com
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)

2014-07-22 Thread Glynn Clements

Paulo van Breugel wrote:

 And it seems to be the default behaviour by python/numpy:

It is, but ...

   import numpy as np
   np.random.random()
 0.8351426142559701
   np.random.random()
 0.4813823441998394
   np.random.random()
 0.7279314267025369

... this example doesn't demonstrate that. Any PRNG returns different
values for successive calls.

The question is whether the PRNG's initial value should autmatically
be seeded from some external source of entropy (e.g. the system
clock), so that the sequence of values differs on different runs.

In turn, that brings up questions about the quality of the entropy
source. The ANSI C time() function typically only has one second
granularity (indeed, POSIX requires this, as time_t is defined as
seconds since the epoch), which is sufficiently course that successive
runs may get the same seed. Other functions aren't portable, and even
where available, the granularity isn't guaranteed.

My main objection to automatic seeding is that people will inevitably
produce non-repeatable results without even realising it.

One possible solution would be to automatically add the seed to the
history of any map generated by r.mapcalc (or possibly only those
which use the rand() function). But that would still only help if the
creator either provides access to the generated maps, or the output
from r.info. Simply providing the commands used and the end result
wouldn't help.

-- 
Glynn Clements gl...@gclements.plus.com
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)

2014-07-22 Thread Glynn Clements

Markus Neteler wrote:

 - if the user needs reproducability, then have a env var to enable that.

And when issue of usability doesn't even get considered until a few
years later when the user (or a colleague) gets an email suggesting
the results can't be be reproduced ...?

I'm inclined to add both an option (to specify a seed, replacing the
environment variable) and a flag (to seed from the system clock or
whatever), and having the PRNG generate a fatal error if neither of
those are used.

That way, neither of the likely problems can arise by oversight.

-- 
Glynn Clements gl...@gclements.plus.com
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)

2014-07-22 Thread Vaclav Petras
On Tue, Jul 22, 2014 at 4:39 PM, Glynn Clements gl...@gclements.plus.com
wrote:


 Paulo van Breugel wrote:

  And it seems to be the default behaviour by python/numpy:

 It is, but ...

import numpy as np
np.random.random()
  0.8351426142559701
np.random.random()
  0.4813823441998394
np.random.random()
  0.7279314267025369

 ... this example doesn't demonstrate that.


Good point, on my computer I get:

 import numpy as np
 np.random.random()
0.49727844715398417

And in different (also freshly started) Python:

 import numpy as np
 np.random.random()
0.2457281014919791

Any PRNG returns different
 values for successive calls.

 The problem is that user may not see the difference between between two
module calls in GRASS command line and two calls of random() function in
Python. When calling GRASS module in Python the difference is even less
visible.

Anyway, the reproducibility would be really nice considering GRASS
scientific audience, however are you sure that different systems will give
same random number for the same seed? Or do you think about reproducible as
as reproducible as possible, e.g. using the same system if necessary.

The question is whether the PRNG's initial value should autmatically
 be seeded from some external source of entropy (e.g. the system
 clock), so that the sequence of values differs on different runs.

 In turn, that brings up questions about the quality of the entropy
 source. The ANSI C time() function typically only has one second
 granularity (indeed, POSIX requires this, as time_t is defined as
 seconds since the epoch), which is sufficiently course that successive
 runs may get the same seed. Other functions aren't portable, and even
 where available, the granularity isn't guaranteed.

 What about time + process id?


 My main objection to automatic seeding is that people will inevitably
 produce non-repeatable results without even realising it.

 One possible solution would be to automatically add the seed to the
 history of any map generated by r.mapcalc (or possibly only those
 which use the rand() function). But that would still only help if the
 creator either provides access to the generated maps, or the output
 from r.info. Simply providing the commands used and the end result
 wouldn't help.

 --
 Glynn Clements gl...@gclements.plus.com
 ___
 grass-dev mailing list
 grass-dev@lists.osgeo.org
 http://lists.osgeo.org/mailman/listinfo/grass-dev

___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)

2014-07-22 Thread Vaclav Petras
On Tue, Jul 22, 2014 at 4:58 PM, Glynn Clements gl...@gclements.plus.com
wrote:

 Markus Neteler wrote:

  - if the user needs reproducability, then have a env var to enable that.

 And when issue of usability doesn't even get considered until a few
 years later when the user (or a colleague) gets an email suggesting
 the results can't be be reproduced ...?

 I'm inclined to add both an option (to specify a seed, replacing the
 environment variable) and a flag (to seed from the system clock or
 whatever), and having the PRNG generate a fatal error if neither of
 those are used.

 That way, neither of the likely problems can arise by oversight.


This looks very good at first glance.
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)

2014-07-22 Thread Paulo van Breugel


On 22-07-14 22:58, Glynn Clements wrote:

Markus Neteler wrote:


- if the user needs reproducability, then have a env var to enable that.

And when issue of usability doesn't even get considered until a few
years later when the user (or a colleague) gets an email suggesting
the results can't be be reproduced ...?

I'm inclined to add both an option (to specify a seed, replacing the
environment variable) and a flag (to seed from the system clock or
whatever), and having the PRNG generate a fatal error if neither of
those are used.

That way, neither of the likely problems can arise by oversight.



I guess there is a lot to say for both approaches, which is why I think 
the suggestion of Markus is a very good one!  +1 from me



___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)

2014-07-22 Thread Markus Neteler
On Tue, Jul 22, 2014 at 11:19 PM, Paulo van Breugel wrote:
 On 22-07-14 22:58, Glynn Clements wrote:
 And when issue of usability doesn't even get considered until a few
 years later when the user (or a colleague) gets an email suggesting
 the results can't be be reproduced ...?

 I'm inclined to add both an option (to specify a seed, replacing the
 environment variable) and a flag (to seed from the system clock or
 whatever), and having the PRNG generate a fatal error if neither of
 those are used.

 That way, neither of the likely problems can arise by oversight.


 I guess there is a lot to say for both approaches, which is why I think the
 suggestion of Markus is a very good one!  +1 from me

It is indeed Glynn's suggestion (which I like, too).

Markus
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)

2014-07-22 Thread Paulo van Breugel


On 22-07-14 23:31, Markus Neteler wrote:

On Tue, Jul 22, 2014 at 11:19 PM, Paulo van Breugel wrote:

On 22-07-14 22:58, Glynn Clements wrote:

And when issue of usability doesn't even get considered until a few
years later when the user (or a colleague) gets an email suggesting
the results can't be be reproduced ...?

I'm inclined to add both an option (to specify a seed, replacing the
environment variable) and a flag (to seed from the system clock or
whatever), and having the PRNG generate a fatal error if neither of
those are used.

That way, neither of the likely problems can arise by oversight.


I guess there is a lot to say for both approaches, which is why I think the
suggestion of Markus is a very good one!  +1 from me

It is indeed Glynn's suggestion (which I like, too).


Sorry, never seem to get used to how my email program displays the 
treads... good suggestion by Glynn I mean.. Glynn, it would be really 
great if you could implement it that way.




Markus


___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)

2014-07-22 Thread Glynn Clements

Vaclav Petras wrote:

 Anyway, the reproducibility would be really nice considering GRASS
 scientific audience, however are you sure that different systems will give
 same random number for the same seed?

They will from now on, because I've replaced the use of the system's
PRNG (either rand or mrand48/drand48) with a portable implementation
of the latter.

  What about time + process id?

That's what's done now (if -s is used). Although we could probably do
with a better hash (currently, it's just addition) and/or more entropy
sources.

-- 
Glynn Clements gl...@gclements.plus.com
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)

2014-07-22 Thread Glynn Clements

Glynn Clements wrote:

 I'm inclined to add both an option (to specify a seed, replacing the
 environment variable) and a flag (to seed from the system clock or
 whatever), and having the PRNG generate a fatal error if neither of
 those are used.

This is now done.

r61350 adds the lrand48/mrand48/drand48 equivalents to lib/gis. Brief
testing suggests that the results are identical to those generated by
GNU libc (which should be identical to any other POSIX implementation).

r61352 changes it to generate a fatal error if used prior to seeding.

r61353 changes r.mapcalc so that seeding is performed via seed= or -s. 
The seed (whether specified by seed= or generated for -s) is added to
the history (for r.mapcalc; r3.mapcalc's create_history() function is
a stub; do 3D rasters have history?)

Note that GRASS_RND_SEED is no longer supported. That was a hack from
the time before r.mapcalc used G_parser().

As I write this, it has occurred to me that the behaviour of rand()
may be non-deterministic in the presence of certain forms of
parallelism, e.g. multiple occurences of rand() in the expression(s)
in conjunction with pthreads. Ultimately we may need to expand the
PRNG to support explicit state (as per erand48, nrand48 and jrand48).

-- 
Glynn Clements gl...@gclements.plus.com
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)

2014-07-22 Thread Anna Petrášová
On Tue, Jul 22, 2014 at 8:14 PM, Glynn Clements gl...@gclements.plus.com
wrote:


 Glynn Clements wrote:

  I'm inclined to add both an option (to specify a seed, replacing the
  environment variable) and a flag (to seed from the system clock or
  whatever), and having the PRNG generate a fatal error if neither of
  those are used.

 This is now done.

 r61350 adds the lrand48/mrand48/drand48 equivalents to lib/gis. Brief
 testing suggests that the results are identical to those generated by
 GNU libc (which should be identical to any other POSIX implementation).

 r61352 changes it to generate a fatal error if used prior to seeding.

 r61353 changes r.mapcalc so that seeding is performed via seed= or -s.
 The seed (whether specified by seed= or generated for -s) is added to
 the history (for r.mapcalc; r3.mapcalc's create_history() function is
 a stub; do 3D rasters have history?)

 Note that GRASS_RND_SEED is no longer supported. That was a hack from
 the time before r.mapcalc used G_parser().

 As I write this, it has occurred to me that the behaviour of rand()
 may be non-deterministic in the presence of certain forms of
 parallelism, e.g. multiple occurences of rand() in the expression(s)
 in conjunction with pthreads. Ultimately we may need to expand the
 PRNG to support explicit state (as per erand48, nrand48 and jrand48).


I added the support for -s and seed to the r.mapcalc gui in 61354. Testing
is very welcome.

I wonder if there are any modules in core or addons which need to be
updated.

Anna


 --
 Glynn Clements gl...@gclements.plus.com
 ___
 grass-dev mailing list
 grass-dev@lists.osgeo.org
 http://lists.osgeo.org/mailman/listinfo/grass-dev

___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)

2014-07-22 Thread Vaclav Petras
On Tue, Jul 22, 2014 at 8:14 PM, Glynn Clements gl...@gclements.plus.com
wrote:

 r61353 changes r.mapcalc so that seeding is performed via seed= or -s.
 The seed (whether specified by seed= or generated for -s) is added to
 the history (for r.mapcalc; r3.mapcalc's create_history() function is
 a stub; do 3D rasters have history?)


I added test for r61353 in r61355. Tests are only for r.mapcalc not for
r3.mapcalc.

http://trac.osgeo.org/grass/changeset/61355



 As I write this, it has occurred to me that the behaviour of rand()
 may be non-deterministic in the presence of certain forms of
 parallelism, e.g. multiple occurences of rand() in the expression(s)
 in conjunction with pthreads. Ultimately we may need to expand the
 PRNG to support explicit state (as per erand48, nrand48 and jrand48).


The tests are not testing any of this (at least not explicitly),
contributions welcome.
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)

2014-07-21 Thread Markus Neteler
On Sun, Jul 6, 2014 at 12:25 AM, Glynn Clements
gl...@gclements.plus.com wrote:
 Glynn Clements gl...@gclements.plus.com wrote:
...
 In ticket #2272, I attached a portable implementation of lrand48(). If
 desired, we could add this to libgis and use that in preference to any
 implementation-specific PRNG.

This would be excellent.

 If you want a different result each time, set GRASS_RND_SEED to a
 different value each time, e.g.

IMHO this is not intuitive at all. I would suggest to invert the
behaviour for GRASS 7:
- per default generate random numbers which differ,
- if the user needs reproducability, then have a env var to enable that.

 The main thing is that I believe that
 reproducibility should be the default.

I humbly disagree. This is not what the user expects. It is also the
opposite of how for example R behaves:

R
 runif(1)
[1] 0.5624295
 runif(1)
[1] 0.1683853

http://en.wikibooks.org/wiki/R_Programming/Random_Number_Generation#Seed
 If you want to perform an exact replication of your program, you
have to specify the seed using the function set.seed().

 If people have to take explicit
 action to introduce randomness,

The problem is that most will not even realize the current behaviour of rand().

 they're more likely to consider the
 issues involved. If randomised seeds are the default, the lack of
 reproducibility may not be considered until it is too late.

The R community (and some users here) think the opposite... when you
ask for rand() then you expect a random number. Just to avoid this:
https://xkcd.com/221/

Markus
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)

2014-07-21 Thread Paulo van Breugel


On 21-07-14 19:01, Markus Neteler wrote:

On Sun, Jul 6, 2014 at 12:25 AM, Glynn Clements
gl...@gclements.plus.com wrote:

Glynn Clements gl...@gclements.plus.com wrote:

...

In ticket #2272, I attached a portable implementation of lrand48(). If
desired, we could add this to libgis and use that in preference to any
implementation-specific PRNG.

This would be excellent.


If you want a different result each time, set GRASS_RND_SEED to a
different value each time, e.g.

IMHO this is not intuitive at all. I would suggest to invert the
behaviour for GRASS 7:
- per default generate random numbers which differ,
- if the user needs reproducability, then have a env var to enable that.


The main thing is that I believe that
reproducibility should be the default.

I humbly disagree. This is not what the user expects. It is also the
opposite of how for example R behaves:

R

runif(1)

[1] 0.5624295

runif(1)

[1] 0.1683853

http://en.wikibooks.org/wiki/R_Programming/Random_Number_Generation#Seed
 If you want to perform an exact replication of your program, you
have to specify the seed using the function set.seed().


If people have to take explicit
action to introduce randomness,

The problem is that most will not even realize the current behaviour of rand().


they're more likely to consider the
issues involved. If randomised seeds are the default, the lack of
reproducibility may not be considered until it is too late.

The R community (and some users here) think the opposite... when you
ask for rand() then you expect a random number.
And not only the R community I am sure. In all statistical packages I 
have ever worked with  one can see the same behaviour, a random number 
is random (i.e., each time a different seed), unless the seed is 
explicitly defined by the user. And it seems to be the default behaviour 
by python/numpy:


 import numpy as np
 np.random.random()
0.8351426142559701
 np.random.random()
0.4813823441998394
 np.random.random()
0.7279314267025369


Just to avoid this:
https://xkcd.com/221/

Markus


___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)

2014-07-05 Thread Glynn Clements

Vaclav Petras wrote:

   Shouldn't the seed not be generated on e.g, OS time,
   which would ensure that each run would give a different result?
 
  No. The reason is to provide reproducibility. Anyone running the same
  command with the same data should obtain the same result.
 
 Does the reproducibility go behind one operating system, compiler or
 library?

If drand48() is used, yes. If rand() is used, no.

 I don't think that the first random number is specified by the C
 language standard.

The C standard doesn't specify any particular implementation for
rand() (it does give an example implementation, but it only produces
15-bit values). It does specify that if the PRNG isn't explicitly
seeded, the behaviour is as if srand(1) was called beforehand. 
[§7.20.2.2p2]

IOW, the sequence of results is implementation-dependent, but it may
not change from one run to the next unless the program explicitly
seeds the PRNG with a non-deterministic value such as the current
time.

 If the results would be really reproducible it would be
 good for testing framework but I'm afraid that they are not (with my
 limited knowledge about the topic).

In ticket #2272, I attached a portable implementation of lrand48(). If
desired, we could add this to libgis and use that in preference to any
implementation-specific PRNG.

  If you want a different result each time, set GRASS_RND_SEED to a
  different value each time, e.g.
 
  GRASS_RND_SEED=`date +%N` r.mapcalc a = rand(0,100)
 
  [%N is the nanoseconds portion of the current time; this is a GNU
  extension.]
 
 I've heard that this is not enough on powerful computers/clusters, that you
 have to use also PID because nanoseconds might be the same (I think I
 rememberer that it was nanoseconds not seconds).

The main issue is on systems where the reported time only changes in
increments of a scheduler tick (e.g. 10ms on old versions of Linux).

   On a related note, it would be nice to be able to set the seed (I think
   there has been such a request before, but not sure about the answer at
  that
   time).
 
  GRASS_RND_SEED was the answer.
 
 
 I think there should be some possibility of randomization (auto-setting of
 seed) build-in the modules providing random(ized) results. Perhaps a flag
 which would turn it on. It can be also an option which would behave like
 GRASS_RND_SEED but would have one special value for auto-generating the
 seed. (GRASS_RND_SEED if present would override this option.) With the
 default value of the option we should ask a question what is actually the
 expected behavior of the module giving random results.

That's certainly reasonable. The main thing is that I believe that
reproducibility should be the default. If people have to take explicit
action to introduce randomness, they're more likely to consider the
issues involved. If randomised seeds are the default, the lack of
reproducibility may not be considered until it is too late.

-- 
Glynn Clements gl...@gclements.plus.com
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)

2014-07-05 Thread Glynn Clements

Paulo van Breugel wrote:

 Just a quick additional question, how to set this GRASS_RND_SEED from
 within a python script (I want to add the option to set the seed with a
 seed parameter in my script, as suggested in the previous email).

You can modify os.environ prior to calling it, e.g.

import time
import grass.script as grass
...
t = int(time.time() * 1e9) % (2**31)
os.environ['GRASS_RND_SEED'] = '%d' % t
grass.mapcalc(...)

-- 
Glynn Clements gl...@gclements.plus.com
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)

2014-07-05 Thread Paulo van Breugel
On Sun, Jul 6, 2014 at 12:34 AM, Glynn Clements gl...@gclements.plus.com
wrote:


 Paulo van Breugel wrote:

  Just a quick additional question, how to set this GRASS_RND_SEED from
  within a python script (I want to add the option to set the seed with a
  seed parameter in my script, as suggested in the previous email).

 You can modify os.environ prior to calling it, e.g.

 import time
 import grass.script as grass
 ...
 t = int(time.time() * 1e9) % (2**31)
 os.environ['GRASS_RND_SEED'] = '%d' % t
 grass.mapcalc(...)


Hi, thanks.. I found out the solution after a bit of diving into the
documentation. I btw still think the default should be to have a random
seed as I think that is what most people would expect (I did, but after
running a function for a day and night, I found out I was wrong). But
anyway, it ultimately comes down to preference, so most important I think
is if the user has a clear choice available in both the gui and on the
command line. If that could be implemented, either way, that would be great.


 --
 Glynn Clements gl...@gclements.plus.com

___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)

2014-07-04 Thread Paulo van Breugel
On Thu, Jul 3, 2014 at 9:39 AM, Paulo van Breugel p.vanbreu...@gmail.com
wrote:

 Just a quick additional question, how to set this GRASS_RND_SEED from
 within a python script (I want to add the option to set the seed with a
 seed parameter in my script, as suggested in the previous email).


Concerning the question above, I found out how to do so. I used it in my
r.random.weight script (in grass 7 addons svn). This script uses the rand()
function in r.mapcalc. But rather than using the same seed (1), there is
the option to set the seed, while as default the a time-dependent seed is
set. I am sure there are better ways to do this, but it works.



 On Thu, Jul 3, 2014 at 8:55 AM, Paulo van Breugel p.vanbreu...@gmail.com
 wrote:


 On 03-07-14 03:43, Vaclav Petras wrote:


 On Wed, Jul 2, 2014 at 8:15 PM, Glynn Clements gl...@gclements.plus.com
 wrote:

  Shouldn't the seed not be generated on e.g, OS time,
  which would ensure that each run would give a different result?

  No. The reason is to provide reproducibility. Anyone running the same
 command with the same data should obtain the same result.

   It is certainly be good to be able to reproduce commands. However, I
 think in most (statistical) software the default / expected behaviour is to
 have a new automatically generated seed at each run. In R for example, if
 you have to explicitly specify the seed using the function set.seed().  I
 would think therefore what most users will  expect a similar behaviour in
 GRASS. It would certainly be my personal preference to have the option to
 set the seed explicitly if you want reproducibility, but have it generated
 automatically otherwise. But that is just a personal preference.


  Does the reproducibility go behind one operating system, compiler or
 library? I don't think that the first random number is specified by the C
 language standard. If the results would be really reproducible it would be
 good for testing framework but I'm afraid that they are not (with my
 limited knowledge about the topic).


 If you want a different result each time, set GRASS_RND_SEED to a
 different value each time, e.g.

 GRASS_RND_SEED=`date +%N` r.mapcalc a = rand(0,100)

 [%N is the nanoseconds portion of the current time; this is a GNU
 extension.]

   Perhaps this can be explained like this in the manual page? A far
 better option would be to provide this as  a normal parameter so it can be
 set from the gui interface or command line like any other variable.


  I've heard that this is not enough on powerful computers/clusters, that
 you have to use also PID because nanoseconds might be the same (I think I
 rememberer that it was nanoseconds not seconds).



  On a related note, it would be nice to be able to set the seed (I think
  there has been such a request before, but not sure about the answer at
 that
  time).

  GRASS_RND_SEED was the answer.


  I think there should be some possibility of randomization (auto-setting
 of seed) build-in the modules providing random(ized) results. Perhaps a
 flag which would turn it on. It can be also an option which would behave
 like GRASS_RND_SEED but would have one special value for auto-generating
 the seed. (GRASS_RND_SEED if present would override this option.) With the
 default value of the option we should ask a question what is actually the
 expected behavior of the module giving random results.

 Yes, that would be great. As for the default value, see my earlier
 argument.


 This would provide a nicer interface in Python, standard interface in
 command line, and possibility to set it in the GUI (which means possibility
 to set it for users which don't use command line.) Moreover, it would
 provide all users with the way of setting the random seen in the manner
 which we consider the best according to our knowledge.

 Agree. The way to set the seed now may not be understood by everybody and
 with all the work going into streamlining the GUI, this kind of fairly
 important options should also be available through the GUI


  Vaclav




___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)

2014-07-03 Thread Paulo van Breugel


On 03-07-14 03:43, Vaclav Petras wrote:


On Wed, Jul 2, 2014 at 8:15 PM, Glynn Clements 
gl...@gclements.plus.com mailto:gl...@gclements.plus.com wrote:


 Shouldn't the seed not be generated on e.g, OS time,
 which would ensure that each run would give a different result?

No. The reason is to provide reproducibility. Anyone running the same
command with the same data should obtain the same result.

It is certainly be good to be able to reproduce commands. However, I 
think in most (statistical) software the default / expected behaviour is 
to have a new automatically generated seed at each run. In R for 
example, if you have to explicitly specify the seed using the function 
set.seed().  I would think therefore what most users will  expect a 
similar behaviour in GRASS. It would certainly be my personal preference 
to have the option to set the seed explicitly if you want 
reproducibility, but have it generated automatically otherwise. But that 
is just a personal preference.



Does the reproducibility go behind one operating system, compiler or 
library? I don't think that the first random number is specified by 
the C language standard. If the results would be really reproducible 
it would be good for testing framework but I'm afraid that they are 
not (with my limited knowledge about the topic).


If you want a different result each time, set GRASS_RND_SEED to a
different value each time, e.g.

GRASS_RND_SEED=`date +%N` r.mapcalc a = rand(0,100)

[%N is the nanoseconds portion of the current time; this is a GNU
extension.]

Perhaps this can be explained like this in the manual page? A far better 
option would be to provide this as  a normal parameter so it can be set 
from the gui interface or command line like any other variable.


I've heard that this is not enough on powerful computers/clusters, 
that you have to use also PID because nanoseconds might be the same (I 
think I rememberer that it was nanoseconds not seconds).



 On a related note, it would be nice to be able to set the seed
(I think
 there has been such a request before, but not sure about the
answer at that
 time).

GRASS_RND_SEED was the answer.


I think there should be some possibility of randomization 
(auto-setting of seed) build-in the modules providing random(ized) 
results. Perhaps a flag which would turn it on. It can be also an 
option which would behave like GRASS_RND_SEED but would have one 
special value for auto-generating the seed. (GRASS_RND_SEED if present 
would override this option.) With the default value of the option we 
should ask a question what is actually the expected behavior of the 
module giving random results.

Yes, that would be great. As for the default value, see my earlier argument.


This would provide a nicer interface in Python, standard interface in 
command line, and possibility to set it in the GUI (which means 
possibility to set it for users which don't use command line.) 
Moreover, it would provide all users with the way of setting the 
random seen in the manner which we consider the best according to our 
knowledge.
Agree. The way to set the seed now may not be understood by everybody 
and with all the work going into streamlining the GUI, this kind of 
fairly important options should also be available through the GUI


Vaclav


___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)

2014-07-03 Thread Paulo van Breugel
Just a quick additional question, how to set this GRASS_RND_SEED from
within a python script (I want to add the option to set the seed with a
seed parameter in my script, as suggested in the previous email).


On Thu, Jul 3, 2014 at 8:55 AM, Paulo van Breugel p.vanbreu...@gmail.com
wrote:


 On 03-07-14 03:43, Vaclav Petras wrote:


 On Wed, Jul 2, 2014 at 8:15 PM, Glynn Clements gl...@gclements.plus.com
 wrote:

  Shouldn't the seed not be generated on e.g, OS time,
  which would ensure that each run would give a different result?

  No. The reason is to provide reproducibility. Anyone running the same
 command with the same data should obtain the same result.

   It is certainly be good to be able to reproduce commands. However, I
 think in most (statistical) software the default / expected behaviour is to
 have a new automatically generated seed at each run. In R for example, if
 you have to explicitly specify the seed using the function set.seed().  I
 would think therefore what most users will  expect a similar behaviour in
 GRASS. It would certainly be my personal preference to have the option to
 set the seed explicitly if you want reproducibility, but have it generated
 automatically otherwise. But that is just a personal preference.


  Does the reproducibility go behind one operating system, compiler or
 library? I don't think that the first random number is specified by the C
 language standard. If the results would be really reproducible it would be
 good for testing framework but I'm afraid that they are not (with my
 limited knowledge about the topic).


 If you want a different result each time, set GRASS_RND_SEED to a
 different value each time, e.g.

 GRASS_RND_SEED=`date +%N` r.mapcalc a = rand(0,100)

 [%N is the nanoseconds portion of the current time; this is a GNU
 extension.]

   Perhaps this can be explained like this in the manual page? A far
 better option would be to provide this as  a normal parameter so it can be
 set from the gui interface or command line like any other variable.


  I've heard that this is not enough on powerful computers/clusters, that
 you have to use also PID because nanoseconds might be the same (I think I
 rememberer that it was nanoseconds not seconds).



  On a related note, it would be nice to be able to set the seed (I think
  there has been such a request before, but not sure about the answer at
 that
  time).

  GRASS_RND_SEED was the answer.


  I think there should be some possibility of randomization (auto-setting
 of seed) build-in the modules providing random(ized) results. Perhaps a
 flag which would turn it on. It can be also an option which would behave
 like GRASS_RND_SEED but would have one special value for auto-generating
 the seed. (GRASS_RND_SEED if present would override this option.) With the
 default value of the option we should ask a question what is actually the
 expected behavior of the module giving random results.

 Yes, that would be great. As for the default value, see my earlier
 argument.


 This would provide a nicer interface in Python, standard interface in
 command line, and possibility to set it in the GUI (which means possibility
 to set it for users which don't use command line.) Moreover, it would
 provide all users with the way of setting the random seen in the manner
 which we consider the best according to our knowledge.

 Agree. The way to set the seed now may not be understood by everybody and
 with all the work going into streamlining the GUI, this kind of fairly
 important options should also be available through the GUI


  Vaclav



___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)

2014-07-02 Thread Glynn Clements

Paulo van Breugel wrote:

 When I run several times e.g., r.mapcalc a = rand(0,100)
 
 I am always getting exactly the same layer. In the help file it reads:
 
 The environment variable GRASS_RND_SEED is read to initialize the random
 number generator
 
 But what does it mean.

The value of that environment variable is parsed using atol() and the
result used to seed the PRNG (via srand() or srand48()) (setup_rand()
in r.mapcalc/evaluate.c).

If the variable isn't set, the PRNG isn't explicitly seeded. For
rand(), the result should be equivalent to GRASS_RND_SEED=1.

 Shouldn't the seed not be generated on e.g, OS time,
 which would ensure that each run would give a different result?

No. The reason is to provide reproducibility. Anyone running the same
command with the same data should obtain the same result.

If you want a different result each time, set GRASS_RND_SEED to a
different value each time, e.g.

GRASS_RND_SEED=`date +%N` r.mapcalc a = rand(0,100)

[%N is the nanoseconds portion of the current time; this is a GNU
extension.]

 On a related note, it would be nice to be able to set the seed (I think
 there has been such a request before, but not sure about the answer at that
 time).

GRASS_RND_SEED was the answer.

-- 
Glynn Clements gl...@gclements.plus.com
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)

2014-07-02 Thread Vaclav Petras
On Wed, Jul 2, 2014 at 8:15 PM, Glynn Clements gl...@gclements.plus.com
wrote:

  Shouldn't the seed not be generated on e.g, OS time,
  which would ensure that each run would give a different result?

 No. The reason is to provide reproducibility. Anyone running the same
 command with the same data should obtain the same result.

 Does the reproducibility go behind one operating system, compiler or
library? I don't think that the first random number is specified by the C
language standard. If the results would be really reproducible it would be
good for testing framework but I'm afraid that they are not (with my
limited knowledge about the topic).


 If you want a different result each time, set GRASS_RND_SEED to a
 different value each time, e.g.

 GRASS_RND_SEED=`date +%N` r.mapcalc a = rand(0,100)

 [%N is the nanoseconds portion of the current time; this is a GNU
 extension.]


I've heard that this is not enough on powerful computers/clusters, that you
have to use also PID because nanoseconds might be the same (I think I
rememberer that it was nanoseconds not seconds).



  On a related note, it would be nice to be able to set the seed (I think
  there has been such a request before, but not sure about the answer at
 that
  time).

 GRASS_RND_SEED was the answer.


I think there should be some possibility of randomization (auto-setting of
seed) build-in the modules providing random(ized) results. Perhaps a flag
which would turn it on. It can be also an option which would behave like
GRASS_RND_SEED but would have one special value for auto-generating the
seed. (GRASS_RND_SEED if present would override this option.) With the
default value of the option we should ask a question what is actually the
expected behavior of the module giving random results.

This would provide a nicer interface in Python, standard interface in
command line, and possibility to set it in the GUI (which means possibility
to set it for users which don't use command line.) Moreover, it would
provide all users with the way of setting the random seen in the manner
which we consider the best according to our knowledge.

Vaclav
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev