Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)
On Sun, Jul 27, 2014 at 1:58 AM, Glynn Clements gl...@gclements.plus.com wrote: Glynn Clements wrote: I wonder if there are any modules in core or addons which need to be updated. The following are candidates for conversion: Most uses of rand, random, lrand48, etc have been replaced in r61415 and r61416. Thanks for your work. It is good that all the private rand() implementations finally disappeared. I have made a bundled backport to relbranch7 in r61471. (The open issues mentioned in your previous email I am not able to judge.) Markus ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)
On Thu, Jul 31, 2014 at 1:30 PM, Markus Neteler nete...@osgeo.org wrote: On Sun, Jul 27, 2014 at 1:58 AM, Glynn Clements gl...@gclements.plus.com wrote: Glynn Clements wrote: I wonder if there are any modules in core or addons which need to be updated. The following are candidates for conversion: Most uses of rand, random, lrand48, etc have been replaced in r61415 and r61416. Thanks for your work. It is good that all the private rand() implementations finally disappeared. I have made a bundled backport to relbranch7 in r61471. In r61475 (relbranch70) I have bundle backported the changes to r.mapcalc/r3.mapcalc, python interface and wx r.mapcalculator. Markus ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)
Glynn Clements wrote: I wonder if there are any modules in core or addons which need to be updated. The following are candidates for conversion: Most uses of rand, random, lrand48, etc have been replaced in r61415 and r61416. Many of these modules seeded the RNG from either the clock or the PID, with no way to provide an explicit seed. In such cases, the use of G_srand48_auto() has been marked with a FIXME comment. It would be possible to modify G_srand48_auto() to allow the use of an environment variable, but this has problems of its own (e.g. setting the variable manually then forgetting about it). r.li.daemon/daemon.c uses a hard-coded seed of zero. r61416 changes readcell.c in r.proj, i.rectify, and i.ortho.rectify. These all used rand() to select a random cache block for ejection. While this wouldn't have affected the result, only the first RAND_MAX cache blocks would have been used. RAND_MAX is only guaranteed to be at least 32767, which would limit the effective cache size to 1 GiB (each block is 64 * 64 = 4096 doubles = 32kiB). Cases which haven't been changed are: lib/raster3d/test/test_put_get_value_large_file.c. This appears to be a test case; does it matter? include/iostream/quicksort.h uses random() or rand() to select a pivot for the quicksort algorithm. That file has no dependency on lib/gis (or anything else, except for stdlib.h for random/rand), and I didn't want to add one unnecessarily. Again, this shouldn't affect the result, but there may be performance issues if the size of the array being sorted is significantly larger than RAND_MAX (in this situation, the algorithm will be O(n^2) even in the best case). Unless there's a specific reason not to, it may be better to simply replace all uses of that file with std::sort() from algorithm. -- Glynn Clements gl...@gclements.plus.com ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)
On Tue, Jul 22, 2014 at 10:07 PM, Anna Petrášová kratocha...@gmail.com wrote: On Tue, Jul 22, 2014 at 8:14 PM, Glynn Clements gl...@gclements.plus.com wrote: Glynn Clements wrote: I'm inclined to add both an option (to specify a seed, replacing the environment variable) and a flag (to seed from the system clock or whatever), and having the PRNG generate a fatal error if neither of those are used. This is now done. r61350 adds the lrand48/mrand48/drand48 equivalents to lib/gis. Brief testing suggests that the results are identical to those generated by GNU libc (which should be identical to any other POSIX implementation). r61352 changes it to generate a fatal error if used prior to seeding. r61353 changes r.mapcalc so that seeding is performed via seed= or -s. The seed (whether specified by seed= or generated for -s) is added to the history (for r.mapcalc; r3.mapcalc's create_history() function is a stub; do 3D rasters have history?) Note that GRASS_RND_SEED is no longer supported. That was a hack from the time before r.mapcalc used G_parser(). As I write this, it has occurred to me that the behaviour of rand() may be non-deterministic in the presence of certain forms of parallelism, e.g. multiple occurences of rand() in the expression(s) in conjunction with pthreads. Ultimately we may need to expand the PRNG to support explicit state (as per erand48, nrand48 and jrand48). I added the support for -s and seed to the r.mapcalc gui in 61354. Testing is very welcome. I wonder if there are any modules in core or addons which need to be updated. For example all TGRASS tests are affected. Anna -- Glynn Clements gl...@gclements.plus.com ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)
Anna Petrášová wrote: I wonder if there are any modules in core or addons which need to be updated. The following are candidates for conversion: lib/gmath/rand1.cdrand48 raster/r.random/creat_rand.c lrand48 vector/v.kcv/main.c drand48 vector/v.random/main.c drand48 lib/raster/color_rand.c rand imagery/i.ortho.photo/i.ortho.rectify/readcell.c rand imagery/i.rectify/readcell.c rand raster/r.proj/readcell.c rand raster/r.spread/pick_dist.c rand raster/r.spread/pick_ignite.crand vector/v.kcv/main.c rand vector/v.qcount/findquads.c rand vector/v.random/main.c rand r.proj probably doesn't matter, as rand() is only used to choose which block to eject from the cache, so it shouldn't have any effect upon the end result. -- Glynn Clements gl...@gclements.plus.com ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)
Paulo van Breugel wrote: And it seems to be the default behaviour by python/numpy: It is, but ... import numpy as np np.random.random() 0.8351426142559701 np.random.random() 0.4813823441998394 np.random.random() 0.7279314267025369 ... this example doesn't demonstrate that. Any PRNG returns different values for successive calls. The question is whether the PRNG's initial value should autmatically be seeded from some external source of entropy (e.g. the system clock), so that the sequence of values differs on different runs. In turn, that brings up questions about the quality of the entropy source. The ANSI C time() function typically only has one second granularity (indeed, POSIX requires this, as time_t is defined as seconds since the epoch), which is sufficiently course that successive runs may get the same seed. Other functions aren't portable, and even where available, the granularity isn't guaranteed. My main objection to automatic seeding is that people will inevitably produce non-repeatable results without even realising it. One possible solution would be to automatically add the seed to the history of any map generated by r.mapcalc (or possibly only those which use the rand() function). But that would still only help if the creator either provides access to the generated maps, or the output from r.info. Simply providing the commands used and the end result wouldn't help. -- Glynn Clements gl...@gclements.plus.com ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)
Markus Neteler wrote: - if the user needs reproducability, then have a env var to enable that. And when issue of usability doesn't even get considered until a few years later when the user (or a colleague) gets an email suggesting the results can't be be reproduced ...? I'm inclined to add both an option (to specify a seed, replacing the environment variable) and a flag (to seed from the system clock or whatever), and having the PRNG generate a fatal error if neither of those are used. That way, neither of the likely problems can arise by oversight. -- Glynn Clements gl...@gclements.plus.com ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)
On Tue, Jul 22, 2014 at 4:39 PM, Glynn Clements gl...@gclements.plus.com wrote: Paulo van Breugel wrote: And it seems to be the default behaviour by python/numpy: It is, but ... import numpy as np np.random.random() 0.8351426142559701 np.random.random() 0.4813823441998394 np.random.random() 0.7279314267025369 ... this example doesn't demonstrate that. Good point, on my computer I get: import numpy as np np.random.random() 0.49727844715398417 And in different (also freshly started) Python: import numpy as np np.random.random() 0.2457281014919791 Any PRNG returns different values for successive calls. The problem is that user may not see the difference between between two module calls in GRASS command line and two calls of random() function in Python. When calling GRASS module in Python the difference is even less visible. Anyway, the reproducibility would be really nice considering GRASS scientific audience, however are you sure that different systems will give same random number for the same seed? Or do you think about reproducible as as reproducible as possible, e.g. using the same system if necessary. The question is whether the PRNG's initial value should autmatically be seeded from some external source of entropy (e.g. the system clock), so that the sequence of values differs on different runs. In turn, that brings up questions about the quality of the entropy source. The ANSI C time() function typically only has one second granularity (indeed, POSIX requires this, as time_t is defined as seconds since the epoch), which is sufficiently course that successive runs may get the same seed. Other functions aren't portable, and even where available, the granularity isn't guaranteed. What about time + process id? My main objection to automatic seeding is that people will inevitably produce non-repeatable results without even realising it. One possible solution would be to automatically add the seed to the history of any map generated by r.mapcalc (or possibly only those which use the rand() function). But that would still only help if the creator either provides access to the generated maps, or the output from r.info. Simply providing the commands used and the end result wouldn't help. -- Glynn Clements gl...@gclements.plus.com ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)
On Tue, Jul 22, 2014 at 4:58 PM, Glynn Clements gl...@gclements.plus.com wrote: Markus Neteler wrote: - if the user needs reproducability, then have a env var to enable that. And when issue of usability doesn't even get considered until a few years later when the user (or a colleague) gets an email suggesting the results can't be be reproduced ...? I'm inclined to add both an option (to specify a seed, replacing the environment variable) and a flag (to seed from the system clock or whatever), and having the PRNG generate a fatal error if neither of those are used. That way, neither of the likely problems can arise by oversight. This looks very good at first glance. ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)
On 22-07-14 22:58, Glynn Clements wrote: Markus Neteler wrote: - if the user needs reproducability, then have a env var to enable that. And when issue of usability doesn't even get considered until a few years later when the user (or a colleague) gets an email suggesting the results can't be be reproduced ...? I'm inclined to add both an option (to specify a seed, replacing the environment variable) and a flag (to seed from the system clock or whatever), and having the PRNG generate a fatal error if neither of those are used. That way, neither of the likely problems can arise by oversight. I guess there is a lot to say for both approaches, which is why I think the suggestion of Markus is a very good one! +1 from me ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)
On Tue, Jul 22, 2014 at 11:19 PM, Paulo van Breugel wrote: On 22-07-14 22:58, Glynn Clements wrote: And when issue of usability doesn't even get considered until a few years later when the user (or a colleague) gets an email suggesting the results can't be be reproduced ...? I'm inclined to add both an option (to specify a seed, replacing the environment variable) and a flag (to seed from the system clock or whatever), and having the PRNG generate a fatal error if neither of those are used. That way, neither of the likely problems can arise by oversight. I guess there is a lot to say for both approaches, which is why I think the suggestion of Markus is a very good one! +1 from me It is indeed Glynn's suggestion (which I like, too). Markus ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)
On 22-07-14 23:31, Markus Neteler wrote: On Tue, Jul 22, 2014 at 11:19 PM, Paulo van Breugel wrote: On 22-07-14 22:58, Glynn Clements wrote: And when issue of usability doesn't even get considered until a few years later when the user (or a colleague) gets an email suggesting the results can't be be reproduced ...? I'm inclined to add both an option (to specify a seed, replacing the environment variable) and a flag (to seed from the system clock or whatever), and having the PRNG generate a fatal error if neither of those are used. That way, neither of the likely problems can arise by oversight. I guess there is a lot to say for both approaches, which is why I think the suggestion of Markus is a very good one! +1 from me It is indeed Glynn's suggestion (which I like, too). Sorry, never seem to get used to how my email program displays the treads... good suggestion by Glynn I mean.. Glynn, it would be really great if you could implement it that way. Markus ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)
Vaclav Petras wrote: Anyway, the reproducibility would be really nice considering GRASS scientific audience, however are you sure that different systems will give same random number for the same seed? They will from now on, because I've replaced the use of the system's PRNG (either rand or mrand48/drand48) with a portable implementation of the latter. What about time + process id? That's what's done now (if -s is used). Although we could probably do with a better hash (currently, it's just addition) and/or more entropy sources. -- Glynn Clements gl...@gclements.plus.com ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)
Glynn Clements wrote: I'm inclined to add both an option (to specify a seed, replacing the environment variable) and a flag (to seed from the system clock or whatever), and having the PRNG generate a fatal error if neither of those are used. This is now done. r61350 adds the lrand48/mrand48/drand48 equivalents to lib/gis. Brief testing suggests that the results are identical to those generated by GNU libc (which should be identical to any other POSIX implementation). r61352 changes it to generate a fatal error if used prior to seeding. r61353 changes r.mapcalc so that seeding is performed via seed= or -s. The seed (whether specified by seed= or generated for -s) is added to the history (for r.mapcalc; r3.mapcalc's create_history() function is a stub; do 3D rasters have history?) Note that GRASS_RND_SEED is no longer supported. That was a hack from the time before r.mapcalc used G_parser(). As I write this, it has occurred to me that the behaviour of rand() may be non-deterministic in the presence of certain forms of parallelism, e.g. multiple occurences of rand() in the expression(s) in conjunction with pthreads. Ultimately we may need to expand the PRNG to support explicit state (as per erand48, nrand48 and jrand48). -- Glynn Clements gl...@gclements.plus.com ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)
On Tue, Jul 22, 2014 at 8:14 PM, Glynn Clements gl...@gclements.plus.com wrote: Glynn Clements wrote: I'm inclined to add both an option (to specify a seed, replacing the environment variable) and a flag (to seed from the system clock or whatever), and having the PRNG generate a fatal error if neither of those are used. This is now done. r61350 adds the lrand48/mrand48/drand48 equivalents to lib/gis. Brief testing suggests that the results are identical to those generated by GNU libc (which should be identical to any other POSIX implementation). r61352 changes it to generate a fatal error if used prior to seeding. r61353 changes r.mapcalc so that seeding is performed via seed= or -s. The seed (whether specified by seed= or generated for -s) is added to the history (for r.mapcalc; r3.mapcalc's create_history() function is a stub; do 3D rasters have history?) Note that GRASS_RND_SEED is no longer supported. That was a hack from the time before r.mapcalc used G_parser(). As I write this, it has occurred to me that the behaviour of rand() may be non-deterministic in the presence of certain forms of parallelism, e.g. multiple occurences of rand() in the expression(s) in conjunction with pthreads. Ultimately we may need to expand the PRNG to support explicit state (as per erand48, nrand48 and jrand48). I added the support for -s and seed to the r.mapcalc gui in 61354. Testing is very welcome. I wonder if there are any modules in core or addons which need to be updated. Anna -- Glynn Clements gl...@gclements.plus.com ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)
On Tue, Jul 22, 2014 at 8:14 PM, Glynn Clements gl...@gclements.plus.com wrote: r61353 changes r.mapcalc so that seeding is performed via seed= or -s. The seed (whether specified by seed= or generated for -s) is added to the history (for r.mapcalc; r3.mapcalc's create_history() function is a stub; do 3D rasters have history?) I added test for r61353 in r61355. Tests are only for r.mapcalc not for r3.mapcalc. http://trac.osgeo.org/grass/changeset/61355 As I write this, it has occurred to me that the behaviour of rand() may be non-deterministic in the presence of certain forms of parallelism, e.g. multiple occurences of rand() in the expression(s) in conjunction with pthreads. Ultimately we may need to expand the PRNG to support explicit state (as per erand48, nrand48 and jrand48). The tests are not testing any of this (at least not explicitly), contributions welcome. ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)
On Sun, Jul 6, 2014 at 12:25 AM, Glynn Clements gl...@gclements.plus.com wrote: Glynn Clements gl...@gclements.plus.com wrote: ... In ticket #2272, I attached a portable implementation of lrand48(). If desired, we could add this to libgis and use that in preference to any implementation-specific PRNG. This would be excellent. If you want a different result each time, set GRASS_RND_SEED to a different value each time, e.g. IMHO this is not intuitive at all. I would suggest to invert the behaviour for GRASS 7: - per default generate random numbers which differ, - if the user needs reproducability, then have a env var to enable that. The main thing is that I believe that reproducibility should be the default. I humbly disagree. This is not what the user expects. It is also the opposite of how for example R behaves: R runif(1) [1] 0.5624295 runif(1) [1] 0.1683853 http://en.wikibooks.org/wiki/R_Programming/Random_Number_Generation#Seed If you want to perform an exact replication of your program, you have to specify the seed using the function set.seed(). If people have to take explicit action to introduce randomness, The problem is that most will not even realize the current behaviour of rand(). they're more likely to consider the issues involved. If randomised seeds are the default, the lack of reproducibility may not be considered until it is too late. The R community (and some users here) think the opposite... when you ask for rand() then you expect a random number. Just to avoid this: https://xkcd.com/221/ Markus ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)
On 21-07-14 19:01, Markus Neteler wrote: On Sun, Jul 6, 2014 at 12:25 AM, Glynn Clements gl...@gclements.plus.com wrote: Glynn Clements gl...@gclements.plus.com wrote: ... In ticket #2272, I attached a portable implementation of lrand48(). If desired, we could add this to libgis and use that in preference to any implementation-specific PRNG. This would be excellent. If you want a different result each time, set GRASS_RND_SEED to a different value each time, e.g. IMHO this is not intuitive at all. I would suggest to invert the behaviour for GRASS 7: - per default generate random numbers which differ, - if the user needs reproducability, then have a env var to enable that. The main thing is that I believe that reproducibility should be the default. I humbly disagree. This is not what the user expects. It is also the opposite of how for example R behaves: R runif(1) [1] 0.5624295 runif(1) [1] 0.1683853 http://en.wikibooks.org/wiki/R_Programming/Random_Number_Generation#Seed If you want to perform an exact replication of your program, you have to specify the seed using the function set.seed(). If people have to take explicit action to introduce randomness, The problem is that most will not even realize the current behaviour of rand(). they're more likely to consider the issues involved. If randomised seeds are the default, the lack of reproducibility may not be considered until it is too late. The R community (and some users here) think the opposite... when you ask for rand() then you expect a random number. And not only the R community I am sure. In all statistical packages I have ever worked with one can see the same behaviour, a random number is random (i.e., each time a different seed), unless the seed is explicitly defined by the user. And it seems to be the default behaviour by python/numpy: import numpy as np np.random.random() 0.8351426142559701 np.random.random() 0.4813823441998394 np.random.random() 0.7279314267025369 Just to avoid this: https://xkcd.com/221/ Markus ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)
Vaclav Petras wrote: Shouldn't the seed not be generated on e.g, OS time, which would ensure that each run would give a different result? No. The reason is to provide reproducibility. Anyone running the same command with the same data should obtain the same result. Does the reproducibility go behind one operating system, compiler or library? If drand48() is used, yes. If rand() is used, no. I don't think that the first random number is specified by the C language standard. The C standard doesn't specify any particular implementation for rand() (it does give an example implementation, but it only produces 15-bit values). It does specify that if the PRNG isn't explicitly seeded, the behaviour is as if srand(1) was called beforehand. [§7.20.2.2p2] IOW, the sequence of results is implementation-dependent, but it may not change from one run to the next unless the program explicitly seeds the PRNG with a non-deterministic value such as the current time. If the results would be really reproducible it would be good for testing framework but I'm afraid that they are not (with my limited knowledge about the topic). In ticket #2272, I attached a portable implementation of lrand48(). If desired, we could add this to libgis and use that in preference to any implementation-specific PRNG. If you want a different result each time, set GRASS_RND_SEED to a different value each time, e.g. GRASS_RND_SEED=`date +%N` r.mapcalc a = rand(0,100) [%N is the nanoseconds portion of the current time; this is a GNU extension.] I've heard that this is not enough on powerful computers/clusters, that you have to use also PID because nanoseconds might be the same (I think I rememberer that it was nanoseconds not seconds). The main issue is on systems where the reported time only changes in increments of a scheduler tick (e.g. 10ms on old versions of Linux). On a related note, it would be nice to be able to set the seed (I think there has been such a request before, but not sure about the answer at that time). GRASS_RND_SEED was the answer. I think there should be some possibility of randomization (auto-setting of seed) build-in the modules providing random(ized) results. Perhaps a flag which would turn it on. It can be also an option which would behave like GRASS_RND_SEED but would have one special value for auto-generating the seed. (GRASS_RND_SEED if present would override this option.) With the default value of the option we should ask a question what is actually the expected behavior of the module giving random results. That's certainly reasonable. The main thing is that I believe that reproducibility should be the default. If people have to take explicit action to introduce randomness, they're more likely to consider the issues involved. If randomised seeds are the default, the lack of reproducibility may not be considered until it is too late. -- Glynn Clements gl...@gclements.plus.com ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)
Paulo van Breugel wrote: Just a quick additional question, how to set this GRASS_RND_SEED from within a python script (I want to add the option to set the seed with a seed parameter in my script, as suggested in the previous email). You can modify os.environ prior to calling it, e.g. import time import grass.script as grass ... t = int(time.time() * 1e9) % (2**31) os.environ['GRASS_RND_SEED'] = '%d' % t grass.mapcalc(...) -- Glynn Clements gl...@gclements.plus.com ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)
On Sun, Jul 6, 2014 at 12:34 AM, Glynn Clements gl...@gclements.plus.com wrote: Paulo van Breugel wrote: Just a quick additional question, how to set this GRASS_RND_SEED from within a python script (I want to add the option to set the seed with a seed parameter in my script, as suggested in the previous email). You can modify os.environ prior to calling it, e.g. import time import grass.script as grass ... t = int(time.time() * 1e9) % (2**31) os.environ['GRASS_RND_SEED'] = '%d' % t grass.mapcalc(...) Hi, thanks.. I found out the solution after a bit of diving into the documentation. I btw still think the default should be to have a random seed as I think that is what most people would expect (I did, but after running a function for a day and night, I found out I was wrong). But anyway, it ultimately comes down to preference, so most important I think is if the user has a clear choice available in both the gui and on the command line. If that could be implemented, either way, that would be great. -- Glynn Clements gl...@gclements.plus.com ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)
On Thu, Jul 3, 2014 at 9:39 AM, Paulo van Breugel p.vanbreu...@gmail.com wrote: Just a quick additional question, how to set this GRASS_RND_SEED from within a python script (I want to add the option to set the seed with a seed parameter in my script, as suggested in the previous email). Concerning the question above, I found out how to do so. I used it in my r.random.weight script (in grass 7 addons svn). This script uses the rand() function in r.mapcalc. But rather than using the same seed (1), there is the option to set the seed, while as default the a time-dependent seed is set. I am sure there are better ways to do this, but it works. On Thu, Jul 3, 2014 at 8:55 AM, Paulo van Breugel p.vanbreu...@gmail.com wrote: On 03-07-14 03:43, Vaclav Petras wrote: On Wed, Jul 2, 2014 at 8:15 PM, Glynn Clements gl...@gclements.plus.com wrote: Shouldn't the seed not be generated on e.g, OS time, which would ensure that each run would give a different result? No. The reason is to provide reproducibility. Anyone running the same command with the same data should obtain the same result. It is certainly be good to be able to reproduce commands. However, I think in most (statistical) software the default / expected behaviour is to have a new automatically generated seed at each run. In R for example, if you have to explicitly specify the seed using the function set.seed(). I would think therefore what most users will expect a similar behaviour in GRASS. It would certainly be my personal preference to have the option to set the seed explicitly if you want reproducibility, but have it generated automatically otherwise. But that is just a personal preference. Does the reproducibility go behind one operating system, compiler or library? I don't think that the first random number is specified by the C language standard. If the results would be really reproducible it would be good for testing framework but I'm afraid that they are not (with my limited knowledge about the topic). If you want a different result each time, set GRASS_RND_SEED to a different value each time, e.g. GRASS_RND_SEED=`date +%N` r.mapcalc a = rand(0,100) [%N is the nanoseconds portion of the current time; this is a GNU extension.] Perhaps this can be explained like this in the manual page? A far better option would be to provide this as a normal parameter so it can be set from the gui interface or command line like any other variable. I've heard that this is not enough on powerful computers/clusters, that you have to use also PID because nanoseconds might be the same (I think I rememberer that it was nanoseconds not seconds). On a related note, it would be nice to be able to set the seed (I think there has been such a request before, but not sure about the answer at that time). GRASS_RND_SEED was the answer. I think there should be some possibility of randomization (auto-setting of seed) build-in the modules providing random(ized) results. Perhaps a flag which would turn it on. It can be also an option which would behave like GRASS_RND_SEED but would have one special value for auto-generating the seed. (GRASS_RND_SEED if present would override this option.) With the default value of the option we should ask a question what is actually the expected behavior of the module giving random results. Yes, that would be great. As for the default value, see my earlier argument. This would provide a nicer interface in Python, standard interface in command line, and possibility to set it in the GUI (which means possibility to set it for users which don't use command line.) Moreover, it would provide all users with the way of setting the random seen in the manner which we consider the best according to our knowledge. Agree. The way to set the seed now may not be understood by everybody and with all the work going into streamlining the GUI, this kind of fairly important options should also be available through the GUI Vaclav ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)
On 03-07-14 03:43, Vaclav Petras wrote: On Wed, Jul 2, 2014 at 8:15 PM, Glynn Clements gl...@gclements.plus.com mailto:gl...@gclements.plus.com wrote: Shouldn't the seed not be generated on e.g, OS time, which would ensure that each run would give a different result? No. The reason is to provide reproducibility. Anyone running the same command with the same data should obtain the same result. It is certainly be good to be able to reproduce commands. However, I think in most (statistical) software the default / expected behaviour is to have a new automatically generated seed at each run. In R for example, if you have to explicitly specify the seed using the function set.seed(). I would think therefore what most users will expect a similar behaviour in GRASS. It would certainly be my personal preference to have the option to set the seed explicitly if you want reproducibility, but have it generated automatically otherwise. But that is just a personal preference. Does the reproducibility go behind one operating system, compiler or library? I don't think that the first random number is specified by the C language standard. If the results would be really reproducible it would be good for testing framework but I'm afraid that they are not (with my limited knowledge about the topic). If you want a different result each time, set GRASS_RND_SEED to a different value each time, e.g. GRASS_RND_SEED=`date +%N` r.mapcalc a = rand(0,100) [%N is the nanoseconds portion of the current time; this is a GNU extension.] Perhaps this can be explained like this in the manual page? A far better option would be to provide this as a normal parameter so it can be set from the gui interface or command line like any other variable. I've heard that this is not enough on powerful computers/clusters, that you have to use also PID because nanoseconds might be the same (I think I rememberer that it was nanoseconds not seconds). On a related note, it would be nice to be able to set the seed (I think there has been such a request before, but not sure about the answer at that time). GRASS_RND_SEED was the answer. I think there should be some possibility of randomization (auto-setting of seed) build-in the modules providing random(ized) results. Perhaps a flag which would turn it on. It can be also an option which would behave like GRASS_RND_SEED but would have one special value for auto-generating the seed. (GRASS_RND_SEED if present would override this option.) With the default value of the option we should ask a question what is actually the expected behavior of the module giving random results. Yes, that would be great. As for the default value, see my earlier argument. This would provide a nicer interface in Python, standard interface in command line, and possibility to set it in the GUI (which means possibility to set it for users which don't use command line.) Moreover, it would provide all users with the way of setting the random seen in the manner which we consider the best according to our knowledge. Agree. The way to set the seed now may not be understood by everybody and with all the work going into streamlining the GUI, this kind of fairly important options should also be available through the GUI Vaclav ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)
Just a quick additional question, how to set this GRASS_RND_SEED from within a python script (I want to add the option to set the seed with a seed parameter in my script, as suggested in the previous email). On Thu, Jul 3, 2014 at 8:55 AM, Paulo van Breugel p.vanbreu...@gmail.com wrote: On 03-07-14 03:43, Vaclav Petras wrote: On Wed, Jul 2, 2014 at 8:15 PM, Glynn Clements gl...@gclements.plus.com wrote: Shouldn't the seed not be generated on e.g, OS time, which would ensure that each run would give a different result? No. The reason is to provide reproducibility. Anyone running the same command with the same data should obtain the same result. It is certainly be good to be able to reproduce commands. However, I think in most (statistical) software the default / expected behaviour is to have a new automatically generated seed at each run. In R for example, if you have to explicitly specify the seed using the function set.seed(). I would think therefore what most users will expect a similar behaviour in GRASS. It would certainly be my personal preference to have the option to set the seed explicitly if you want reproducibility, but have it generated automatically otherwise. But that is just a personal preference. Does the reproducibility go behind one operating system, compiler or library? I don't think that the first random number is specified by the C language standard. If the results would be really reproducible it would be good for testing framework but I'm afraid that they are not (with my limited knowledge about the topic). If you want a different result each time, set GRASS_RND_SEED to a different value each time, e.g. GRASS_RND_SEED=`date +%N` r.mapcalc a = rand(0,100) [%N is the nanoseconds portion of the current time; this is a GNU extension.] Perhaps this can be explained like this in the manual page? A far better option would be to provide this as a normal parameter so it can be set from the gui interface or command line like any other variable. I've heard that this is not enough on powerful computers/clusters, that you have to use also PID because nanoseconds might be the same (I think I rememberer that it was nanoseconds not seconds). On a related note, it would be nice to be able to set the seed (I think there has been such a request before, but not sure about the answer at that time). GRASS_RND_SEED was the answer. I think there should be some possibility of randomization (auto-setting of seed) build-in the modules providing random(ized) results. Perhaps a flag which would turn it on. It can be also an option which would behave like GRASS_RND_SEED but would have one special value for auto-generating the seed. (GRASS_RND_SEED if present would override this option.) With the default value of the option we should ask a question what is actually the expected behavior of the module giving random results. Yes, that would be great. As for the default value, see my earlier argument. This would provide a nicer interface in Python, standard interface in command line, and possibility to set it in the GUI (which means possibility to set it for users which don't use command line.) Moreover, it would provide all users with the way of setting the random seen in the manner which we consider the best according to our knowledge. Agree. The way to set the seed now may not be understood by everybody and with all the work going into streamlining the GUI, this kind of fairly important options should also be available through the GUI Vaclav ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)
Paulo van Breugel wrote: When I run several times e.g., r.mapcalc a = rand(0,100) I am always getting exactly the same layer. In the help file it reads: The environment variable GRASS_RND_SEED is read to initialize the random number generator But what does it mean. The value of that environment variable is parsed using atol() and the result used to seed the PRNG (via srand() or srand48()) (setup_rand() in r.mapcalc/evaluate.c). If the variable isn't set, the PRNG isn't explicitly seeded. For rand(), the result should be equivalent to GRASS_RND_SEED=1. Shouldn't the seed not be generated on e.g, OS time, which would ensure that each run would give a different result? No. The reason is to provide reproducibility. Anyone running the same command with the same data should obtain the same result. If you want a different result each time, set GRASS_RND_SEED to a different value each time, e.g. GRASS_RND_SEED=`date +%N` r.mapcalc a = rand(0,100) [%N is the nanoseconds portion of the current time; this is a GNU extension.] On a related note, it would be nice to be able to set the seed (I think there has been such a request before, but not sure about the answer at that time). GRASS_RND_SEED was the answer. -- Glynn Clements gl...@gclements.plus.com ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] using rand(x,y) in r.mapcalc (grass7)
On Wed, Jul 2, 2014 at 8:15 PM, Glynn Clements gl...@gclements.plus.com wrote: Shouldn't the seed not be generated on e.g, OS time, which would ensure that each run would give a different result? No. The reason is to provide reproducibility. Anyone running the same command with the same data should obtain the same result. Does the reproducibility go behind one operating system, compiler or library? I don't think that the first random number is specified by the C language standard. If the results would be really reproducible it would be good for testing framework but I'm afraid that they are not (with my limited knowledge about the topic). If you want a different result each time, set GRASS_RND_SEED to a different value each time, e.g. GRASS_RND_SEED=`date +%N` r.mapcalc a = rand(0,100) [%N is the nanoseconds portion of the current time; this is a GNU extension.] I've heard that this is not enough on powerful computers/clusters, that you have to use also PID because nanoseconds might be the same (I think I rememberer that it was nanoseconds not seconds). On a related note, it would be nice to be able to set the seed (I think there has been such a request before, but not sure about the answer at that time). GRASS_RND_SEED was the answer. I think there should be some possibility of randomization (auto-setting of seed) build-in the modules providing random(ized) results. Perhaps a flag which would turn it on. It can be also an option which would behave like GRASS_RND_SEED but would have one special value for auto-generating the seed. (GRASS_RND_SEED if present would override this option.) With the default value of the option we should ask a question what is actually the expected behavior of the module giving random results. This would provide a nicer interface in Python, standard interface in command line, and possibility to set it in the GUI (which means possibility to set it for users which don't use command line.) Moreover, it would provide all users with the way of setting the random seen in the manner which we consider the best according to our knowledge. Vaclav ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev