Re: [Numpy-discussion] Scipy 2016 attending

2016-05-18 Thread Ryan May
Yup.

On Wed, May 18, 2016 at 5:04 PM, Steve Waterbury 
wrote:

> Me 3!  ;)
>
> Steve
>
>
> On 05/18/2016 06:03 PM, Nathaniel Smith wrote:
>
> Me too.
>
> On Wed, May 18, 2016 at 3:02 PM, Chris Barker  
>  wrote:
>
> I'll be there.
>
> -CHB
>
>
> On Wed, May 18, 2016 at 2:09 PM, Charles R Harris 
>  wrote:
>
> Hi All,
>
> Out of curiosity, who all here intends to be at Scipy 2016?
>
> Chuck
>
> ___
> NumPy-Discussion mailing 
> listNumPy-Discussion@scipy.orghttps://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
> chris.bar...@noaa.gov
>
> ___
> NumPy-Discussion mailing 
> listNumPy-Discussion@scipy.orghttps://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
Ryan May
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Scipy 2016 attending

2016-05-18 Thread Steve Waterbury

Me 3!  ;)

Steve

On 05/18/2016 06:03 PM, Nathaniel Smith wrote:

Me too.

On Wed, May 18, 2016 at 3:02 PM, Chris Barker  wrote:

I'll be there.

-CHB


On Wed, May 18, 2016 at 2:09 PM, Charles R Harris
 wrote:

Hi All,

Out of curiosity, who all here intends to be at Scipy 2016?

Chuck

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion




--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion






___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Scipy 2016 attending

2016-05-18 Thread Nathaniel Smith
Me too.

On Wed, May 18, 2016 at 3:02 PM, Chris Barker  wrote:
> I'll be there.
>
> -CHB
>
>
> On Wed, May 18, 2016 at 2:09 PM, Charles R Harris
>  wrote:
>>
>> Hi All,
>>
>> Out of curiosity, who all here intends to be at Scipy 2016?
>>
>> Chuck
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
Nathaniel J. Smith -- https://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Scipy 2016 attending

2016-05-18 Thread Chris Barker
I'll be there.

-CHB


On Wed, May 18, 2016 at 2:09 PM, Charles R Harris  wrote:

> Hi All,
>
> Out of curiosity, who all here intends to be at Scipy 2016?
>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Scipy 2016 attending

2016-05-18 Thread Charles R Harris
Hi All,

Out of curiosity, who all here intends to be at Scipy 2016?

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: numpy.random.random_seed

2016-05-18 Thread Nathaniel Smith
On Wed, May 18, 2016 at 5:07 AM, Robert Kern  wrote:
> On Wed, May 18, 2016 at 1:14 AM, Nathaniel Smith  wrote:
>>
>> On Tue, May 17, 2016 at 10:41 AM, Robert Kern 
>> wrote:
>> > On Tue, May 17, 2016 at 6:24 PM, Nathaniel Smith  wrote:
>> >>
>> >> On May 17, 2016 1:50 AM, "Robert Kern"  wrote:
>> >> >
>> >> [...]
>> >> > What you want is a function that returns many RandomState objects
>> >> > that
>> >> > are hopefully spread around the MT19937 space enough that they are
>> >> > essentially independent (in the absence of true jumpahead). The
>> >> > better
>> >> > implementation of such a function would look something like this:
>> >> >
>> >> > def spread_out_prngs(n, root_prng=None):
>> >> > if root_prng is None:
>> >> > root_prng = np.random
>> >> > elif not isinstance(root_prng, np.random.RandomState):
>> >> > root_prng = np.random.RandomState(root_prng)
>> >> > sprouted_prngs = []
>> >> > for i in range(n):
>> >> > seed_array = root_prng.randint(1<<32, size=624)  #
>> >> > dtype=np.uint32 under 1.11
>> >> > sprouted_prngs.append(np.random.RandomState(seed_array))
>> >> > return spourted_prngs
>> >>
>> >> Maybe a nice way to encapsulate this in the RandomState interface would
>> >> be
>> >> a method RandomState.random_state() that generates and returns a new
>> >> child
>> >> RandomState.
>> >
>> > I disagree. This is a workaround in the absence of proper jumpahead or
>> > guaranteed-independent streams. I would not encourage it.
>> >
>> >> > Internally, this generates seed arrays of about the size of the
>> >> > MT19937
>> >> > state so make sure that you can access more of the state space. That
>> >> > will at
>> >> > least make the chance of collision tiny. And it can be easily
>> >> > rewritten to
>> >> > take advantage of one of the newer PRNGs that have true independent
>> >> > streams:
>> >> >
>> >> >   https://github.com/bashtage/ng-numpy-randomstate
>> >>
>> >> ... But unfortunately I'm not sure how to make my interface suggestion
>> >> above work on top of one of these RNGs, because for
>> >> RandomState.random_state
>> >> you really want a tree of independent RNGs and the fancy new PRNGs only
>> >> provide a single flat namespace :-/. And even more annoyingly, the tree
>> >> API
>> >> is actually a nicer API, because with a flat namespace you have to know
>> >> up
>> >> front about all possible RNGs your code will use, which is an
>> >> unfortunate
>> >> global coupling that makes it difficult to compose programs out of
>> >> independent pieces, while the RandomState.random_state approach
>> >> composes
>> >> beautifully. Maybe there's some clever way to allocate a 64-bit
>> >> namespace to
>> >> make it look tree-like? I'm not sure 64 bits is really enough...
>> >
>> > MT19937 doesn't have a "tree" any more than the others. It's the same
>> > flat
>> > state space. You are just getting the illusion of a tree by hoping that
>> > you
>> > never collide. You ought to think about precisely the same global
>> > coupling
>> > issues with MT19937 as you do with guaranteed-independent streams.
>> > Hope-and-prayer isn't really a substitute for properly engineering your
>> > problem. It's just a moral hazard to promote this method to the main
>> > API.
>>
>> Nonsense.
>>
>> If your definition of "hope and prayer" includes assuming that we
>> won't encounter a random collision in a 2**19937 state space, then
>> literally all engineering is hope-and-prayer. A collision could
>> happen, but if it does it's overwhelmingly more likely to happen
>> because of a flaw in the mathematical analysis, or a bug in the
>> implementation, or because random quantum fluctuations caused you and
>> your program to suddenly be transported to a parallel world where 1 +
>> 1 = 1, than that you just got unlucky with your random state. And all
>> of these hazards apply equally to both MT19937 and more modern PRNGs.
>
> Granted.
>
>> ...anyway, the real reason I'm a bit grumpy is because there are solid
>> engineering reasons why users *want* this API,
>
> I remain unconvinced on this mark. Grumpily.

Sorry for getting grumpy :-). The engineering reasons seem pretty
obvious to me though? If you have any use case for independent streams
at all, and you're writing code that's intended to live inside a
library's abstraction barrier, then you need some way to choose your
streams to avoid colliding with arbitrary other code that the end-user
might assemble alongside yours as part of their final program. So
AFAICT you have two options: either you need a "tree-style" API for
allocating these streams, or else you need to add some explicit API to
your library that lets the end-user control in detail which streams
you use. Both are possible, but the latter is obviously undesireable
if you can avoid it, since it breaks the abstraction barrier, making
your library more complicated to 

Re: [Numpy-discussion] Proposal: numpy.random.random_seed

2016-05-18 Thread Robert Kern
On Wed, May 18, 2016 at 6:20 PM,  wrote:
>
> On Wed, May 18, 2016 at 12:01 PM, Robert Kern 
wrote:
>>
>> On Wed, May 18, 2016 at 4:50 PM, Chris Barker 
wrote:
>> >>
>> >> > ...anyway, the real reason I'm a bit grumpy is because there are
solid
>> >> > engineering reasons why users *want* this API,
>> >
>> > Honestly, I am lost in the math -- but like any good engineer, I want
to accomplish something anyway :-) I trust you guys to get this right -- or
at least document what's "wrong" with it.
>> >
>> > But, if I'm reading the use case that started all this correctly, it
closely matches my use-case. That is, I have a complex model with multiple
independent "random" processes. And we want to be able to re-produce
EXACTLY simulations -- our users get confused when the results are
"different" even if in a statistically insignificant way.
>> >
>> > At the moment we are using one RNG, with one seed for everything. So
we get reproducible results, but if one thing is changed, then the entire
simulation is different -- which is OK, but it would be nicer to have each
process using its own RNG stream with it's own seed. However, it matters
not one whit if those seeds are independent -- the processes are different,
you'd never notice if they were using the same PRN stream -- because they
are used differently. So a "fairly low probability of a clash" would be
totally fine.
>>
>> Well, the main question is: do you need to be able to spawn dependent
streams at arbitrary points to an arbitrary depth without coordination
between processes? The necessity for multiple independent streams per se is
not contentious.
>
> I'm similar to Chris, and didn't try to figure out the details of what
you are talking about.
>
> However, if there are functions getting into numpy that help in using a
best practice even if it's not bullet proof, then it's still better than
home made approaches.
> If it get's in soon, then we can use it in a few years (given dependency
lag). At that point there should be more distributed, nested simulation
based algorithms where we don't know in advance how far we have to go to
get reliable numbers or convergence.
>
> (But I don't see anything like that right now.)

Current best practice is to use PRNGs with settable streams (or fixed
jumpahead for those PRNGs cursed to not have settable streams but blessed
to have super-long periods). The way to get those into numpy is to help
Kevin Sheppard finish:

  https://github.com/bashtage/ng-numpy-randomstate

He's done nearly all of the hard work already.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: numpy.random.random_seed

2016-05-18 Thread josef.pktd
On Wed, May 18, 2016 at 12:01 PM, Robert Kern  wrote:

> On Wed, May 18, 2016 at 4:50 PM, Chris Barker 
> wrote:
> >>
> >> > ...anyway, the real reason I'm a bit grumpy is because there are solid
> >> > engineering reasons why users *want* this API,
> >
> > Honestly, I am lost in the math -- but like any good engineer, I want to
> accomplish something anyway :-) I trust you guys to get this right -- or at
> least document what's "wrong" with it.
> >
> > But, if I'm reading the use case that started all this correctly, it
> closely matches my use-case. That is, I have a complex model with multiple
> independent "random" processes. And we want to be able to re-produce
> EXACTLY simulations -- our users get confused when the results are
> "different" even if in a statistically insignificant way.
> >
> > At the moment we are using one RNG, with one seed for everything. So we
> get reproducible results, but if one thing is changed, then the entire
> simulation is different -- which is OK, but it would be nicer to have each
> process using its own RNG stream with it's own seed. However, it matters
> not one whit if those seeds are independent -- the processes are different,
> you'd never notice if they were using the same PRN stream -- because they
> are used differently. So a "fairly low probability of a clash" would be
> totally fine.
>
> Well, the main question is: do you need to be able to spawn dependent
> streams at arbitrary points to an arbitrary depth without coordination
> between processes? The necessity for multiple independent streams per se is
> not contentious.
>


I'm similar to Chris, and didn't try to figure out the details of what you
are talking about.

However, if there are functions getting into numpy that help in using a
best practice even if it's not bullet proof, then it's still better than
home made approaches.
If it get's in soon, then we can use it in a few years (given dependency
lag). At that point there should be more distributed, nested simulation
based algorithms where we don't know in advance how far we have to go to
get reliable numbers or convergence.

(But I don't see anything like that right now.)

Josef



>
> --
> Robert Kern
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: numpy.random.random_seed

2016-05-18 Thread Robert Kern
On Wed, May 18, 2016 at 4:50 PM, Chris Barker  wrote:
>>
>> > ...anyway, the real reason I'm a bit grumpy is because there are solid
>> > engineering reasons why users *want* this API,
>
> Honestly, I am lost in the math -- but like any good engineer, I want to
accomplish something anyway :-) I trust you guys to get this right -- or at
least document what's "wrong" with it.
>
> But, if I'm reading the use case that started all this correctly, it
closely matches my use-case. That is, I have a complex model with multiple
independent "random" processes. And we want to be able to re-produce
EXACTLY simulations -- our users get confused when the results are
"different" even if in a statistically insignificant way.
>
> At the moment we are using one RNG, with one seed for everything. So we
get reproducible results, but if one thing is changed, then the entire
simulation is different -- which is OK, but it would be nicer to have each
process using its own RNG stream with it's own seed. However, it matters
not one whit if those seeds are independent -- the processes are different,
you'd never notice if they were using the same PRN stream -- because they
are used differently. So a "fairly low probability of a clash" would be
totally fine.

Well, the main question is: do you need to be able to spawn dependent
streams at arbitrary points to an arbitrary depth without coordination
between processes? The necessity for multiple independent streams per se is
not contentious.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: numpy.random.random_seed

2016-05-18 Thread Chris Barker
>
> > ...anyway, the real reason I'm a bit grumpy is because there are solid
> > engineering reasons why users *want* this API,
>

Honestly, I am lost in the math -- but like any good engineer, I want to
accomplish something anyway :-) I trust you guys to get this right -- or at
least document what's "wrong" with it.

But, if I'm reading the use case that started all this correctly, it
closely matches my use-case. That is, I have a complex model with multiple
independent "random" processes. And we want to be able to re-produce
EXACTLY simulations -- our users get confused when the results are
"different" even if in a statistically insignificant way.

At the moment we are using one RNG, with one seed for everything. So we get
reproducible results, but if one thing is changed, then the entire
simulation is different -- which is OK, but it would be nicer to have each
process using its own RNG stream with it's own seed. However, it matters
not one whit if those seeds are independent -- the processes are different,
you'd never notice if they were using the same PRN stream -- because they
are used differently. So a "fairly low probability of a clash" would be
totally fine.

Granted, in a Monte Carlo simulation, it could be disastrous... :-)

I guess the point is -- do something reasonable, and document its
limitations, and we're all fine :-)

And thanks for giving your attention to this.

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: numpy.random.random_seed

2016-05-18 Thread Robert Kern
On Wed, May 18, 2016 at 1:14 AM, Nathaniel Smith  wrote:
>
> On Tue, May 17, 2016 at 10:41 AM, Robert Kern 
wrote:
> > On Tue, May 17, 2016 at 6:24 PM, Nathaniel Smith  wrote:
> >>
> >> On May 17, 2016 1:50 AM, "Robert Kern"  wrote:
> >> >
> >> [...]
> >> > What you want is a function that returns many RandomState objects
that
> >> > are hopefully spread around the MT19937 space enough that they are
> >> > essentially independent (in the absence of true jumpahead). The
better
> >> > implementation of such a function would look something like this:
> >> >
> >> > def spread_out_prngs(n, root_prng=None):
> >> > if root_prng is None:
> >> > root_prng = np.random
> >> > elif not isinstance(root_prng, np.random.RandomState):
> >> > root_prng = np.random.RandomState(root_prng)
> >> > sprouted_prngs = []
> >> > for i in range(n):
> >> > seed_array = root_prng.randint(1<<32, size=624)  #
> >> > dtype=np.uint32 under 1.11
> >> > sprouted_prngs.append(np.random.RandomState(seed_array))
> >> > return spourted_prngs
> >>
> >> Maybe a nice way to encapsulate this in the RandomState interface
would be
> >> a method RandomState.random_state() that generates and returns a new
child
> >> RandomState.
> >
> > I disagree. This is a workaround in the absence of proper jumpahead or
> > guaranteed-independent streams. I would not encourage it.
> >
> >> > Internally, this generates seed arrays of about the size of the
MT19937
> >> > state so make sure that you can access more of the state space. That
will at
> >> > least make the chance of collision tiny. And it can be easily
rewritten to
> >> > take advantage of one of the newer PRNGs that have true independent
streams:
> >> >
> >> >   https://github.com/bashtage/ng-numpy-randomstate
> >>
> >> ... But unfortunately I'm not sure how to make my interface suggestion
> >> above work on top of one of these RNGs, because for
RandomState.random_state
> >> you really want a tree of independent RNGs and the fancy new PRNGs only
> >> provide a single flat namespace :-/. And even more annoyingly, the
tree API
> >> is actually a nicer API, because with a flat namespace you have to
know up
> >> front about all possible RNGs your code will use, which is an
unfortunate
> >> global coupling that makes it difficult to compose programs out of
> >> independent pieces, while the RandomState.random_state approach
composes
> >> beautifully. Maybe there's some clever way to allocate a 64-bit
namespace to
> >> make it look tree-like? I'm not sure 64 bits is really enough...
> >
> > MT19937 doesn't have a "tree" any more than the others. It's the same
flat
> > state space. You are just getting the illusion of a tree by hoping that
you
> > never collide. You ought to think about precisely the same global
coupling
> > issues with MT19937 as you do with guaranteed-independent streams.
> > Hope-and-prayer isn't really a substitute for properly engineering your
> > problem. It's just a moral hazard to promote this method to the main
API.
>
> Nonsense.
>
> If your definition of "hope and prayer" includes assuming that we
> won't encounter a random collision in a 2**19937 state space, then
> literally all engineering is hope-and-prayer. A collision could
> happen, but if it does it's overwhelmingly more likely to happen
> because of a flaw in the mathematical analysis, or a bug in the
> implementation, or because random quantum fluctuations caused you and
> your program to suddenly be transported to a parallel world where 1 +
> 1 = 1, than that you just got unlucky with your random state. And all
> of these hazards apply equally to both MT19937 and more modern PRNGs.

Granted.

> ...anyway, the real reason I'm a bit grumpy is because there are solid
> engineering reasons why users *want* this API,

I remain unconvinced on this mark. Grumpily.

> so whether or not it
> turns out to be possible I think we should at least be allowed to have
> a discussion about whether there's some way to give it to them.

I'm not shutting down discussion of the option. I *implemented* the option.
I think that discussing whether it should be part of the main API is
premature. There probably ought to be a paper or three out there supporting
its safety and utility first. Let the utility function version flourish
first.

> It's
> not even 100% out of the question that we conclude that existing PRNGs
> are buggy because they don't take this use case into account -- it
> would be far from the first time that numpy found itself going beyond
> the limits of older numerical tools that weren't designed to build the
> kind of large composable systems that numpy gets used for.
>
> MT19937's state space is large enough that you could explicitly encode
> a "tree seed" into it, even if you don't trust the laws of probability
> -- e.g., you start with a RandomState with id [], then its children
> have id [0], [1],