Re: [Numpy-discussion] Intel random number package

2016-10-26 Thread Pavlyk, Oleksandr
Please see responses inline.

From: NumPy-Discussion [mailto:numpy-discussion-boun...@scipy.org] On Behalf Of 
Todd
Sent: Wednesday, October 26, 2016 4:04 PM
To: Discussion of Numerical Python 
Subject: Re: [Numpy-discussion] Intel random number package

On Wed, Oct 26, 2016 at 4:30 PM, Pavlyk, Oleksandr 
mailto:oleksandr.pav...@intel.com>> wrote:

The module under review, similarly to randomstate package, provides alternative 
basic pseudo-random number generators (BRNGs), like MT2203, MCG31, MRG32K3A, 
Wichmann-Hill. The scope of support differ, with randomstate implementing some 
generators absent in MKL and vice-versa.

Is there a reason that randomstate shouldn't implement those generators?


No, randomstate certainly can implement all the BRNGs implemented in MKL. It is 
at developer’s discretion.



Thinking about the possibility of providing the functionality of this module 
within the framework of randomstate, I find that randomstate implements 
samplers from statistical distributions as functions that take the state of the 
underlying BRNG, and produce a single variate, e.g.:

https://github.com/bashtage/ng-numpy-randomstate/blob/master/randomstate/distributions.c#L23-L26

This design stands in a way of efficient use of MKL, which generates a whole 
vector of variates at a time. This can be done faster than sampling a variate 
at a time by using vectorized instructions.  So I wrote mkl_distributions.cpp 
to provide functions that return a given size vector of sampled variates from 
each supported distribution.

I don't know a huge amount about pseudo-random number generators, but this 
seems superficially to be something that would benefit random number generation 
as a whole independently of whether MKL is used.  Might it be possible to 
modify the numpy implementation to support this sort of vectorized approach?

I also think that adopting vectorized mindset would benefit np.random. For 
example, Gaussians are currently generated using Box-Muller algorithm which 
produces two variate at a time, so one currently needs to be saved in the 
random state struct itself, along with an indicator that it should be used on 
the next iteration.  With vectorized approach one could populate the vector two 
elements at a time with better memory locality, resulting in better performance.

Vectorized approach has merits with or without use of MKL.

Another point already raised by Nathaniel is that for numpy's randomness 
ideally should provide a way to override default algorithm for sampling from a 
particular distribution.  For example RandomState object that implements PCG 
may rely on default acceptance-rejection algorithm for sampling from Gamma, 
while the RandomState object that provides interface to MKL might want to call 
into MKL directly.

The approach that pyfftw uses at least for scipy, which may also work here, is 
that you can monkey-patch the scipy.fftpack module at runtime, replacing it 
with pyfftw's drop-in replacement.  scipy then proceeds to use pyfftw instead 
of its built-in fftpack implementation.  Might such an approach work here?  
Users can either use this alternative randomstate replacement directly, or they 
can replace numpy's with it at runtime and numpy will then proceed to use the 
alternative.

I think the monkey-patching approach will work.

RandomState was written with a view to replace numpy.random at some point in 
the future. It is standalone at the moment, from what I understand, only 
because it is still being worked on and extended.

One particularly important development is the ability to sample continuous 
distributions in floats, or to populate a given preallocated
buffer with random samples. These features are missing from numpy.random_intel 
and we thought it providing them.

As I have said earlier, another missing feature in the C-API for randomness in 
numpy.


Oleksandr
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Intel random number package

2016-10-26 Thread Todd
On Wed, Oct 26, 2016 at 4:30 PM, Pavlyk, Oleksandr <
oleksandr.pav...@intel.com> wrote:

>
> The module under review, similarly to randomstate package, provides
> alternative basic pseudo-random number generators (BRNGs), like MT2203,
> MCG31, MRG32K3A, Wichmann-Hill. The scope of support differ, with
> randomstate implementing some generators absent in MKL and vice-versa.
>
>
Is there a reason that randomstate shouldn't implement those generators?



> Thinking about the possibility of providing the functionality of this
> module within the framework of randomstate, I find that randomstate
> implements samplers from statistical distributions as functions that take
> the state of the underlying BRNG, and produce a single variate, e.g.:
>
> https://github.com/bashtage/ng-numpy-randomstate/blob/master/randomstate/
> distributions.c#L23-L26
>
> This design stands in a way of efficient use of MKL, which generates a
> whole vector of variates at a time. This can be done faster than sampling a
> variate at a time by using vectorized instructions.  So I wrote
> mkl_distributions.cpp to provide functions that return a given size vector
> of sampled variates from each supported distribution.
>

I don't know a huge amount about pseudo-random number generators, but this
seems superficially to be something that would benefit random number
generation as a whole independently of whether MKL is used.  Might it be
possible to modify the numpy implementation to support this sort of
vectorized approach?


Another point already raised by Nathaniel is that for numpy's randomness
> ideally should provide a way to override default algorithm for sampling
> from a particular distribution.  For example RandomState object that
> implements PCG may rely on default acceptance-rejection algorithm for
> sampling from Gamma, while the RandomState object that provides interface
> to MKL might want to call into MKL directly.
>

The approach that pyfftw uses at least for scipy, which may also work here,
is that you can monkey-patch the scipy.fftpack module at runtime, replacing
it with pyfftw's drop-in replacement.  scipy then proceeds to use pyfftw
instead of its built-in fftpack implementation.  Might such an approach
work here?  Users can either use this alternative randomstate replacement
directly, or they can replace numpy's with it at runtime and numpy will
then proceed to use the alternative.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] padding options for diff

2016-10-26 Thread Peter Creasey
> Date: Wed, 26 Oct 2016 16:18:05 -0400
> From: Matthew Harrigan 
>
> Would it be preferable to have to_begin='first' as an option under the
> existing kwarg to avoid overlapping?
>
>> if keep_left:
>> if to_begin is None:
>> to_begin = np.take(a, [0], axis=axis)
>> else:
>> raise ValueError(?np.diff(a, keep_left=False, to_begin=None)
>> can be used with either keep_left or to_begin, but not both.?)
>>
>> Generally I try to avoid optional keyword argument overlap, but in
>> this case it is probably justified.
>>

It works for me. I can't *think* of a case where you could have a
np.diff on a string array and 'first' could be confused with an
element, since you're not allowed diff on strings in the present numpy
anyway (unless wiser heads than me know something!). Feel free to move
the conversation to github btw.

Peter
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Intel random number package

2016-10-26 Thread Pavlyk, Oleksandr
Hi, 

Thanks a lot everybody for the feedback. 

The package can certainly be made a stand-alone drop-in replacement for 
np.random. There are many points raised and unraised in favor of this, 
and it is easy to accomplish.  I will create a stand-alone package on github, 
but would still appreciate some help in reviewing it 
and making it available at PyPI.

Interestingly, Nathaniel's link to a representative changes, specifically 


https://github.com/oleksandr-pavlyk/numpy/blob/b53880432c19356f4e54b520958272516bf391a2/numpy/random_intel/mklrand/mkl_distributions.cpp#L1724-L1833

point at an unused code borrowed directly from mtrand/distributions.c:

https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/distributions.c#L262-L297

More representative change would be the implementation of Student's 
T-distribution:

https://github.com/oleksandr-pavlyk/numpy/blob/b53880432c19356f4e54b520958272516bf391a2/numpy/random_intel/mklrand/mkl_distributions.cpp#L232-L262
 

The module under review, similarly to randomstate package, provides alternative 
basic pseudo-random number generators (BRNGs), like MT2203, MCG31, MRG32K3A, 
Wichmann-Hill. The scope of support differ, with randomstate implementing some 
generators absent in MKL and vice-versa. 

Thinking about the possibility of providing the functionality of this module 
within the framework of randomstate, I find that randomstate implements 
samplers from statistical distributions as functions that take the state of the 
underlying BRNG, and produce a single variate, e.g.:

https://github.com/bashtage/ng-numpy-randomstate/blob/master/randomstate/distributions.c#L23-L26
 

This design stands in a way of efficient use of MKL, which generates a whole 
vector of variates at a time. This can be done faster than sampling a variate 
at a time by using vectorized instructions.  So I wrote mkl_distributions.cpp 
to provide functions that return a given size vector of sampled variates from 
each supported distribution.

mklrand.pyx was then written by modifying mtrand.pyx to work with such vector 
generators.   In particular, this allowed for efficient sampling from product 
distributions of Poisson distributions with different rate parameters, which is 
implemented in MKL:

https://software.intel.com/en-us/node/521894 

https://github.com/oleksandr-pavlyk/numpy/blob/b53880432c19356f4e54b520958272516bf391a2/numpy/random_intel/mklrand/mkl_distributions.cpp#L1071
 


Another point already raised by Nathaniel is that for numpy's randomness 
ideally should provide a way to override default algorithm for sampling from a 
particular distribution.  For example RandomState object that implements PCG 
may rely on default acceptance-rejection algorithm for sampling from Gamma, 
while the RandomState object that provides interface to MKL might want to call 
into MKL directly.

While at this topic, I also would like to point out the need for C-API 
interface to randomness, particularly felt writing parallel algorithms, where 
Python's GIL and use of Lock() in RandomState hurt scalability.

Oleksandr

-Original Message-
From: NumPy-Discussion [mailto:numpy-discussion-boun...@scipy.org] On Behalf Of 
Nathaniel Smith
Sent: Wednesday, October 26, 2016 2:25 PM
To: Discussion of Numerical Python 
Subject: Re: [Numpy-discussion] Intel random number package

On Wed, Oct 26, 2016 at 9:10 AM, Julian Taylor  
wrote:
> On 10/26/2016 06:00 PM, Julian Taylor wrote:
>>
>> On 10/26/2016 10:59 AM, Ralf Gommers wrote:
>>>
>>>
>>>
>>> On Wed, Oct 26, 2016 at 8:33 PM, Julian Taylor 
>>> >> >
>>> wrote:
>>>
>>> On 26.10.2016 06:34, Charles R Harris wrote:
>>> > Hi All,
>>> >
>>> > There is a proposed random number package PR now up on github:
>>> > https://github.com/numpy/numpy/pull/8209
>>> . It is from
>>> > oleksandr-pavlyk >> > and implements
>>> > the number random number package using MKL for increased speed.
>>> I think
>>> > we are definitely interested in the improved speed, but I'm 
>>> not sure
>>> > numpy is the best place to put the package. I'd welcome any 
>>> comments on
>>> > the PR itself, as well as any thoughts on the best way 
>>> organize or use
>>> > of this work. Maybe scikit-random
>>>
>>>
>>> Note that this thread is a continuation of 
>>> https://mail.scipy.org/pipermail/numpy-discussion/2016-July/075822.h
>>> tml
>>>
>>>
>>>
>>> I'm not a fan of putting code depending on a proprietary library
>>> into numpy.
>>> This should be a standalone package which may provide the same 
>>> interface
>>> as numpy.
>>>
>>>
>>> I don't really see a problem with that in principle. Numpy can use 
>>> Intel MKL (and Accelerate) as well if it's available. It needs some 
>>> thought put into the API though - a ``numpy.random_intel`` module is 
>>> certai

Re: [Numpy-discussion] Numpy integers to integer powers again again

2016-10-26 Thread josef . pktd
On Wed, Oct 26, 2016 at 3:57 PM, Charles R Harris  wrote:

>
>
> On Wed, Oct 26, 2016 at 1:39 PM,  wrote:
>
>>
>>
>> On Wed, Oct 26, 2016 at 3:23 PM, Charles R Harris <
>> charlesr.har...@gmail.com> wrote:
>>
>>>
>>>
>>> On Tue, Oct 25, 2016 at 10:14 AM, Stephan Hoyer 
>>> wrote:
>>>
 I am also concerned about adding more special cases for NumPy scalars
 vs arrays. These cases are already confusing (e.g., making no distinction
 between 0d arrays and scalars) and poorly documented.

 On Mon, Oct 24, 2016 at 4:30 PM, Nathaniel Smith  wrote:

> On Mon, Oct 24, 2016 at 3:41 PM, Charles R Harris
>  wrote:
> > Hi All,
> >
> > I've been thinking about this some (a lot) more and have an alternate
> > proposal for the behavior of the `**` operator
> >
> > if both base and power are numpy/python scalar integers, convert to
> python
> > integers and call the `**` operator. That would solve both the
> precision and
> > compatibility problems and I think is the option of least surprise.
> For
> > those who need type preservation and modular arithmetic, the np.power
> > function remains, although the type conversions can be surpirising
> as it
> > seems that the base and power should  play different roles in
> determining
> > the type, at least to me.
> > Array, 0-d or not, are treated differently from scalars and integers
> raised
> > to negative integer powers always raise an error.
> >
> > I think this solves most problems and would not be difficult to
> implement.
> >
> > Thoughts?
>
> My main concern about this is that it adds more special cases to numpy
> scalars, and a new behavioral deviation between 0d arrays and scalars,
> when ideally we should be trying to reduce the
> duplication/discrepancies between these. It's also inconsistent with
> how other operations on integer scalars work, e.g. regular addition
> overflows rather than promoting to Python int:
>
> In [8]: np.int64(2 ** 63 - 1) + 1
> /home/njs/.user-python3.5-64bit/bin/ipython:1: RuntimeWarning:
> overflow encountered in long_scalars
>   #!/home/njs/.user-python3.5-64bit/bin/python3.5
> Out[8]: -9223372036854775808
>
> So I'm inclined to try and keep it simple, like in your previous
> proposal... theoretically of course it would be nice to have the
> perfect solution here, but at this point it feels like we might be
> overthinking this trying to get that last 1% of improvement. The thing
> where 2 ** -1 returns 0 is just broken and bites people so we should
> definitely fix it, but beyond that I'm not sure it really matters
> *that* much what we do, and "special cases aren't special enough to
> break the rules" and all that.
>
>
>>> What I have been concerned about are the follow combinations that
>>> currently return floats
>>>
>>> num: , exp: , res: >> 'numpy.float32'>
>>> num: , exp: , res: >> 'numpy.float32'>
>>> num: , exp: , res: >> 'numpy.float32'>
>>> num: , exp: , res: >> 'numpy.float64'>
>>> num: , exp: , res: >> 'numpy.float64'>
>>> num: , exp: , res: >> 'numpy.float64'>
>>> num: , exp: , res: >> 'numpy.float64'>
>>> num: , exp: , res: >> 'numpy.float64'>
>>> num: , exp: , res: >> 'numpy.float64'>
>>> num: , exp: , res: >> 'numpy.float64'>
>>> num: , exp: , res: >> 'numpy.float64'>
>>> num: , exp: , res: >> 'numpy.float64'>
>>> num: , exp: , res: >> 'numpy.float64'>
>>> num: , exp: , res: >> 'numpy.float64'>
>>> num: , exp: , res: >> 'numpy.float64'>
>>> num: , exp: , res: >> 'numpy.float64'>
>>>
>>> The other combinations of signed and unsigned integers to signed powers
>>> currently raise ValueError due to the change to the power ufunc. The
>>> exceptions that aren't covered by uint64 + signed (which won't change) seem
>>> to occur when the exponent can be safely cast to the base type. I suspect
>>> that people have already come to depend on that, especially as python
>>> integers on 64 bit linux convert to int64. So in those cases we should
>>> perhaps raise a FutureWarning instead of an error.
>>>
>>
>>
>> >>> np.int64(2)**np.array(-1, np.int64)
>> 0.5
>> >>> np.__version__
>> '1.10.4'
>> >>> np.int64(2)**np.array([-1, 2], np.int64)
>> array([0, 4], dtype=int64)
>> >>> np.array(2, np.uint64)**np.array([-1, 2], np.int64)
>> array([0, 4], dtype=int64)
>> >>> np.array([2], np.uint64)**np.array([-1, 2], np.int64)
>> array([ 0.5,  4. ])
>> >>> np.array([2], np.uint64).squeeze()**np.array([-1, 2], np.int64)
>> array([0, 4], dtype=int64)
>>
>>
>> (IMO: If you have to break backwards compatibility, break forwards not
>> backwards.)
>>
>
> Current master is different. I'm not too worried in the array cases as the
> results for negative exponents were zero except then raising -1 to a power.
> Since that result is incorrect raising an error  falls on the fine line
> between bug fix and compatibility break. If the pre-rele

Re: [Numpy-discussion] padding options for diff

2016-10-26 Thread Matthew Harrigan
Would it be preferable to have to_begin='first' as an option under the
existing kwarg to avoid overlapping?

On Wed, Oct 26, 2016 at 3:35 PM, Peter Creasey <
p.e.creasey...@googlemail.com> wrote:

> > Date: Wed, 26 Oct 2016 09:05:41 -0400
> > From: Matthew Harrigan 
> >
> > np.cumsum(np.diff(x, to_begin=x.take([0], axis=axis), axis=axis),
> axis=axis)
> >
> > That's certainly not going to win any beauty contests.  The 1d case is
> > clean though:
> >
> > np.cumsum(np.diff(x, to_begin=x[0]))
> >
> > I'm not sure if this means the API should change, and if so how.  Higher
> > dimensional arrays seem to just have extra complexity.
> >
> >>
> >> I like the proposal, though I suspect that making it general has
> >> obscured that the most common use-case for padding is to make the
> >> inverse of np.cumsum (at least that?s what I frequently need), and now
> >> in the multidimensional case you have the somewhat unwieldy:
> >>
> >> >>> np.diff(a, axis=axis, to_begin=np.take(a, 0, axis=axis))
> >>
> >> rather than
> >>
> >> >>> np.diff(a, axis=axis, keep_left=True)
> >>
> >> which of course could just be an option upon what you already have.
> >>
>
> So my suggestion was intended that you might want an additional
> keyword argument (keep_left=False) to make the inverse np.cumsum
> use-case easier, i.e. you would have something in your np.diff like:
>
> if keep_left:
> if to_begin is None:
> to_begin = np.take(a, [0], axis=axis)
> else:
> raise ValueError(‘np.diff(a, keep_left=False, to_begin=None)
> can be used with either keep_left or to_begin, but not both.’)
>
> Generally I try to avoid optional keyword argument overlap, but in
> this case it is probably justified.
>
> Peter
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy integers to integer powers again again

2016-10-26 Thread josef . pktd
On Wed, Oct 26, 2016 at 3:39 PM, Nathaniel Smith  wrote:

> On Wed, Oct 26, 2016 at 12:23 PM, Charles R Harris
>  wrote:
> [...]
> > What I have been concerned about are the follow combinations that
> currently
> > return floats
> >
> > num: , exp: , res:  > 'numpy.float32'>
> > num: , exp: , res:  > 'numpy.float32'>
> > num: , exp: , res:  > 'numpy.float32'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
>
> What's this referring to? For both arrays and scalars I get:
>
> In [8]: (np.array(2, dtype=np.int8) ** np.array(2, dtype=np.int8)).dtype
> Out[8]: dtype('int8')
>
> In [9]: (np.int8(2) ** np.int8(2)).dtype
> Out[9]: dtype('int8')
>


>>> (np.array([2], dtype=np.int8) ** np.array(-1,
dtype=np.int8).squeeze()).dtype
dtype('int8')
>>> (np.array([2], dtype=np.int8)[0] ** np.array(-1,
dtype=np.int8).squeeze()).dtype
dtype('float32')

>>> (np.int8(2)**np.int8(-1)).dtype
dtype('float32')
>>> (np.int8(2)**np.int8(2)).dtype
dtype('int8')

The last one looks like value dependent scalar dtype

Josef


>
> -n
>
> --
> Nathaniel J. Smith -- https://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Combining covariance and correlation coefficient into one numpy.cov call

2016-10-26 Thread Mathew S. Madhavacheril
On Wed, Oct 26, 2016 at 3:20 PM,  wrote:

>
>
> On Wed, Oct 26, 2016 at 3:11 PM, Mathew S. Madhavacheril <
> mathewsyr...@gmail.com> wrote:
>
>>
>>
>> On Wed, Oct 26, 2016 at 2:56 PM, Nathaniel Smith  wrote:
>>
>>> On Wed, Oct 26, 2016 at 11:13 AM, Stephan Hoyer 
>>> wrote:
>>> > On Wed, Oct 26, 2016 at 11:03 AM, Mathew S. Madhavacheril
>>> >  wrote:
>>> >>
>>> >> On Wed, Oct 26, 2016 at 1:46 PM, Stephan Hoyer 
>>> wrote:
>>> >>>
>>> >>> I wonder if the goals of this addition could be achieved by simply
>>> adding
>>> >>> an optional `cov` argument
>>> >>>
>>> >>> to np.corr, which would provide a pre-computed covariance.
>>> >>
>>> >>
>>> >> That's a fair suggestion which I'm happy to switch to. This
>>> eliminates the
>>> >> need for two new functions.
>>> >> I'll add an optional `cov = False` argument to numpy.corrcoef that
>>> returns
>>> >> a tuple (corr, cov) instead.
>>> >>
>>> >>>
>>> >>>
>>> >>> Either way, `covcorr` feels like a helper function that could exist
>>> in
>>> >>> user code rather than numpy proper.
>>> >>
>>> >>
>>> >> The user would have to re-implement the part that converts the
>>> covariance
>>> >> matrix to a correlation
>>> >> coefficient. I made this PR to avoid that code duplication.
>>> >
>>> >
>>> > With the API I was envisioning (or even your proposed API, for that
>>> matter),
>>> > this function would only be a few lines, e.g.,
>>> >
>>> > def covcorr(x):
>>> > cov = np.cov(x)
>>> > corr = np.corrcoef(x, cov=cov)
>>>
>>> IIUC, if you have a covariance matrix then you can compute the
>>> correlation matrix directly, without looking at 'x', so corrcoef(x,
>>> cov=cov) is a bit odd-looking. I think probably the API that makes the
>>> most sense is just to expose something like the covtocorr function
>>> (maybe it could have a less telegraphic name?)? And then, yeah, users
>>> can use that to build their own covcorr or whatever if they want it.
>>>
>>
>> Right, agreed, this is why I said `x` becomes redundant when `cov` is
>> specified
>> when calling `numpy.corrcoef`.  So we have two alternatives:
>>
>> 1) Have `np.corrcoef` accept a boolean optional argument `covmat = False`
>> that lets
>> one obtain a tuple containing the covariance and the correlation matrices
>> in the same call
>> 2) Modify my original PR so that `np.covtocorr` remains (with possibly a
>> better
>> name) but remove `np.covcorr` since this is easy for the user to add.
>>
>> My preference is option 2.
>>
>
> cov2corr is a useful function
> http://www.statsmodels.org/dev/generated/statsmodels.stats.
> moment_helpers.cov2corr.html
> I also wrote the inverse function corr2cov, but AFAIR use it only in some
> test cases.
>
>
> I don't think adding any of the options to corrcoef or covcor is useful
> since there is no computational advantage to it.
>

I'm not sure I agree with that statement. If a user wants to calculate both
a covariance and correlation matrix,
they currently have two options:
A) Call np.cov and np.corrcoef separately, which takes at least twice as
long as one call to np.cov. For data-sets that
I am used to, a np.cov call takes 5-10 seconds.
B) Call np.cov and then separately implement their own correlation matrix
code, which means the user
isn't able to fully take advantage of code that is already in numpy.

In any case, I've updated the PR:
https://github.com/numpy/numpy/pull/8211

Relative to my original PR, it:
a) removes the numpy.covcorr function which the user can easily implement
b) have numpy.cov2corr be the function exposed in the API (previously
called numpy.covtocorr in the PR), which accepts a pre-calculated covariance
matrix
c) have numpy.corrcoef call numpy.cov2corr




>
>
>
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy integers to integer powers again again

2016-10-26 Thread Charles R Harris
On Wed, Oct 26, 2016 at 1:39 PM,  wrote:

>
>
> On Wed, Oct 26, 2016 at 3:23 PM, Charles R Harris <
> charlesr.har...@gmail.com> wrote:
>
>>
>>
>> On Tue, Oct 25, 2016 at 10:14 AM, Stephan Hoyer  wrote:
>>
>>> I am also concerned about adding more special cases for NumPy scalars vs
>>> arrays. These cases are already confusing (e.g., making no distinction
>>> between 0d arrays and scalars) and poorly documented.
>>>
>>> On Mon, Oct 24, 2016 at 4:30 PM, Nathaniel Smith  wrote:
>>>
 On Mon, Oct 24, 2016 at 3:41 PM, Charles R Harris
  wrote:
 > Hi All,
 >
 > I've been thinking about this some (a lot) more and have an alternate
 > proposal for the behavior of the `**` operator
 >
 > if both base and power are numpy/python scalar integers, convert to
 python
 > integers and call the `**` operator. That would solve both the
 precision and
 > compatibility problems and I think is the option of least surprise.
 For
 > those who need type preservation and modular arithmetic, the np.power
 > function remains, although the type conversions can be surpirising as
 it
 > seems that the base and power should  play different roles in
 determining
 > the type, at least to me.
 > Array, 0-d or not, are treated differently from scalars and integers
 raised
 > to negative integer powers always raise an error.
 >
 > I think this solves most problems and would not be difficult to
 implement.
 >
 > Thoughts?

 My main concern about this is that it adds more special cases to numpy
 scalars, and a new behavioral deviation between 0d arrays and scalars,
 when ideally we should be trying to reduce the
 duplication/discrepancies between these. It's also inconsistent with
 how other operations on integer scalars work, e.g. regular addition
 overflows rather than promoting to Python int:

 In [8]: np.int64(2 ** 63 - 1) + 1
 /home/njs/.user-python3.5-64bit/bin/ipython:1: RuntimeWarning:
 overflow encountered in long_scalars
   #!/home/njs/.user-python3.5-64bit/bin/python3.5
 Out[8]: -9223372036854775808

 So I'm inclined to try and keep it simple, like in your previous
 proposal... theoretically of course it would be nice to have the
 perfect solution here, but at this point it feels like we might be
 overthinking this trying to get that last 1% of improvement. The thing
 where 2 ** -1 returns 0 is just broken and bites people so we should
 definitely fix it, but beyond that I'm not sure it really matters
 *that* much what we do, and "special cases aren't special enough to
 break the rules" and all that.


>> What I have been concerned about are the follow combinations that
>> currently return floats
>>
>> num: , exp: , res: > 'numpy.float32'>
>> num: , exp: , res: > 'numpy.float32'>
>> num: , exp: , res: > 'numpy.float32'>
>> num: , exp: , res: > 'numpy.float64'>
>> num: , exp: , res: > 'numpy.float64'>
>> num: , exp: , res: > 'numpy.float64'>
>> num: , exp: , res: > 'numpy.float64'>
>> num: , exp: , res: > 'numpy.float64'>
>> num: , exp: , res: > 'numpy.float64'>
>> num: , exp: , res: > 'numpy.float64'>
>> num: , exp: , res: > 'numpy.float64'>
>> num: , exp: , res: > 'numpy.float64'>
>> num: , exp: , res: > 'numpy.float64'>
>> num: , exp: , res: > 'numpy.float64'>
>> num: , exp: , res: > 'numpy.float64'>
>> num: , exp: , res: > 'numpy.float64'>
>>
>> The other combinations of signed and unsigned integers to signed powers
>> currently raise ValueError due to the change to the power ufunc. The
>> exceptions that aren't covered by uint64 + signed (which won't change) seem
>> to occur when the exponent can be safely cast to the base type. I suspect
>> that people have already come to depend on that, especially as python
>> integers on 64 bit linux convert to int64. So in those cases we should
>> perhaps raise a FutureWarning instead of an error.
>>
>
>
> >>> np.int64(2)**np.array(-1, np.int64)
> 0.5
> >>> np.__version__
> '1.10.4'
> >>> np.int64(2)**np.array([-1, 2], np.int64)
> array([0, 4], dtype=int64)
> >>> np.array(2, np.uint64)**np.array([-1, 2], np.int64)
> array([0, 4], dtype=int64)
> >>> np.array([2], np.uint64)**np.array([-1, 2], np.int64)
> array([ 0.5,  4. ])
> >>> np.array([2], np.uint64).squeeze()**np.array([-1, 2], np.int64)
> array([0, 4], dtype=int64)
>
>
> (IMO: If you have to break backwards compatibility, break forwards not
> backwards.)
>

Current master is different. I'm not too worried in the array cases as the
results for negative exponents were zero except then raising -1 to a power.
Since that result is incorrect raising an error  falls on the fine line
between bug fix and compatibility break. If the pre-releases cause too much
trouble.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy integers to integer powers again again

2016-10-26 Thread Charles R Harris
On Wed, Oct 26, 2016 at 1:39 PM, Nathaniel Smith  wrote:

> On Wed, Oct 26, 2016 at 12:23 PM, Charles R Harris
>  wrote:
> [...]
> > What I have been concerned about are the follow combinations that
> currently
> > return floats
> >
> > num: , exp: , res:  > 'numpy.float32'>
> > num: , exp: , res:  > 'numpy.float32'>
> > num: , exp: , res:  > 'numpy.float32'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
> > num: , exp: , res:  > 'numpy.float64'>
>
> What's this referring to? For both arrays and scalars I get:
>
> In [8]: (np.array(2, dtype=np.int8) ** np.array(2, dtype=np.int8)).dtype
> Out[8]: dtype('int8')
>
> In [9]: (np.int8(2) ** np.int8(2)).dtype
> Out[9]: dtype('int8')
>
>
You need a negative exponent to see the effect.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Intel random number package

2016-10-26 Thread Nathaniel Smith
On Wed, Oct 26, 2016 at 12:41 PM, Warren Weckesser
 wrote:
>
>
> On Wed, Oct 26, 2016 at 3:24 PM, Nathaniel Smith  wrote:
>>
>> On Wed, Oct 26, 2016 at 9:10 AM, Julian Taylor
>>  wrote:
>> > On 10/26/2016 06:00 PM, Julian Taylor wrote:
>> >>
>> >> On 10/26/2016 10:59 AM, Ralf Gommers wrote:
>> >>>
>> >>>
>> >>>
>> >>> On Wed, Oct 26, 2016 at 8:33 PM, Julian Taylor
>> >>> mailto:jtaylor.deb...@googlemail.com>>
>> >>> wrote:
>> >>>
>> >>> On 26.10.2016 06:34, Charles R Harris wrote:
>> >>> > Hi All,
>> >>> >
>> >>> > There is a proposed random number package PR now up on github:
>> >>> > https://github.com/numpy/numpy/pull/8209
>> >>> . It is from
>> >>> > oleksandr-pavlyk > >>> > and implements
>> >>> > the number random number package using MKL for increased speed.
>> >>> I think
>> >>> > we are definitely interested in the improved speed, but I'm not
>> >>> sure
>> >>> > numpy is the best place to put the package. I'd welcome any
>> >>> comments on
>> >>> > the PR itself, as well as any thoughts on the best way organize
>> >>> or use
>> >>> > of this work. Maybe scikit-random
>> >>>
>> >>>
>> >>> Note that this thread is a continuation of
>> >>>
>> >>> https://mail.scipy.org/pipermail/numpy-discussion/2016-July/075822.html
>> >>>
>> >>>
>> >>>
>> >>> I'm not a fan of putting code depending on a proprietary library
>> >>> into numpy.
>> >>> This should be a standalone package which may provide the same
>> >>> interface
>> >>> as numpy.
>> >>>
>> >>>
>> >>> I don't really see a problem with that in principle. Numpy can use
>> >>> Intel
>> >>> MKL (and Accelerate) as well if it's available. It needs some thought
>> >>> put into the API though - a ``numpy.random_intel`` module is certainly
>> >>> not what we want.
>> >>>
>> >>
>> >> For me there is a difference between being able to optionally use a
>> >> proprietary library as an alternative to free software libraries if the
>> >> user wishes to do so and offering functionality that only works with
>> >> non-free software.
>> >> We are providing a form of advertisement for them by allowing it (hey
>> >> if
>> >> you buy this black box that you cannot modify or use freely you get
>> >> this
>> >> neat numpy feature!).
>> >>
>> >> I prefer for the full functionality of numpy to stay available with a
>> >> stack of community owned software, even if it may be less powerful that
>> >> way.
>> >
>> > But then if this is really just the same random numbers numpy already
>> > provides just faster, it is probably acceptable in principle. I haven't
>> > actually looked at the PR yet.
>>
>> The RNG stream is totally different, so yeah, it can't just be a
>> silent drop-in replacement like BLAS/LAPACK.
>>
>> The patch also adds ~10,000 lines of code; here's an example of what
>> some of it looks like:
>>
>>
>> https://github.com/oleksandr-pavlyk/numpy/blob/b53880432c19356f4e54b520958272516bf391a2/numpy/random_intel/mklrand/mkl_distributions.cpp#L1724-L1833
>>
>> I don't see how we can realistically commit to maintaining this.
>>
>
>
> FYI:  numpy already maintains code exactly like that:
> https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/distributions.c#L262-L397
>
> Perhaps the point should be that the numpy devs won't want to maintain two
> nearly identical versions of that code.

Heh, good catch! Okay, if random_intel is a massive copy-paste of
random with modifications applied on top, then that's its own issue...
on the one hand, yeah, we definitely don't want to carry around
massive copy/paste code. OTOH, it suggests that it might be possible
to refactor the code so that common parts are shared, and this would
be a benefit to integrating random and random_intel more closely. (And
this benefit would then have to be weighed against all the other
considerations, like how much sharing there actually was,
maintainability of the remaining random_intel-specific bits, the
desire to keep numpy free-and-open, etc.) Hard to make that call just
from skimming a 10,000 line patch, though...

Oleksandr, or others at Intel: how much possibility do you think there
is for sharing code between random and random_intel?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Intel random number package

2016-10-26 Thread Robert Kern
On Wed, Oct 26, 2016 at 12:41 PM, Warren Weckesser <
warren.weckes...@gmail.com> wrote:
>
> On Wed, Oct 26, 2016 at 3:24 PM, Nathaniel Smith  wrote:

>> The patch also adds ~10,000 lines of code; here's an example of what
>> some of it looks like:
>>
>>
https://github.com/oleksandr-pavlyk/numpy/blob/b53880432c19356f4e54b520958272516bf391a2/numpy/random_intel/mklrand/mkl_distributions.cpp#L1724-L1833
>>
>> I don't see how we can realistically commit to maintaining this.
>
> FYI:  numpy already maintains code exactly like that:
https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/distributions.c#L262-L397
>
> Perhaps the point should be that the numpy devs won't want to maintain
two nearly identical versions of that code.

Indeed. That's how the algorithm was published. The /* sigh ... */ is my
own. ;-)

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy integers to integer powers again again

2016-10-26 Thread Charles R Harris
On Tue, Oct 25, 2016 at 10:14 AM, Stephan Hoyer  wrote:

> I am also concerned about adding more special cases for NumPy scalars vs
> arrays. These cases are already confusing (e.g., making no distinction
> between 0d arrays and scalars) and poorly documented.
>
> On Mon, Oct 24, 2016 at 4:30 PM, Nathaniel Smith  wrote:
>
>> On Mon, Oct 24, 2016 at 3:41 PM, Charles R Harris
>>  wrote:
>> > Hi All,
>> >
>> > I've been thinking about this some (a lot) more and have an alternate
>> > proposal for the behavior of the `**` operator
>> >
>> > if both base and power are numpy/python scalar integers, convert to
>> python
>> > integers and call the `**` operator. That would solve both the
>> precision and
>> > compatibility problems and I think is the option of least surprise. For
>> > those who need type preservation and modular arithmetic, the np.power
>> > function remains, although the type conversions can be surpirising as it
>> > seems that the base and power should  play different roles in
>> determining
>> > the type, at least to me.
>> > Array, 0-d or not, are treated differently from scalars and integers
>> raised
>> > to negative integer powers always raise an error.
>> >
>> > I think this solves most problems and would not be difficult to
>> implement.
>> >
>> > Thoughts?
>>
>> My main concern about this is that it adds more special cases to numpy
>> scalars, and a new behavioral deviation between 0d arrays and scalars,
>> when ideally we should be trying to reduce the
>> duplication/discrepancies between these. It's also inconsistent with
>> how other operations on integer scalars work, e.g. regular addition
>> overflows rather than promoting to Python int:
>>
>> In [8]: np.int64(2 ** 63 - 1) + 1
>> /home/njs/.user-python3.5-64bit/bin/ipython:1: RuntimeWarning:
>> overflow encountered in long_scalars
>>   #!/home/njs/.user-python3.5-64bit/bin/python3.5
>> Out[8]: -9223372036854775808
>>
>> So I'm inclined to try and keep it simple, like in your previous
>> proposal... theoretically of course it would be nice to have the
>> perfect solution here, but at this point it feels like we might be
>> overthinking this trying to get that last 1% of improvement. The thing
>> where 2 ** -1 returns 0 is just broken and bites people so we should
>> definitely fix it, but beyond that I'm not sure it really matters
>> *that* much what we do, and "special cases aren't special enough to
>> break the rules" and all that.
>>
>>
What I have been concerned about are the follow combinations that currently
return floats

num: , exp: , res: 
num: , exp: , res: 
num: , exp: , res: 
num: , exp: , res: 
num: , exp: , res: 
num: , exp: , res: 
num: , exp: , res: 
num: , exp: , res: 
num: , exp: , res: 
num: , exp: , res: 
num: , exp: , res: 
num: , exp: , res: 
num: , exp: , res: 
num: , exp: , res: 
num: , exp: , res: 
num: , exp: , res: 

The other combinations of signed and unsigned integers to signed powers
currently raise ValueError due to the change to the power ufunc. The
exceptions that aren't covered by uint64 + signed (which won't change) seem
to occur when the exponent can be safely cast to the base type. I suspect
that people have already come to depend on that, especially as python
integers on 64 bit linux convert to int64. So in those cases we should
perhaps raise a FutureWarning instead of an error.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Intel random number package

2016-10-26 Thread Warren Weckesser
On Wed, Oct 26, 2016 at 3:24 PM, Nathaniel Smith  wrote:

> On Wed, Oct 26, 2016 at 9:10 AM, Julian Taylor
>  wrote:
> > On 10/26/2016 06:00 PM, Julian Taylor wrote:
> >>
> >> On 10/26/2016 10:59 AM, Ralf Gommers wrote:
> >>>
> >>>
> >>>
> >>> On Wed, Oct 26, 2016 at 8:33 PM, Julian Taylor
> >>> mailto:jtaylor.deb...@googlemail.com>>
> >>> wrote:
> >>>
> >>> On 26.10.2016 06:34, Charles R Harris wrote:
> >>> > Hi All,
> >>> >
> >>> > There is a proposed random number package PR now up on github:
> >>> > https://github.com/numpy/numpy/pull/8209
> >>> . It is from
> >>> > oleksandr-pavlyk  >>> > and implements
> >>> > the number random number package using MKL for increased speed.
> >>> I think
> >>> > we are definitely interested in the improved speed, but I'm not
> >>> sure
> >>> > numpy is the best place to put the package. I'd welcome any
> >>> comments on
> >>> > the PR itself, as well as any thoughts on the best way organize
> >>> or use
> >>> > of this work. Maybe scikit-random
> >>>
> >>>
> >>> Note that this thread is a continuation of
> >>> https://mail.scipy.org/pipermail/numpy-discussion/
> 2016-July/075822.html
> >>>
> >>>
> >>>
> >>> I'm not a fan of putting code depending on a proprietary library
> >>> into numpy.
> >>> This should be a standalone package which may provide the same
> >>> interface
> >>> as numpy.
> >>>
> >>>
> >>> I don't really see a problem with that in principle. Numpy can use
> Intel
> >>> MKL (and Accelerate) as well if it's available. It needs some thought
> >>> put into the API though - a ``numpy.random_intel`` module is certainly
> >>> not what we want.
> >>>
> >>
> >> For me there is a difference between being able to optionally use a
> >> proprietary library as an alternative to free software libraries if the
> >> user wishes to do so and offering functionality that only works with
> >> non-free software.
> >> We are providing a form of advertisement for them by allowing it (hey if
> >> you buy this black box that you cannot modify or use freely you get this
> >> neat numpy feature!).
> >>
> >> I prefer for the full functionality of numpy to stay available with a
> >> stack of community owned software, even if it may be less powerful that
> >> way.
> >
> > But then if this is really just the same random numbers numpy already
> > provides just faster, it is probably acceptable in principle. I haven't
> > actually looked at the PR yet.
>
> The RNG stream is totally different, so yeah, it can't just be a
> silent drop-in replacement like BLAS/LAPACK.
>
> The patch also adds ~10,000 lines of code; here's an example of what
> some of it looks like:
>
> https://github.com/oleksandr-pavlyk/numpy/blob/
> b53880432c19356f4e54b520958272516bf391a2/numpy/random_intel/
> mklrand/mkl_distributions.cpp#L1724-L1833
>
> I don't see how we can realistically commit to maintaining this.
>
>

FYI:  numpy already maintains code exactly like that:
https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/distributions.c#L262-L397

Perhaps the point should be that the numpy devs won't want to maintain two
nearly identical versions of that code.

Warren




> I'm also not really seeing how shipping it as part of numpy provides
> extra benefits to maintainers or users? AFAICT right now it's
> basically structured as a standalone library that's been dropped into
> the numpy source tree, and it would be just as easy to ship separately
> (or am I wrong?). And since the public API is that all the
> functionality comes from importing this specific new module
> ('numpy.random_intel'), it'd be a one-line change for users to import
> from a non-numpy namespace, like 'mkl.random' or whatever. If it were
> more integrated with the rest of numpy then the trade-offs would be
> more complicated, but in its present form this seems like an easy
> call.
>
> The other question is whether it could/should change to *become* more
> integrated... that's more tricky. There's been some work towards
> supporting swappable backends inside np.random; but the focus has
> mostly been on allowing new core generators, though, and this code
> seems to want to take over the whole thing (core generator +
> distributions), so even once the swappable backends stuff is working
> I'm not sure it would be relevant here. The one case I can think of
> that does seem promising is that if we get an API for users to say "I
> don't care about stream compatibility, just give me un-reproducible
> variates as fast as you can", then it might make sense for that to
> silently use MKL if available -- this would be pretty analogous to the
> use of MKL in np.linalg. But we don't have that API yet, I'm not sure
> how the MKL fallback could be maintainably implemented given that it
> would require somehow swapping the entire RandomState implementation,
> 

Re: [Numpy-discussion] Numpy integers to integer powers again again

2016-10-26 Thread Nathaniel Smith
On Wed, Oct 26, 2016 at 12:23 PM, Charles R Harris
 wrote:
[...]
> What I have been concerned about are the follow combinations that currently
> return floats
>
> num: , exp: , res:  'numpy.float32'>
> num: , exp: , res:  'numpy.float32'>
> num: , exp: , res:  'numpy.float32'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>

What's this referring to? For both arrays and scalars I get:

In [8]: (np.array(2, dtype=np.int8) ** np.array(2, dtype=np.int8)).dtype
Out[8]: dtype('int8')

In [9]: (np.int8(2) ** np.int8(2)).dtype
Out[9]: dtype('int8')

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy integers to integer powers again again

2016-10-26 Thread josef . pktd
On Wed, Oct 26, 2016 at 3:23 PM, Charles R Harris  wrote:

>
>
> On Tue, Oct 25, 2016 at 10:14 AM, Stephan Hoyer  wrote:
>
>> I am also concerned about adding more special cases for NumPy scalars vs
>> arrays. These cases are already confusing (e.g., making no distinction
>> between 0d arrays and scalars) and poorly documented.
>>
>> On Mon, Oct 24, 2016 at 4:30 PM, Nathaniel Smith  wrote:
>>
>>> On Mon, Oct 24, 2016 at 3:41 PM, Charles R Harris
>>>  wrote:
>>> > Hi All,
>>> >
>>> > I've been thinking about this some (a lot) more and have an alternate
>>> > proposal for the behavior of the `**` operator
>>> >
>>> > if both base and power are numpy/python scalar integers, convert to
>>> python
>>> > integers and call the `**` operator. That would solve both the
>>> precision and
>>> > compatibility problems and I think is the option of least surprise. For
>>> > those who need type preservation and modular arithmetic, the np.power
>>> > function remains, although the type conversions can be surpirising as
>>> it
>>> > seems that the base and power should  play different roles in
>>> determining
>>> > the type, at least to me.
>>> > Array, 0-d or not, are treated differently from scalars and integers
>>> raised
>>> > to negative integer powers always raise an error.
>>> >
>>> > I think this solves most problems and would not be difficult to
>>> implement.
>>> >
>>> > Thoughts?
>>>
>>> My main concern about this is that it adds more special cases to numpy
>>> scalars, and a new behavioral deviation between 0d arrays and scalars,
>>> when ideally we should be trying to reduce the
>>> duplication/discrepancies between these. It's also inconsistent with
>>> how other operations on integer scalars work, e.g. regular addition
>>> overflows rather than promoting to Python int:
>>>
>>> In [8]: np.int64(2 ** 63 - 1) + 1
>>> /home/njs/.user-python3.5-64bit/bin/ipython:1: RuntimeWarning:
>>> overflow encountered in long_scalars
>>>   #!/home/njs/.user-python3.5-64bit/bin/python3.5
>>> Out[8]: -9223372036854775808
>>>
>>> So I'm inclined to try and keep it simple, like in your previous
>>> proposal... theoretically of course it would be nice to have the
>>> perfect solution here, but at this point it feels like we might be
>>> overthinking this trying to get that last 1% of improvement. The thing
>>> where 2 ** -1 returns 0 is just broken and bites people so we should
>>> definitely fix it, but beyond that I'm not sure it really matters
>>> *that* much what we do, and "special cases aren't special enough to
>>> break the rules" and all that.
>>>
>>>
> What I have been concerned about are the follow combinations that
> currently return floats
>
> num: , exp: , res:  'numpy.float32'>
> num: , exp: , res:  'numpy.float32'>
> num: , exp: , res:  'numpy.float32'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
> num: , exp: , res:  'numpy.float64'>
>
> The other combinations of signed and unsigned integers to signed powers
> currently raise ValueError due to the change to the power ufunc. The
> exceptions that aren't covered by uint64 + signed (which won't change) seem
> to occur when the exponent can be safely cast to the base type. I suspect
> that people have already come to depend on that, especially as python
> integers on 64 bit linux convert to int64. So in those cases we should
> perhaps raise a FutureWarning instead of an error.
>


>>> np.int64(2)**np.array(-1, np.int64)
0.5
>>> np.__version__
'1.10.4'
>>> np.int64(2)**np.array([-1, 2], np.int64)
array([0, 4], dtype=int64)
>>> np.array(2, np.uint64)**np.array([-1, 2], np.int64)
array([0, 4], dtype=int64)
>>> np.array([2], np.uint64)**np.array([-1, 2], np.int64)
array([ 0.5,  4. ])
>>> np.array([2], np.uint64).squeeze()**np.array([-1, 2], np.int64)
array([0, 4], dtype=int64)


(IMO: If you have to break backwards compatibility, break forwards not
backwards.)


Josef
http://www.stanlaurelandoliverhardy.com/nicemess.htm



>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] padding options for diff

2016-10-26 Thread Peter Creasey
> Date: Wed, 26 Oct 2016 09:05:41 -0400
> From: Matthew Harrigan 
>
> np.cumsum(np.diff(x, to_begin=x.take([0], axis=axis), axis=axis), axis=axis)
>
> That's certainly not going to win any beauty contests.  The 1d case is
> clean though:
>
> np.cumsum(np.diff(x, to_begin=x[0]))
>
> I'm not sure if this means the API should change, and if so how.  Higher
> dimensional arrays seem to just have extra complexity.
>
>>
>> I like the proposal, though I suspect that making it general has
>> obscured that the most common use-case for padding is to make the
>> inverse of np.cumsum (at least that?s what I frequently need), and now
>> in the multidimensional case you have the somewhat unwieldy:
>>
>> >>> np.diff(a, axis=axis, to_begin=np.take(a, 0, axis=axis))
>>
>> rather than
>>
>> >>> np.diff(a, axis=axis, keep_left=True)
>>
>> which of course could just be an option upon what you already have.
>>

So my suggestion was intended that you might want an additional
keyword argument (keep_left=False) to make the inverse np.cumsum
use-case easier, i.e. you would have something in your np.diff like:

if keep_left:
if to_begin is None:
to_begin = np.take(a, [0], axis=axis)
else:
raise ValueError(‘np.diff(a, keep_left=False, to_begin=None)
can be used with either keep_left or to_begin, but not both.’)

Generally I try to avoid optional keyword argument overlap, but in
this case it is probably justified.

Peter
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Intel random number package

2016-10-26 Thread Nathaniel Smith
On Wed, Oct 26, 2016 at 9:10 AM, Julian Taylor
 wrote:
> On 10/26/2016 06:00 PM, Julian Taylor wrote:
>>
>> On 10/26/2016 10:59 AM, Ralf Gommers wrote:
>>>
>>>
>>>
>>> On Wed, Oct 26, 2016 at 8:33 PM, Julian Taylor
>>> mailto:jtaylor.deb...@googlemail.com>>
>>> wrote:
>>>
>>> On 26.10.2016 06:34, Charles R Harris wrote:
>>> > Hi All,
>>> >
>>> > There is a proposed random number package PR now up on github:
>>> > https://github.com/numpy/numpy/pull/8209
>>> . It is from
>>> > oleksandr-pavlyk >> > and implements
>>> > the number random number package using MKL for increased speed.
>>> I think
>>> > we are definitely interested in the improved speed, but I'm not
>>> sure
>>> > numpy is the best place to put the package. I'd welcome any
>>> comments on
>>> > the PR itself, as well as any thoughts on the best way organize
>>> or use
>>> > of this work. Maybe scikit-random
>>>
>>>
>>> Note that this thread is a continuation of
>>> https://mail.scipy.org/pipermail/numpy-discussion/2016-July/075822.html
>>>
>>>
>>>
>>> I'm not a fan of putting code depending on a proprietary library
>>> into numpy.
>>> This should be a standalone package which may provide the same
>>> interface
>>> as numpy.
>>>
>>>
>>> I don't really see a problem with that in principle. Numpy can use Intel
>>> MKL (and Accelerate) as well if it's available. It needs some thought
>>> put into the API though - a ``numpy.random_intel`` module is certainly
>>> not what we want.
>>>
>>
>> For me there is a difference between being able to optionally use a
>> proprietary library as an alternative to free software libraries if the
>> user wishes to do so and offering functionality that only works with
>> non-free software.
>> We are providing a form of advertisement for them by allowing it (hey if
>> you buy this black box that you cannot modify or use freely you get this
>> neat numpy feature!).
>>
>> I prefer for the full functionality of numpy to stay available with a
>> stack of community owned software, even if it may be less powerful that
>> way.
>
> But then if this is really just the same random numbers numpy already
> provides just faster, it is probably acceptable in principle. I haven't
> actually looked at the PR yet.

The RNG stream is totally different, so yeah, it can't just be a
silent drop-in replacement like BLAS/LAPACK.

The patch also adds ~10,000 lines of code; here's an example of what
some of it looks like:


https://github.com/oleksandr-pavlyk/numpy/blob/b53880432c19356f4e54b520958272516bf391a2/numpy/random_intel/mklrand/mkl_distributions.cpp#L1724-L1833

I don't see how we can realistically commit to maintaining this.

I'm also not really seeing how shipping it as part of numpy provides
extra benefits to maintainers or users? AFAICT right now it's
basically structured as a standalone library that's been dropped into
the numpy source tree, and it would be just as easy to ship separately
(or am I wrong?). And since the public API is that all the
functionality comes from importing this specific new module
('numpy.random_intel'), it'd be a one-line change for users to import
from a non-numpy namespace, like 'mkl.random' or whatever. If it were
more integrated with the rest of numpy then the trade-offs would be
more complicated, but in its present form this seems like an easy
call.

The other question is whether it could/should change to *become* more
integrated... that's more tricky. There's been some work towards
supporting swappable backends inside np.random; but the focus has
mostly been on allowing new core generators, though, and this code
seems to want to take over the whole thing (core generator +
distributions), so even once the swappable backends stuff is working
I'm not sure it would be relevant here. The one case I can think of
that does seem promising is that if we get an API for users to say "I
don't care about stream compatibility, just give me un-reproducible
variates as fast as you can", then it might make sense for that to
silently use MKL if available -- this would be pretty analogous to the
use of MKL in np.linalg. But we don't have that API yet, I'm not sure
how the MKL fallback could be maintainably implemented given that it
would require somehow swapping the entire RandomState implementation,
and it's entirely possible that once we figure out solutions to those
then it'd still make sense for the actual MKL wrappers to live in a
third-party library that numpy imports.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Combining covariance and correlation coefficient into one numpy.cov call

2016-10-26 Thread josef . pktd
On Wed, Oct 26, 2016 at 3:11 PM, Mathew S. Madhavacheril <
mathewsyr...@gmail.com> wrote:

>
>
> On Wed, Oct 26, 2016 at 2:56 PM, Nathaniel Smith  wrote:
>
>> On Wed, Oct 26, 2016 at 11:13 AM, Stephan Hoyer  wrote:
>> > On Wed, Oct 26, 2016 at 11:03 AM, Mathew S. Madhavacheril
>> >  wrote:
>> >>
>> >> On Wed, Oct 26, 2016 at 1:46 PM, Stephan Hoyer 
>> wrote:
>> >>>
>> >>> I wonder if the goals of this addition could be achieved by simply
>> adding
>> >>> an optional `cov` argument
>> >>>
>> >>> to np.corr, which would provide a pre-computed covariance.
>> >>
>> >>
>> >> That's a fair suggestion which I'm happy to switch to. This eliminates
>> the
>> >> need for two new functions.
>> >> I'll add an optional `cov = False` argument to numpy.corrcoef that
>> returns
>> >> a tuple (corr, cov) instead.
>> >>
>> >>>
>> >>>
>> >>> Either way, `covcorr` feels like a helper function that could exist in
>> >>> user code rather than numpy proper.
>> >>
>> >>
>> >> The user would have to re-implement the part that converts the
>> covariance
>> >> matrix to a correlation
>> >> coefficient. I made this PR to avoid that code duplication.
>> >
>> >
>> > With the API I was envisioning (or even your proposed API, for that
>> matter),
>> > this function would only be a few lines, e.g.,
>> >
>> > def covcorr(x):
>> > cov = np.cov(x)
>> > corr = np.corrcoef(x, cov=cov)
>>
>> IIUC, if you have a covariance matrix then you can compute the
>> correlation matrix directly, without looking at 'x', so corrcoef(x,
>> cov=cov) is a bit odd-looking. I think probably the API that makes the
>> most sense is just to expose something like the covtocorr function
>> (maybe it could have a less telegraphic name?)? And then, yeah, users
>> can use that to build their own covcorr or whatever if they want it.
>>
>
> Right, agreed, this is why I said `x` becomes redundant when `cov` is
> specified
> when calling `numpy.corrcoef`.  So we have two alternatives:
>
> 1) Have `np.corrcoef` accept a boolean optional argument `covmat = False`
> that lets
> one obtain a tuple containing the covariance and the correlation matrices
> in the same call
> 2) Modify my original PR so that `np.covtocorr` remains (with possibly a
> better
> name) but remove `np.covcorr` since this is easy for the user to add.
>
> My preference is option 2.
>

cov2corr is a useful function
http://www.statsmodels.org/dev/generated/statsmodels.stats.moment_helpers.cov2corr.html
I also wrote the inverse function corr2cov, but AFAIR use it only in some
test cases.


I don't think adding any of the options to corrcoef or covcor is useful
since there is no computational advantage to it.
What I'm missing are functions that return the intermediate results, e.g.
var and mean or cov and mean.

(For statsmodels I decided to return mean and cov or mean and var in the
related functions. Some R packages return the mean as an option.)

Josef



>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Combining covariance and correlation coefficient into one numpy.cov call

2016-10-26 Thread Mathew S. Madhavacheril
On Wed, Oct 26, 2016 at 2:56 PM, Nathaniel Smith  wrote:

> On Wed, Oct 26, 2016 at 11:13 AM, Stephan Hoyer  wrote:
> > On Wed, Oct 26, 2016 at 11:03 AM, Mathew S. Madhavacheril
> >  wrote:
> >>
> >> On Wed, Oct 26, 2016 at 1:46 PM, Stephan Hoyer 
> wrote:
> >>>
> >>> I wonder if the goals of this addition could be achieved by simply
> adding
> >>> an optional `cov` argument
> >>>
> >>> to np.corr, which would provide a pre-computed covariance.
> >>
> >>
> >> That's a fair suggestion which I'm happy to switch to. This eliminates
> the
> >> need for two new functions.
> >> I'll add an optional `cov = False` argument to numpy.corrcoef that
> returns
> >> a tuple (corr, cov) instead.
> >>
> >>>
> >>>
> >>> Either way, `covcorr` feels like a helper function that could exist in
> >>> user code rather than numpy proper.
> >>
> >>
> >> The user would have to re-implement the part that converts the
> covariance
> >> matrix to a correlation
> >> coefficient. I made this PR to avoid that code duplication.
> >
> >
> > With the API I was envisioning (or even your proposed API, for that
> matter),
> > this function would only be a few lines, e.g.,
> >
> > def covcorr(x):
> > cov = np.cov(x)
> > corr = np.corrcoef(x, cov=cov)
>
> IIUC, if you have a covariance matrix then you can compute the
> correlation matrix directly, without looking at 'x', so corrcoef(x,
> cov=cov) is a bit odd-looking. I think probably the API that makes the
> most sense is just to expose something like the covtocorr function
> (maybe it could have a less telegraphic name?)? And then, yeah, users
> can use that to build their own covcorr or whatever if they want it.
>

Right, agreed, this is why I said `x` becomes redundant when `cov` is
specified
when calling `numpy.corrcoef`.  So we have two alternatives:

1) Have `np.corrcoef` accept a boolean optional argument `covmat = False`
that lets
one obtain a tuple containing the covariance and the correlation matrices
in the same call
2) Modify my original PR so that `np.covtocorr` remains (with possibly a
better
name) but remove `np.covcorr` since this is easy for the user to add.

My preference is option 2.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Combining covariance and correlation coefficient into one numpy.cov call

2016-10-26 Thread Nathaniel Smith
On Wed, Oct 26, 2016 at 11:13 AM, Stephan Hoyer  wrote:
> On Wed, Oct 26, 2016 at 11:03 AM, Mathew S. Madhavacheril
>  wrote:
>>
>> On Wed, Oct 26, 2016 at 1:46 PM, Stephan Hoyer  wrote:
>>>
>>> I wonder if the goals of this addition could be achieved by simply adding
>>> an optional `cov` argument
>>>
>>> to np.corr, which would provide a pre-computed covariance.
>>
>>
>> That's a fair suggestion which I'm happy to switch to. This eliminates the
>> need for two new functions.
>> I'll add an optional `cov = False` argument to numpy.corrcoef that returns
>> a tuple (corr, cov) instead.
>>
>>>
>>>
>>> Either way, `covcorr` feels like a helper function that could exist in
>>> user code rather than numpy proper.
>>
>>
>> The user would have to re-implement the part that converts the covariance
>> matrix to a correlation
>> coefficient. I made this PR to avoid that code duplication.
>
>
> With the API I was envisioning (or even your proposed API, for that matter),
> this function would only be a few lines, e.g.,
>
> def covcorr(x):
> cov = np.cov(x)
> corr = np.corrcoef(x, cov=cov)

IIUC, if you have a covariance matrix then you can compute the
correlation matrix directly, without looking at 'x', so corrcoef(x,
cov=cov) is a bit odd-looking. I think probably the API that makes the
most sense is just to expose something like the covtocorr function
(maybe it could have a less telegraphic name?)? And then, yeah, users
can use that to build their own covcorr or whatever if they want it.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Combining covariance and correlation coefficient into one numpy.cov call

2016-10-26 Thread Mathew S. Madhavacheril
On Wed, Oct 26, 2016 at 2:13 PM, Stephan Hoyer  wrote:

> On Wed, Oct 26, 2016 at 11:03 AM, Mathew S. Madhavacheril <
> mathewsyr...@gmail.com> wrote:
>
>> On Wed, Oct 26, 2016 at 1:46 PM, Stephan Hoyer  wrote:
>>
>>> I wonder if the goals of this addition could be achieved by simply
>>> adding an optional `cov` argument
>>>
>> to np.corr, which would provide a pre-computed covariance.
>>>
>>
>> That's a fair suggestion which I'm happy to switch to. This eliminates
>> the need for two new functions.
>> I'll add an optional `cov = False` argument to numpy.corrcoef that
>> returns a tuple (corr, cov) instead.
>>
>>
>>>
>>> Either way, `covcorr` feels like a helper function that could exist in
>>> user code rather than numpy proper.
>>>
>>
>> The user would have to re-implement the part that converts the covariance
>> matrix to a correlation
>> coefficient. I made this PR to avoid that code duplication.
>>
>
> With the API I was envisioning (or even your proposed API, for that
> matter), this function would only be a few lines, e.g.,
>
> def covcorr(x):
> cov = np.cov(x)
> corr = np.corrcoef(x, cov=cov)
> return (cov, corr)
>
> Generally, functions this short should be provided as recipes (if at all)
> rather than be added to numpy proper, unless the need for them is extremely
> common.
>

Ah, I see what you were suggesting now. I agree that a function like
covcorr need not be provided
by numpy itself, but it would be tremendously useful if a pre-computed
covariance could
be provided to np.corrcoef. I can update this PR to just add `cov = None`
to numpy.corrcoef and
do an `if cov is not None` before calculating the covariance. Note however
that in the case
that `cov` is specified for np.corrcoef, the non-optional `x` argument is
redundant.



>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Combining covariance and correlation coefficient into one numpy.cov call

2016-10-26 Thread Stephan Hoyer
On Wed, Oct 26, 2016 at 11:03 AM, Mathew S. Madhavacheril <
mathewsyr...@gmail.com> wrote:

> On Wed, Oct 26, 2016 at 1:46 PM, Stephan Hoyer  wrote:
>
>> I wonder if the goals of this addition could be achieved by simply adding
>> an optional `cov` argument
>>
> to np.corr, which would provide a pre-computed covariance.
>>
>
> That's a fair suggestion which I'm happy to switch to. This eliminates the
> need for two new functions.
> I'll add an optional `cov = False` argument to numpy.corrcoef that returns
> a tuple (corr, cov) instead.
>
>
>>
>> Either way, `covcorr` feels like a helper function that could exist in
>> user code rather than numpy proper.
>>
>
> The user would have to re-implement the part that converts the covariance
> matrix to a correlation
> coefficient. I made this PR to avoid that code duplication.
>

With the API I was envisioning (or even your proposed API, for that
matter), this function would only be a few lines, e.g.,

def covcorr(x):
cov = np.cov(x)
corr = np.corrcoef(x, cov=cov)
return (cov, corr)

Generally, functions this short should be provided as recipes (if at all)
rather than be added to numpy proper, unless the need for them is extremely
common.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Combining covariance and correlation coefficient into one numpy.cov call

2016-10-26 Thread Mathew S. Madhavacheril
On Wed, Oct 26, 2016 at 1:46 PM, Stephan Hoyer  wrote:

> I wonder if the goals of this addition could be achieved by simply adding
> an optional `cov` argument
>
to np.corr, which would provide a pre-computed covariance.
>

That's a fair suggestion which I'm happy to switch to. This eliminates the
need for two new functions.
I'll add an optional `cov = False` argument to numpy.corrcoef that returns
a tuple (corr, cov) instead.


>
> Either way, `covcorr` feels like a helper function that could exist in
> user code rather than numpy proper.
>

The user would have to re-implement the part that converts the covariance
matrix to a correlation
coefficient. I made this PR to avoid that code duplication.

Mathew


>
> On Wed, Oct 26, 2016 at 10:27 AM, Mathew S. Madhavacheril <
> mathewsyr...@gmail.com> wrote:
>
>> Hi all,
>>
>> I posted a pull request:
>> https://github.com/numpy/numpy/pull/8211
>>
>> which adds a function `numpy.covcorr` that calculates both
>> the covariance matrix and correlation coefficient with a single
>> call to `numpy.cov` (which is often an expensive call for large
>> data-sets). A function `numpy.covtocorr` has also been added
>> that converts a covariance matrix to a correlation coefficent,
>> and `numpy.corrcoef` has been modified to call this. The
>> motivation here is that one often needs the covariance for
>> subsequent analysis and the correlation coefficient for
>> visualization, so instead of forcing the user to write their own
>> code to convert one to the other, we want to allow both to
>> be obtained from `numpy` as efficiently as possible.
>>
>> Best,
>> Mathew
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Combining covariance and correlation coefficient into one numpy.cov call

2016-10-26 Thread Stephan Hoyer
I wonder if the goals of this addition could be achieved by simply adding
an optional `cov` argument to np.corr, which would provide a pre-computed
covariance.

Either way, `covcorr` feels like a helper function that could exist in user
code rather than numpy proper.

On Wed, Oct 26, 2016 at 10:27 AM, Mathew S. Madhavacheril <
mathewsyr...@gmail.com> wrote:

> Hi all,
>
> I posted a pull request:
> https://github.com/numpy/numpy/pull/8211
>
> which adds a function `numpy.covcorr` that calculates both
> the covariance matrix and correlation coefficient with a single
> call to `numpy.cov` (which is often an expensive call for large
> data-sets). A function `numpy.covtocorr` has also been added
> that converts a covariance matrix to a correlation coefficent,
> and `numpy.corrcoef` has been modified to call this. The
> motivation here is that one often needs the covariance for
> subsequent analysis and the correlation coefficient for
> visualization, so instead of forcing the user to write their own
> code to convert one to the other, we want to allow both to
> be obtained from `numpy` as efficiently as possible.
>
> Best,
> Mathew
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Combining covariance and correlation coefficient into one numpy.cov call

2016-10-26 Thread Mathew S. Madhavacheril
Hi all,

I posted a pull request:
https://github.com/numpy/numpy/pull/8211

which adds a function `numpy.covcorr` that calculates both
the covariance matrix and correlation coefficient with a single
call to `numpy.cov` (which is often an expensive call for large
data-sets). A function `numpy.covtocorr` has also been added
that converts a covariance matrix to a correlation coefficent,
and `numpy.corrcoef` has been modified to call this. The
motivation here is that one often needs the covariance for
subsequent analysis and the correlation coefficient for
visualization, so instead of forcing the user to write their own
code to convert one to the other, we want to allow both to
be obtained from `numpy` as efficiently as possible.

Best,
Mathew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Intel random number package

2016-10-26 Thread Robert Kern
On Wed, Oct 26, 2016 at 9:36 AM, Sebastian Berg 
wrote:
>
> On Mi, 2016-10-26 at 09:29 -0700, Robert Kern wrote:
> > On Wed, Oct 26, 2016 at 9:10 AM, Julian Taylor  > mail.com> wrote:
> > >
> > > On 10/26/2016 06:00 PM, Julian Taylor wrote:
> >
> > >> I prefer for the full functionality of numpy to stay available
> > with a
> > >> stack of community owned software, even if it may be less powerful
> > that
> > >> way.
> > >
> > > But then if this is really just the same random numbers numpy
> > already provides just faster, it is probably acceptable in principle.
> > I haven't actually looked at the PR yet.
> >
> > I think the stream is different in some places, at least. And it's
> > not a silent backend drop-in like np.linalg being built against an
> > optimized BLAS, just a separate module that is inoperative without
> > MKL.
>
> I might be swayed, but my gut feeling would be that a backend change
> (if the default stream changes, an explicit one, though maybe one could
> make a "fastest") would be the only reasonable way to provide such a
> thing in numpy itself.

That mostly argues for distributing it as a separate package, not part of
numpy at all.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Intel random number package

2016-10-26 Thread Sebastian Berg
On Mi, 2016-10-26 at 09:29 -0700, Robert Kern wrote:
> On Wed, Oct 26, 2016 at 9:10 AM, Julian Taylor  mail.com> wrote:
> >
> > On 10/26/2016 06:00 PM, Julian Taylor wrote:
> 
> >> I prefer for the full functionality of numpy to stay available
> with a
> >> stack of community owned software, even if it may be less powerful
> that
> >> way.
> >
> > But then if this is really just the same random numbers numpy
> already provides just faster, it is probably acceptable in principle.
> I haven't actually looked at the PR yet.
> 
> I think the stream is different in some places, at least. And it's
> not a silent backend drop-in like np.linalg being built against an
> optimized BLAS, just a separate module that is inoperative without
> MKL.
> 

I might be swayed, but my gut feeling would be that a backend change
(if the default stream changes, an explicit one, though maybe one could
make a "fastest") would be the only reasonable way to provide such a
thing in numpy itself.

- Sebastian



> --
> Robert Kern
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Intel random number package

2016-10-26 Thread Robert Kern
On Wed, Oct 26, 2016 at 9:10 AM, Julian Taylor <
jtaylor.deb...@googlemail.com> wrote:
>
> On 10/26/2016 06:00 PM, Julian Taylor wrote:

>> I prefer for the full functionality of numpy to stay available with a
>> stack of community owned software, even if it may be less powerful that
>> way.
>
> But then if this is really just the same random numbers numpy already
provides just faster, it is probably acceptable in principle. I haven't
actually looked at the PR yet.

I think the stream is different in some places, at least. And it's not a
silent backend drop-in like np.linalg being built against an optimized
BLAS, just a separate module that is inoperative without MKL.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Intel random number package

2016-10-26 Thread Julian Taylor

On 10/26/2016 06:00 PM, Julian Taylor wrote:

On 10/26/2016 10:59 AM, Ralf Gommers wrote:



On Wed, Oct 26, 2016 at 8:33 PM, Julian Taylor
mailto:jtaylor.deb...@googlemail.com>>
wrote:

On 26.10.2016 06:34, Charles R Harris wrote:
> Hi All,
>
> There is a proposed random number package PR now up on github:
> https://github.com/numpy/numpy/pull/8209
. It is from
> oleksandr-pavlyk > and implements
> the number random number package using MKL for increased speed.
I think
> we are definitely interested in the improved speed, but I'm not
sure
> numpy is the best place to put the package. I'd welcome any
comments on
> the PR itself, as well as any thoughts on the best way organize
or use
> of this work. Maybe scikit-random


Note that this thread is a continuation of
https://mail.scipy.org/pipermail/numpy-discussion/2016-July/075822.html



I'm not a fan of putting code depending on a proprietary library
into numpy.
This should be a standalone package which may provide the same
interface
as numpy.


I don't really see a problem with that in principle. Numpy can use Intel
MKL (and Accelerate) as well if it's available. It needs some thought
put into the API though - a ``numpy.random_intel`` module is certainly
not what we want.



For me there is a difference between being able to optionally use a
proprietary library as an alternative to free software libraries if the
user wishes to do so and offering functionality that only works with
non-free software.
We are providing a form of advertisement for them by allowing it (hey if
you buy this black box that you cannot modify or use freely you get this
neat numpy feature!).

I prefer for the full functionality of numpy to stay available with a
stack of community owned software, even if it may be less powerful that
way.


But then if this is really just the same random numbers numpy already 
provides just faster, it is probably acceptable in principle. I haven't 
actually looked at the PR yet.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Intel random number package

2016-10-26 Thread Julian Taylor

On 10/26/2016 10:59 AM, Ralf Gommers wrote:



On Wed, Oct 26, 2016 at 8:33 PM, Julian Taylor
mailto:jtaylor.deb...@googlemail.com>>
wrote:

On 26.10.2016 06:34, Charles R Harris wrote:
> Hi All,
>
> There is a proposed random number package PR now up on github:
> https://github.com/numpy/numpy/pull/8209
. It is from
> oleksandr-pavlyk > and implements
> the number random number package using MKL for increased speed. I think
> we are definitely interested in the improved speed, but I'm not sure
> numpy is the best place to put the package. I'd welcome any comments on
> the PR itself, as well as any thoughts on the best way organize or use
> of this work. Maybe scikit-random


Note that this thread is a continuation of
https://mail.scipy.org/pipermail/numpy-discussion/2016-July/075822.html



I'm not a fan of putting code depending on a proprietary library
into numpy.
This should be a standalone package which may provide the same interface
as numpy.


I don't really see a problem with that in principle. Numpy can use Intel
MKL (and Accelerate) as well if it's available. It needs some thought
put into the API though - a ``numpy.random_intel`` module is certainly
not what we want.



For me there is a difference between being able to optionally use a 
proprietary library as an alternative to free software libraries if the 
user wishes to do so and offering functionality that only works with 
non-free software.
We are providing a form of advertisement for them by allowing it (hey if 
you buy this black box that you cannot modify or use freely you get this 
neat numpy feature!).


I prefer for the full functionality of numpy to stay available with a 
stack of community owned software, even if it may be less powerful that way.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] padding options for diff

2016-10-26 Thread Matthew Harrigan
The inverse of cumsum is actually a little more unweildy since you can't
drop a dimension with take.  This returns the original array (numerical
caveats aside):

np.cumsum(np.diff(x, to_begin=x.take([0], axis=axis), axis=axis), axis=axis)

That's certainly not going to win any beauty contests.  The 1d case is
clean though:

np.cumsum(np.diff(x, to_begin=x[0]))

I'm not sure if this means the API should change, and if so how.  Higher
dimensional arrays seem to just have extra complexity.

On Tue, Oct 25, 2016 at 1:26 PM, Peter Creasey <
p.e.creasey...@googlemail.com> wrote:

> > Date: Mon, 24 Oct 2016 08:44:46 -0400
> > From: Matthew Harrigan 
> >
> > I posted a pull request  which
> > adds optional padding kwargs "to_begin" and "to_end" to diff.  Those
> > options are based on what's available in ediff1d.  It closes this issue
> > 
>
> I like the proposal, though I suspect that making it general has
> obscured that the most common use-case for padding is to make the
> inverse of np.cumsum (at least that’s what I frequently need), and now
> in the multidimensional case you have the somewhat unwieldy:
>
> >>> np.diff(a, axis=axis, to_begin=np.take(a, 0, axis=axis))
>
> rather than
>
> >>> np.diff(a, axis=axis, keep_left=True)
>
> which of course could just be an option upon what you already have.
>
> Best,
> Peter
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Intel random number package

2016-10-26 Thread Ralf Gommers
On Wed, Oct 26, 2016 at 8:33 PM, Julian Taylor <
jtaylor.deb...@googlemail.com> wrote:

> On 26.10.2016 06:34, Charles R Harris wrote:
> > Hi All,
> >
> > There is a proposed random number package PR now up on github:
> > https://github.com/numpy/numpy/pull/8209. It is from
> > oleksandr-pavlyk  and implements
> > the number random number package using MKL for increased speed. I think
> > we are definitely interested in the improved speed, but I'm not sure
> > numpy is the best place to put the package. I'd welcome any comments on
> > the PR itself, as well as any thoughts on the best way organize or use
> > of this work. Maybe scikit-random
>

Note that this thread is a continuation of
https://mail.scipy.org/pipermail/numpy-discussion/2016-July/075822.html


>
> I'm not a fan of putting code depending on a proprietary library into
> numpy.
> This should be a standalone package which may provide the same interface
> as numpy.
>

I don't really see a problem with that in principle. Numpy can use Intel
MKL (and Accelerate) as well if it's available. It needs some thought put
into the API though - a ``numpy.random_intel`` module is certainly not what
we want.

Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Intel random number package

2016-10-26 Thread Julian Taylor
On 26.10.2016 06:34, Charles R Harris wrote:
> Hi All,
> 
> There is a proposed random number package PR now up on github:
> https://github.com/numpy/numpy/pull/8209. It is from
> oleksandr-pavlyk  and implements
> the number random number package using MKL for increased speed. I think
> we are definitely interested in the improved speed, but I'm not sure
> numpy is the best place to put the package. I'd welcome any comments on
> the PR itself, as well as any thoughts on the best way organize or use
> of this work. Maybe scikit-random
> 

I'm not a fan of putting code depending on a proprietary library into numpy.
This should be a standalone package which may provide the same interface
as numpy.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion