Re: [Numpy-discussion] Integers to integer powers

2016-05-20 Thread josef.pktd
On Fri, May 20, 2016 at 6:54 PM, Nathaniel Smith  wrote:

> On May 20, 2016 12:44 PM,  wrote:
> [...]
> >
> > can numpy cast to float by default for power or **?
>
> Maybe? The question is whether there are any valid use cases for getting
> ints back:
>
> >>> np.array([1, 2, 3]) ** 2
> array([1, 4, 9])
>
> It's not 100% obvious to me but intuitively this seems like an operation
> that we probably want to support? Especially since there's a reasonable
> range of int64 values that can't be represented exactly as floats.
>

It would still be supported by the np.power function with dtype keyword for
users that want to strictly control the dtype.

The question is mainly for the operator ** which doesn't have options and
there I think it's more appropriate for users that want correct numbers but
not necessarily have or want to watch out for the dtype.


Related: Python 3.4 returns complex for (-1)**0.5 while numpy returns nan.
That's a similar case of upcasting if the result doesn't fit.

Longterm I think it would still be nice if numpy could do value dependent
type promotion. e.g. when a value is encountered that doesn't fit, then
upcast the result at the cost of possibly having to copy the existing
results.
In the current setting the user has to decide in advance what the necessary
maybe required dtype is supposed to be. (Of course I have no idea about the
technical problems or computational cost of this.)

(Julia dispatch seems to make it easier to construct new types. e.g. we
could have a flexible dtype that is free to upcast to whatever is required
for the calculation. just guessing)


practicality:
going from int to float is a common usecase and we would expect getting the
correct numbers  2**(-2) -> promote

complex is in most fields an unusual outcome for integer or float
calculations
(e.g. box-cox transformation for x>0 )  having suddenly complex numbers is
weird, getting nans is standard float response  -> don't promote


I'm still largely in the python 2.x habit of adding a decimal points to
numbers for float or a redundant `* 1.` in my code to avoid integer
division or other weirdness. So, I never realized that ** in numpy doesn't
always promote to float which I kind of thought it did.
Maybe it's not yet time to drop all the decimal points or `* 1.` from the
code?


Josef



> -n
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Integers to integer powers

2016-05-20 Thread josef.pktd
On Fri, May 20, 2016 at 4:27 PM, Warren Weckesser <
warren.weckes...@gmail.com> wrote:

>
>
> On Fri, May 20, 2016 at 4:22 PM, Alan Isaac  wrote:
>
>> On 5/19/2016 11:30 PM, Nathaniel Smith wrote:
>>
>>> the last bad
>>> option IMHO would be that we make int ** (negative int) an error in
>>> all cases, and the error message can suggest that instead of writing
>>>
>>> np.array(2) ** -2
>>>
>>> they should instead write
>>>
>>> np.array(2) ** -2.0
>>>
>>> (And similarly for np.int64(2) ** -2 versus np.int64(2) ** -2.0.)
>>>
>>
>>
>>
>> Fwiw, Haskell has three exponentiation operators
>> because of such ambiguities.  I don't use C, but
>> I think the contrasting decision there was to
>> always return a double, which has a clear attraction
>> since for any fixed-width integral type, most of the
>> possible input pairs overflow the type.
>>
>> My core inclination would be to use (what I understand to be)
>> the C convention that integer exponentiation always produces
>> a double, but to support dtype-specific exponentiation with
>> a function.
>
>
>
> C doesn't have an exponentiation operator.  The C math library has pow,
> powf and powl, which (like any C functions) are explicitly typed.
>
> Warren
>
>
>   But this is just a user's perspective.
>>
>> Cheers,
>> Alan Isaac
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>

another question

uint are positive so the division problem doesn't show up. So that could
still handle a usecase for ints.

I'm getting stronger in favor of float because raising an exception (or
worse, nonsense) in half of the parameter spaces sounds ...  (maybe kind of
silly)

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Integers to integer powers

2016-05-20 Thread josef.pktd
On Fri, May 20, 2016 at 3:23 PM, Charles R Harris  wrote:

>
>
> On Fri, May 20, 2016 at 1:15 PM, Nathaniel Smith  wrote:
>
>> On Fri, May 20, 2016 at 11:35 AM, Charles R Harris
>>  wrote:
>> >
>> >
>> > On Thu, May 19, 2016 at 9:30 PM, Nathaniel Smith  wrote:
>> >>
>> >> So I guess what makes this tricky is that:
>> >>
>> >> - We want the behavior to the same for multiple-element arrays,
>> >> single-element arrays, zero-dimensional arrays, and scalars -- the
>> >> shape of the data shouldn't affect the semantics of **
>> >>
>> >> - We also want the numpy scalar behavior to match the Python scalar
>> >> behavior
>> >>
>> >> - For Python scalars, int ** (positive int) returns an int, but int **
>> >> (negative int) returns a float.
>> >>
>> >> - For arrays, int ** (positive int) and int ** (negative int) _have_
>> >> to return the same type, because in general output types are always a
>> >> function of the input types and *can't* look at the specific values
>> >> involved, and in specific because if you do array([2, 3]) ** array([2,
>> >> -2]) you can't return an array where the first element is int and the
>> >> second is float.
>> >>
>> >> Given these immutable and contradictory constraints, the last bad
>> >> option IMHO would be that we make int ** (negative int) an error in
>> >> all cases, and the error message can suggest that instead of writing
>> >>
>> >> np.array(2) ** -2
>> >>
>> >> they should instead write
>> >>
>> >> np.array(2) ** -2.0
>> >>
>> >> (And similarly for np.int64(2) ** -2 versus np.int64(2) ** -2.0.)
>> >>
>> >> Definitely annoying, but all the other options seem even more
>> >> inconsistent and confusing, and likely to encourage the writing of
>> >> subtly buggy code...
>> >>
>> >> (I especially have in mind numpy's habit of silently switching between
>> >> scalars and zero-dimensional arrays -- so it's easy to write code that
>> >> you think handles arbitrary array dimensions, and it even passes all
>> >> your tests, but then it fails when someone passes in a different shape
>> >> data and triggers some scalar/array inconsistency. E.g. if we make **
>> >> -2 work for scalars but not arrays, then this code:
>> >>
>> >> def f(arr):
>> >> return np.sum(arr, axis=0) ** -2
>> >>
>> >> works as expected for 1-d input, tests pass, everyone's happy... but
>> >> as soon as you try to pass in higher dimensional integer input it will
>> >> fail.)
>> >>
>> >
>> > Hmm, the Alexandrian solution. The main difficulty with this solution
>> that
>> > this will likely to break working code. We could try it, or take the
>> safe
>> > route of raising a (Visible)DeprecationWarning.
>>
>> Right, sorry, I was talking about the end goal -- there's a separate
>> question of how we get there. Pretty much any solution is going to
>> require some sort of deprecation cycle though I guess, and at least
>> the deprecate -> error transition is a lot easier than the working ->
>> working different transition.
>>
>> > The other option is to
>> > simply treat the negative power case uniformly as floor division and
>> raise
>> > an error on zero division, but the difference from Python power would be
>> > highly confusing. I think I would vote for the second option with a
>> > DeprecationWarning.
>>
>> So "floor division" here would mean that k ** -n == 0 for all k and n
>> except for k == 1, right? In addition to the consistency issue, that
>> doesn't seem like a behavior that's very useful to anyone...
>>
>
> And -1 as well. The virtue is consistancy while deprecating. Or we could
> just back out the current changes in master and throw in deprecation
> warnings. That has the virtue of simplicity and not introducing possible
> code breaks.
>


can numpy cast to float by default for power or **?

At least then we always get correct numbers.

Are there dominant usecases that require default return dtype int?
AFAICS, it's always possible to choose the return dtype in np.power.

Josef


>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Integers to integer powers

2016-05-19 Thread josef.pktd
On Thu, May 19, 2016 at 10:16 PM,  wrote:

>
>
> On Thu, May 19, 2016 at 5:37 PM, Charles R Harris <
> charlesr.har...@gmail.com> wrote:
>
>> Hi All,
>>
>> There are currently several pull requests apropos integer arrays/scalars
>> to integer powers and, because the area is messy and involves tradeoffs,
>> I'd like to see some discussion here on the list before proceeding.
>>
>> *Scalars in 1.10*
>>
>> In [1]: 1 ** -1
>> Out[1]: 1.0
>>
>> In [2]: int16(1) ** -1
>> Out[2]: 1
>>
>> In [3]: int32(1) ** -1
>> Out[3]: 1
>>
>> In [4]: int64(1) ** -1
>> Out[4]: 1.0
>>
>> In [5]: 2 ** -1
>> Out[5]: 0.5
>>
>> In [6]: int16(2) ** -1
>> Out[6]: 0
>>
>> In [7]: int32(2) ** -1
>> Out[7]: 0
>>
>> In [8]: int64(2) ** -1
>> Out[8]: 0.5
>>
>> In [9]: 0 ** -1
>>
>> ---
>> ZeroDivisionError Traceback (most recent call
>> last)
>>  in ()
>> > 1 0 ** -1
>>
>> ZeroDivisionError: 0.0 cannot be raised to a negative power
>>
>> In [10]: int16(0) ** -1
>> /home/charris/.local/bin/ipython:1: RuntimeWarning: divide by zero
>> encountered in power
>>   #!/usr/bin/python
>> /home/charris/.local/bin/ipython:1: RuntimeWarning: invalid value
>> encountered in power
>>   #!/usr/bin/python
>> Out[10]: -9223372036854775808
>>
>> In [11]: int32(0) ** -1
>> Out[11]: -9223372036854775808
>>
>> In [12]: int64(0) ** -1
>> /home/charris/.local/bin/ipython:1: RuntimeWarning: divide by zero
>> encountered in long_scalars
>>   #!/usr/bin/python
>> Out[12]: inf
>>
>> Proposed
>>
>>- for non-zero numbers the return type should be float.
>>- for zero numbers a zero division error should be raised.
>>
>>
>>
>>
>> *Scalar Arrays in 1.10*
>> In [1]: array(1, dtype=int16) ** -1
>> Out[1]: 1
>>
>> In [2]: array(1, dtype=int32) ** -1
>> Out[2]: 1
>>
>> In [3]: array(1, dtype=int64) ** -1
>> Out[3]: 1
>>
>> In [4]: array(2, dtype=int16) ** -1
>> Out[4]: 0
>>
>> In [5]: array(2, dtype=int32) ** -1
>> Out[5]: 0
>>
>> In [6]: array(2, dtype=int64) ** -1
>> Out[6]: 0
>>
>> In [7]: array(0, dtype=int16) ** -1
>> /home/charris/.local/bin/ipython:1: RuntimeWarning: divide by zero
>> encountered in power
>>   #!/usr/bin/python
>> /home/charris/.local/bin/ipython:1: RuntimeWarning: invalid value
>> encountered in power
>>   #!/usr/bin/python
>> Out[7]: -9223372036854775808
>>
>> In [8]: array(0, dtype=int32) ** -1
>> Out[8]: -9223372036854775808
>>
>> In [9]: array(0, dtype=int64) ** -1
>> Out[9]: -9223372036854775808
>>
>> In [10]: type(array(1, dtype=int64) ** -1)
>> Out[10]: numpy.int64
>>
>> In [11]: type(array(1, dtype=int32) ** -1)
>> Out[11]: numpy.int64
>>
>> In [12]: type(array(1, dtype=int16) ** -1)
>> Out[12]: numpy.int64
>>
>> Note that the return type is always int64 in all these cases. However,
>> type is preserved in non-scalar arrays, although the value of int16 is not
>> compatible with int32 and int64 for zero division.
>>
>> In [22]: array([0]*2, dtype=int16) ** -1
>> Out[22]: array([0, 0], dtype=int16)
>>
>> In [23]: array([0]*2, dtype=int32) ** -1
>> Out[23]: array([-2147483648, -2147483648], dtype=int32)
>>
>> In [24]: array([0]*2, dtype=int64) ** -1
>> Out[24]: array([-9223372036854775808, -9223372036854775808])
>>
>> Proposed:
>>
>>- Raise an ZeroDivisionError for zero division, that is, in the ufunc.
>>- Scalar arrays to return scalar arrays
>>
>>
>> Thoughts?
>>
> Why does negative exponent not upcast to float like division?
> sounds like python 2 to me
>

from __future__ import negative_power

Josef


>
> Josef
>
>
>
>> Chuck
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Integers to integer powers

2016-05-19 Thread josef.pktd
On Thu, May 19, 2016 at 5:37 PM, Charles R Harris  wrote:

> Hi All,
>
> There are currently several pull requests apropos integer arrays/scalars
> to integer powers and, because the area is messy and involves tradeoffs,
> I'd like to see some discussion here on the list before proceeding.
>
> *Scalars in 1.10*
>
> In [1]: 1 ** -1
> Out[1]: 1.0
>
> In [2]: int16(1) ** -1
> Out[2]: 1
>
> In [3]: int32(1) ** -1
> Out[3]: 1
>
> In [4]: int64(1) ** -1
> Out[4]: 1.0
>
> In [5]: 2 ** -1
> Out[5]: 0.5
>
> In [6]: int16(2) ** -1
> Out[6]: 0
>
> In [7]: int32(2) ** -1
> Out[7]: 0
>
> In [8]: int64(2) ** -1
> Out[8]: 0.5
>
> In [9]: 0 ** -1
> ---
> ZeroDivisionError Traceback (most recent call last)
>  in ()
> > 1 0 ** -1
>
> ZeroDivisionError: 0.0 cannot be raised to a negative power
>
> In [10]: int16(0) ** -1
> /home/charris/.local/bin/ipython:1: RuntimeWarning: divide by zero
> encountered in power
>   #!/usr/bin/python
> /home/charris/.local/bin/ipython:1: RuntimeWarning: invalid value
> encountered in power
>   #!/usr/bin/python
> Out[10]: -9223372036854775808
>
> In [11]: int32(0) ** -1
> Out[11]: -9223372036854775808
>
> In [12]: int64(0) ** -1
> /home/charris/.local/bin/ipython:1: RuntimeWarning: divide by zero
> encountered in long_scalars
>   #!/usr/bin/python
> Out[12]: inf
>
> Proposed
>
>- for non-zero numbers the return type should be float.
>- for zero numbers a zero division error should be raised.
>
>
>
>
> *Scalar Arrays in 1.10*
> In [1]: array(1, dtype=int16) ** -1
> Out[1]: 1
>
> In [2]: array(1, dtype=int32) ** -1
> Out[2]: 1
>
> In [3]: array(1, dtype=int64) ** -1
> Out[3]: 1
>
> In [4]: array(2, dtype=int16) ** -1
> Out[4]: 0
>
> In [5]: array(2, dtype=int32) ** -1
> Out[5]: 0
>
> In [6]: array(2, dtype=int64) ** -1
> Out[6]: 0
>
> In [7]: array(0, dtype=int16) ** -1
> /home/charris/.local/bin/ipython:1: RuntimeWarning: divide by zero
> encountered in power
>   #!/usr/bin/python
> /home/charris/.local/bin/ipython:1: RuntimeWarning: invalid value
> encountered in power
>   #!/usr/bin/python
> Out[7]: -9223372036854775808
>
> In [8]: array(0, dtype=int32) ** -1
> Out[8]: -9223372036854775808
>
> In [9]: array(0, dtype=int64) ** -1
> Out[9]: -9223372036854775808
>
> In [10]: type(array(1, dtype=int64) ** -1)
> Out[10]: numpy.int64
>
> In [11]: type(array(1, dtype=int32) ** -1)
> Out[11]: numpy.int64
>
> In [12]: type(array(1, dtype=int16) ** -1)
> Out[12]: numpy.int64
>
> Note that the return type is always int64 in all these cases. However,
> type is preserved in non-scalar arrays, although the value of int16 is not
> compatible with int32 and int64 for zero division.
>
> In [22]: array([0]*2, dtype=int16) ** -1
> Out[22]: array([0, 0], dtype=int16)
>
> In [23]: array([0]*2, dtype=int32) ** -1
> Out[23]: array([-2147483648, -2147483648], dtype=int32)
>
> In [24]: array([0]*2, dtype=int64) ** -1
> Out[24]: array([-9223372036854775808, -9223372036854775808])
>
> Proposed:
>
>- Raise an ZeroDivisionError for zero division, that is, in the ufunc.
>- Scalar arrays to return scalar arrays
>
>
> Thoughts?
>
Why does negative exponent not upcast to float like division?
sounds like python 2 to me

Josef



> Chuck
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: numpy.random.random_seed

2016-05-18 Thread josef.pktd
On Wed, May 18, 2016 at 12:01 PM, Robert Kern  wrote:

> On Wed, May 18, 2016 at 4:50 PM, Chris Barker 
> wrote:
> >>
> >> > ...anyway, the real reason I'm a bit grumpy is because there are solid
> >> > engineering reasons why users *want* this API,
> >
> > Honestly, I am lost in the math -- but like any good engineer, I want to
> accomplish something anyway :-) I trust you guys to get this right -- or at
> least document what's "wrong" with it.
> >
> > But, if I'm reading the use case that started all this correctly, it
> closely matches my use-case. That is, I have a complex model with multiple
> independent "random" processes. And we want to be able to re-produce
> EXACTLY simulations -- our users get confused when the results are
> "different" even if in a statistically insignificant way.
> >
> > At the moment we are using one RNG, with one seed for everything. So we
> get reproducible results, but if one thing is changed, then the entire
> simulation is different -- which is OK, but it would be nicer to have each
> process using its own RNG stream with it's own seed. However, it matters
> not one whit if those seeds are independent -- the processes are different,
> you'd never notice if they were using the same PRN stream -- because they
> are used differently. So a "fairly low probability of a clash" would be
> totally fine.
>
> Well, the main question is: do you need to be able to spawn dependent
> streams at arbitrary points to an arbitrary depth without coordination
> between processes? The necessity for multiple independent streams per se is
> not contentious.
>


I'm similar to Chris, and didn't try to figure out the details of what you
are talking about.

However, if there are functions getting into numpy that help in using a
best practice even if it's not bullet proof, then it's still better than
home made approaches.
If it get's in soon, then we can use it in a few years (given dependency
lag). At that point there should be more distributed, nested simulation
based algorithms where we don't know in advance how far we have to go to
get reliable numbers or convergence.

(But I don't see anything like that right now.)

Josef



>
> --
> Robert Kern
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: numpy.random.random_seed

2016-05-17 Thread josef.pktd
On Tue, May 17, 2016 at 4:49 AM, Robert Kern  wrote:

> On Tue, May 17, 2016 at 9:09 AM, Stephan Hoyer  wrote:
> >
> > On Tue, May 17, 2016 at 12:18 AM, Robert Kern 
> wrote:
> >>
> >> On Tue, May 17, 2016 at 4:54 AM, Stephan Hoyer 
> wrote:
> >> > 1. When writing a library of stochastic functions that take a seed as
> an input argument, and some of these functions call multiple other such
> stochastic functions. Dask is one such example [1].
> >>
> >> Can you clarify the use case here? I don't really know what you are
> doing here, but I'm pretty sure this is not the right approach.
> >
> > Here's a contrived example. Suppose I've written a simulator for cars
> that consists of a number of loosely connected components (e.g., an engine,
> brakes, etc.). The behavior of each component of our simulator is
> stochastic, but we want everything to be fully reproducible, so we need to
> use seeds or RandomState objects.
> >
> > We might write our simulate_car function like the following:
> >
> > def simulate_car(engine_config, brakes_config, seed=None):
> > rs = np.random.RandomState(seed)
> > engine = simulate_engine(engine_config, seed=rs.random_seed())
> > brakes = simulate_brakes(brakes_config, seed=rs.random_seed())
> > ...
> >
> > The problem with passing the same RandomState object (either explicitly
> or dropping the seed argument entirely and using the  global state) to both
> simulate_engine and simulate_breaks is that it breaks encapsulation -- if I
> change what I do inside simulate_engine, it also effects the brakes.
>
> That's a little too contrived, IMO. In most such simulations, the
> different components interact with each other in the normal course of the
> simulation; that's why they are both joined together in the same simulation
> instead of being two separate runs. Unless if the components are being run
> across a process or thread boundary (a la dask below) where true
> nondeterminism comes into play, then I don't think you want these
> semi-independent streams. This seems to be the advice du jour from the
> agent-based modeling community.
>


similar usecase where I had to switch to using several RandomStates

In a Monte Carlo experiment with increasing sample size, I want two random
variables, x, y, to have the same the same draws in the common initial
observations.

If I draw x and y sequentially, and then increase the number of
observations for the simulation, then it completely changes the draws for
second variable if they use a common RandomState.

With separate random states, increasing from 1000 to 1200 observations,
leaves the first 1000 draws unchanged.
(This reduces the Monte Carlo noise for example when calculating the power
of a hypothesis test as function of the sample size.)

Josef


>
>
> > The dask use case is actually pretty different -- the intent is to
> create many random numbers in parallel using multiple threads or processes
> (possibly in a distributed fashion). I know that skipping ahead is the
> standard way to get independent number streams for parallel sampling, but
> that isn't exposed in numpy.random, and setting distinct seeds seems like a
> reasonable alternative for scientific computing use cases.
>
> Forget about integer seeds. Those are for human convenience. If you're not
> jotting them down in your lab notebook in pen, you don't want an integer
> seed.
>
> What you want is a function that returns many RandomState objects that are
> hopefully spread around the MT19937 space enough that they are essentially
> independent (in the absence of true jumpahead). The better implementation
> of such a function would look something like this:
>
> def spread_out_prngs(n, root_prng=None):
> if root_prng is None:
> root_prng = np.random
> elif not isinstance(root_prng, np.random.RandomState):
> root_prng = np.random.RandomState(root_prng)
> sprouted_prngs = []
> for i in range(n):
> seed_array = root_prng.randint(1<<32, size=624)  # dtype=np.uint32
> under 1.11
> sprouted_prngs.append(np.random.RandomState(seed_array))
> return spourted_prngs
>
> Internally, this generates seed arrays of about the size of the MT19937
> state so make sure that you can access more of the state space. That will
> at least make the chance of collision tiny. And it can be easily rewritten
> to take advantage of one of the newer PRNGs that have true independent
> streams:
>
>   https://github.com/bashtage/ng-numpy-randomstate
>
> --
> Robert Kern
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Remove a random sample from array

2016-05-16 Thread josef.pktd
On Mon, May 16, 2016 at 12:24 PM, Elliot Hallmark 
wrote:

> Use `random.shuffle(range(len(arr))` to make a list of indices.  Use a
> slices to get your 20/80.  Convert to integer arrays and index your
> original array with them.  Use sorted on the 80% list if you need to
> preserve the order.
>

similar but simpler

You can just random permute/shuffle an array with 20% ones (True) and 80%
zeros (False) and use it as a mask to select from the original array.

Josef



>
>
> -Elliot
>
> On Mon, May 16, 2016 at 11:04 AM, Martin Noblia <
> martin.nob...@openmailbox.org> wrote:
>
>> I think with `np.random.choice`
>>
>>
>> http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.random.choice.html
>>
>>
>> On 05/16/2016 11:08 AM, Florian Lindner wrote:
>>
>> Hello,
>>
>> I have an array of shape (n, 2) from which I want to extract a random sample
>> of 20% of rows. The choosen samples should be removed the original array and
>> moved to a new array of the same shape (n, 2).
>>
>> What is the most clever way to do with numpy?
>>
>> Thanks,
>> Florian
>> ___
>> NumPy-Discussion mailing 
>> listNumPy-Discussion@scipy.orghttps://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>> --
>> *Martin Noblia*
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] three-way comparisons

2016-05-14 Thread josef.pktd
On Sat, May 14, 2016 at 3:23 AM, Phillip Feldman <
phillip.m.feld...@gmail.com> wrote:

> I often find a need to do the type of comparison done by function shown
> below.  I suspect that this would be more efficient for large arrays if
> implemented direction in C.  Is there any possibility of adding something
> like this to NumPy?
>
> def three_way(x, y):
>"""
>This function performs a 3-way comparison on `x` and `y`, which must be
>either lists or arrays of compatible shape.  Each pair of items or
> elements--
>let's call them x[i] and y[i]--are compared.  The corresponding element
> in
>the output array is 1 if `x[i]` is greater then `y[i]`, -1 of `x[i]` is
> less,
>and zero if the two are equal.
>"""
>numpy.greater(y, x).astype(int) - numpy.less(y, x).astype(int)
>


isn't that the same as sign( x- y) ?

Josef


>
> Phillip
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changing the behavior of (builtins.)round (via the __round__ dunder) to return an integer

2016-04-13 Thread josef.pktd
On Wed, Apr 13, 2016 at 4:31 AM, Stephan Hoyer  wrote:

> On Wed, Apr 13, 2016 at 12:42 AM, Antony Lee 
> wrote:
>
>> (Note that I am suggesting to switch to the new behavior regardless of
>> the version of Python.)
>>
>
> I would lean towards making this change only for Python 3. This is
> arguably more consistent with Python than changing the behavior on Python
> 2.7, too.
>
> The most obvious way in which a float being surprisingly switched to an
> integer could cause silent bugs (rather than noisy TypeErrors) is if the
> number is used in division. True division in Python 3 eliminates this risk.
>
> Generally, I agree with your reasoning. It would be unfortunate to be
> stuck with this legacy behavior forever.
>
>
The difference is that Python 3 has long ints, (and doesn't have to
overflow, AFAICS)

what happens with nan?
I guess inf would overflow?

(nan and inf are preserved with np.round)

Josef



> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ndarray.T2 for 2D transpose

2016-04-08 Thread josef.pktd
On Fri, Apr 8, 2016 at 5:11 PM, Charles R Harris 
wrote:

>
>
> On Fri, Apr 8, 2016 at 2:52 PM,  wrote:
>
>>
>>
>> On Fri, Apr 8, 2016 at 3:55 PM, Charles R Harris <
>> charlesr.har...@gmail.com> wrote:
>>
>>>
>>>
>>> On Fri, Apr 8, 2016 at 12:17 PM, Chris Barker 
>>> wrote:
>>>
 On Fri, Apr 8, 2016 at 9:59 AM, Charles R Harris <
 charlesr.har...@gmail.com> wrote:

> Apropos column/row vectors, I've toyed a bit with the idea of adding a
> flag to numpy arrays to indicate that the last index is one or the other,
> and maybe neither.
>

 I don't follow this. wouldn't it ony be an issue for 1D arrays, rather
 than the "last index". Or maybe I'm totally missing the point.

 But anyway, are (N,1) and (1, N) arrays insufficient for representing
 column and row vectors for some reason? If not -- then we have a way to
 express a column or row vector, we just need an easier and more obvious way
 to create them.

 *maybe* we could have actual column and row vector classes -- they
 would BE regular arrays, with (1,N) or (N,1) dimensions, and act the same
 in every way except their __repr__. and we're provide handy factor
 functions for them.

 These were needed to complete the old Matrix class -- which is no
 longer needed now that we have @ (i.e. a 2D array IS a matrix)

>>>
>>> One problem with that approach is that `vrow @ vcol` has dimension 1 x
>>> 1, which is not a scalar.
>>>
>>
>> I think it's not supposed to be a scalar, if @ breaks on scalars
>>
>> `vrow @ vcol @ a
>>
>
> It's supposed to be a scalar and the expression should be written `vrow @
> vcol * a`, although parens are probably desireable for clarity `(vrow @
> vcol) * a`.
>


if a is 1d or twod vcol, and vrow and vcol could also be 2d arrays (not a
single row or col)
this is just a part of a long linear algebra expression

1d dot 1d is different from vrow dot vcol

A dot 1d is different from A dot vcol.

There intentional differences in the linear algebra behavior of 1d versus a
col or row vector. One of those is dropping the extra dimension.
We are using this a lot to switch between 1-d and 2-d cases.

And another great thing about numpy is that often code immediately
generalizes from 1-d to 2d with just some tiny adjustments.

(I haven't played with @ yet)


I worry that making the 1-d arrays suddenly behave ambiguously as weird
1-d/2-d mixture will make code more inconsistent and more difficult to
follow.

shortcuts and variations of atleast_2d sound fine, but not implicitly

Josef




>
> Chuck
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ndarray.T2 for 2D transpose

2016-04-08 Thread josef.pktd
On Fri, Apr 8, 2016 at 3:55 PM, Charles R Harris 
wrote:

>
>
> On Fri, Apr 8, 2016 at 12:17 PM, Chris Barker 
> wrote:
>
>> On Fri, Apr 8, 2016 at 9:59 AM, Charles R Harris <
>> charlesr.har...@gmail.com> wrote:
>>
>>> Apropos column/row vectors, I've toyed a bit with the idea of adding a
>>> flag to numpy arrays to indicate that the last index is one or the other,
>>> and maybe neither.
>>>
>>
>> I don't follow this. wouldn't it ony be an issue for 1D arrays, rather
>> than the "last index". Or maybe I'm totally missing the point.
>>
>> But anyway, are (N,1) and (1, N) arrays insufficient for representing
>> column and row vectors for some reason? If not -- then we have a way to
>> express a column or row vector, we just need an easier and more obvious way
>> to create them.
>>
>> *maybe* we could have actual column and row vector classes -- they would
>> BE regular arrays, with (1,N) or (N,1) dimensions, and act the same in
>> every way except their __repr__. and we're provide handy factor functions
>> for them.
>>
>> These were needed to complete the old Matrix class -- which is no longer
>> needed now that we have @ (i.e. a 2D array IS a matrix)
>>
>
> One problem with that approach is that `vrow @ vcol` has dimension 1 x 1,
> which is not a scalar.
>

I think it's not supposed to be a scalar, if @ breaks on scalars

`vrow @ vcol @ a

Josef`



>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ndarray.T2 for 2D transpose

2016-04-07 Thread josef.pktd
On Thu, Apr 7, 2016 at 4:07 PM, Ian Henriksen <
insertinterestingnameh...@gmail.com> wrote:

> On Thu, Apr 7, 2016 at 1:53 PM  wrote:
>
>> On Thu, Apr 7, 2016 at 3:26 PM, Ian Henriksen <
>> insertinterestingnameh...@gmail.com> wrote:
>>
>>> On Thu, Apr 7, 2016 at 12:31 PM  wrote:
>>>
 write unit tests with non square 2d arrays and the exception / test
 error shows up fast.

 Josef


>>> Absolutely, but good programming practices don't totally obviate helpful
>>> error
>>> messages.
>>>
>>
>> The current behavior is perfectly well defined, and I don't want a lot of
>> warnings showing up because .T works suddenly only for ndim != 1.
>> I make lots of mistakes during programming. But shape mismatch are
>> usually very fast to catch.
>>
>> If you want safe programming, then force everyone to use only 2-D like in
>> matlab. It would have prevented me from making many mistakes.
>>
>> >>> np.array(1).T
>> array(1)
>>
>> another noop. Why doesn't it convert it to 2d?
>>
>> Josef
>>
>>
> I think we've misunderstood each other. Sorry if I was unclear. As I've
> understood the discussion thus far, "raising an error" refers to raising
> an error when
> a 1D array is passed used with the syntax a.T2 (for swapping the last two
> dimensions?). As far as whether or not a.T should raise an error for 1D
> arrays, that
> ship has definitely already sailed. I'm making the case that there's value
> in having
> an abbreviated syntax that helps prevent errors from accidentally using a
> 1D array,
> not that we should change the existing semantics.
>

Sorry, I misunderstood.

I'm not sure which case CHB initially meant.

Josef



>
> Cheers,
>
> -Ian
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ndarray.T2 for 2D transpose

2016-04-07 Thread josef.pktd
On Thu, Apr 7, 2016 at 3:26 PM, Ian Henriksen <
insertinterestingnameh...@gmail.com> wrote:

> On Thu, Apr 7, 2016 at 12:31 PM  wrote:
>
>> write unit tests with non square 2d arrays and the exception / test error
>> shows up fast.
>>
>> Josef
>>
>>
> Absolutely, but good programming practices don't totally obviate helpful
> error
> messages.
>

The current behavior is perfectly well defined, and I don't want a lot of
warnings showing up because .T works suddenly only for ndim != 1.
I make lots of mistakes during programming. But shape mismatch are usually
very fast to catch.

If you want safe programming, then force everyone to use only 2-D like in
matlab. It would have prevented me from making many mistakes.

>>> np.array(1).T
array(1)

another noop. Why doesn't it convert it to 2d?

Josef




>
> Best,
> -Ian
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ndarray.T2 for 2D transpose

2016-04-07 Thread josef.pktd
On Thu, Apr 7, 2016 at 2:17 PM, Chris Barker  wrote:

> On Thu, Apr 7, 2016 at 10:00 AM, Ian Henriksen <
> insertinterestingnameh...@gmail.com> wrote:
>
>> Here's another example that I've seen catch people now and again.
>>
>> A = np.random.rand(100, 100)
>> b =  np.random.rand(10)
>> A * b.T
>>
>
> typo? that was supposed to be
>
> b =  np.random.rand(100). yes?
>
> This is exactly what someone else referred to as the expectations of
> someone that comes from MATLAB, and doesn't yet "get" that 1D arrays are 1D
> arrays.
>
> All of this is EXACTLY the motivation for the matric class -- which never
> took off, and was never complete (it needed a row and column vector
> implementation, if you ask me. But Ithikn the reason it didn't take off is
> that it really isn't that useful, but is different enough from regular
> arrays to be a greater source of confusion. And it was decided that all
> people REALLY wanted was an obviou sway to get matric multiply, which we
> now have with @.
>
> So this discussion brings up that we also need an easy an obvious way to
> make a column vector --
>
> maybe:
>
> np.col_vector(arr)
>
> which would be a synonym for np.reshape(arr, (-1,1))
>
> would that make anyone happy?
>
> NOTE: having transposing a 1D array raise an exception would help remove a
> lot  of the confusion, but it may be too late for that
>
>
> In this case the user pretty clearly meant to be broadcasting along the
>> rows of A
>> rather than along the columns, but the code fails silently.
>>
>
> hence the exception idea
>
> maybe a warning?
>

AFAIR, there is a lot of code that works correctly with .T being a noop for
1D
e.g. covariance matrix/inner product x.T dot y as mentioned before.

write unit tests with non square 2d arrays and the exception / test error
shows up fast.

Josef



>
> -CHB
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ndarray.T2 for 2D transpose

2016-04-07 Thread josef.pktd
On Thu, Apr 7, 2016 at 1:35 PM, Sebastian Berg 
wrote:

> On Do, 2016-04-07 at 13:29 -0400, josef.p...@gmail.com wrote:
> >
> >
> > On Thu, Apr 7, 2016 at 1:20 PM, Sebastian Berg <
> > sebast...@sipsolutions.net> wrote:
> > > On Do, 2016-04-07 at 11:56 -0400, josef.p...@gmail.com wrote:
> > > >
> > > >
> > >
> > > 
> > >
> > > >
> > > > I don't think numpy treats 1d arrays as row vectors. numpy has C
> > > > -order for axis preference which coincides in many cases with row
> > > > vector behavior.
> > > >
> > >
> > > Well, broadcasting rules, are that (n,) should typically behave
> > > similar
> > > to (1, n). However, for dot/matmul and @ the rules are stretched to
> > > mean "the one dimensional thing that gives an inner product" (using
> > > matmul since my python has no @ yet):
> > >
> > > In [12]: a = np.arange(20)
> > > In [13]: b = np.arange(20)
> > >
> > > In [14]: np.matmul(a, b)
> > > Out[14]: 2470
> > >
> > > In [15]: np.matmul(a, b[:, None])
> > > Out[15]: array([2470])
> > >
> > > In [16]: np.matmul(a[None, :], b)
> > > Out[16]: array([2470])
> > >
> > > In [17]: np.matmul(a[None, :], b[:, None])
> > > Out[17]: array([[2470]])
> > >
> > > which indeed gives us a fun thing, because if you look at the last
> > > line, the outer product equivalent would be:
> > >
> > > outer = np.matmul(a[None, :].T, b[:, None].T)
> > >
> > > Now if I go back to the earlier example:
> > >
> > > a.T @ b
> > >
> > > Does not achieve the outer product at all with using T2, since
> > >
> > > a.T2 @ b.T2  # only correct for a, but not for b
> > > a.T2 @ b  # b attempts to be "inner", so does not work
> > >
> > > It almost seems to me that the example is a counter example,
> > > because on
> > > first sight the `T2` attribute would still leave you with no
> > > shorthand
> > > for `b`.
> > a.T2 @ b.T2.T
> >
>
> Actually, better would be:
>
>   a.T2 @ b.T2.T2  # Aha?
>
> And true enough, that works, but is it still reasonably easy to find
> and understand?
> Or is it just frickeling around, the same as you would try `a[:, None]`
> before finding `a[None, :]`, maybe worse?
>

I had thought about it earlier, but its "too cute" for my taste (and I
think I would complain during code review when I see this.)

Josef



>
> - Sebastian
>
> >
> > (T2 as shortcut for creating a[:, None] that's neat, except if a is
> > already 2D)
> >
> > Josef
> >
> > >
> > > I understand the pain of having to write (and parse get into the
> > > depth
> > > of) things like `arr[:, np.newaxis]` or reshape. I also understand
> > > the
> > > idea of a shorthand for vectorized matrix operations. That is, an
> > > argument for a T2 attribute which errors on 1D arrays (not sure I
> > > like
> > > it, but that is a different issue).
> > >
> > > However, it seems that implicit adding of an axis which only works
> > > half
> > > the time does not help too much? I have to admit I don't write
> > > these
> > > things too much, but I wonder if it would not help more if we just
> > > provided some better information/link to longer examples in the
> > > "dimension mismatch" error message?
> > >
> > > In the end it is quite simple, as Nathaniel, I think I would like
> > > to
> > > see some example code, where the code obviously looks easier then
> > > before? With the `@` operator that was the case, with the
> > > "dimension
> > > adding logic" I am not so sure, plus it seems it may add other
> > > pitfalls.
> > >
> > > - Sebastian
> > >
> > >
> > >
> > >
> > > > >>> np.concatenate(([[1,2,3]], [4,5,6]))
> > > > Traceback (most recent call last):
> > > >   File "", line 1, in 
> > > > np.concatenate(([[1,2,3]], [4,5,6]))
> > > > ValueError: arrays must have same number of dimensions
> > > >
> > > > It's not an uncommon exception for me.
> > > >
> > > > Josef
> > > >
> > > > >
> > > > > ___
> > > > > NumPy-Discussion mailing list
> > > > > NumPy-Discussion@scipy.org
> > > > > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > > > >
> > > > ___
> > > > NumPy-Discussion mailing list
> > > > NumPy-Discussion@scipy.org
> > > > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > >
> > > ___
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion@scipy.org
> > > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ndarray.T2 for 2D transpose

2016-04-07 Thread josef.pktd
On Thu, Apr 7, 2016 at 1:20 PM, Sebastian Berg 
wrote:

> On Do, 2016-04-07 at 11:56 -0400, josef.p...@gmail.com wrote:
> >
> >
>
> 
>
> >
> > I don't think numpy treats 1d arrays as row vectors. numpy has C
> > -order for axis preference which coincides in many cases with row
> > vector behavior.
> >
>
> Well, broadcasting rules, are that (n,) should typically behave similar
> to (1, n). However, for dot/matmul and @ the rules are stretched to
> mean "the one dimensional thing that gives an inner product" (using
> matmul since my python has no @ yet):
>
> In [12]: a = np.arange(20)
> In [13]: b = np.arange(20)
>
> In [14]: np.matmul(a, b)
> Out[14]: 2470
>
> In [15]: np.matmul(a, b[:, None])
> Out[15]: array([2470])
>
> In [16]: np.matmul(a[None, :], b)
> Out[16]: array([2470])
>
> In [17]: np.matmul(a[None, :], b[:, None])
> Out[17]: array([[2470]])
>
> which indeed gives us a fun thing, because if you look at the last
> line, the outer product equivalent would be:
>
> outer = np.matmul(a[None, :].T, b[:, None].T)
>
> Now if I go back to the earlier example:
>
> a.T @ b
>
> Does not achieve the outer product at all with using T2, since
>
> a.T2 @ b.T2  # only correct for a, but not for b
> a.T2 @ b  # b attempts to be "inner", so does not work
>


> It almost seems to me that the example is a counter example, because on
> first sight the `T2` attribute would still leave you with no shorthand
> for `b`.
>

a.T2 @ b.T2.T


(T2 as shortcut for creating a[:, None] that's neat, except if a is already
2D)

Josef


>
> I understand the pain of having to write (and parse get into the depth
> of) things like `arr[:, np.newaxis]` or reshape. I also understand the
> idea of a shorthand for vectorized matrix operations. That is, an
> argument for a T2 attribute which errors on 1D arrays (not sure I like
> it, but that is a different issue).
>
> However, it seems that implicit adding of an axis which only works half
> the time does not help too much? I have to admit I don't write these
> things too much, but I wonder if it would not help more if we just
> provided some better information/link to longer examples in the
> "dimension mismatch" error message?
>
> In the end it is quite simple, as Nathaniel, I think I would like to
> see some example code, where the code obviously looks easier then
> before? With the `@` operator that was the case, with the "dimension
> adding logic" I am not so sure, plus it seems it may add other
> pitfalls.
>
> - Sebastian
>
>
>
>
> > >>> np.concatenate(([[1,2,3]], [4,5,6]))
> > Traceback (most recent call last):
> >   File "", line 1, in 
> > np.concatenate(([[1,2,3]], [4,5,6]))
> > ValueError: arrays must have same number of dimensions
> >
> > It's not an uncommon exception for me.
> >
> > Josef
> >
> > >
> > > ___
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion@scipy.org
> > > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ndarray.T2 for 2D transpose

2016-04-07 Thread josef.pktd
On Thu, Apr 7, 2016 at 11:42 AM, Todd  wrote:
> On Thu, Apr 7, 2016 at 11:35 AM,  wrote:
>>
>> On Thu, Apr 7, 2016 at 11:13 AM, Todd  wrote:
>> > On Wed, Apr 6, 2016 at 5:20 PM, Nathaniel Smith  wrote:
>> >>
>> >> On Wed, Apr 6, 2016 at 10:43 AM, Todd  wrote:
>> >> >
>> >> > My intention was to make linear algebra operations easier in numpy.
>> >> > With
>> >> > the @ operator available, it is now very easy to do basic linear
>> >> > algebra
>> >> > on
>> >> > arrays without needing the matrix class.  But getting an array into
a
>> >> > state
>> >> > where you can use the @ operator effectively is currently pretty
>> >> > verbose
>> >> > and
>> >> > confusing.  I was trying to find a way to make the @ operator more
>> >> > useful.
>> >>
>> >> Can you elaborate on what you're doing that you find verbose and
>> >> confusing, maybe paste an example? I've never had any trouble like
>> >> this doing linear algebra with @ or dot (which have similar semantics
>> >> for 1d arrays), which is probably just because I've had different use
>> >> cases, but it's much easier to talk about these things with a concrete
>> >> example in front of us to put everyone on the same page.
>> >>
>> >
>> > Let's say you want to do a simple matrix multiplication example.  You
>> > create
>> > two example arrays like so:
>> >
>> >a = np.arange(20)
>> >b = np.arange(10, 50, 10)
>> >
>> > Now you want to do
>> >
>> > a.T @ b
>> >
>> > First you need to turn a into a 2D array.  I can think of 10 ways to do
>> > this
>> > off the top of my head, and there may be more:
>> >
>> > 1a) a[:, None]
>> > 1b) a[None]
>> > 1c) a[None, :]
>> > 2a) a.shape = (1, -1)
>> > 2b) a.shape = (-1, 1)
>> > 3a) a.reshape(1, -1)
>> > 3b) a.reshape(-1, 1)
>> > 4a) np.reshape(a, (1, -1))
>> > 4b) np.reshape(a, (-1, 1))
>> > 5) np.atleast_2d(a)
>> >
>> > 5 is pretty clear, and will work fine with any number of dimensions,
but
>> > is
>> > also long to type out when trying to do a simple example.  The
different
>> > variants of 1, 2, 3, and 4, however, will only work with 1D arrays
>> > (making
>> > them less useful for functions), are not immediately obvious to me what
>> > the
>> > result will be (I always need to try it to make sure the result is what
>> > I
>> > expect), and are easy to get mixed up in my opinion.  They also require
>> > people keep a mental list of lots of ways to do what should be a very
>> > simple
>> > task.
>> >
>> > Basically, my argument here is the same as the argument from pep465 for
>> > the
>> > inclusion of the @ operator:
>> >
>> >
https://www.python.org/dev/peps/pep-0465/#transparent-syntax-is-especially-crucial-for-non-expert-programmers
>> >
>> > "A large proportion of scientific code is written by people who are
>> > experts
>> > in their domain, but are not experts in programming. And there are many
>> > university courses run each year with titles like "Data analysis for
>> > social
>> > scientists" which assume no programming background, and teach some
>> > combination of mathematical techniques, introduction to programming,
and
>> > the
>> > use of programming to implement these mathematical techniques, all
>> > within a
>> > 10-15 week period. These courses are more and more often being taught
in
>> > Python rather than special-purpose languages like R or Matlab.
>> >
>> > For these kinds of users, whose programming knowledge is fragile, the
>> > existence of a transparent mapping between formulas and code often
means
>> > the
>> > difference between succeeding and failing to write that code at all."
>>
>> This doesn't work because of the ambiguity between column and row vector.
>>
>> In most cases 1d vectors in statistics/econometrics are column
>> vectors. Sometime it takes me a long time to figure out whether an
>> author uses row or column vector for transpose.
>>
>> i.e. I often need x.T dot y   which works for 1d and 2d to produce
>> inner product.
>> but the outer product would require most of the time a column vector
>> so it's defined as x dot x.T.
>>
>> I think keeping around explicitly 2d arrays if necessary is less error
>> prone and confusing.
>>
>> But I wouldn't mind a shortcut for atleast_2d   (although more often I
>> need atleast_2dcol to translate formulas)
>>
>
> At least from what I have seen, in all cases in numpy where a 1D array is
> treated as a 2D array, it is always treated as a row vector, the examples
I
> can think of being atleast_2d, hstack, vstack, and dstack. So using this
> convention would be in line with how it is used elsewhere in numpy.

AFAIK, linear algebra works differently, 1-D is special

>>> xx = np.arange(20).reshape(4,5)
>>> yy = np.arange(4)
>>> xx.dot(yy)
Traceback (most recent call last):
  File "", line 1, in 
xx.dot(yy)
ValueError: objects are not aligned

>>> yy = np.arange(5)
>>> xx.dot(yy)
array([ 30,  

Re: [Numpy-discussion] ndarray.T2 for 2D transpose

2016-04-07 Thread josef.pktd
On Thu, Apr 7, 2016 at 11:13 AM, Todd  wrote:
> On Wed, Apr 6, 2016 at 5:20 PM, Nathaniel Smith  wrote:
>>
>> On Wed, Apr 6, 2016 at 10:43 AM, Todd  wrote:
>> >
>> > My intention was to make linear algebra operations easier in numpy.
>> > With
>> > the @ operator available, it is now very easy to do basic linear algebra
>> > on
>> > arrays without needing the matrix class.  But getting an array into a
>> > state
>> > where you can use the @ operator effectively is currently pretty verbose
>> > and
>> > confusing.  I was trying to find a way to make the @ operator more
>> > useful.
>>
>> Can you elaborate on what you're doing that you find verbose and
>> confusing, maybe paste an example? I've never had any trouble like
>> this doing linear algebra with @ or dot (which have similar semantics
>> for 1d arrays), which is probably just because I've had different use
>> cases, but it's much easier to talk about these things with a concrete
>> example in front of us to put everyone on the same page.
>>
>
> Let's say you want to do a simple matrix multiplication example.  You create
> two example arrays like so:
>
>a = np.arange(20)
>b = np.arange(10, 50, 10)
>
> Now you want to do
>
> a.T @ b
>
> First you need to turn a into a 2D array.  I can think of 10 ways to do this
> off the top of my head, and there may be more:
>
> 1a) a[:, None]
> 1b) a[None]
> 1c) a[None, :]
> 2a) a.shape = (1, -1)
> 2b) a.shape = (-1, 1)
> 3a) a.reshape(1, -1)
> 3b) a.reshape(-1, 1)
> 4a) np.reshape(a, (1, -1))
> 4b) np.reshape(a, (-1, 1))
> 5) np.atleast_2d(a)
>
> 5 is pretty clear, and will work fine with any number of dimensions, but is
> also long to type out when trying to do a simple example.  The different
> variants of 1, 2, 3, and 4, however, will only work with 1D arrays (making
> them less useful for functions), are not immediately obvious to me what the
> result will be (I always need to try it to make sure the result is what I
> expect), and are easy to get mixed up in my opinion.  They also require
> people keep a mental list of lots of ways to do what should be a very simple
> task.
>
> Basically, my argument here is the same as the argument from pep465 for the
> inclusion of the @ operator:
> https://www.python.org/dev/peps/pep-0465/#transparent-syntax-is-especially-crucial-for-non-expert-programmers
>
> "A large proportion of scientific code is written by people who are experts
> in their domain, but are not experts in programming. And there are many
> university courses run each year with titles like "Data analysis for social
> scientists" which assume no programming background, and teach some
> combination of mathematical techniques, introduction to programming, and the
> use of programming to implement these mathematical techniques, all within a
> 10-15 week period. These courses are more and more often being taught in
> Python rather than special-purpose languages like R or Matlab.
>
> For these kinds of users, whose programming knowledge is fragile, the
> existence of a transparent mapping between formulas and code often means the
> difference between succeeding and failing to write that code at all."

This doesn't work because of the ambiguity between column and row vector.

In most cases 1d vectors in statistics/econometrics are column
vectors. Sometime it takes me a long time to figure out whether an
author uses row or column vector for transpose.

i.e. I often need x.T dot y   which works for 1d and 2d to produce
inner product.
but the outer product would require most of the time a column vector
so it's defined as x dot x.T.

I think keeping around explicitly 2d arrays if necessary is less error
prone and confusing.

But I wouldn't mind a shortcut for atleast_2d   (although more often I
need atleast_2dcol to translate formulas)

Josef

>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Make np.bincount output same dtype as weights

2016-03-26 Thread josef.pktd
On Sat, Mar 26, 2016 at 9:54 PM, Joseph Fox-Rabinovitz
 wrote:
> Would it make sense to just make the output type large enough to hold the
> cumulative sum of the weights?
>
>
> - Joseph Fox-Rabinovitz
>
> -- Original message--
>
> From: Jaime Fernández del Río
>
> Date: Sat, Mar 26, 2016 16:16
>
> To: Discussion of Numerical Python;
>
> Subject:[Numpy-discussion] Make np.bincount output same dtype as weights
>
> Hi all,
>
> I have just submitted a PR (#7464) that fixes an enhancement request
> (#6854), making np.bincount return an array of the same type as the weights
> parameter.  This is an important deviation from current behavior, which
> always casts weights to double, and always returns a double array, so I
> would like to hear what others think about the worthiness of this.  Main
> discussion points:
>
> np.bincount now works with complex weights (yay!), I guess this should be a
> pretty uncontroversial enhancement.
> The return is of the same type as weights, which means that small integers
> are very likely to overflow.  This is exactly what #6854 requested, but
> perhaps we should promote the output for integers to a long, as we do in
> np.sum?

I always thought of bincount with weights just as a group-by sum. So
it would be easier to remember and have fewer surprises if it matches
the behavior of np.sum.

> Boolean arrays stay boolean, and OR, rather than sum, the weights. Is this
> what one would want? If we decide that integer promotion is the way to go,
> perhaps booleans should go in the same pack?

Isn't this calculating the sum, i.e. count of True by group, already?
Based on a quick example with numpy 1.9.2, I don't think I ever used
bool weights before.


> This new implementation currently supports all of the reasonable native
> types, but has no fallback for user defined types.  I guess we should
> attempt to cast the array to double as before if no native loop can be
> found? It would be good to have a way of testing this though, any thoughts
> on how to go about this?
> Does a behavior change like this require some deprecation period? What would
> that look like?
> I have also added broadcasting of weights to the full size of list, so that
> one can do e.g. np.bincount([1, 2, 3], weights=2j) without having to tile
> the single weight to the size of the bins list.
>
> Any other thoughts are very welcome as well!

(2-D weights ?)


Josef


>
> Jaime
>
> --
> (__/)
> ( O.o)
> ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de
> dominación mundial.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changes to generalized ufunc core dimension checking

2016-03-19 Thread josef.pktd
On Thu, Mar 17, 2016 at 1:08 AM, Steve Waterbury 
wrote:

> On 03/16/2016 10:32 PM, Fernando Perez wrote:
>
>> On Wed, Mar 16, 2016 at 3:45 PM, Steve Waterbury
>> > wrote:
>>
>> On 03/16/2016 06:28 PM, Nathaniel Smith wrote:
>>
>> ... Sounds like a real deprecation cycle would have been better.
>>
>>
>> IMHO for a library as venerable and widely-used as Numpy, a
>> deprecation cycle is almost always better ... consider this a
>> lesson learned.
>>
>>
>> Mandatory XKCD - https://xkcd.com/1172
>>
>> We recently had a discussion about a similar "nobody we know uses nor
>> should use this api" situation in IPython, and ultimately decided that
>> xkcd 1172 would hit us, so opted in this case just for creating new
>> cleaner APIs + possibly doing slow deprecation of the old stuff.
>>
>> For a widely used library, if the code exists then someone, somewhere
>> depends on it, regardless of how broken or obscure you think the feature
>> is. We just have to live with that.
>>
>
> Ha, I love that xkcd!  But not sure I agree that it applies here ...
> however, I do appreciate your sharing it.  :D
>
> I mean, just change stuff and see who screams, right?  ;)


No, it's change stuff and listen to whether anybody screams.


I'm sometimes late in (politely) screaming because deprecation warning are
either filtered out or I'm using ancient numpy in my development python, or
for whatever other reason I don't see warnings.

Josef



>
>
> Steve
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Windows wheels, built, but should we deploy?

2016-03-04 Thread josef.pktd
On Fri, Mar 4, 2016 at 1:38 PM, Matthew Brett 
wrote:

> On Fri, Mar 4, 2016 at 12:29 AM, David Cournapeau 
> wrote:
> >
> >
> > On Fri, Mar 4, 2016 at 4:42 AM, Matthew Brett 
> > wrote:
> >>
> >> Hi,
> >>
> >> Summary:
> >>
> >> I propose that we upload Windows wheels to pypi.  The wheels are
> >> likely to be stable and relatively easy to maintain, but will have
> >> slower performance than other versions of numpy linked against faster
> >> BLAS / LAPACK libraries.
> >>
> >> Background:
> >>
> >> There's a long discussion going on at issue github #5479 [1], where
> >> the old problem of Windows wheels for numpy came up.
> >>
> >> For those of you not following this issue, the current situation for
> >> community-built numpy Windows binaries is dire:
> >>
> >> * We have not so far provided windows wheels on pypi, so `pip install
> >> numpy` on Windows will bring you a world of pain;
> >> * Until recently we did provide .exe "superpack" installers on
> >> sourceforge, but these became increasingly difficult to build and we
> >> gave up building them as of the latest (1.10.4) release.
> >>
> >> Despite this, popularity of Windows wheels on pypi is high.   A few
> >> weeks ago, Donald Stufft ran a query for the binary wheels most often
> >> downloaded from pypi, for any platform [2] . The top five most
> >> downloaded were (n_downloads, name):
> >>
> >> 6646,
> >>
> numpy-1.10.4-cp27-none-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
> >> 5445, cryptography-1.2.1-cp27-none-win_amd64.whl
> >> 5243, matplotlib-1.4.0-cp34-none-win32.whl
> >> 5241, scikit_learn-0.15.1-cp34-none-win32.whl
> >> 4573, pandas-0.17.1-cp27-none-win_amd64.whl
> >>
> >> So a) the OSX numpy wheel is very popular and b) despite the fact that
> >> we don't provide a numpy wheel for Windows, matplotlib, sckit_learn
> >> and pandas, that depend on numpy, are the 3rd, 4th and 5th most
> >> downloaded wheels as of a few weeks ago.
> >>
> >> So, there seems to be a large appetite for numpy wheels.
> >>
> >> Current proposal:
> >>
> >> I have now built numpy wheels, using the ATLAS blas / lapack library -
> >> the build is automatic and reproducible [3].
> >>
> >> I chose ATLAS to build against, rather than, say OpenBLAS, because
> >> we've had some significant worries in the past about the reliability
> >> of OpenBLAS, and I thought it better to err on the side of
> >> correctness.
> >>
> >> However, these builds are relatively slow for matrix multiply and
> >> other linear algebra routines compared numpy built against OpenBLAS or
> >> MKL (which we cannot use because of its license) [4].   In my very
> >> crude array test of a dot product and matrix inversion, the ATLAS
> >> wheels were 2-3 times slower than MKL.  Other benchmarks on Julia
> >> found about the same result for ATLAS vs OpenBLAS on 32-bit bit, but a
> >> much bigger difference on 64-bit (for an earlier version of ATLAS than
> >> we are currently using) [5].
> >>
> >> So, our numpy wheels likely to be stable and give correct results, but
> >> will be somewhat slow for linear algebra.
> >
> >
> > I would not worry too much about this: at worst, this gives us back the
> > situation where we were w/ so-called superpack, which have been
> successful
> > in the past to spread numpy use on windows.
> >
> > My main worry is whether this locks us into ATLAS  for a long time
> because
> > of package depending on numpy blas/lapack (scipy, scikit learn). I am not
> > sure how much this is the case.
>
> You mean the situation where other packages try to find the BLAS /
> LAPACK library and link against that?   My impression was that neither
> scipy or scikit-learn do that at the moment, but I'm happy to be
> corrected.
>
> You'd know better than me about this, but my understanding is that
> BLAS / LAPACK has a standard interface that should allow code to run
> the same way, regardless of which BLAS / LAPACK library it is linking
> to.  So, even if another package is trying to link against the numpy
> BLAS, swapping the numpy BLAS library shouldn't cause a problem
> (unless the package is trying to link to ATLAS-specific stuff, which
> seems a bit unlikely).
>
> Is that right?
>


AFAIK, numpy doesn't provide access to BLAS/LAPACK. scipy does. statsmodels
is linking to the installed BLAS/LAPACK in cython code through scipy. So
far we haven't seen problems with different versions. I think scipy
development works very well to isolate linalg library version specific
parts from the user interface.

AFAIU, The main problem will be linking to inconsistent Fortran libraries
in downstream packages that use Fortran.
Eg. AFAIU it won't work to pip install a ATLAS based numpy and then install
a MKL based scipy from Gohlke.

I don't know if there is a useful error message, or if this just results in
puzzled users.

Josef



>
> Cheers,
>
> Matthew
> 

Re: [Numpy-discussion] [Suggestion] Labelled Array

2016-02-19 Thread josef.pktd
On Fri, Feb 19, 2016 at 12:08 PM, Allan Haldane 
wrote:

> I also want to add a historical note here, that 'groupby' has been
> discussed a couple times before.
>
> Travis Oliphant even made an NEP for it, and Wes McKinney lightly hinted
> at adding it to numpy.
>
> http://thread.gmane.org/gmane.comp.python.numeric.general/37480/focus=37480
> http://thread.gmane.org/gmane.comp.python.numeric.general/38272/focus=38299
> http://docs.scipy.org/doc/numpy-1.10.1/neps/groupby_additions.html
>
> Travis's idea for a ufunc method 'reduceby' is more along the lines of
> what I was originally thinking. Just musing about it, it might cover few
> small cases pandas groupby might not: It could work on arbitrary ufuncs,
> and over particular axes of multidimensional data. Eg, to sum over
> pixels from NxNx3 image data. But maybe pandas can cover the
> multidimensional case through additional index columns or with Panel.
>

xarray is now covering that area.

There are also recfunctions in numpy.lib that never got a lot of attention
and expansion.
There were plans to cover more of the matplotlib versions in numpy, but I
have no idea and didn't check what happened to it..

Josef



>
> Cheers,
> Allan
>
> On 02/15/2016 05:31 PM, Paul Hobson wrote:
> > Just for posterity -- any future readers to this thread who need to do
> > pandas-like on record arrays should look at matplotlib's mlab submodule.
> >
> > I've been in situations (::cough:: Esri production ::cough::) where I've
> > had one hand tied behind my back and unable to install pandas. mlab was
> > a big help there.
> >
> > https://goo.gl/M7Mi8B
> >
> > -paul
> >
> >
> >
> > On Mon, Feb 15, 2016 at 1:28 PM, Lluís Vilanova  > > wrote:
> >
> > Benjamin Root writes:
> >
> > > Seems like you are talking about xarray:
> https://github.com/pydata/xarray
> >
> > Oh, I wasn't aware of xarray, but there's also this:
> >
> >
> >
> https://people.gso.ac.upc.edu/vilanova/doc/sciexp2/user_guide/data.html#basic-indexing
> >
> >
> https://people.gso.ac.upc.edu/vilanova/doc/sciexp2/user_guide/data.html#dimension-oblivious-indexing
> >
> >
> > Cheers,
> >   Lluis
> >
> >
> >
> > > Cheers!
> > > Ben Root
> >
> > > On Fri, Feb 12, 2016 at 9:40 AM, Sérgio  > > wrote:
> >
> > > Hello,
> >
> >
> > > This is my first e-mail, I will try to make the idea simple.
> >
> >
> > > Similar to masked array it would be interesting to use a label
> > array to
> > > guide operations.
> >
> >
> > > Ex.:
> >  x
> > > labelled_array(data =
> >
> > > [[0 1 2]
> > > [3 4 5]
> > > [6 7 8]],
> > > label =
> > > [[0 1 2]
> > > [0 1 2]
> > > [0 1 2]])
> >
> >
> >  sum(x)
> > > array([9, 12, 15])
> >
> >
> > > The operations would create a new axis for label indexing.
> >
> >
> > > You could think of it as a collection of masks, one for each
> > label.
> >
> >
> > > I don't know a way to make something like this efficiently
> > without a loop.
> > > Just wondering...
> >
> >
> > > Sérgio.
> >
> > > ___
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion@scipy.org 
> > > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> >
> >
> > > ___
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion@scipy.org 
> > > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org 
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Suggestion: special-case np.array(range(...)) to be faster

2016-02-18 Thread josef.pktd
On Thu, Feb 18, 2016 at 1:15 PM, Antony Lee  wrote:

> Mostly so that there is no performance lost when someone passes range(...)
> instead of np.arange(...).  At least I had never realized that one is much
> faster than the other and always just passed range() as a convenience.
>
> Antony
>
> 2016-02-17 10:50 GMT-08:00 Chris Barker :
>
>> On Sun, Feb 14, 2016 at 11:41 PM, Antony Lee 
>> wrote:
>>
>>> So how can np.array(range(...)) even work?
>>>
>>
>> range()  (in py3) is not a generator, nor is is a iterator. it is a range
>> object, which is lazily evaluated, and satisfies both the iterator protocol
>> and the sequence protocol (at least most of it:
>>
>> In [*1*]: r = range(10)
>>
>
thanks, I didn't know that

the range r here doesn't get eaten by iterating through it
while
r = (i for i in range(5))
is only good for a single pass.

(tried on python 3.4)

Josef



>
>> In [*2*]: r[3]
>>
>> Out[*2*]: 3
>>
>>
>> In [*3*]: len(r)
>>
>> Out[*3*]: 10
>>
>>
>> In [*4*]: type(r)
>>
>> Out[*4*]: range
>>
>> In [*9*]: isinstance(r, collections.abc.Sequence)
>>
>> Out[*9*]: True
>>
>> In [*10*]: l = list()
>>
>> In [*11*]: isinstance(l, collections.abc.Sequence)
>>
>> Out[*11*]: True
>>
>> In [*12*]: isinstance(r, collections.abc.Iterable)
>>
>> Out[*12*]: True
>> I'm still totally confused as to why we'd need to special-case range when
>> we have arange().
>>
>> -CHB
>>
>>
>>
>> --
>>
>> Christopher Barker, Ph.D.
>> Oceanographer
>>
>> Emergency Response Division
>> NOAA/NOS/OR(206) 526-6959   voice
>> 7600 Sand Point Way NE   (206) 526-6329   fax
>> Seattle, WA  98115   (206) 526-6317   main reception
>>
>> chris.bar...@noaa.gov
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] making "low" optional in numpy.randint

2016-02-17 Thread josef.pktd
On Wed, Feb 17, 2016 at 7:17 PM, Juan Nunez-Iglesias 
wrote:

> Ah! Touché! =) My last and admittedly weak defense is that I've been
> writing numpy since before 1.7. =)
>
> On Thu, Feb 18, 2016 at 11:08 AM, Alan Isaac  wrote:
>
>> On 2/17/2016 7:01 PM, Juan Nunez-Iglesias wrote:
>>
>>> Notice the limitation "1D array-like".
>>>
>>
>>
>>
>> http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.random.choice.html
>> "If an int, the random sample is generated as if a was np.arange(n)"
>>
>
(un)related aside:
my R doc quote about "may lead to undesired behavior" refers to this,
IIRC, R's `sample` was the inspiration for this function

but numpy distinguishes scalar from one element (1D) arrays

>>> for i in range(3, 10): np.random.choice(np.arange(10)[i:])

Josef



>
>> hth,
>>
>> Alan Isaac
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] making "low" optional in numpy.randint

2016-02-17 Thread josef.pktd
On Wed, Feb 17, 2016 at 3:58 PM, G Young  wrote:

> I sense that this issue is now becoming more of "randint has become too
> complicated"  I suppose we could always "add" more functions that present
> simpler interfaces, though if you really do want simple, there's always
> Python's random library you can use.
>
> On Wed, Feb 17, 2016 at 8:48 PM, Robert Kern 
> wrote:
>
>> On Wed, Feb 17, 2016 at 8:43 PM, G Young  wrote:
>>
>> > Josef: I don't think we are making people think more.  They're all
>> keyword arguments, so if you don't want to think about them, then you leave
>> them as the defaults, and everyone is happy.
>>
>> I believe that Josef has the code's reader in mind, not the code's
>> writer. As a reader of other people's code (and I count 6-months-ago-me as
>> one such "other people"), I am sure to eventually encounter all of the
>> different variants, so I will need to know all of them.
>>
>

I have mostly the users in mind (i.e. me).

I like simple patterns where I don't have to stare at a docstring for five
minutes to understand it, or pull it up again each time I use it.

dtype for storage is different from dtype as distribution parameter.


---
aside, since I just read this
https://news.ycombinator.com/item?id=2763

what to avoid.
you save a few keystrokes and spend months trying to figure out what's
going on.
(exaggerated)

"*Note* that this convenience feature may lead to undesired behaviour when
..." from R docs

Josef




>
>> --
>> Robert Kern
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] making "low" optional in numpy.randint

2016-02-17 Thread josef.pktd
On Wed, Feb 17, 2016 at 2:20 PM,  wrote:

>
>
> On Wed, Feb 17, 2016 at 2:09 PM, G Young  wrote:
>
>> Yes, you are correct in explaining my intentions.  However, as I also
>> mentioned in the PR discussion, I did not quite understand how your wrapper
>> idea would make things any more comprehensive at the cost of additional
>> overhead and complexity.  What do you mean by making the functions
>> "consistent" (i.e. outline the behavior *exactly* depending on the
>> inputs)?  As I've explained before, and I will state it again, the
>> different behavior for the high=None and low != None case is due to
>> backwards compatibility.
>>
>
>
> One problem is that if there is only one positional argument, then I can
> still figure out that it might have different meanings.
> If there are two keywords, then I would assume standard python argument
> interpretation applies.
>
> If I want to save on typing, then I think it should be for a more
> "standard" case. (I also never sample all real numbers, at least not
> uniformly.)
>


One more thing I don't like:

So far all distributions are "theoretical" distributions where the
distribution depends on the provided shape, location and scale parameters.
There is a limitation in how they are represented as numbers/dtype and what
range is possible. However, that is not relevant for most use cases.

In this case you are promoting `dtype` from a memory or storage parameter
to an actual shape (or loc and scale) parameter.
That's "weird", and even more so if this would be the default behavior.

There is no proper uniform distribution on all integers. So, this forces
users to think about the implementation detail like dtype, when I just want
a random sample of a probability distribution.

Josef


>
> Josef
>
>
>
>>
>> On Wed, Feb 17, 2016 at 6:52 PM, Joseph Fox-Rabinovitz <
>> jfoxrabinov...@gmail.com> wrote:
>>
>>> On Wed, Feb 17, 2016 at 1:37 PM,   wrote:
>>> >
>>> >
>>> > On Wed, Feb 17, 2016 at 10:01 AM, G Young  wrote:
>>> >>
>>> >> Hello all,
>>> >>
>>> >> I have a PR open here that makes "low" an optional parameter in
>>> >> numpy.randint and introduces new behavior into the API as follows:
>>> >>
>>> >> 1) `low == None` and `high == None`
>>> >>
>>> >> Numbers are generated over the range `[lowbnd, highbnd)`, where
>>> `lowbnd =
>>> >> np.iinfo(dtype).min`, and `highbnd = np.iinfo(dtype).max`, where
>>> `dtype` is
>>> >> the provided integral type.
>>> >>
>>> >> 2) `low != None` and `high == None`
>>> >>
>>> >> If `low >= 0`, numbers are still generated over the range `[0,
>>> >> low)`, but if `low` < 0, numbers are generated over the range `[low,
>>> >> highbnd)`, where `highbnd` is defined as above.
>>> >>
>>> >> 3) `low == None` and `high != None`
>>> >>
>>> >> Numbers are generated over the range `[lowbnd, high)`, where `lowbnd`
>>> is
>>> >> defined as above.
>>> >
>>> >
>>> > My impression (*) is that this will be confusing, and uses a default
>>> that I
>>> > never ever needed.
>>> >
>>> > Maybe a better way would be to use low=-np.inf and high=np.inf  where
>>> inf
>>> > would be interpreted as the smallest and largest representable number.
>>> And
>>> > leave the defaults unchanged.
>>> >
>>> > (*) I didn't try to understand how it works for various cases.
>>> >
>>> > Josef
>>> >
>>>
>>> As I mentioned on the PR discussion, the thing that bothers me is the
>>> inconsistency between the new and the old functionality, specifically
>>> in #2. If high is, the behavior is completely different depending on
>>> the value of `low`. Using `np.inf` instead of `None` may fix that,
>>> although I think that the author's idea was to avoid having to type
>>> the bounds in the `None`/`+/-np.inf` cases. I think that a better
>>> option is to have a separate wrapper to `randint` that implements this
>>> behavior in a consistent manner and leaves the current function
>>> consistent as well.
>>>
>>> -Joe
>>>
>>>
>>> >
>>> >
>>> >>
>>> >>
>>> >> The primary motivation was the second case, as it is more convenient
>>> to
>>> >> specify a 'dtype' by itself when generating such numbers in a similar
>>> vein
>>> >> to numpy.empty, except with initialized values.
>>> >>
>>> >> Looking forward to your feedback!
>>> >>
>>> >> Greg
>>> >>
>>> >> ___
>>> >> NumPy-Discussion mailing list
>>> >> NumPy-Discussion@scipy.org
>>> >> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>> >>
>>> >
>>> >
>>> > ___
>>> > NumPy-Discussion mailing list
>>> > NumPy-Discussion@scipy.org
>>> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>> >
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> 

Re: [Numpy-discussion] making "low" optional in numpy.randint

2016-02-17 Thread josef.pktd
On Wed, Feb 17, 2016 at 2:09 PM, G Young  wrote:

> Yes, you are correct in explaining my intentions.  However, as I also
> mentioned in the PR discussion, I did not quite understand how your wrapper
> idea would make things any more comprehensive at the cost of additional
> overhead and complexity.  What do you mean by making the functions
> "consistent" (i.e. outline the behavior *exactly* depending on the
> inputs)?  As I've explained before, and I will state it again, the
> different behavior for the high=None and low != None case is due to
> backwards compatibility.
>


One problem is that if there is only one positional argument, then I can
still figure out that it might have different meanings.
If there are two keywords, then I would assume standard python argument
interpretation applies.

If I want to save on typing, then I think it should be for a more
"standard" case. (I also never sample all real numbers, at least not
uniformly.)

Josef



>
> On Wed, Feb 17, 2016 at 6:52 PM, Joseph Fox-Rabinovitz <
> jfoxrabinov...@gmail.com> wrote:
>
>> On Wed, Feb 17, 2016 at 1:37 PM,   wrote:
>> >
>> >
>> > On Wed, Feb 17, 2016 at 10:01 AM, G Young  wrote:
>> >>
>> >> Hello all,
>> >>
>> >> I have a PR open here that makes "low" an optional parameter in
>> >> numpy.randint and introduces new behavior into the API as follows:
>> >>
>> >> 1) `low == None` and `high == None`
>> >>
>> >> Numbers are generated over the range `[lowbnd, highbnd)`, where
>> `lowbnd =
>> >> np.iinfo(dtype).min`, and `highbnd = np.iinfo(dtype).max`, where
>> `dtype` is
>> >> the provided integral type.
>> >>
>> >> 2) `low != None` and `high == None`
>> >>
>> >> If `low >= 0`, numbers are still generated over the range `[0,
>> >> low)`, but if `low` < 0, numbers are generated over the range `[low,
>> >> highbnd)`, where `highbnd` is defined as above.
>> >>
>> >> 3) `low == None` and `high != None`
>> >>
>> >> Numbers are generated over the range `[lowbnd, high)`, where `lowbnd`
>> is
>> >> defined as above.
>> >
>> >
>> > My impression (*) is that this will be confusing, and uses a default
>> that I
>> > never ever needed.
>> >
>> > Maybe a better way would be to use low=-np.inf and high=np.inf  where
>> inf
>> > would be interpreted as the smallest and largest representable number.
>> And
>> > leave the defaults unchanged.
>> >
>> > (*) I didn't try to understand how it works for various cases.
>> >
>> > Josef
>> >
>>
>> As I mentioned on the PR discussion, the thing that bothers me is the
>> inconsistency between the new and the old functionality, specifically
>> in #2. If high is, the behavior is completely different depending on
>> the value of `low`. Using `np.inf` instead of `None` may fix that,
>> although I think that the author's idea was to avoid having to type
>> the bounds in the `None`/`+/-np.inf` cases. I think that a better
>> option is to have a separate wrapper to `randint` that implements this
>> behavior in a consistent manner and leaves the current function
>> consistent as well.
>>
>> -Joe
>>
>>
>> >
>> >
>> >>
>> >>
>> >> The primary motivation was the second case, as it is more convenient to
>> >> specify a 'dtype' by itself when generating such numbers in a similar
>> vein
>> >> to numpy.empty, except with initialized values.
>> >>
>> >> Looking forward to your feedback!
>> >>
>> >> Greg
>> >>
>> >> ___
>> >> NumPy-Discussion mailing list
>> >> NumPy-Discussion@scipy.org
>> >> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >>
>> >
>> >
>> > ___
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@scipy.org
>> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] making "low" optional in numpy.randint

2016-02-17 Thread josef.pktd
On Wed, Feb 17, 2016 at 10:01 AM, G Young  wrote:

> Hello all,
>
> I have a PR open here  that
> makes "low" an optional parameter in numpy.randint and introduces new
> behavior into the API as follows:
>
> 1) `low == None` and `high == None`
>
> Numbers are generated over the range `[lowbnd, highbnd)`, where `lowbnd =
> np.iinfo(dtype).min`, and `highbnd = np.iinfo(dtype).max`, where `dtype` is
> the provided integral type.
>
> 2) `low != None` and `high == None`
>
> If `low >= 0`, numbers are still generated over the range `[0,
> low)`, but if `low` < 0, numbers are generated over the range `[low,
> highbnd)`, where `highbnd` is defined as above.
>
> 3) `low == None` and `high != None`
>
> Numbers are generated over the range `[lowbnd, high)`, where `lowbnd` is
> defined as above.
>

My impression (*) is that this will be confusing, and uses a default that I
never ever needed.

Maybe a better way would be to use low=-np.inf and high=np.inf  where inf
would be interpreted as the smallest and largest representable number. And
leave the defaults unchanged.

(*) I didn't try to understand how it works for various cases.

Josef




>
> The primary motivation was the second case, as it is more convenient to
> specify a 'dtype' by itself when generating such numbers in a similar vein
> to numpy.empty, except with initialized values.
>
> Looking forward to your feedback!
>
> Greg
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal to add `weights` to `np.percentile` and `np.median`

2016-02-16 Thread josef.pktd
On Tue, Feb 16, 2016 at 2:48 PM, Joseph Fox-Rabinovitz <
jfoxrabinov...@gmail.com> wrote:

> Please correct me if I misunderstood, but the code in that commit is
> doing a full sort, somewhat similar to what
> `scipy.stats.scoreatpercentile`. If that is correct, I will run some
> benchmarks first, but I think there is value to going forward with a
> numpy version that extends the current partitioning scheme.
>

I think so, but it's hiding inside pandas groupby, which also uses a hash,
IIUC.
AFAICS, the main reason it's implemented this way is to get correct tie
handling.

There could be large performance differences depending on whether there are
many ties (discretized data) or only unique floats.

(just guessing)

Josef



>
> - Joe
>
> On Tue, Feb 16, 2016 at 2:39 PM,   wrote:
> >
> >
> > On Tue, Feb 16, 2016 at 1:41 PM, Joseph Fox-Rabinovitz
> >  wrote:
> >>
> >> Thanks for pointing me to that. I had something a bit different in
> >> mind but that definitely looks like a good start.
> >>
> >> On Tue, Feb 16, 2016 at 1:32 PM, Antony Lee 
> >> wrote:
> >> > See earlier discussion here:
> https://github.com/numpy/numpy/issues/6326
> >> > Basically, naïvely sorting may be faster than a not-so-optimized
> version
> >> > of
> >> > quickselect.
> >> >
> >> > Antony
> >> >
> >> > 2016-02-15 21:49 GMT-08:00 Joseph Fox-Rabinovitz
> >> > :
> >> >>
> >> >> I would like to add a `weights` keyword to `np.partition`,
> >> >> `np.percentile` and `np.median`. My reason for doing so is to to
> allow
> >> >> `np.histogram` to process automatic bin selection with weights.
> >> >> Currently, weights are not supported for the automatic bin selection
> >> >> and would be difficult to support in `auto` mode without having
> >> >> `np.percentile` support a `weights` keyword. I suspect that there are
> >> >> many other uses for such a feature.
> >> >>
> >> >> I have taken a preliminary look at the C implementation of the
> >> >> partition functions that are the basis for `partition`, `median` and
> >> >> `percentile`. I think that it would be possible to add versions (or
> >> >> just extend the functionality of existing ones) that check the ratio
> >> >> of the weights below the partition point to the total sum of the
> >> >> weights instead of just counting elements.
> >> >>
> >> >> One of the main advantages of such an implementation is that it would
> >> >> allow any real weights to be handled correctly, not just integers.
> >> >> Complex weights would not be supported.
> >> >>
> >> >> The purpose of this email is to see if anybody objects, has ideas or
> >> >> cares at all about this proposal before I spend a significant amount
> >> >> of time working on it. For example, did I miss any functions in my
> >> >> list?
> >> >>
> >> >> Regards,
> >> >>
> >> >> -Joe
> >> >> ___
> >> >> NumPy-Discussion mailing list
> >> >> NumPy-Discussion@scipy.org
> >> >> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >> >
> >> >
> >> >
> >> > ___
> >> > NumPy-Discussion mailing list
> >> > NumPy-Discussion@scipy.org
> >> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >> >
> >> ___
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion@scipy.org
> >> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> >
> > statsmodels just got weighted quantiles
> > https://github.com/statsmodels/statsmodels/pull/2707
> >
> > I didn't try to figure out it's computational efficiency, and we would
> > gladly delegate to whatever fast algorithm would be in numpy.
> >
> > Josef
> >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal to add `weights` to `np.percentile` and `np.median`

2016-02-16 Thread josef.pktd
On Tue, Feb 16, 2016 at 1:41 PM, Joseph Fox-Rabinovitz <
jfoxrabinov...@gmail.com> wrote:

> Thanks for pointing me to that. I had something a bit different in
> mind but that definitely looks like a good start.
>
> On Tue, Feb 16, 2016 at 1:32 PM, Antony Lee 
> wrote:
> > See earlier discussion here: https://github.com/numpy/numpy/issues/6326
> > Basically, naïvely sorting may be faster than a not-so-optimized version
> of
> > quickselect.
> >
> > Antony
> >
> > 2016-02-15 21:49 GMT-08:00 Joseph Fox-Rabinovitz <
> jfoxrabinov...@gmail.com>:
> >>
> >> I would like to add a `weights` keyword to `np.partition`,
> >> `np.percentile` and `np.median`. My reason for doing so is to to allow
> >> `np.histogram` to process automatic bin selection with weights.
> >> Currently, weights are not supported for the automatic bin selection
> >> and would be difficult to support in `auto` mode without having
> >> `np.percentile` support a `weights` keyword. I suspect that there are
> >> many other uses for such a feature.
> >>
> >> I have taken a preliminary look at the C implementation of the
> >> partition functions that are the basis for `partition`, `median` and
> >> `percentile`. I think that it would be possible to add versions (or
> >> just extend the functionality of existing ones) that check the ratio
> >> of the weights below the partition point to the total sum of the
> >> weights instead of just counting elements.
> >>
> >> One of the main advantages of such an implementation is that it would
> >> allow any real weights to be handled correctly, not just integers.
> >> Complex weights would not be supported.
> >>
> >> The purpose of this email is to see if anybody objects, has ideas or
> >> cares at all about this proposal before I spend a significant amount
> >> of time working on it. For example, did I miss any functions in my
> >> list?
> >>
> >> Regards,
> >>
> >> -Joe
> >> ___
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion@scipy.org
> >> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>


statsmodels just got weighted quantiles
https://github.com/statsmodels/statsmodels/pull/2707

I didn't try to figure out it's computational efficiency, and we would
gladly delegate to whatever fast algorithm would be in numpy.

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy 1.11.0b3 released.

2016-02-15 Thread josef.pktd
On Tue, Feb 16, 2016 at 12:09 AM,  wrote:

>
>
> On Mon, Feb 15, 2016 at 11:31 PM, Charles R Harris <
> charlesr.har...@gmail.com> wrote:
>
>>
>>
>> On Mon, Feb 15, 2016 at 9:15 PM,  wrote:
>>
>>>
>>>
>>> On Mon, Feb 15, 2016 at 11:05 PM, Charles R Harris <
>>> charlesr.har...@gmail.com> wrote:
>>>


 On Mon, Feb 15, 2016 at 8:50 PM,  wrote:

>
>
> On Mon, Feb 15, 2016 at 10:46 PM,  wrote:
>
>
>>
>> On Fri, Feb 12, 2016 at 4:19 PM, Nathan Goldbaum <
>> nathan12...@gmail.com> wrote:
>>
>>>
>>> https://github.com/numpy/numpy/blob/master/doc/release/1.11.0-notes.rst
>>>
>>> On Fri, Feb 12, 2016 at 3:17 PM, Andreas Mueller 
>>> wrote:
>>>
 Hi.
 Where can I find the changelog?
 It would be good for us to know which changes are done one purpos
 without hunting through the issue tracker.

 Thanks,
 Andy


 On 02/09/2016 09:09 PM, Charles R Harris wrote:

 Hi All,

 I'm pleased to announce the release of NumPy 1.11.0b3. This beta
 contains additional bug fixes as well as limiting the number of
 FutureWarnings raised by assignment to masked array slices. One issue 
 that
 remains to be decided is whether or not to postpone raising an error 
 for
 floats used as indexes. Sources may be found on Sourceforge
  and
 both sources and OS X wheels are availble on pypi. Please test, 
 hopefully
 this will be that last beta needed.

 As a note on problems encountered, twine uploads continue to fail
 for me, but there are still variations to try. The wheeluploader 
 downloaded
 wheels as it should, but could not upload them, giving the error 
 message
 "HTTPError: 413 Client Error: Request Entity Too Large for url:
 https://www.python.org/pypi;. Firefox
 also complains that http://wheels.scipy.org is incorrectly
 configured with an invalid certificate.

 Enjoy,

 Chuck


 ___
 NumPy-Discussion mailing 
 listNumPy-Discussion@scipy.orghttps://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 https://mail.scipy.org/mailman/listinfo/numpy-discussion


>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
> (try to send again)
>
>
>>
>> another indexing question:  (not covered by unit test but showed up
>> in examples in statsmodels)
>>
>>
>> This works in numpy at least 1.9.2 and 1.6.1   (python 2.7, and
>> python 3.4)
>>
>> >>> list(range(5))[np.array([0])]
>> 0
>>
>>
>>
>> on numpy 0.11.0b2   (I'm not yet at b3)   (python 3.4)
>>
>> I get the same exception as here but even if there is just one element
>>
>>
>> >>> list(range(5))[np.array([0, 1])]
>> Traceback (most recent call last):
>>   File "", line 1, in 
>> list(range(5))[np.array([0, 1])]
>> TypeError: only integer arrays with one element can be converted to
>> an index
>>
>
 Looks like a misleading error message. Apparently it requires scalar
 arrays (ndim == 0)

 In [3]: list(range(5))[np.array(0)]
 Out[3]: 0

>>>
>>>
>>> We have a newer version of essentially same function a second time that
>>> uses squeeze and that seems to work fine.
>>>
>>> Just to understand
>>>
>>> Why does this depend on the numpy version?  I would have understood that
>>> this always failed, but this code worked for several years.
>>> https://github.com/statsmodels/statsmodels/issues/2817
>>>
>>
>> It's part of the indexing cleanup.
>>
>> In [2]: list(range(5))[np.array([0])]
>> /home/charris/.local/bin/ipython:1: VisibleDeprecationWarning: converting
>> an array with ndim > 0 to an index will result in an error in the future
>>   #!/usr/bin/python
>> Out[2]: 0
>>
>> The use of multidimensional arrays as indexes is likely a coding error.
>> Or so we hope...
>>
>
> Thanks for the explanation
>
>
> Or, it forces everyone to watch out for the color of the ducks :)
>
> It's just a number, whether it's python scalar, numpy scalar, 1D or 2D.
> And once we squeeze, we cannot iterate over it anymore.
>
>
> This looks like the last problem with have in 

Re: [Numpy-discussion] NumPy 1.11.0b3 released.

2016-02-15 Thread josef.pktd
On Mon, Feb 15, 2016 at 11:31 PM, Charles R Harris <
charlesr.har...@gmail.com> wrote:

>
>
> On Mon, Feb 15, 2016 at 9:15 PM,  wrote:
>
>>
>>
>> On Mon, Feb 15, 2016 at 11:05 PM, Charles R Harris <
>> charlesr.har...@gmail.com> wrote:
>>
>>>
>>>
>>> On Mon, Feb 15, 2016 at 8:50 PM,  wrote:
>>>


 On Mon, Feb 15, 2016 at 10:46 PM,  wrote:


>
> On Fri, Feb 12, 2016 at 4:19 PM, Nathan Goldbaum <
> nathan12...@gmail.com> wrote:
>
>>
>> https://github.com/numpy/numpy/blob/master/doc/release/1.11.0-notes.rst
>>
>> On Fri, Feb 12, 2016 at 3:17 PM, Andreas Mueller 
>> wrote:
>>
>>> Hi.
>>> Where can I find the changelog?
>>> It would be good for us to know which changes are done one purpos
>>> without hunting through the issue tracker.
>>>
>>> Thanks,
>>> Andy
>>>
>>>
>>> On 02/09/2016 09:09 PM, Charles R Harris wrote:
>>>
>>> Hi All,
>>>
>>> I'm pleased to announce the release of NumPy 1.11.0b3. This beta
>>> contains additional bug fixes as well as limiting the number of
>>> FutureWarnings raised by assignment to masked array slices. One issue 
>>> that
>>> remains to be decided is whether or not to postpone raising an error for
>>> floats used as indexes. Sources may be found on Sourceforge
>>>  and
>>> both sources and OS X wheels are availble on pypi. Please test, 
>>> hopefully
>>> this will be that last beta needed.
>>>
>>> As a note on problems encountered, twine uploads continue to fail
>>> for me, but there are still variations to try. The wheeluploader 
>>> downloaded
>>> wheels as it should, but could not upload them, giving the error message
>>> "HTTPError: 413 Client Error: Request Entity Too Large for url:
>>> https://www.python.org/pypi;. Firefox
>>> also complains that http://wheels.scipy.org is incorrectly
>>> configured with an invalid certificate.
>>>
>>> Enjoy,
>>>
>>> Chuck
>>>
>>>
>>> ___
>>> NumPy-Discussion mailing 
>>> listNumPy-Discussion@scipy.orghttps://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
 (try to send again)


>
> another indexing question:  (not covered by unit test but showed up in
> examples in statsmodels)
>
>
> This works in numpy at least 1.9.2 and 1.6.1   (python 2.7, and python
> 3.4)
>
> >>> list(range(5))[np.array([0])]
> 0
>
>
>
> on numpy 0.11.0b2   (I'm not yet at b3)   (python 3.4)
>
> I get the same exception as here but even if there is just one element
>
>
> >>> list(range(5))[np.array([0, 1])]
> Traceback (most recent call last):
>   File "", line 1, in 
> list(range(5))[np.array([0, 1])]
> TypeError: only integer arrays with one element can be converted to an
> index
>

>>> Looks like a misleading error message. Apparently it requires scalar
>>> arrays (ndim == 0)
>>>
>>> In [3]: list(range(5))[np.array(0)]
>>> Out[3]: 0
>>>
>>
>>
>> We have a newer version of essentially same function a second time that
>> uses squeeze and that seems to work fine.
>>
>> Just to understand
>>
>> Why does this depend on the numpy version?  I would have understood that
>> this always failed, but this code worked for several years.
>> https://github.com/statsmodels/statsmodels/issues/2817
>>
>
> It's part of the indexing cleanup.
>
> In [2]: list(range(5))[np.array([0])]
> /home/charris/.local/bin/ipython:1: VisibleDeprecationWarning: converting
> an array with ndim > 0 to an index will result in an error in the future
>   #!/usr/bin/python
> Out[2]: 0
>
> The use of multidimensional arrays as indexes is likely a coding error. Or
> so we hope...
>

Thanks for the explanation


Or, it forces everyone to watch out for the color of the ducks :)

It's just a number, whether it's python scalar, numpy scalar, 1D or 2D.
And once we squeeze, we cannot iterate over it anymore.


This looks like the last problem with have in statsmodels master.
Part of the reason that 0.10 hurt quite a bit is that we are using in
statsmodels some of the grey zones so we don't have to commit to a specific
usage. Even if a user or developer tries a "weird" case, it works for most
of the 

Re: [Numpy-discussion] NumPy 1.11.0b3 released.

2016-02-15 Thread josef.pktd
On Mon, Feb 15, 2016 at 10:46 PM,  wrote:

>
>
> On Fri, Feb 12, 2016 at 4:19 PM, Nathan Goldbaum 
> wrote:
>
>> https://github.com/numpy/numpy/blob/master/doc/release/1.11.0-notes.rst
>>
>> On Fri, Feb 12, 2016 at 3:17 PM, Andreas Mueller 
>> wrote:
>>
>>> Hi.
>>> Where can I find the changelog?
>>> It would be good for us to know which changes are done one purpos
>>> without hunting through the issue tracker.
>>>
>>> Thanks,
>>> Andy
>>>
>>>
>>> On 02/09/2016 09:09 PM, Charles R Harris wrote:
>>>
>>> Hi All,
>>>
>>> I'm pleased to announce the release of NumPy 1.11.0b3. This beta
>>> contains additional bug fixes as well as limiting the number of
>>> FutureWarnings raised by assignment to masked array slices. One issue that
>>> remains to be decided is whether or not to postpone raising an error for
>>> floats used as indexes. Sources may be found on Sourceforge
>>>  and both
>>> sources and OS X wheels are availble on pypi. Please test, hopefully this
>>> will be that last beta needed.
>>>
>>> As a note on problems encountered, twine uploads continue to fail for
>>> me, but there are still variations to try. The wheeluploader downloaded
>>> wheels as it should, but could not upload them, giving the error message
>>> "HTTPError: 413 Client Error: Request Entity Too Large for url:
>>> https://www.python.org/pypi;. Firefox also
>>> complains that http://wheels.scipy.org is incorrectly configured with
>>> an invalid certificate.
>>>
>>> Enjoy,
>>>
>>> Chuck
>>>
>>>
>>> ___
>>> NumPy-Discussion mailing 
>>> listNumPy-Discussion@scipy.orghttps://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
(try to send again)


>
> another indexing question:  (not covered by unit test but showed up in
> examples in statsmodels)
>
>
> This works in numpy at least 1.9.2 and 1.6.1   (python 2.7, and python 3.4)
>
> >>> list(range(5))[np.array([0])]
> 0
>
>
>
> on numpy 0.11.0b2   (I'm not yet at b3)   (python 3.4)
>
> I get the same exception as here but even if there is just one element
>
>
> >>> list(range(5))[np.array([0, 1])]
> Traceback (most recent call last):
>   File "", line 1, in 
> list(range(5))[np.array([0, 1])]
> TypeError: only integer arrays with one element can be converted to an
> index
>
>
> the actual code uses pop on a python list with a return from
> np.where(...)[0]   that returns a one element int64 array
>
> Josef
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Suggestion: special-case np.array(range(...)) to be faster

2016-02-14 Thread josef.pktd
On Sun, Feb 14, 2016 at 3:21 AM, Antony Lee  wrote:

> re: no reason why...
> This has nothing to do with Python2/Python3 (I personally stopped using
> Python2 at least 3 years ago.)  Let me put it this way instead: if
> Python3's "range" (or Python2's "xrange") was not a builtin type but a type
> provided by numpy, I don't think it would be controversial at all to
> provide an `__array__` special method to efficiently convert it to a
> ndarray.  It would be the same if `np.array` used a
> `functools.singledispatch` dispatcher rather than an `__array__` special
> method (which is obviously not possible for chronological reasons).
>
>
But numpy does provide arange.
What's the reason to not use np.arange and use an iterator instead?



> re: iterable vs iterator: check for the presence of the __next__ special
> method (or isinstance(x, Iterable) vs. isinstance(x, Iterator) and not
> isinstance(x, Iterable))
>

AFAIR and from spot checking the mailing list, in the past the argument was
that it's too complicated to mix array/asarray creation with fromiter
building of arrays.

(I have no idea if array could cheaply delegate to fromiter.)


Josef



>
> Antony
>
>
> 2016-02-13 18:48 GMT-08:00 :
>
>>
>>
>> On Sat, Feb 13, 2016 at 9:43 PM,  wrote:
>>
>>>
>>>
>>> On Sat, Feb 13, 2016 at 8:57 PM, Antony Lee 
>>> wrote:
>>>
 Compare (on Python3 -- for Python2, read "xrange" instead of "range"):

 In [2]: %timeit np.array(range(100), np.int64)
 10 loops, best of 3: 156 ms per loop

 In [3]: %timeit np.arange(100, dtype=np.int64)
 1000 loops, best of 3: 853 µs per loop


 Note that while iterating over a range is not very fast, it is still
 much better than the array creation:

 In [4]: from collections import deque

 In [5]: %timeit deque(range(100), 1)
 10 loops, best of 3: 25.5 ms per loop


 On one hand, special cases are awful. On the other hand, the range
 builtin is probably important enough to deserve a special case to make this
 construction faster. Or not? I initially opened this as
 https://github.com/numpy/numpy/issues/7233 but it was suggested there
 that this should be discussed on the ML first.

 (The real issue which prompted this suggestion: I was building sparse
 matrices using scipy.sparse.csc_matrix with some indices specified using
 range, and that construction step turned out to take a significant portion
 of the time because of the calls to np.array).

>>>
>>>
>>> IMO: I don't see a reason why this should be supported. There is
>>> np.arange after all for this usecase, and from_iter.
>>> range and the other guys are iterators, and in several cases we can use
>>> larange = list(range(...)) as a short cut to get python list.for python 2/3
>>> compatibility.
>>>
>>> I think this might be partially a learning effect in the python 2 to 3
>>> transition. After using almost only python 3 for maybe a year, I don't
>>> think it's difficult to remember the differences when writing code that is
>>> py 2.7 and py 3.x compatible.
>>>
>>>
>>> It's just **another** thing to watch out for if milliseconds matter in
>>> your application.
>>>
>>
>>
>> side question: Is there a simple way to distinguish a iterator or
>> generator from an iterable data structure?
>>
>> Josef
>>
>>
>>
>>>
>>> Josef
>>>
>>>

 Antony

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 https://mail.scipy.org/mailman/listinfo/numpy-discussion


>>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [Suggestion] Labelled Array

2016-02-13 Thread josef.pktd
On Sat, Feb 13, 2016 at 1:01 PM, Allan Haldane 
wrote:

> Sorry, to reply to myself here, but looking at it with fresh eyes maybe
> the performance of the naive version isn't too bad. Here's a comparison of
> the naive vs a better implementation:
>
> def split_classes_naive(c, v):
> return [v[c == u] for u in unique(c)]
>
> def split_classes(c, v):
> perm = c.argsort()
> csrt = c[perm]
> div = where(csrt[1:] != csrt[:-1])[0] + 1
> return [v[x] for x in split(perm, div)]
>
> >>> c = randint(0,32,size=10)
> >>> v = arange(10)
> >>> %timeit split_classes_naive(c,v)
> 100 loops, best of 3: 8.4 ms per loop
> >>> %timeit split_classes(c,v)
> 100 loops, best of 3: 4.79 ms per loop
>

The usecases I recently started to target for similar things is 1 Million
or more rows and 1 uniques in the labels.
The second version should be faster for large number of uniques, I guess.

Overall numpy is falling far behind pandas in terms of simple groupby
operations. bincount and histogram (IIRC) worked for some cases but are
rather limited.

reduce_at looks nice for cases where it applies.

In contrast to the full sized labels in the original post, I only know of
applications where the labels are 1-D corresponding to rows or columns.

Josef



>
> In any case, maybe it is useful to Sergio or others.
>
> Allan
>
>
> On 02/13/2016 12:11 PM, Allan Haldane wrote:
>
>> I've had a pretty similar idea for a new indexing function
>> 'split_classes' which would help in your case, which essentially does
>>
>>  def split_classes(c, v):
>>  return [v[c == u] for u in unique(c)]
>>
>> Your example could be coded as
>>
>>  >>> [sum(c) for c in split_classes(label, data)]
>>  [9, 12, 15]
>>
>> I feel I've come across the need for such a function often enough that
>> it might be generally useful to people as part of numpy. The
>> implementation of split_classes above has pretty poor performance
>> because it creates many temporary boolean arrays, so my plan for a PR
>> was to have a speedy version of it that uses a single pass through v.
>> (I often wanted to use this function on large datasets).
>>
>> If anyone has any comments on the idea (good idea. bad idea?) I'd love
>> to hear.
>>
>> I have some further notes and examples here:
>> https://gist.github.com/ahaldane/1e673d2fe6ffe0be4f21
>>
>> Allan
>>
>> On 02/12/2016 09:40 AM, Sérgio wrote:
>>
>>> Hello,
>>>
>>> This is my first e-mail, I will try to make the idea simple.
>>>
>>> Similar to masked array it would be interesting to use a label array to
>>> guide operations.
>>>
>>> Ex.:
>>>  >>> x
>>> labelled_array(data =
>>>   [[0 1 2]
>>>   [3 4 5]
>>>   [6 7 8]],
>>>  label =
>>>   [[0 1 2]
>>>   [0 1 2]
>>>   [0 1 2]])
>>>
>>>  >>> sum(x)
>>> array([9, 12, 15])
>>>
>>> The operations would create a new axis for label indexing.
>>>
>>> You could think of it as a collection of masks, one for each label.
>>>
>>> I don't know a way to make something like this efficiently without a
>>> loop. Just wondering...
>>>
>>> Sérgio.
>>>
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [Suggestion] Labelled Array

2016-02-13 Thread josef.pktd
On Sat, Feb 13, 2016 at 1:42 PM, Jeff Reback  wrote:

> These operations get slower as the number of groups increase, but with a
> faster function (e.g. the standard ones which are cythonized), the
> constant on
> the increase is pretty low.
>
> In [23]: c = np.random.randint(0,1,size=10)
>
> In [24]: df = DataFrame({'v' : v, 'c' : c})
>
> In [25]: %timeit df.groupby('c').count()
> 100 loops, best of 3: 3.18 ms per loop
>
> In [26]: len(df.groupby('c').count())
> Out[26]: 1
>
> In [27]: df.groupby('c').count()
> Out[27]:
>v
> c
> 0  9
> 1 11
> 2  7
> 3  8
> 4 16
> ...   ..
> 9995  11
> 9996  13
> 9997  13
> 9998   7
>   10
>
> [1 rows x 1 columns]
>
>
One other difference across usecases is whether this is a single operation,
or we want to optimize the data format for a large number of different
calculations.  (We have both cases in statsmodels.)

In the latter case it's worth spending some extra computational effort on
rearranging the data to be either sorted or in lists of arrays, (I guess
without having done any timings).

Josef




>
> On Sat, Feb 13, 2016 at 1:39 PM, Jeff Reback  wrote:
>
>> In [10]: pd.options.display.max_rows=10
>>
>> In [13]: np.random.seed(1234)
>>
>> In [14]: c = np.random.randint(0,32,size=10)
>>
>> In [15]: v = np.arange(10)
>>
>> In [16]: df = DataFrame({'v' : v, 'c' : c})
>>
>> In [17]: df
>> Out[17]:
>> c  v
>> 0  15  0
>> 1  19  1
>> 2   6  2
>> 3  21  3
>> 4  12  4
>> ........
>> 5   7  5
>> 6   2  6
>> 7  27  7
>> 8  28  8
>> 9   7  9
>>
>> [10 rows x 2 columns]
>>
>> In [19]: df.groupby('c').count()
>> Out[19]:
>>v
>> c
>> 0   3136
>> 1   3229
>> 2   3093
>> 3   3121
>> 4   3041
>> ..   ...
>> 27  3128
>> 28  3063
>> 29  3147
>> 30  3073
>> 31  3090
>>
>> [32 rows x 1 columns]
>>
>> In [20]: %timeit df.groupby('c').count()
>> 100 loops, best of 3: 2 ms per loop
>>
>> In [21]: %timeit df.groupby('c').mean()
>> 100 loops, best of 3: 2.39 ms per loop
>>
>> In [22]: df.groupby('c').mean()
>> Out[22]:
>>v
>> c
>> 0   49883.384885
>> 1   50233.692165
>> 2   48634.116069
>> 3   50811.743992
>> 4   50505.368629
>> ..   ...
>> 27  49715.349425
>> 28  50363.501469
>> 29  50485.395933
>> 30  50190.155223
>> 31  50691.041748
>>
>> [32 rows x 1 columns]
>>
>>
>> On Sat, Feb 13, 2016 at 1:29 PM,  wrote:
>>
>>>
>>>
>>> On Sat, Feb 13, 2016 at 1:01 PM, Allan Haldane 
>>> wrote:
>>>
 Sorry, to reply to myself here, but looking at it with fresh eyes maybe
 the performance of the naive version isn't too bad. Here's a comparison of
 the naive vs a better implementation:

 def split_classes_naive(c, v):
 return [v[c == u] for u in unique(c)]

 def split_classes(c, v):
 perm = c.argsort()
 csrt = c[perm]
 div = where(csrt[1:] != csrt[:-1])[0] + 1
 return [v[x] for x in split(perm, div)]

 >>> c = randint(0,32,size=10)
 >>> v = arange(10)
 >>> %timeit split_classes_naive(c,v)
 100 loops, best of 3: 8.4 ms per loop
 >>> %timeit split_classes(c,v)
 100 loops, best of 3: 4.79 ms per loop

>>>
>>> The usecases I recently started to target for similar things is 1
>>> Million or more rows and 1 uniques in the labels.
>>> The second version should be faster for large number of uniques, I guess.
>>>
>>> Overall numpy is falling far behind pandas in terms of simple groupby
>>> operations. bincount and histogram (IIRC) worked for some cases but are
>>> rather limited.
>>>
>>> reduce_at looks nice for cases where it applies.
>>>
>>> In contrast to the full sized labels in the original post, I only know
>>> of applications where the labels are 1-D corresponding to rows or columns.
>>>
>>> Josef
>>>
>>>
>>>

 In any case, maybe it is useful to Sergio or others.

 Allan


 On 02/13/2016 12:11 PM, Allan Haldane wrote:

> I've had a pretty similar idea for a new indexing function
> 'split_classes' which would help in your case, which essentially does
>
>  def split_classes(c, v):
>  return [v[c == u] for u in unique(c)]
>
> Your example could be coded as
>
>  >>> [sum(c) for c in split_classes(label, data)]
>  [9, 12, 15]
>
> I feel I've come across the need for such a function often enough that
> it might be generally useful to people as part of numpy. The
> implementation of split_classes above has pretty poor performance
> because it creates many temporary boolean arrays, so my plan for a PR
> was to have a speedy version of it that uses a single pass through v.
> (I often wanted to use this function on large datasets).
>
> If anyone has any comments on the idea (good idea. bad idea?) I'd love

Re: [Numpy-discussion] ANN: numpydoc 0.6.0 released

2016-02-13 Thread josef.pktd
On Sat, Feb 13, 2016 at 10:03 AM, Ralf Gommers 
wrote:

> Hi all,
>
> I'm pleased to announce the release of numpydoc 0.6.0. The main new
> feature is support for the Yields section in numpy-style docstrings. This
> is described in
> https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt
>
> Numpydoc can be installed from PyPi: https://pypi.python.org/pypi/numpydoc
>


Thanks,

BTW: the status section in the howto still refers to the documentation
editor, which has been retired AFAIK.

Josef



>
>
> Cheers,
> Ralf
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Suggestion: special-case np.array(range(...)) to be faster

2016-02-13 Thread josef.pktd
On Sat, Feb 13, 2016 at 9:43 PM,  wrote:

>
>
> On Sat, Feb 13, 2016 at 8:57 PM, Antony Lee 
> wrote:
>
>> Compare (on Python3 -- for Python2, read "xrange" instead of "range"):
>>
>> In [2]: %timeit np.array(range(100), np.int64)
>> 10 loops, best of 3: 156 ms per loop
>>
>> In [3]: %timeit np.arange(100, dtype=np.int64)
>> 1000 loops, best of 3: 853 µs per loop
>>
>>
>> Note that while iterating over a range is not very fast, it is still much
>> better than the array creation:
>>
>> In [4]: from collections import deque
>>
>> In [5]: %timeit deque(range(100), 1)
>> 10 loops, best of 3: 25.5 ms per loop
>>
>>
>> On one hand, special cases are awful. On the other hand, the range
>> builtin is probably important enough to deserve a special case to make this
>> construction faster. Or not? I initially opened this as
>> https://github.com/numpy/numpy/issues/7233 but it was suggested there
>> that this should be discussed on the ML first.
>>
>> (The real issue which prompted this suggestion: I was building sparse
>> matrices using scipy.sparse.csc_matrix with some indices specified using
>> range, and that construction step turned out to take a significant portion
>> of the time because of the calls to np.array).
>>
>
>
> IMO: I don't see a reason why this should be supported. There is np.arange
> after all for this usecase, and from_iter.
> range and the other guys are iterators, and in several cases we can use
> larange = list(range(...)) as a short cut to get python list.for python 2/3
> compatibility.
>
> I think this might be partially a learning effect in the python 2 to 3
> transition. After using almost only python 3 for maybe a year, I don't
> think it's difficult to remember the differences when writing code that is
> py 2.7 and py 3.x compatible.
>
>
> It's just **another** thing to watch out for if milliseconds matter in
> your application.
>


side question: Is there a simple way to distinguish a iterator or generator
from an iterable data structure?

Josef



>
> Josef
>
>
>>
>> Antony
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Suggestion: special-case np.array(range(...)) to be faster

2016-02-13 Thread josef.pktd
On Sat, Feb 13, 2016 at 8:57 PM, Antony Lee  wrote:

> Compare (on Python3 -- for Python2, read "xrange" instead of "range"):
>
> In [2]: %timeit np.array(range(100), np.int64)
> 10 loops, best of 3: 156 ms per loop
>
> In [3]: %timeit np.arange(100, dtype=np.int64)
> 1000 loops, best of 3: 853 µs per loop
>
>
> Note that while iterating over a range is not very fast, it is still much
> better than the array creation:
>
> In [4]: from collections import deque
>
> In [5]: %timeit deque(range(100), 1)
> 10 loops, best of 3: 25.5 ms per loop
>
>
> On one hand, special cases are awful. On the other hand, the range builtin
> is probably important enough to deserve a special case to make this
> construction faster. Or not? I initially opened this as
> https://github.com/numpy/numpy/issues/7233 but it was suggested there
> that this should be discussed on the ML first.
>
> (The real issue which prompted this suggestion: I was building sparse
> matrices using scipy.sparse.csc_matrix with some indices specified using
> range, and that construction step turned out to take a significant portion
> of the time because of the calls to np.array).
>


IMO: I don't see a reason why this should be supported. There is np.arange
after all for this usecase, and from_iter.
range and the other guys are iterators, and in several cases we can use
larange = list(range(...)) as a short cut to get python list.for python 2/3
compatibility.

I think this might be partially a learning effect in the python 2 to 3
transition. After using almost only python 3 for maybe a year, I don't
think it's difficult to remember the differences when writing code that is
py 2.7 and py 3.x compatible.


It's just **another** thing to watch out for if milliseconds matter in your
application.

Josef


>
> Antony
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy 1.11.0b3 released.

2016-02-10 Thread josef.pktd
On Wed, Feb 10, 2016 at 5:36 PM, Charles R Harris  wrote:

>
>
> On Wed, Feb 10, 2016 at 2:58 PM, Pauli Virtanen  wrote:
>
>> 10.02.2016, 04:09, Charles R Harris kirjoitti:
>> > I'm pleased to announce the release of NumPy 1.11.0b3. This beta
>> contains
>> [clip]
>> > Please test, hopefully this will be that last beta needed.
>>
>> FWIW, https://travis-ci.org/pv/testrig/builds/108384173
>
>
> Thanks Pauli, very interesting.
>


Thanks Pauli, me too

is this intended?:

return np.r_[[np.nan] * head, x, [np.nan] * tail]
TypeError: 'numpy.float64' object cannot be interpreted as an index


In the old times of Python 2.x, statsmodels avoided integers so we don't
get accidental integer division.
Python wanted float() everywhere. Looks like numpy wants int() everywhere.
(fixed in statsmodels master)


Josef






>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy 1.11.0b2 released

2016-02-05 Thread josef.pktd
On Fri, Feb 5, 2016 at 3:24 PM,  wrote:

>
>
> On Fri, Feb 5, 2016 at 1:14 PM, Pauli Virtanen  wrote:
>
>> 05.02.2016, 19:55, Nathaniel Smith kirjoitti:
>> > On Feb 5, 2016 8:28 AM, "Chris Barker - NOAA Federal" <
>> chris.bar...@noaa.gov>
>> > wrote:
>> >>
>> >>> An extra ~2 hours of tests / 6-way parallelism is not that big a deal
>> >>> in the grand scheme of things (and I guess it's probably less than
>> >>> that if we can take advantage of existing binary builds)
>> >>
>> >> If we set up a numpy-testing conda channel, it could be used to cache
>> >> binary builds for all he versions of everything we want to test
>> >> against.
>> >>
>> >> Conda-build-all could make it manageable to maintain that channel.
>> >
>> > What would be the advantage of maintaining that channel ourselves
>> instead
>> > of using someone else's binary builds that already exist (e.g.
>> Anaconda's,
>> > or official project wheels)?
>>
>> ABI compatibility. However, as I understand it, not many backward ABI
>> incompatible changes in Numpy are not expected in future.
>>
>> If they were, I note that if you work in the same environment, you can
>> push repeated compilation times to zero compared to the time it takes to
>> run tests in a way that requires less configuration, by enabling
>> ccache/f90cache.
>>
>
>
> control of fortran compiler and libraries
>
> I was just looking at some new test errors on TravisCI in unchanged code
> of statsmodels, and it looks like conda switched from openblas to mkl
> yesterday.
>
> (statsmodels doesn't care when compiling which BLAS/LAPACK is used as long
> as they work because we don't have Fortran code.)
>
> Josef
>
>
(sending again, delivery refused)


>
>
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Behavior of np.random.uniform

2016-01-19 Thread josef.pktd
On Tue, Jan 19, 2016 at 12:43 PM, Charles R Harris <
charlesr.har...@gmail.com> wrote:

>
>
> On Tue, Jan 19, 2016 at 10:42 AM, Robert Kern 
> wrote:
>
>> On Tue, Jan 19, 2016 at 5:40 PM, Charles R Harris <
>> charlesr.har...@gmail.com> wrote:
>> >
>> > On Tue, Jan 19, 2016 at 10:36 AM, Robert Kern 
>> wrote:
>> >>
>> >> On Tue, Jan 19, 2016 at 5:27 PM, Charles R Harris <
>> charlesr.har...@gmail.com> wrote:
>> >> >
>> >>
>> >> > On Tue, Jan 19, 2016 at 9:23 AM, Chris Barker - NOAA Federal <
>> chris.bar...@noaa.gov> wrote:
>> >> >>
>> >> >> What does the standard lib do for rand range? I see that randint Is
>> closed on both ends, so order doesn't matter, though if it raises for b> then that's a precedent we could follow.
>> >> >
>> >> > randint is not closed on the high end. The now deprecated
>> random_integers is the function that does that.
>> >> >
>> >> > For floats, it's good to have various interval options. For
>> instance, in generating numbers that will be inverted or have their log
>> taken it is good to avoid zero. However, the names 'low' and 'high' are
>> misleading...
>> >>
>> >> They are correctly leading the users to the manner in which the author
>> intended the function to be used. The *implementation* is misleading by
>> allowing users to do things contrary to the documented intent. ;-)
>> >>
>> >> With floating point and general intervals, there is not really a good
>> way to guarantee that the generated results avoid the "open" end of the
>> specified interval or even stay *within* that interval. This function is
>> definitely not intended to be used as `uniform(closed_end, open_end)`.
>> >
>> > Well, it is possible to make that happen if one is careful or directly
>> sets the bits in ieee types...
>>
>> For the unit interval, certainly. For general bounds, I am not so sure.
>>
>
> Point taken.
>

What's the practical importance of this. The boundary points have
probability zero, theoretically.

What happens if low and high are only a few nulps apart?

If you clip the distribution to obey boundary rules you create mass points
:)

Josef


>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy 1.10.3 release.

2016-01-03 Thread josef.pktd
On Sun, Jan 3, 2016 at 12:05 AM,   wrote:
> On Sat, Jan 2, 2016 at 4:47 PM, Charles R Harris
>  wrote:
>> Hi All,
>>
>> A significant segfault problem has been reported against Numpy 1.10.2 and I
>> want to make a quick 1.10.3 release to get it fixed. Two questions
>>
>> What exactly is the release process that has been decided on? AFAIK, I
>> should just do a source release on Sourceforge, ping Matthew to produce
>> wheels for Mac and wait for him to put them on pypi, and then upload the
>> sources to pypi. No windows binaries are to be produced.
>> Is there anything else that needs fixing for 1.10.3?
>
>
> I'm running the 1.10.2 tests on Windows 10 in a virtualbox on Windows 8.1
> using Gohlke binary for MKL on a fresh Python 3.5
>
> This test
> "Test workarounds for 32-bit limited fwrite, fseek, and ftell ..."
> is taking a very long time. Is this expected?
>
>
> I get the following errors on Windows 10, and also on Windows 8.1
> Winpython 3.4 (except for the last "Unable to find vcvarsall.bat"
> because it's set up for compiling with mingw)
>
> Earlier I also got a ref count error message but I don't see it
> anymore, so maybe I messed up when trying Ctrl+C to kill the tests.
>
>
> ==
> ERROR: Failure: ImportError (cannot import name 'fib2')
> --
> Traceback (most recent call last):
>   File 
> "c:\users\josef\appdata\local\programs\python\python35\lib\site-packages\nose\failure.py",
> line 39, in runTest
> raise self.exc_val.with_traceback(self.tb)
>   File 
> "c:\users\josef\appdata\local\programs\python\python35\lib\site-packages\nose\loader.py",
> line 418, in loadTestsFromName
> addr.filename, addr.module)
>   File 
> "c:\users\josef\appdata\local\programs\python\python35\lib\site-packages\nose\importer.py",
> line 47, in importFromPath
> return self.importFromDir(dir_path, fqname)
>   File 
> "c:\users\josef\appdata\local\programs\python\python35\lib\site-packages\nose\importer.py",
> line 94, in importFromDir
> mod = load_module(part_fqname, fh, filename, desc)
>   File "c:\users\josef\appdata\local\programs\python\python35\lib\imp.py",
> line 234, in load_module
> return load_source(name, filename, file)
>   File "c:\users\josef\appdata\local\programs\python\python35\lib\imp.py",
> line 172, in load_source
> module = _load(spec)
>   File "", line 693, in _load
>   File "", line 673, in _load_unlocked
>   File "", line 662, in exec_module
>   File "", line 222, in _call_with_frames_removed
>   File 
> "c:\users\josef\appdata\local\programs\python\python35\lib\site-packages\numpy\distutils\tests\f2py_ext\tests\test_fib2.py",
> line 4, in 
> from f2py_ext import fib2
> ImportError: cannot import name 'fib2'
>
> ==
> ERROR: Failure: ImportError (cannot import name 'foo')
> --
> Traceback (most recent call last):
>   File 
> "c:\users\josef\appdata\local\programs\python\python35\lib\site-packages\nose\failure.py",
> line 39, in runTest
> raise self.exc_val.with_traceback(self.tb)
>   File 
> "c:\users\josef\appdata\local\programs\python\python35\lib\site-packages\nose\loader.py",
> line 418, in loadTestsFromName
> addr.filename, addr.module)
>   File 
> "c:\users\josef\appdata\local\programs\python\python35\lib\site-packages\nose\importer.py",
> line 47, in importFromPath
> return self.importFromDir(dir_path, fqname)
>   File 
> "c:\users\josef\appdata\local\programs\python\python35\lib\site-packages\nose\importer.py",
> line 94, in importFromDir
> mod = load_module(part_fqname, fh, filename, desc)
>   File "c:\users\josef\appdata\local\programs\python\python35\lib\imp.py",
> line 234, in load_module
> return load_source(name, filename, file)
>   File "c:\users\josef\appdata\local\programs\python\python35\lib\imp.py",
> line 172, in load_source
> module = _load(spec)
>   File "", line 693, in _load
>   File "", line 673, in _load_unlocked
>   File "", line 662, in exec_module
>   File "", line 222, in _call_with_frames_removed
>   File 
> "c:\users\josef\appdata\local\programs\python\python35\lib\site-packages\numpy\distutils\tests\f2py_f90_ext\tests\test_foo.py",
> line 4, in 
> from f2py_f90_ext import foo
> ImportError: cannot import name 'foo'
>
> ==
> ERROR: Failure: ImportError (cannot import name 'fib3')
> --
> Traceback (most recent call last):
>   File 
> "c:\users\josef\appdata\local\programs\python\python35\lib\site-packages\nose\failure.py",
> line 39, in runTest
> raise self.exc_val.with_traceback(self.tb)
>   File 
> 

Re: [Numpy-discussion] Numpy 1.10.3 release.

2016-01-02 Thread josef.pktd
On Sat, Jan 2, 2016 at 4:47 PM, Charles R Harris
 wrote:
> Hi All,
>
> A significant segfault problem has been reported against Numpy 1.10.2 and I
> want to make a quick 1.10.3 release to get it fixed. Two questions
>
> What exactly is the release process that has been decided on? AFAIK, I
> should just do a source release on Sourceforge, ping Matthew to produce
> wheels for Mac and wait for him to put them on pypi, and then upload the
> sources to pypi. No windows binaries are to be produced.
> Is there anything else that needs fixing for 1.10.3?


I'm running the 1.10.2 tests on Windows 10 in a virtualbox on Windows 8.1
using Gohlke binary for MKL on a fresh Python 3.5

This test
"Test workarounds for 32-bit limited fwrite, fseek, and ftell ..."
is taking a very long time. Is this expected?


I get the following errors on Windows 10, and also on Windows 8.1
Winpython 3.4 (except for the last "Unable to find vcvarsall.bat"
because it's set up for compiling with mingw)

Earlier I also got a ref count error message but I don't see it
anymore, so maybe I messed up when trying Ctrl+C to kill the tests.


==
ERROR: Failure: ImportError (cannot import name 'fib2')
--
Traceback (most recent call last):
  File 
"c:\users\josef\appdata\local\programs\python\python35\lib\site-packages\nose\failure.py",
line 39, in runTest
raise self.exc_val.with_traceback(self.tb)
  File 
"c:\users\josef\appdata\local\programs\python\python35\lib\site-packages\nose\loader.py",
line 418, in loadTestsFromName
addr.filename, addr.module)
  File 
"c:\users\josef\appdata\local\programs\python\python35\lib\site-packages\nose\importer.py",
line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
  File 
"c:\users\josef\appdata\local\programs\python\python35\lib\site-packages\nose\importer.py",
line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
  File "c:\users\josef\appdata\local\programs\python\python35\lib\imp.py",
line 234, in load_module
return load_source(name, filename, file)
  File "c:\users\josef\appdata\local\programs\python\python35\lib\imp.py",
line 172, in load_source
module = _load(spec)
  File "", line 693, in _load
  File "", line 673, in _load_unlocked
  File "", line 662, in exec_module
  File "", line 222, in _call_with_frames_removed
  File 
"c:\users\josef\appdata\local\programs\python\python35\lib\site-packages\numpy\distutils\tests\f2py_ext\tests\test_fib2.py",
line 4, in 
from f2py_ext import fib2
ImportError: cannot import name 'fib2'

==
ERROR: Failure: ImportError (cannot import name 'foo')
--
Traceback (most recent call last):
  File 
"c:\users\josef\appdata\local\programs\python\python35\lib\site-packages\nose\failure.py",
line 39, in runTest
raise self.exc_val.with_traceback(self.tb)
  File 
"c:\users\josef\appdata\local\programs\python\python35\lib\site-packages\nose\loader.py",
line 418, in loadTestsFromName
addr.filename, addr.module)
  File 
"c:\users\josef\appdata\local\programs\python\python35\lib\site-packages\nose\importer.py",
line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
  File 
"c:\users\josef\appdata\local\programs\python\python35\lib\site-packages\nose\importer.py",
line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
  File "c:\users\josef\appdata\local\programs\python\python35\lib\imp.py",
line 234, in load_module
return load_source(name, filename, file)
  File "c:\users\josef\appdata\local\programs\python\python35\lib\imp.py",
line 172, in load_source
module = _load(spec)
  File "", line 693, in _load
  File "", line 673, in _load_unlocked
  File "", line 662, in exec_module
  File "", line 222, in _call_with_frames_removed
  File 
"c:\users\josef\appdata\local\programs\python\python35\lib\site-packages\numpy\distutils\tests\f2py_f90_ext\tests\test_foo.py",
line 4, in 
from f2py_f90_ext import foo
ImportError: cannot import name 'foo'

==
ERROR: Failure: ImportError (cannot import name 'fib3')
--
Traceback (most recent call last):
  File 
"c:\users\josef\appdata\local\programs\python\python35\lib\site-packages\nose\failure.py",
line 39, in runTest
raise self.exc_val.with_traceback(self.tb)
  File 
"c:\users\josef\appdata\local\programs\python\python35\lib\site-packages\nose\loader.py",
line 418, in loadTestsFromName
addr.filename, addr.module)
  File 
"c:\users\josef\appdata\local\programs\python\python35\lib\site-packages\nose\importer.py",
line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
  

Re: [Numpy-discussion] Fast vectorized arithmetic with ~32 significant digits under Numpy

2015-12-11 Thread josef.pktd
On Fri, Dec 11, 2015 at 11:22 AM, Anne Archibald  wrote:
> Actually, GCC implements 128-bit floats in software and provides them as
> __float128; there are also quad-precision versions of the usual functions.
> The Intel compiler provides this as well, I think, but I don't think
> Microsoft compilers do. A portable quad-precision library might be less
> painful.
>
> The cleanest way to add extended precision to numpy is by adding a
> C-implemented dtype. This can be done in an extension module; see the
> quaternion and half-precision modules online.
>
> Anne
>
>
> On Fri, Dec 11, 2015, 16:46 Charles R Harris 
> wrote:
>>
>> On Fri, Dec 11, 2015 at 6:25 AM, Thomas Baruchel  wrote:
>>>
>>> From time to time it is asked on forums how to extend precision of
>>> computation on Numpy array. The most common answer
>>> given to this question is: use the dtype=object with some arbitrary
>>> precision module like mpmath or gmpy.
>>> See
>>> http://stackoverflow.com/questions/6876377/numpy-arbitrary-precision-linear-algebra
>>> or http://stackoverflow.com/questions/21165745/precision-loss-numpy-mpmath
>>> or
>>> http://stackoverflow.com/questions/15307589/numpy-array-with-mpz-mpfr-values
>>>
>>> While this is obviously the most relevant answer for many users because
>>> it will allow them to use Numpy arrays exactly
>>> as they would have used them with native types, the wrong thing is that
>>> from some point of view "true" vectorization
>>> will be lost.
>>>
>>> With years I got very familiar with the extended double-double type which
>>> has (for usual architectures) about 32 accurate
>>> digits with faster arithmetic than "arbitrary precision types". I even
>>> used it for research purpose in number theory and
>>> I got convinced that it is a very wonderful type as long as such
>>> precision is suitable.
>>>
>>> I often implemented it partially under Numpy, most of the time by trying
>>> to vectorize at a low-level the libqd library.
>>>
>>> But I recently thought that a very nice and portable way of implementing
>>> it under Numpy would be to use the existing layer
>>> of vectorization on floats for computing the arithmetic operations by
>>> "columns containing half of the numbers" rather than
>>> by "full numbers". As a proof of concept I wrote the following file:
>>> https://gist.github.com/baruchel/c86ed748939534d8910d
>>>
>>> I converted and vectorized the Algol 60 codes from
>>> http://szmoore.net/ipdf/documents/references/dekker1971afloating.pdf
>>> (Dekker, 1971).
>>>
>>> A test is provided at the end; for inverting 100,000 numbers, my type is
>>> about 3 or 4 times faster than GMPY and almost
>>> 50 times faster than MPmath. It should be even faster for some other
>>> operations since I had to create another np.ones
>>> array for testing this type because inversion isn't implemented here
>>> (which could of course be done). You can run this file by yourself
>>> (maybe you will have to discard mpmath or gmpy if you don't have it).
>>>
>>> I would like to discuss about the way to make available something related
>>> to that.
>>>
>>> a) Would it be relevant to include that in Numpy ? (I would think to some
>>> "contribution"-tool rather than including it in
>>> the core of Numpy because it would be painful to code all ufuncs; on the
>>> other hand I am pretty sure that many would be happy
>>> to perform several arithmetic operations by knowing that they can't use
>>> cos/sin/etc. on this type; in other words, I am not
>>> sure it would be a good idea to embed it as an every-day type but I think
>>> it would be nice to have it quickly available
>>> in some way). If you agree with that, in which way should I code it (the
>>> current link only is a "proof of concept"; I would
>>> be very happy to code it in some cleaner way)?
>>>
>>> b) Do you think such attempt should remain something external to Numpy
>>> itself and be released on my Github account without being
>>> integrated to Numpy?
>>
>>
>> I think astropy does something similar for time and dates. There has also
>> been some talk of adding a user type for ieee 128 bit doubles. I've looked
>> once for relevant code for the latter and, IIRC, the available packages were
>> GPL :(.

This might be the same as or similar to a recent announcement for Julia

https://groups.google.com/d/msg/julia-users/iHTaxRVj1yM/M-WtZCedCQAJ


It would be useful to get this in a consistent way across platforms
and compilers.
I can think of several applications where higher precision reduce
operations would be
useful in statistics.
As Windows user, I never even saw a higher precision float.

Josef


>>
>> Chuck
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> 

Re: [Numpy-discussion] ENH: Add the function 'expand_view'

2015-11-24 Thread josef.pktd
On Tue, Nov 24, 2015 at 7:13 PM, Nathaniel Smith  wrote:

> On Nov 24, 2015 11:57 AM, "John Kirkham"  wrote:
> >
> > Takes an array and tacks on arbitrary dimensions on either side, which
> is returned as a view always. Here are the relevant features:
> >
> > * Creates a view of the array that has the dimensions before and after
> tacked on to it.
> > * Takes the before and after arguments independent of each other and the
> current shape.
> > * Allows for read and write access to the underlying array.
>
> Can you expand this with some discussion of why you want this function,
> and why you chose these specific features? (E.g. as mentioned in the PR
> comments already, the reason broadcast_to returns a read-only array is that
> it was decided that this was less confusing for users, not because of any
> technical issue.)
>

Why is this a stride_trick?

I thought this looks similar to expand_dims and could maybe be implemented
with some extra options there.



Josef



> -n
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] asarray(sparse) -> object

2015-11-20 Thread josef.pktd
Is this intentional?


>>> exog
<50x5 sparse matrix of type ''
with 50 stored elements in Compressed Sparse Column format>

>>> np.asarray(exog)
array(<50x5 sparse matrix of type ''
with 50 stored elements in Compressed Sparse Column format>, dtype=object)


I'm just a newbie who thought to use the usual pattern.




>>> np.asarray(exog).dot(beta)
array([ <50x5 sparse matrix of type ''
with 50 stored elements in Compressed Sparse Column format>,
   <50x5 sparse matrix of type ''
with 50 stored elements in Compressed Sparse Column format>,
   <50x5 sparse matrix of type ''
with 50 stored elements in Compressed Sparse Column format>,
   <50x5 sparse matrix of type ''
with 50 stored elements in Compressed Sparse Column format>,
   <50x5 sparse matrix of type ''
with 50 stored elements in Compressed Sparse Column format>], dtype=object)
C:\programs\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\site-packages\scipy\sparse\compressed.py:306:
SparseEfficiencyWarning: Comparing sparse matrices using >= and <= is
inefficient, using <, >, or !=, instead.
  "using <, >, or !=, instead.", SparseEfficiencyWarning)

seems to warn only once

>>> y = np.asarray(exog).dot(beta)
>>> y.shape
(5,)


>>> np.__version__
'1.9.2rc1'

>>> scipy.__version__
'0.15.1'



Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] asarray(sparse) -> object

2015-11-20 Thread josef.pktd
On Fri, Nov 20, 2015 at 6:29 PM, CJ Carey  wrote:

> The short answer is: "kind of".
>
> These two Github issues explain what's going on more in-depth:
> https://github.com/scipy/scipy/issues/3995
> https://github.com/scipy/scipy/issues/4239
>


Thanks, I didn't pay attention to those issues, or only very superficially.

+1 for doing anything else than converting to object arrays.



>
>
> As for the warning only showing once, that's Python's default behavior for
> warnings: http://stackoverflow.com/q/22661745/10601
>

The default should be overwritten for warnings that are always relevant.
I usually don't use sparse arrays, and don't know if this should always
warn.

Josef


>
> -CJ
>
> On Fri, Nov 20, 2015 at 2:40 PM,  wrote:
>
>> Is this intentional?
>>
>>
>> >>> exog
>> <50x5 sparse matrix of type ''
>> with 50 stored elements in Compressed Sparse Column format>
>>
>> >>> np.asarray(exog)
>> array(<50x5 sparse matrix of type ''
>> with 50 stored elements in Compressed Sparse Column format>, dtype=object)
>>
>>
>> I'm just a newbie who thought to use the usual pattern.
>>
>>
>> 
>>
>> >>> np.asarray(exog).dot(beta)
>> array([ <50x5 sparse matrix of type ''
>> with 50 stored elements in Compressed Sparse Column format>,
>><50x5 sparse matrix of type ''
>> with 50 stored elements in Compressed Sparse Column format>,
>><50x5 sparse matrix of type ''
>> with 50 stored elements in Compressed Sparse Column format>,
>><50x5 sparse matrix of type ''
>> with 50 stored elements in Compressed Sparse Column format>,
>><50x5 sparse matrix of type ''
>> with 50 stored elements in Compressed Sparse Column format>],
>> dtype=object)
>> C:\programs\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\site-packages\scipy\sparse\compressed.py:306:
>> SparseEfficiencyWarning: Comparing sparse matrices using >= and <= is
>> inefficient, using <, >, or !=, instead.
>>   "using <, >, or !=, instead.", SparseEfficiencyWarning)
>>
>> seems to warn only once
>>
>> >>> y = np.asarray(exog).dot(beta)
>> >>> y.shape
>> (5,)
>>
>>
>> >>> np.__version__
>> '1.9.2rc1'
>>
>> >>> scipy.__version__
>> '0.15.1'
>>
>>
>>
>> Josef
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: stop supporting 'setup.py install'; start requiring 'pip install .' instead

2015-10-27 Thread josef.pktd
On Tue, Oct 27, 2015 at 3:32 AM, Ralf Gommers 
wrote:

>
>
> On Tue, Oct 27, 2015 at 8:28 AM, Nathaniel Smith  wrote:
>
>> On Tue, Oct 27, 2015 at 12:19 AM, Ralf Gommers 
>> wrote:
>> >
>> >
>> > On Tue, Oct 27, 2015 at 6:44 AM, Nathaniel Smith  wrote:
>> >>
>> >> On Mon, Oct 26, 2015 at 9:31 PM, Nathaniel Smith 
>> wrote:
>> >> [...]
>> >> > I believe that this would also break both 'easy_install numpy', and
>> >> > attempts to install numpy via the setup_requires= argument to
>> >> > setuptools.setup (because setup_requires= implicitly calls
>> >> > easy_install). install_requires= would *not* be affected, and
>> >> > setup_requires= would still be fine in cases where numpy was already
>> >> > installed.
>> >>
>> >> On further investigation, it looks like the simplest approach to doing
>> >> this would actually treat easy_install and setup_requires= the same
>> >> way as they treat pip, i.e., they would all be allowed. (I was
>> >> misreading some particularly confusing code in setuptools.)
>> >>
>> >> It also looks like easy_installed packages can at least be safely
>> >> upgraded, so I guess allowing this is okay :-).
>> >
>> >
>> > I just discovered https://bitbucket.org/dholth/setup-requires, which
>> ensures
>> > that setup_requires uses pip instead of easy_install. So we can not only
>> > keep setup-requires working, but make it work significantly better.
>>
>> IIUC this is not something that we (= numpy) could use ourselves, but
>> instead something that everyone who does setup_requires=["numpy"]
>> would have to set up in their individual projects?
>>
>
> Right. I was thinking about using it in scipy. Ah well, I'm sure we can
> manage to not break ``setup_requires=numpy`` in some way.
>


What's the equivalent of
python setup.py build_ext --inplace


brief google search (I didn't follow up on those)

https://github.com/pypa/pip/issues/1887
https://github.com/pypa/pip/issues/18


Given that I rely completely on binary distributions for numpy and scipy, I
won't be affected.

(I'm still allergic to pip and will switch only several years after
everybody else.)

Josef



>
> Ralf
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: stop supporting 'setup.py install'; start requiring 'pip install .' instead

2015-10-27 Thread josef.pktd
On Tue, Oct 27, 2015 at 10:59 AM, Nathaniel Smith  wrote:

> On Oct 27, 2015 6:08 AM,  wrote:
> >
> [...]
> >
> >
> > What's the equivalent of
> > python setup.py build_ext --inplace
>
> It's
> python setup.py build_ext --inplace
>
> ;-)
>
Ok, Sorry, I read now the small print and the issue.

Sounds reasonable, given we can `force` our way out.

(If the reason to run to pip is a misspelled dev version number, then it
looks like a hammer to me.)

Josef



> There's also no replacement for setup.py sdist, or setup.py upload (which
> is broken and should never be used), or setup.py clean (which is also
> broken and should never be used in numpy's case). pip is a better package
> installer than raw distutils or setuptools, for non-installation-related
> tasks it has nothing to offer. (With the partial exception of 'pip wheel'.)
>
> -n
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-20 Thread josef.pktd
On Mon, Oct 19, 2015 at 9:51 PM,  wrote:

>
>
> On Mon, Oct 19, 2015 at 9:15 PM, Nathaniel Smith  wrote:
>
>> On Mon, Oct 19, 2015 at 5:55 AM,   wrote:
>> >
>> >
>> > On Mon, Oct 19, 2015 at 2:14 AM, Nathaniel Smith  wrote:
>> >>
>> >> On Sun, Oct 18, 2015 at 9:35 PM,   wrote:
>> >>  np.column_stack((np.ones(10), np.ones(10))).flags
>> >> >   C_CONTIGUOUS : True
>> >> >   F_CONTIGUOUS : False
>> >> >
>> >>  np.__version__
>> >> > '1.9.2rc1'
>> >> >
>> >> >
>> >> > on my notebook which has numpy 1.6.1 it is f_contiguous
>> >> >
>> >> >
>> >> > I was just trying to optimize a loop over variable adjustment in
>> >> > regression,
>> >> > and found out that we lost fortran contiguity.
>> >> >
>> >> > I always thought column_stack is for fortran usage (linalg)
>> >> >
>> >> > What's the alternative?
>> >> > column_stack was one of my favorite commands, and I always assumed we
>> >> > have
>> >> > in statsmodels the right memory layout to call the linalg libraries.
>> >> >
>> >> > ("assumed" means we don't have timing nor unit tests for it.)
>> >>
>> >> In general practice no numpy functions make any guarantee about memory
>> >> layout, unless that's explicitly a documented part of their contract
>> >> (e.g. 'ascontiguous', or some functions that take an order= argument
>> >> -- I say "some" b/c there are functions like 'reshape' that take an
>> >> argument called order= that doesn't actually refer to memory layout).
>> >> This isn't so much an official policy as just a fact of life -- if
>> >> no-one has any idea that the someone is depending on some memory
>> >> layout detail then there's no way to realize that we've broken
>> >> something. (But it is a good policy IMO.)
>> >
>> >
>> > I understand that in general.
>> >
>> > However, I always thought column_stack is a array creation function
>> which
>> > have guaranteed memory layout. And since it's stacking by columns I
>> thought
>> > that order is always Fortran.
>> > And the fact that it doesn't have an order keyword yet, I thought is
>> just a
>> > missing extension.
>>
>> I guess I don't know what to say except that I'm sorry to hear that
>> and sorry that no-one noticed until several releases later.
>>
>
>
> Were there more contiguity changes in 0.10?
> I just saw a large number of test errors and failures in statespace models
> which are heavily based on cython code where it's not just a question of
> performance.
>
> I don't know yet what's going on, but I just saw that we have some
> explicit tests for fortran contiguity which just started to fail.
>
>
>
>
>>
>> >> If this kind of problem gets caught during a pre-release cycle then we
>> >> generally do try to fix it, because we try not to break code, but if
>> >> it's been broken for 2 full releases then there's no much we can do --
>> >> we can't go back in time to fix it so it sounds like you're stuck
>> >> working around the problem no matter what (unless you want to refuse
>> >> to support 1.9.0 through 1.10.1, which I assume you don't... worst
>> >> case, you just have to do a global search replace of np.column_stack
>> >> with statsmodels.utils.column_stack_f, right?).
>> >>
>> >> And the regression issue seems like the only real argument for
>> >> changing it back -- we'd never guarantee f-contiguity here if starting
>> >> from a blank slate, I think?
>> >
>> >
>> > When the cat is out of the bag, the down stream developer writes
>> > compatibility code or helper functions.
>> >
>> > I will do that at at least the parts I know are intentionally designed
>> for F
>> > memory order.
>> >
>> > ---
>> >
>> > statsmodels doesn't really check or consistently optimize the memory
>> order,
>> > except in some cython functions.
>> > But, I thought we should be doing quite well with getting Fortran
>> ordered
>> > arrays. I only paid attention where we have more extensive loops
>> internally.
>> >
>> > Nathniel, Does patsy guarantee memory layout (F-contiguous) when
>> creating
>> > design matrices?
>>
>> I never thought about it :-). So: no, it looks like right now patsy
>> usually returns C-order matrices (or really, whatever np.empty or
>> np.repeat returns), and there aren't any particular guarantees that
>> this will continue to be the case in the future.
>>
>> Is returning matrices in F-contiguous layout really important? Should
>> there be a return_type="fortran_matrix" option or something like that?
>>
>
> I don't know, yet. My intuition was that it would be better because we
> feed the arrays directly to pinv/SVD or QR which, I think, require by
> default Fortran contiguous.
>
> However, my intuition might not be correct, and it might not make much
> difference in a single OLS estimation.
>

I did some quick timing checks of pinv and qr, and the Fortran ordered is
only about 5% to 15% faster and uses about the same amount of memory
(watching the Task manager). So, nothing to get excited 

Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-19 Thread josef.pktd
On Mon, Oct 19, 2015 at 2:14 AM, Nathaniel Smith  wrote:

> On Sun, Oct 18, 2015 at 9:35 PM,   wrote:
>  np.column_stack((np.ones(10), np.ones(10))).flags
> >   C_CONTIGUOUS : True
> >   F_CONTIGUOUS : False
> >
>  np.__version__
> > '1.9.2rc1'
> >
> >
> > on my notebook which has numpy 1.6.1 it is f_contiguous
> >
> >
> > I was just trying to optimize a loop over variable adjustment in
> regression,
> > and found out that we lost fortran contiguity.
> >
> > I always thought column_stack is for fortran usage (linalg)
> >
> > What's the alternative?
> > column_stack was one of my favorite commands, and I always assumed we
> have
> > in statsmodels the right memory layout to call the linalg libraries.
> >
> > ("assumed" means we don't have timing nor unit tests for it.)
>
> In general practice no numpy functions make any guarantee about memory
> layout, unless that's explicitly a documented part of their contract
> (e.g. 'ascontiguous', or some functions that take an order= argument
> -- I say "some" b/c there are functions like 'reshape' that take an
> argument called order= that doesn't actually refer to memory layout).
> This isn't so much an official policy as just a fact of life -- if
> no-one has any idea that the someone is depending on some memory
> layout detail then there's no way to realize that we've broken
> something. (But it is a good policy IMO.)
>

I understand that in general.

However, I always thought column_stack is a array creation function which
have guaranteed memory layout. And since it's stacking by columns I thought
that order is always Fortran.
And the fact that it doesn't have an order keyword yet, I thought is just a
missing extension.



>
> If this kind of problem gets caught during a pre-release cycle then we
> generally do try to fix it, because we try not to break code, but if
> it's been broken for 2 full releases then there's no much we can do --
> we can't go back in time to fix it so it sounds like you're stuck
> working around the problem no matter what (unless you want to refuse
> to support 1.9.0 through 1.10.1, which I assume you don't... worst
> case, you just have to do a global search replace of np.column_stack
> with statsmodels.utils.column_stack_f, right?).
>
> And the regression issue seems like the only real argument for
> changing it back -- we'd never guarantee f-contiguity here if starting
> from a blank slate, I think?
>

When the cat is out of the bag, the down stream developer writes
compatibility code or helper functions.

I will do that at at least the parts I know are intentionally designed for
F memory order.

---

statsmodels doesn't really check or consistently optimize the memory order,
except in some cython functions.
But, I thought we should be doing quite well with getting Fortran ordered
arrays. I only paid attention where we have more extensive loops internally.

Nathniel, Does patsy guarantee memory layout (F-contiguous) when creating
design matrices?

Josef



>
> -n
>
> --
> Nathaniel J. Smith -- http://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-19 Thread josef.pktd
On Mon, Oct 19, 2015 at 5:16 AM, Sebastian Berg 
wrote:

> On Mo, 2015-10-19 at 01:34 -0400, josef.p...@gmail.com wrote:
> >
>
> 
>
>
> >
> > It looks like in 1.9 it depends on the order of the 2-d arrays, which
> > it didn't do in 1.6
> >
>
> Yes, it uses concatenate, and concatenate probably changed in 1.7 to use
> "K" (since "K" did not really exists before 1.7 IIRC).
> Not sure what we can do about it, the order is not something that is
> easily fixed unless explicitly given. It might be optimized (as in this
> case I would guess).
> Whether or not doing the fastest route for these kind of functions is
> faster for the user is of course impossible to know, we can only hope
> that in most cases it is better.
> If someone has an idea how to decide I am all ears, but I think all we
> can do is put in asserts/tests in the downstream code if it relies
> heavily on the order (or just copy, if the order is wrong) :(, another
> example is change of the output order in advanced indexing in some
> cases, it makes it faster sometimes, and probably slower in others, what
> is right seems very much non-trivial.
>

To understand the reason:

Is this to have more efficient memory access during copying?

AFAIU, column_stack needs to create a new array which has to be either F or
C contiguous, so we always have to pick one of the two. With a large number
of 1d arrays it seemed more "intuitive" to me to copy them by columns.

Josef



>
> - Sebastian
>
>
> >
> > >>> np.column_stack((np.ones(10), np.ones((10, 2), order='F'))).flags
> >   C_CONTIGUOUS : False
> >   F_CONTIGUOUS : True
> >   OWNDATA : True
> >   WRITEABLE : True
> >   ALIGNED : True
> >   UPDATEIFCOPY : False
> >
> >
> >
> >
> > which means the default order looks more like "K" now, not "C", IIUC
> >
> >
> > Josef
> >
> >
> >
> >
> >
> > Josef
> >
> >
> >
> >
> >
> > Stephan
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> >
> >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-19 Thread josef.pktd
On Mon, Oct 19, 2015 at 9:00 AM,  wrote:

>
>
> On Mon, Oct 19, 2015 at 5:16 AM, Sebastian Berg <
> sebast...@sipsolutions.net> wrote:
>
>> On Mo, 2015-10-19 at 01:34 -0400, josef.p...@gmail.com wrote:
>> >
>>
>> 
>>
>>
>> >
>> > It looks like in 1.9 it depends on the order of the 2-d arrays, which
>> > it didn't do in 1.6
>> >
>>
>> Yes, it uses concatenate, and concatenate probably changed in 1.7 to use
>> "K" (since "K" did not really exists before 1.7 IIRC).
>> Not sure what we can do about it, the order is not something that is
>> easily fixed unless explicitly given. It might be optimized (as in this
>> case I would guess).
>> Whether or not doing the fastest route for these kind of functions is
>> faster for the user is of course impossible to know, we can only hope
>> that in most cases it is better.
>> If someone has an idea how to decide I am all ears, but I think all we
>> can do is put in asserts/tests in the downstream code if it relies
>> heavily on the order (or just copy, if the order is wrong) :(, another
>> example is change of the output order in advanced indexing in some
>> cases, it makes it faster sometimes, and probably slower in others, what
>> is right seems very much non-trivial.
>>
>
> To understand the reason:
>
> Is this to have more efficient memory access during copying?
>
> AFAIU, column_stack needs to create a new array which has to be either F
> or C contiguous, so we always have to pick one of the two. With a large
> number of 1d arrays it seemed more "intuitive" to me to copy them by
> columns.
>


just as background

I was mainly surprised last night about having my long held beliefs
shattered. I skipped numpy 1.7 and 1.8 in my development environment and
still need to catch up now that I use 1.9 as my main numpy version.

I might have to update a bit my "folk wisdom", which is not codified
anywhere and doesn't have unit tests.

For example, the improvement iteration for Fortran contiguous or not C or F
contiguous arrays sounded very useful, but I never checked if it would
affect us.

Josef



>
> Josef
>
>
>
>>
>> - Sebastian
>>
>>
>> >
>> > >>> np.column_stack((np.ones(10), np.ones((10, 2), order='F'))).flags
>> >   C_CONTIGUOUS : False
>> >   F_CONTIGUOUS : True
>> >   OWNDATA : True
>> >   WRITEABLE : True
>> >   ALIGNED : True
>> >   UPDATEIFCOPY : False
>> >
>> >
>> >
>> >
>> > which means the default order looks more like "K" now, not "C", IIUC
>> >
>> >
>> > Josef
>> >
>> >
>> >
>> >
>> >
>> > Josef
>> >
>> >
>> >
>> >
>> >
>> > Stephan
>> >
>> > ___
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@scipy.org
>> >
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> >
>> >
>> >
>> >
>> > ___
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@scipy.org
>> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-19 Thread josef.pktd
On Mon, Oct 19, 2015 at 9:15 PM, Nathaniel Smith  wrote:

> On Mon, Oct 19, 2015 at 5:55 AM,   wrote:
> >
> >
> > On Mon, Oct 19, 2015 at 2:14 AM, Nathaniel Smith  wrote:
> >>
> >> On Sun, Oct 18, 2015 at 9:35 PM,   wrote:
> >>  np.column_stack((np.ones(10), np.ones(10))).flags
> >> >   C_CONTIGUOUS : True
> >> >   F_CONTIGUOUS : False
> >> >
> >>  np.__version__
> >> > '1.9.2rc1'
> >> >
> >> >
> >> > on my notebook which has numpy 1.6.1 it is f_contiguous
> >> >
> >> >
> >> > I was just trying to optimize a loop over variable adjustment in
> >> > regression,
> >> > and found out that we lost fortran contiguity.
> >> >
> >> > I always thought column_stack is for fortran usage (linalg)
> >> >
> >> > What's the alternative?
> >> > column_stack was one of my favorite commands, and I always assumed we
> >> > have
> >> > in statsmodels the right memory layout to call the linalg libraries.
> >> >
> >> > ("assumed" means we don't have timing nor unit tests for it.)
> >>
> >> In general practice no numpy functions make any guarantee about memory
> >> layout, unless that's explicitly a documented part of their contract
> >> (e.g. 'ascontiguous', or some functions that take an order= argument
> >> -- I say "some" b/c there are functions like 'reshape' that take an
> >> argument called order= that doesn't actually refer to memory layout).
> >> This isn't so much an official policy as just a fact of life -- if
> >> no-one has any idea that the someone is depending on some memory
> >> layout detail then there's no way to realize that we've broken
> >> something. (But it is a good policy IMO.)
> >
> >
> > I understand that in general.
> >
> > However, I always thought column_stack is a array creation function which
> > have guaranteed memory layout. And since it's stacking by columns I
> thought
> > that order is always Fortran.
> > And the fact that it doesn't have an order keyword yet, I thought is
> just a
> > missing extension.
>
> I guess I don't know what to say except that I'm sorry to hear that
> and sorry that no-one noticed until several releases later.
>


Were there more contiguity changes in 0.10?
I just saw a large number of test errors and failures in statespace models
which are heavily based on cython code where it's not just a question of
performance.

I don't know yet what's going on, but I just saw that we have some explicit
tests for fortran contiguity which just started to fail.




>
> >> If this kind of problem gets caught during a pre-release cycle then we
> >> generally do try to fix it, because we try not to break code, but if
> >> it's been broken for 2 full releases then there's no much we can do --
> >> we can't go back in time to fix it so it sounds like you're stuck
> >> working around the problem no matter what (unless you want to refuse
> >> to support 1.9.0 through 1.10.1, which I assume you don't... worst
> >> case, you just have to do a global search replace of np.column_stack
> >> with statsmodels.utils.column_stack_f, right?).
> >>
> >> And the regression issue seems like the only real argument for
> >> changing it back -- we'd never guarantee f-contiguity here if starting
> >> from a blank slate, I think?
> >
> >
> > When the cat is out of the bag, the down stream developer writes
> > compatibility code or helper functions.
> >
> > I will do that at at least the parts I know are intentionally designed
> for F
> > memory order.
> >
> > ---
> >
> > statsmodels doesn't really check or consistently optimize the memory
> order,
> > except in some cython functions.
> > But, I thought we should be doing quite well with getting Fortran ordered
> > arrays. I only paid attention where we have more extensive loops
> internally.
> >
> > Nathniel, Does patsy guarantee memory layout (F-contiguous) when creating
> > design matrices?
>
> I never thought about it :-). So: no, it looks like right now patsy
> usually returns C-order matrices (or really, whatever np.empty or
> np.repeat returns), and there aren't any particular guarantees that
> this will continue to be the case in the future.
>
> Is returning matrices in F-contiguous layout really important? Should
> there be a return_type="fortran_matrix" option or something like that?
>

I don't know, yet. My intuition was that it would be better because we feed
the arrays directly to pinv/SVD or QR which, I think, require by default
Fortran contiguous.

However, my intuition might not be correct, and it might not make much
difference in a single OLS estimation.

There are a few critical loops in variable selection that I'm planning to
investigate to see how much it matters.
Memory optimization was never high in our priority compared to expanding
the functionality overall, but reading the Julia mailing list is starting
to worry me a bit. :)

(I'm even starting to see the reason for multiple dispatch.)

Josef


>
> -n
>
> --
> Nathaniel J. Smith -- http://vorpus.org

Re: [Numpy-discussion] numpy 1.10.1 reduce operation on recarrays

2015-10-19 Thread josef.pktd
On Fri, Oct 16, 2015 at 9:31 PM, Allan Haldane 
wrote:

> On 10/16/2015 09:17 PM, josef.p...@gmail.com wrote:
>
>>
>>
>> On Fri, Oct 16, 2015 at 8:56 PM, Allan Haldane > > wrote:
>>
>> On 10/16/2015 05:31 PM, josef.p...@gmail.com
>>  wrote:
>> >
>> >
>> > On Fri, Oct 16, 2015 at 2:21 PM, Charles R Harris
>> > 
>> > >> wrote:
>> >
>> >
>> >
>> > On Fri, Oct 16, 2015 at 12:20 PM, Charles R Harris
>> > 
>> > >> wrote:
>> >
>> >
>> >
>> > On Fri, Oct 16, 2015 at 11:58 AM, > 
>>  > >
>> >> wrote:
>>  >
>>  > was there a change with reduce operations with
>> recarrays in
>>  > 1.10 or 1.10.1?
>>  >
>>  > Travis shows a new test failure in the statsmodels
>> testsuite
>>  > with 1.10.1:
>>  >
>>  > ERROR: test suite for >  > 'statsmodels.base.tests.test_data.TestRecarrays'>
>>  >
>>  >   File
>>  >
>>
>> "/home/travis/miniconda/envs/statsmodels-test/lib/python2.7/site-packages/statsmodels-0.8.0-py2.7-linux-x86_64.egg/statsmodels/base/data.py",
>>  > line 131, in _handle_constant
>>  > const_idx = np.where(self.exog.ptp(axis=0) ==
>>  > 0)[0].squeeze()
>>  > TypeError: cannot perform reduce with flexible type
>>  >
>>  >
>>  > Sorry for asking so late.
>>  > (statsmodels is short on maintainers, and I'm
>> distracted)
>>  >
>>  >
>>  > statsmodels still has code to support recarrays and
>>  > structured dtypes from the time before pandas became
>>  > popular, but I don't think anyone is using them
>> together
>>  > with statsmodels anymore.
>>  >
>>  >
>>  > There were several commits dealing both recarrays and
>> ufuncs, so
>>  > this might well be a regression.
>>  >
>>  >
>>  > A bisection would be helpful. Also, open an issue.
>>  >
>>  >
>>  >
>>  > The reason for the test failure might be somewhere else hiding
>> behind
>>  > several layers of statsmodels, but only started to show up with
>> numpy 1.10.1
>>  >
>>  > I already have the reduce exception with my currently installed
>> numpy
>>  > '1.9.2rc1'
>>  >
>>   x = np.random.random(9*3).view([('const', 'f8'),('x_1', 'f8'),
>>  > ('x_2', 'f8')]).view(np.recarray)
>>  >
>>   np.ptp(x, axis=0)
>>  > Traceback (most recent call last):
>>  >   File "", line 1, in 
>>  >   File
>>  >
>>
>> "C:\programs\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\site-packages\numpy\core\fromnumeric.py",
>>  > line 2047, in ptp
>>  > return ptp(axis, out)
>>  > TypeError: cannot perform reduce with flexible type
>>  >
>>  >
>>  > Sounds like fun, and I don't even know how to automatically bisect.
>>  >
>>  > Josef
>>
>> That example isn't the problem (ptp should definitely fail on
>> structured
>> arrays), but I've tracked down what is - it has to do with views of
>> record arrays.
>>
>> The fix looks simple, I'll get it in for the next release.
>>
>>
>> Thanks,
>>
>> I realized that at that point in the statsmodels code we should have
>> only regular ndarrays, so the array conversion fails somewhere.
>>
>> AFAICS, the main helper function to convert is
>>
>> def struct_to_ndarray(arr):
>>  return arr.view((float, len(arr.dtype.names)))
>>
>> which doesn't look like it will handle other dtypes than float64. Nobody
>> ever complained, so maybe our test suite is the only user of this.
>>
>> What is now the recommended way of converting structured
>> dtypes/recarrays to ndarrays?
>>
>> Josef
>>
>
> Yes, that's the code I narrowed it down to as well. I think the code in
> statsmodels is fine, the problem is actually a  bug I must admit I
> introduced in changes to the way views of recarrays work.
>
> If you are curious, the bug is in this line:
>
> https://github.com/numpy/numpy/blob/master/numpy/core/records.py#L467
>
> This line was intended to fix the problem that accessing a nested record
> array field would lose the 'np.record' dtype. I only considered void
> structured arrays, and had forgotten about sub-arrays which statsmodels
> uses.
>
> I think the fix is to replace `issubclass(val.type, 

Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-18 Thread josef.pktd
On Mon, Oct 19, 2015 at 1:10 AM, Stephan Hoyer  wrote:

> Looking at the git logs, column_stack appears to have been that way
> (creating a new array with concatenate) since at least NumPy 0.9.2, way
> back in January 2006:
> https://github.com/numpy/numpy/blob/v0.9.2/numpy/lib/shape_base.py#L271
>

Then it must have been changed somewhere else between 1.6.1 amd 1.9.2rc1

I have my notebook and my desktop with different numpy and python versions
next to each other and I don't see a typo in my command.

I assume python 2.7 versus python 3.4 doesn't make a difference.

--

>>> np.column_stack((np.ones(10), np.ones(10))).flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

>>> np.__version__
'1.6.1'
>>> import sys
>>> sys.version
'2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)]'



>>> np.column_stack((np.ones(10), np.ones(10))).flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

>>> np.__version__
'1.9.2rc1'
>>> import sys
>>> sys.version
'3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:44:40) [MSC v.1600 64 bit
(AMD64)]'

---

comparing all flags, owndata also has changed, but I don't think that has
any effect

Josef


>
>
> Stephan
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] when did column_stack become C-contiguous?

2015-10-18 Thread josef.pktd
>>> np.column_stack((np.ones(10), np.ones(10))).flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False

>>> np.__version__
'1.9.2rc1'


on my notebook which has numpy 1.6.1 it is f_contiguous


I was just trying to optimize a loop over variable adjustment in
regression, and found out that we lost fortran contiguity.

I always thought column_stack is for fortran usage (linalg)

What's the alternative?
column_stack was one of my favorite commands, and I always assumed we have
in statsmodels the right memory layout to call the linalg libraries.

("assumed" means we don't have timing nor unit tests for it.)

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-18 Thread josef.pktd
On Mon, Oct 19, 2015 at 1:27 AM,  wrote:

>
>
> On Mon, Oct 19, 2015 at 1:10 AM, Stephan Hoyer  wrote:
>
>> Looking at the git logs, column_stack appears to have been that way
>> (creating a new array with concatenate) since at least NumPy 0.9.2, way
>> back in January 2006:
>> https://github.com/numpy/numpy/blob/v0.9.2/numpy/lib/shape_base.py#L271
>>
>
> Then it must have been changed somewhere else between 1.6.1 amd 1.9.2rc1
>
> I have my notebook and my desktop with different numpy and python versions
> next to each other and I don't see a typo in my command.
>
> I assume python 2.7 versus python 3.4 doesn't make a difference.
>
> --
>
> >>> np.column_stack((np.ones(10), np.ones(10))).flags
>   C_CONTIGUOUS : False
>   F_CONTIGUOUS : True
>   OWNDATA : False
>   WRITEABLE : True
>   ALIGNED : True
>   UPDATEIFCOPY : False
>
> >>> np.__version__
> '1.6.1'
> >>> import sys
> >>> sys.version
> '2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)]'
>
> 
>
> >>> np.column_stack((np.ones(10), np.ones(10))).flags
>   C_CONTIGUOUS : True
>   F_CONTIGUOUS : False
>   OWNDATA : True
>   WRITEABLE : True
>   ALIGNED : True
>   UPDATEIFCOPY : False
>
> >>> np.__version__
> '1.9.2rc1'
> >>> import sys
> >>> sys.version
> '3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:44:40) [MSC v.1600 64 bit
> (AMD64)]'
>
> ---
>
> comparing all flags, owndata also has changed, but I don't think that has
> any effect
>

qualification

It looks like in 1.9 it depends on the order of the 2-d arrays, which it
didn't do in 1.6

>>> np.column_stack((np.ones(10), np.ones((10, 2), order='F'))).flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False


which means the default order looks more like "K" now, not "C", IIUC

Josef



>
> Josef
>
>
>>
>>
>> Stephan
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-18 Thread josef.pktd
On Mon, Oct 19, 2015 at 12:35 AM,  wrote:

> >>> np.column_stack((np.ones(10), np.ones(10))).flags
>   C_CONTIGUOUS : True
>   F_CONTIGUOUS : False
>
> >>> np.__version__
> '1.9.2rc1'
>
>
> on my notebook which has numpy 1.6.1 it is f_contiguous
>
>
> I was just trying to optimize a loop over variable adjustment in
> regression, and found out that we lost fortran contiguity.
>
> I always thought column_stack is for fortran usage (linalg)
>
> What's the alternative?
> column_stack was one of my favorite commands, and I always assumed we have
> in statsmodels the right memory layout to call the linalg libraries.
>
> ("assumed" means we don't have timing nor unit tests for it.)
>

What's the difference between using array and column_stack except for a
transpose and memory order?

my current usecase is copying columns on top of each other

#exog0 = np.column_stack((np.ones(nobs), x0, x0s2))

exog0 = np.array((np.ones(nobs), x0, x0s2)).T

exog_opt = exog0.copy(order='F')


the following part is in a loop, followed by some linear algebra for OLS,
res_optim is a scalar parameter.
exog_opt[:, -1] = np.clip(exog0[:, k] + res_optim, 0, np.inf)

Are my assumption on memory access correct, or is there a better way?

(I have quite a bit code in statsmodels that is optimized for fortran
ordered memory layout especially for sequential regression, under the
assumption that column_stack provides that Fortran order.)

Also, do I need to start timing and memory benchmarking or is it obvious
that a loop

for k in range(maxi):
x = arr[:, :k]


depends on memory order?

Josef


>
> Josef
>
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Interesting discussion on copyrighting files.

2015-10-17 Thread josef.pktd
On Thu, Oct 15, 2015 at 11:28 PM, Charles R Harris <
charlesr.har...@gmail.com> wrote:

> Worth a read at A .
>

Thanks, it is worth a read.

Most of the time when I see code copied from scipy or statsmodels, it is
properly attributed.
But every once in a while (like just now) I see code in an interesting
sounding package on github where I start to recognize parts because they
have my code comments still left in but don't have an attribution to the
origin.

It's almost ok if it's MIT or BSD licensed because then I can "borrow back"
the changes, but not if the new license is GPL.

This is to the point in the discussion of seeing modules or functions that
got isolated from the parent package.

Josef
(slightly grumpy)



>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Bug

2015-10-16 Thread josef.pktd

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] numpy 1.10.1 reduce operation on recarrays

2015-10-16 Thread josef.pktd
was there a change with reduce operations with recarrays in 1.10 or 1.10.1?

Travis shows a new test failure in the statsmodels testsuite with 1.10.1:

ERROR: test suite for 

  File
"/home/travis/miniconda/envs/statsmodels-test/lib/python2.7/site-packages/statsmodels-0.8.0-py2.7-linux-x86_64.egg/statsmodels/base/data.py",
line 131, in _handle_constant
const_idx = np.where(self.exog.ptp(axis=0) == 0)[0].squeeze()
TypeError: cannot perform reduce with flexible type


Sorry for asking so late.
(statsmodels is short on maintainers, and I'm distracted)


statsmodels still has code to support recarrays and structured dtypes from
the time before pandas became popular, but I don't think anyone is using
them together with statsmodels anymore.

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Bug

2015-10-16 Thread josef.pktd
Sorry, wrong shortcut key, question will arrive later.

Josef

On Fri, Oct 16, 2015 at 1:40 PM,  wrote:

>
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy 1.10.1 reduce operation on recarrays

2015-10-16 Thread josef.pktd
On Fri, Oct 16, 2015 at 2:21 PM, Charles R Harris  wrote:

>
>
> On Fri, Oct 16, 2015 at 12:20 PM, Charles R Harris <
> charlesr.har...@gmail.com> wrote:
>
>>
>>
>> On Fri, Oct 16, 2015 at 11:58 AM,  wrote:
>>
>>> was there a change with reduce operations with recarrays in 1.10 or
>>> 1.10.1?
>>>
>>> Travis shows a new test failure in the statsmodels testsuite with 1.10.1:
>>>
>>> ERROR: test suite for >> 'statsmodels.base.tests.test_data.TestRecarrays'>
>>>
>>>   File
>>> "/home/travis/miniconda/envs/statsmodels-test/lib/python2.7/site-packages/statsmodels-0.8.0-py2.7-linux-x86_64.egg/statsmodels/base/data.py",
>>> line 131, in _handle_constant
>>> const_idx = np.where(self.exog.ptp(axis=0) == 0)[0].squeeze()
>>> TypeError: cannot perform reduce with flexible type
>>>
>>>
>>> Sorry for asking so late.
>>> (statsmodels is short on maintainers, and I'm distracted)
>>>
>>>
>>> statsmodels still has code to support recarrays and structured dtypes
>>> from the time before pandas became popular, but I don't think anyone is
>>> using them together with statsmodels anymore.
>>>
>>>
>> There were several commits dealing both recarrays and ufuncs, so this
>> might well be a regression.
>>
>>
> A bisection would be helpful. Also, open an issue.
>


The reason for the test failure might be somewhere else hiding behind
several layers of statsmodels, but only started to show up with numpy 1.10.1

I already have the reduce exception with my currently installed numpy
'1.9.2rc1'

>>> x = np.random.random(9*3).view([('const', 'f8'),('x_1', 'f8'), ('x_2',
'f8')]).view(np.recarray)

>>> np.ptp(x, axis=0)
Traceback (most recent call last):
  File "", line 1, in 
  File
"C:\programs\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\site-packages\numpy\core\fromnumeric.py",
line 2047, in ptp
return ptp(axis, out)
TypeError: cannot perform reduce with flexible type


Sounds like fun, and I don't even know how to automatically bisect.

Josef


>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy 1.10.1 reduce operation on recarrays

2015-10-16 Thread josef.pktd
On Fri, Oct 16, 2015 at 8:56 PM, Allan Haldane 
wrote:

> On 10/16/2015 05:31 PM, josef.p...@gmail.com wrote:
> >
> >
> > On Fri, Oct 16, 2015 at 2:21 PM, Charles R Harris
> > > wrote:
> >
> >
> >
> > On Fri, Oct 16, 2015 at 12:20 PM, Charles R Harris
> > >
> wrote:
> >
> >
> >
> > On Fri, Oct 16, 2015 at 11:58 AM,  > > wrote:
> >
> > was there a change with reduce operations with recarrays in
> > 1.10 or 1.10.1?
> >
> > Travis shows a new test failure in the statsmodels testsuite
> > with 1.10.1:
> >
> > ERROR: test suite for  > 'statsmodels.base.tests.test_data.TestRecarrays'>
> >
> >   File
> >
>  
> "/home/travis/miniconda/envs/statsmodels-test/lib/python2.7/site-packages/statsmodels-0.8.0-py2.7-linux-x86_64.egg/statsmodels/base/data.py",
> > line 131, in _handle_constant
> > const_idx = np.where(self.exog.ptp(axis=0) ==
> > 0)[0].squeeze()
> > TypeError: cannot perform reduce with flexible type
> >
> >
> > Sorry for asking so late.
> > (statsmodels is short on maintainers, and I'm distracted)
> >
> >
> > statsmodels still has code to support recarrays and
> > structured dtypes from the time before pandas became
> > popular, but I don't think anyone is using them together
> > with statsmodels anymore.
> >
> >
> > There were several commits dealing both recarrays and ufuncs, so
> > this might well be a regression.
> >
> >
> > A bisection would be helpful. Also, open an issue.
> >
> >
> >
> > The reason for the test failure might be somewhere else hiding behind
> > several layers of statsmodels, but only started to show up with numpy
> 1.10.1
> >
> > I already have the reduce exception with my currently installed numpy
> > '1.9.2rc1'
> >
>  x = np.random.random(9*3).view([('const', 'f8'),('x_1', 'f8'),
> > ('x_2', 'f8')]).view(np.recarray)
> >
>  np.ptp(x, axis=0)
> > Traceback (most recent call last):
> >   File "", line 1, in 
> >   File
> >
> "C:\programs\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\site-packages\numpy\core\fromnumeric.py",
> > line 2047, in ptp
> > return ptp(axis, out)
> > TypeError: cannot perform reduce with flexible type
> >
> >
> > Sounds like fun, and I don't even know how to automatically bisect.
> >
> > Josef
>
> That example isn't the problem (ptp should definitely fail on structured
> arrays), but I've tracked down what is - it has to do with views of
> record arrays.
>
> The fix looks simple, I'll get it in for the next release.
>

Thanks,

I realized that at that point in the statsmodels code we should have only
regular ndarrays, so the array conversion fails somewhere.

AFAICS, the main helper function to convert is

def struct_to_ndarray(arr):
return arr.view((float, len(arr.dtype.names)))

which doesn't look like it will handle other dtypes than float64. Nobody
ever complained, so maybe our test suite is the only user of this.

What is now the recommended way of converting structured dtypes/recarrays
to ndarrays?

Josef



>
> Allan
>
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Oops - maybe post3 numpy file?

2015-10-08 Thread josef.pktd
On Thu, Oct 8, 2015 at 8:39 PM, Charles R Harris 
wrote:

>
>
> On Thu, Oct 8, 2015 at 6:30 PM, Matthew Brett 
> wrote:
>
>> Hi,
>>
>> I'm afraid I made a mistake uploading OSX wheels for numpy 1.10.0.
>> Using twine to do the upload generated a new release - 1.10.0.post2 -
>> containing only the wheels.  I deleted that new release to avoid
>> confusion, but now, when I try and upload the wheels to the 1.10.0
>> pypi release via the web form, I get this error:
>>
>> Error processing form
>>
>> This filename has previously been used, you should use a different
>> version.
>>
>> Any chance of a post3 upload so I can upload some matching wheels?
>>
>> Sorry about that,
>>
>
> Yeah, pipy is why we are on post2 already. Given the problem with msvc9, I
> think we are due for 1.10.1 in a day or two. Or, I could revert the
> troublesome commit and do a post3 tomorrow. Hmm... decisions, decisions.
> I'll see if Julian has anything to say in the morning and go from there.
>


If you manage a release without a `post` post-fix, then I would not have to
worry right away about what to do about a statsmodels setup.py that cannot
handle it.


Josef



>
> Chuck
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Sign of NaN

2015-09-29 Thread josef.pktd
On Tue, Sep 29, 2015 at 11:25 AM, Anne Archibald 
wrote:

> IEEE 754 has signum(NaN)->NaN. So does np.sign on floating-point arrays.
> Why should it be different for object arrays?
>
> Anne
>
> P.S. If you want exceptions when NaNs appear, that's what np.seterr is
> for. -A
>


I also think NaN should be treated the same way as floating point numbers
(whatever that is). Otherwise it is difficult to remember when nan is
essentially a float dtype or another dtype.
(given that float is the smallest dtype that can hold a nan)

Josef



>
> On Tue, Sep 29, 2015 at 5:18 PM Freddy Rietdijk 
> wrote:
>
>> I wouldn't know of any valid output when applying the sign function to
>> NaN. Therefore, I think it is correct to return a ValueError. Furthermore,
>> I would prefer such an error over just returning NaN since it helps you
>> locating where NaN is generated.
>>
>> On Tue, Sep 29, 2015 at 5:13 PM, Charles R Harris <
>> charlesr.har...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> Due to a recent commit, Numpy master now raises an error when applying
>>> the sign function to an object array containing NaN. Other options may be
>>> preferable, returning NaN for instance, so I would like to open the topic
>>> for discussion on the list.
>>>
>>> Thoughts?
>>>
>>> Chuck
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Sign of NaN

2015-09-29 Thread josef.pktd
On Tue, Sep 29, 2015 at 2:16 PM, Nathaniel Smith  wrote:

> On Sep 29, 2015 8:25 AM, "Anne Archibald"  wrote:
> >
> > IEEE 754 has signum(NaN)->NaN. So does np.sign on floating-point arrays.
> Why should it be different for object arrays?
>
> The argument for doing it this way would be that arbitrary python objects
> don't have a sign, and the natural way to implement something like
> np.sign's semantics using only the "object" interface is
>
> if obj < 0:
> return -1
> elif obj > 0:
> return 1
> elif obj == 0:
> return 0
> else:
> raise
>
> In general I'm not a big fan of trying to do all kinds of guessing about
> how to handle random objects in object arrays, the kind that ends up with a
> big chain of type checks and fallback behaviors. Pretty soon we find
> ourselves trying to extend the language with our own generic dispatch
> system for arbitrary python types, just for object arrays. (The current
> hack where for object arrays np.log will try calling obj.log() is
> particularly horrible. There is no rule in python that "log" is a reserved
> method name for "logarithm" on arbitrary objects. Ditto for the other
> ufuncs that implement this hack.)
>
> Plus we hope that many use cases for object arrays will soon be supplanted
> by better dtype support, so now may not be the best time to invest heavily
> in making object arrays complicated and powerful.
>
> OTOH sometimes practicality beats purity, and at least object arrays are
> already kinda cordoned off from the rest of the system, so I don't feel as
> strongly as if we were talking about core functionality.
>
> ...is there a compelling reason to even support np.sign on object arrays?
> This seems pretty far into the weeds, and that tends to lead to poor
> intuition and decision making.
>

One of the usecases that has sneaked in during the last few numpy versions
is that object arrays contain numerical arrays where the shapes don't add
up to a rectangular array.
In those cases being able to apply numerical operations might be useful.

But I'm +0 since I don't work with object arrays.

Josef



> -n
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Governance model request

2015-09-22 Thread josef.pktd
On Tue, Sep 22, 2015 at 10:55 PM, Bryan Van de Ven 
wrote:

>
> > On Sep 22, 2015, at 1:48 PM, Matthew Brett 
> wrote:
> >
> > The point is, that a sensible organization and a sensible leader has
> > to take the possibility of conflict of interest into account.  They
> > also have to consider the perception of a conflict of interest.
>
> Of course, and the policies to deal with conflicts have deal with the
> possibility that *anyone* at all might have conflict. But that was not your
> suggestion. Your suggestion was to make one individual be subject to
> additional scrutiny that no one else would be subject to. Please explain
> why should one person be singled out for a special "six-month waiting
> period" when the exact same possibility for conflict exists with anyone who
> is ever on the committee?
>


I don't quite understand where the discussion went. The question was not
whether Travis is singled out but whether he is "singled in".

>From my perspectives (7 to 8 years) the situation has changed a lot. Most
of the discussion and consensus building happens on github issues, pull
requests and mailing lists. Merge policy has changed a lot since the svn
days.

Based on all the comments, Travis doesn't have time for this. And I think
the final (last resort) decisions about code should be left to the active
developers that know and participate in the actual work.

If Travis is treated as developer but doesn't have a special status, then
there is also no reason to "single out" him and Continuum for possibly too
much influence.

This is already the current status quo as it developed over the last
several years, AFAICS.

In my view, in a narrow sense, Travis is a "hit and run" contributor. good
ideas and providing good contributions, but somebody has to integrate it,
maintain it and fit it into the development plan.
(Given my experience I would compare it more with GSOC contributions that
need the "core developers" to provide the continuity in the development to
absorb the good work.)
Travis has too many other obligations and interests to provide this day to
day continuity.

Travis is still very important for providing ideas, pushing projects
forward and as one of the community leaders, but I would leave the final
decisions for the development of numpy to the developers in the trenches.

I pretty much agree completely with Nathanial, and Sebastian, (except that
I don't know much about any other FOSS communities)

And to repeat my point from another thread on this: I'm very skeptical
about any committee or board that is based on "outside members" and that is
involved in the decision about code development.

Josef




>
> > It is the opposite of sensible, to respond to this with 'how dare you"
> > or by asserting that this could never happen or by saying that we
> > shouldn't talk about that in case people get frightened.  I point you
>
> Red herring. The objection (as many people have now voiced) is the double
> standard you proposed.
>
> > again to Linus' interview [1].  He is not upset that he has been
> > insulted by the implication of conflict of interest, he soberly
> > accepts that this will always be an issue, with companies in
> > particular, and goes out of his way to address that in an explicit and
> > reasonable way.
>
> Your selective quotation is impressive. Also in that interview, Linus
> points out that his employment contract is probably "unique in the entire
> world". Which means in practical terms that the details of what he has does
> are fairly well irrelevant to any other situation. So what is the point in
> bringing it up, at all, except to try and diminish someone else by
> comparison?
>
> (I'm done.)
>
> Bryan
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] meshgrid dtype casting

2015-09-04 Thread josef.pktd
I'm trying to build a meshgrid with small nonnegative integers

default is int32

>>> np.meshgrid([0,1,2], [0,1])[0].dtype
dtype('int32')

If I use uint, then the arrays are upcast to int64 - Why?

>>> np.meshgrid(np.array([0,1,2], np.uint), np.array([0,1],
np.uint))[0].dtype
dtype('int64')


broadcast_arrays preserves dtype

>>> np.broadcast_arrays(np.array([0,1,2], np.uint)[:,None], np.array([0,1],
np.uint)[None, :])
[array([[0, 0],
   [1, 1],
   [2, 2]], dtype=uint32), array([[0, 1],
   [0, 1],
   [0, 1]], dtype=uint32)]



with uint8
>>> np.broadcast_arrays(np.array([0,1,2], np.uint8)[:,None],
np.array([0,1], np.uint8)[None, :])
[array([[0, 0],
   [1, 1],
   [2, 2]], dtype=uint8), array([[0, 1],
   [0, 1],
   [0, 1]], dtype=uint8)]

>>> np.meshgrid(np.array([0,1,2], np.uint8), np.array([0,1],
np.uint8))[0].dtype
dtype('int32')
>>>


Winpython 64 bit
>>> np.__version__
'1.9.2rc1'

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015)

2015-09-04 Thread josef.pktd
On Fri, Sep 4, 2015 at 7:19 PM, Matthew Brett 
wrote:

> On Sat, Sep 5, 2015 at 12:04 AM,   wrote:
> >
> >
> > On Fri, Sep 4, 2015 at 5:55 PM, Matthew Brett 
> > wrote:
> >>
> >> Hi,
> >>
> >> On Fri, Sep 4, 2015 at 10:22 PM, Eric Firing 
> wrote:
> >> > On 2015/09/04 10:53 AM, Matthew Brett wrote:
> >> >> On Fri, Sep 4, 2015 at 2:33 AM, Matthew Brett <
> matthew.br...@gmail.com>
> >> >> wrote:
> >> >>> Hi,
> >> >>>
> >> >>> On Wed, Sep 2, 2015 at 5:41 PM, Chris Barker  >
> >> >>> wrote:
> >>  1) I very much agree that governance can make or break a project.
> >>  However,
> >>  the actual governance approach often ends up making less difference
> >>  than the
> >>  people involved.
> >> 
> >>  2) While the FreeBSD and XFree examples do point to some real
> >>  problems with
> >>  the "core" model it seems that there are many other projects that
> are
> >>  using
> >>  it quite successfully.
> >> >>
> >> >> I was just rereading the complaints about the 'core' structure from
> >> >> high-level NetBSD project leaders:
> >> >>
> >> >> "[the "core" and "board of directors"] teams are dysfunctional
> because
> >> >> they do not provide leadership: all they do is act reactively to
> >> >> requests from users and/or to resolve internal disputes. In other
> >> >> words: there is no initiative nor vision emerging from these teams
> >> >> (and, for that matter, from anybody)." [1]
> >> >>
> >> >> "There is no high-level direction; if you ask "what about the
> problems
> >> >> with threads" or "will there be a flash-friendly file system", the
> >> >> best you'll get is "we'd love to have both" -- but no work is done to
> >> >> recruit people to code these things, or encourage existing developers
> >> >> to work on them." [2]
> >> >
> >> >
> >> > This is consistent with Chris's first point.
> >>
> >> Do you mean Chris' point that "I very much agree that governance can
> >> make or break a project"?   Charles Hannum's complaints about NetBSD
> >> are very specific in blaming the model rather than the people.   I
> >> think the XFree86 story supports the same conclusion - that the
> >> governance model caused a sense of diffused responsibility that lead
> >> to bad decisions and lack of direction.
> >>
> >> >> I imagine we will have to reconcile ourselves to similar problems, if
> >> >> we adopt the same structures.
> >> >
> >> > Do you have suggestions as to who would make a good numpy president or
> >> > BDFL and potentially has the time and inclination to do it, or how to
> >> > identify and recruit such a person?
> >>
> >> That's a good question, and the answer is that in the current
> >> situation (zero interest in this discussion from the three current
> >> members of the numpy leadership team) - no reasonable person would be
> >> interested in that job.   That's the situation we're in, and so we
> >> have to accept that nothing is going to change, with the consequences
> >> that implies.   If the situation were different, and we had the
> >> interest or commitment to explore this problem, then I guess we could
> >> discuss other options including the one I suggested further up the
> >> thread.
> >
> >
> > "
> >
> > Today, the project is run by a different cabal.  This is the result of a
> > coup that took place in 2000-2001, in which The NetBSD Foundation was
> > taken over by a fraudulent change of the board of directors.  (Note:
> > It's probably too late for me to pursue any legal remedy for this,
> > unfortunately.)  Although "The NetBSD Project" and "The NetBSD
> > Foundation" were intended from the start to be separate entities -- the
> > latter supplying support infrastructure for the former -- this
> > distinction has been actively blurred since, so that the current "board"
> > of TNF has rather tight control over many aspects of TNP.
> >
> > "
> >
> > "
> >
> > The existing NetBSD Foundation must be disbanded, and replaced with
> >an organization that fulfills its original purpose: to merely handle
> >administrative issues, and not to manage day-to-day affairs.
> >
> > "
> >
> >
> > It doesn't sound to me like a developer and community driven governance
> > structure to me.
>
> I think that's a separate issue - the distinction between the 'board'
> and the 'core'.   It would be great if the 'core' concept was fine as
> long as there is no 'board' but I think that's a hard argument to
> make.
>

there is an "esprit de corps" pronounced "esprit de core" but not an
"esprit de board"

I trust the core developers, but not ...

But maybe I don't understand some definitions

"

The "core" group must be replaced with people who are actually
   competent and dedicated enough to review proposals, accept feedback,
   and make good decisions.

"

I thought that's what the "core" group is.

Josef



>
> Cheers,
>
> Matthew
> 

Re: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015)

2015-09-04 Thread josef.pktd
On Fri, Sep 4, 2015 at 5:55 PM, Matthew Brett 
wrote:

> Hi,
>
> On Fri, Sep 4, 2015 at 10:22 PM, Eric Firing  wrote:
> > On 2015/09/04 10:53 AM, Matthew Brett wrote:
> >> On Fri, Sep 4, 2015 at 2:33 AM, Matthew Brett 
> wrote:
> >>> Hi,
> >>>
> >>> On Wed, Sep 2, 2015 at 5:41 PM, Chris Barker 
> wrote:
>  1) I very much agree that governance can make or break a project.
> However,
>  the actual governance approach often ends up making less difference
> than the
>  people involved.
> 
>  2) While the FreeBSD and XFree examples do point to some real
> problems with
>  the "core" model it seems that there are many other projects that are
> using
>  it quite successfully.
> >>
> >> I was just rereading the complaints about the 'core' structure from
> >> high-level NetBSD project leaders:
> >>
> >> "[the "core" and "board of directors"] teams are dysfunctional because
> >> they do not provide leadership: all they do is act reactively to
> >> requests from users and/or to resolve internal disputes. In other
> >> words: there is no initiative nor vision emerging from these teams
> >> (and, for that matter, from anybody)." [1]
> >>
> >> "There is no high-level direction; if you ask "what about the problems
> >> with threads" or "will there be a flash-friendly file system", the
> >> best you'll get is "we'd love to have both" -- but no work is done to
> >> recruit people to code these things, or encourage existing developers
> >> to work on them." [2]
> >
> >
> > This is consistent with Chris's first point.
>
> Do you mean Chris' point that "I very much agree that governance can
> make or break a project"?   Charles Hannum's complaints about NetBSD
> are very specific in blaming the model rather than the people.   I
> think the XFree86 story supports the same conclusion - that the
> governance model caused a sense of diffused responsibility that lead
> to bad decisions and lack of direction.
>
> >> I imagine we will have to reconcile ourselves to similar problems, if
> >> we adopt the same structures.
> >
> > Do you have suggestions as to who would make a good numpy president or
> > BDFL and potentially has the time and inclination to do it, or how to
> > identify and recruit such a person?
>
> That's a good question, and the answer is that in the current
> situation (zero interest in this discussion from the three current
> members of the numpy leadership team) - no reasonable person would be
> interested in that job.   That's the situation we're in, and so we
> have to accept that nothing is going to change, with the consequences
> that implies.   If the situation were different, and we had the
> interest or commitment to explore this problem, then I guess we could
> discuss other options including the one I suggested further up the
> thread.
>

"

Today, the project is run by a different cabal.  This is the result of a
coup that took place in 2000-2001, in which The NetBSD Foundation was
taken over by a fraudulent change of the board of directors.  (Note:
It's probably too late for me to pursue any legal remedy for this,
unfortunately.)  Although "The NetBSD Project" and "The NetBSD
Foundation" were intended from the start to be separate entities -- the
latter supplying support infrastructure for the former -- this
distinction has been actively blurred since, so that the current "board"
of TNF has rather tight control over many aspects of TNP.

"

"

The existing NetBSD Foundation must be disbanded, and replaced with
   an organization that fulfills its original purpose: to merely handle
   administrative issues, and not to manage day-to-day affairs.

"


It doesn't sound to me like a developer and community driven governance
structure to me.

Cheers

Josef
https://jeb2016.com/?lang=es



>
> Cheers,
>
> Matthew
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015)

2015-08-27 Thread josef.pktd
On Thu, Aug 27, 2015 at 8:57 AM, Matthew Brett matthew.br...@gmail.com
wrote:

 Hi,

 On Thu, Aug 27, 2015 at 12:11 PM, Sebastian Berg
 sebast...@sipsolutions.net wrote:
  On Do, 2015-08-27 at 10:45 +0100, Matthew Brett wrote:
  Hi,
 
  On Thu, Aug 27, 2015 at 10:35 AM, Bryan Van de Ven bry...@continuum.io
 wrote:
  
   On Aug 27, 2015, at 10:22 AM, Matthew Brett matthew.br...@gmail.com
 wrote:
  
   In the case of the 'core' model, we have some compelling testimony
   from someone with a great deal of experience:
  
   
   Much of this early structure (CVS, web site, cabal [core group],
   etc.) was copied verbatim by other open source (this term not being
 in
   wide use yet) projects -- even the form of the project name and the
   term core. This later became a kind of standard template for
   starting up an open source project. [...] I'm sorry to say that I
   helped create this problem, and that most of the projects which
   modeled themselves after NetBSD (probably due to its high popularity
   in 1993 and 1994) have suffered similar problems. FreeBSD and
 XFree86,
   for example, have both forked successor projects (Dragonfly and
 X.org)
   for very similar reasons.
   
  
   Who goes on to propose:
  
   7) The core group must be replaced with people who are actually
  competent and dedicated enough to review proposals, accept
 feedback,
  and make good decisions.  More to the point, though, the core
 group
  must only act when *needed* -- most technical decisions should be
  left to the community to hash out; it must not preempt the
 community
  from developing better solutions.  (This is how the core group
  worked during most of the project's growth period.)
 
  Sure.  I think it's reasonable to give high weight to Hannum's
  assessment of the failure of the core group, but less weight to his
  proposal for a replacement, because at the time, I don't believe he
  was in a good position to assess whether his (apparent) alternative
  would run into the same trouble.
 
  It's always tempting to blame the people rather than the system, but
  in this case, I strongly suspect that it was the system that was
  fundamentally flawed, therefore either promoting the wrong people or
  putting otherwise competent people into situations where they are no
  longer getting useful feedback.
 
  Maybe so. I do not know much at all about these models, but I am not
  sure how much applies here to numpy. Isn't at least FreeBSD a magnitude
  larger then numpy?

 It seems to me that numpy suffers from the same risks of poor
 accountability, stagnation and conservatism that larger projects do.
 Is there a reason that would not be the case?

  We do need to have some formality about how to give out commit rights,
  and do final decision when all else fails.

 Yes, sure, something formal is probably but not certainly better than
 nothing, depending on what the 'something formal' is.

  One thing I do not know is how a community vote could work at all,
  considering I do not even know how to count its members. Votes and
  presidents make sense to me for large projects with hundrets of
  developers on different corners (think of the gnome foundation, debian
  probably) [1].

 The 'president' idea is to get at the problem of lack of
 accountability, along with selection for leadership skill rather than
 coding ability.   It's trying to get at the advantages of the BDFL
 model in our situation where there is no obvious BDFL.For the me
 the problem is that, at the moment, if the formal or informal
 governing body makes a bad decision, then no member will feel
 responsible for that decision or its consequences.  That tends to lead
 to an atmosphere of - oh well, what could we do, X wouldn't agree to
 A and Y wouldn't agree to B so we're stuck.   It seems to me we need
 a system such that whoever is in charge feels so strongly that it is
 their job to make numpy as good as possible, that they will take
 whatever difficult or sensitive decisions are necessary to make that
 happen.  On the other hand the 'core' system seems to function on a
 model of mutual deference and personal loyalty that I believe is
 destructive of good management.



I don't really see a problem with codifying the status quo.

It might become necessary to have something like an administrative director
if numpy becomes a more formal organization with funding, but for the
development of the project I don't see any need for a president.
If there is no obvious BDFL, then I guess there is also no obvious
president. (I would vote for Ralf as president of everything, but I don't
think he's available.)

As the current debate shows it's possible to have a public discussion about
the direction of the project without having to delegate providing a vision
to a president.

Given the current pattern all critical issues end up in a public debate on
the mailing list. What numpy (and scipy) need is to have someone as a tie
breaker to make any 

Re: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015)

2015-08-27 Thread josef.pktd
On Thu, Aug 27, 2015 at 12:22 PM, Matthew Brett matthew.br...@gmail.com
wrote:

 Hi

 On Thu, Aug 27, 2015 at 5:11 PM,  josef.p...@gmail.com wrote:
 
 
  On Thu, Aug 27, 2015 at 11:04 AM, Matthew Brett matthew.br...@gmail.com
 
  wrote:
 
  Hi,
 
  On Thu, Aug 27, 2015 at 3:34 PM,  josef.p...@gmail.com wrote:
  [snip]
   I don't really see a problem with codifying the status quo.
 
  That's an excellent point.If we believe that the current situation
  is the best possible, both now and in the future, then codifying the
  status quo is an excellent idea.
 
  So, we should probably first start by asking ourselves:
 
  * what numpy is doing well;
  * what numpy could do better;
 
  and then ask, is there some way we could make it more likely we will
  improve over time.
 
  [snip]
 
   As the current debate shows it's possible to have a public discussion
   about
   the direction of the project without having to delegate providing a
   vision
   to a president.
 
  The idea of a president that I had in mind, was not someone who makes
  all decisions, but the person who holds themselves responsible for the
  performance of the project.  If the project has a coherent vision
  already, the president has no need to provide one, but it's the
  president's job to worry about whether we have vision or not, and do
  what they need to, to make sure we don't lose track of that.   If you
  don't know it already, I highly recommend Jim Collins' work on 'level
  5 leadership' [1]
 
 
  Still doesn't sound like the need for a president to me
 
   the person who holds themselves responsible for the
  performance of the project
 
  sounds more like the role of the core group (adding plural to persons)
 to
  me, and cannot be pushed of to an official president.

 Except that, in the past, having multiple people taking decisions has
 led to the situation where no-one feels themselves accountable for the
 result, hence this situation tends to lead to stagnation.


Is there any evidence for this?

First, it's several individuals taking joint decisions, jointly agree or
not object (LGTM) to merging PRs, it's still a joint decision and
accountability is not exclusive. The PR review process makes decisions much
more into a joint decision process than it was with isolated SVN commits.
(*)

Second, if there are separated decisions, then it could also lead to excess
change. All these enthusiastic new developers bringing in whatever they
(and the local chief) like, and nobody to stop them.

In either case, the developer, or local chief, has to deal with the
consequences. You merged this PR, now fix it. or Why are you holding up
this PR? I can merge it.

(*) Even though I'm not a scipy developer anymore, I still feel partially
responsible for it, I'm still reviewing some PRs and comment on them,
sometimes as cheerleader in favor of something or sometimes pointing out
problems, or just checking that it makes sense, and always with an eye on
what downstream impact it might have.

Another thought: Having an accountable president might actually reduce the
feeling of accountability and responsibility of individual developers, so
the neteffect is negative.
The president is responsible for this (even though he doesn't have enough
time), so I can skip part of this review.

Josef




  Nathaniel to push and organize the discussion, Chuck for continuity, and
  several core developers for detailed ideas and implementation, and a
 large
  number of contributors. (stylized roles)
  and noisy mailing list for feedback and discussion.
 
  Given the size of the numpy development group, numpy needs individuals
 for
  the vision and to push things not a president, vice-presidents and
 assistant
  vice-presidents, IMO.

 Yes, if the roles were honorary and administrative, they would not be
 useful.


I'm not sure what you mean here. Given that it's all volunteer work, any
president wouldn't have any hard tools.

Josef




 Cheers,

 Matthew
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015

2015-08-27 Thread josef.pktd
On Wed, Aug 26, 2015 at 10:06 AM, Travis Oliphant tra...@continuum.io
wrote:



 On Wed, Aug 26, 2015 at 1:41 AM, Nathaniel Smith n...@pobox.com wrote:

 Hi Travis,

 Thanks for taking the time to write up your thoughts!

 I have many thoughts in return, but I will try to restrict myself to two
 main ones :-).

 1) On the question of whether work should be directed towards improving
 NumPy-as-it-is or instead towards a compatibility-breaking replacement:
 There's plenty of room for debate about whether it's better engineering
 practice to try and evolve an existing system in place versus starting
 over, and I guess we have some fundamental disagreements there, but I
 actually think this debate is a distraction -- we can agree to disagree,
 because in fact we have to try both.


 Yes, on this we agree.   I think NumPy can improve *and* we can have new
 innovative array objects.   I don't disagree about that.



 At a practical level: NumPy *is* going to continue to evolve, because it
 has users and people interested in evolving it; similarly, dynd and other
 alternatives libraries will also continue to evolve, because they also have
 people interested in doing it. And at a normative level, this is a good
 thing! If NumPy and dynd both get better, than that's awesome: the worst
 case is that NumPy adds the new features that we talked about at the
 meeting, and dynd simultaneously becomes so awesome that everyone wants to
 switch to it, and the result of this would be... that those NumPy features
 are exactly the ones that will make the transition to dynd easier. Or if
 some part of that plan goes wrong, then well, NumPy will still be there as
 a fallback, and in the mean time we've actually fixed the major pain points
 our users are begging us to fix.

 You seem to be urging us all to make a double-or-nothing wager that your
 extremely ambitious plans will all work out, with the entire numerical
 Python ecosystem as the stakes. I think this ambition is awesome, but maybe
 it'd be wise to hedge our bets a bit?


 You are mis-characterizing my view.  I think NumPy can evolve (though I
 would personally rather see a bigger change to the underlying system like I
 outlined before).But, I don't believe it can even evolve easily in the
 direction needed without breaking ABI and that insisting on not breaking it
 or even putting too much effort into not breaking it will continue to
 create less-optimal solutions that are harder to maintain and do not take
 advantage of knowledge this community now has.

 I'm also very concerned that 'evolving' NumPy will create a situation
 where there are regular semantic and subtle API changes that will cause
 NumPy to be less stable for it's user-base.I've watched this happen.
 This at a time that people are already looking around for new and different
 approaches anyway.



 2) You really emphasize this idea of an ABI-breaking (but not
 API-breaking) release, and I think this must indicate some basic gap in how
 we're looking at things. Where I'm getting stuck here is that... I actually
 can't think of anything important that we can't do now, but could if we
 were allowed to break ABI compatibility. The kinds of things that break ABI
 but keep API are like... rearranging what order the fields in a struct fall
 in, or changing the numeric value of opaque constants like
 NPY_ARRAY_WRITEABLE. The biggest win I can think of is that we could save a
 few bytes per array by arranging the fields inside the ndarray struct more
 optimally, but that's hardly a feature to hang a 2.0 on. You seem to have a
 vision of this ABI-breaking release as being something very different from
 that, and I'm not clear on what this vision is.


 We already broke the ABI with date-time changes --- it's still broken for
 a certain percentage of users last I checked.So, part of my
 disagreement is that we've tried this and it didn't work --- even though
 smart people thought it would.I've had to deal with this personally and
 I'm not enthusiastic about having to deal with this for the next 5 years
 because of even more attempts to make changes while not breaking the ABI.
  I think the group is more careful now --- but I still think the API is
 broad enough and uses of NumPy deep enough that the effort involved in
 trying not to break the ABI is just not worth the effort (because it's a
 non-feature today).Adding new dtypes without breaking the ABI is tricky
 (and to do it without breaking the ABI is ugly).   I also continue to
 believe that putting out a new ABI-breaking NumPy will allow re-compiling
 *once* (with some porting changes needed) and not subtle breakages
 requiring code-changes every time a release is made.If subtle changes
 aren't made, then the new features won't come.   Right now, I'd rather have
 stability from NumPy than new features.   New features can come from other
 libraries.

 One specific change that could easily be made in NumPy 2.0 (the current
 code 

Re: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015)

2015-08-27 Thread josef.pktd
On Thu, Aug 27, 2015 at 11:04 AM, Matthew Brett matthew.br...@gmail.com
wrote:

 Hi,

 On Thu, Aug 27, 2015 at 3:34 PM,  josef.p...@gmail.com wrote:
 [snip]
  I don't really see a problem with codifying the status quo.

 That's an excellent point.If we believe that the current situation
 is the best possible, both now and in the future, then codifying the
 status quo is an excellent idea.

 So, we should probably first start by asking ourselves:

 * what numpy is doing well;
 * what numpy could do better;

 and then ask, is there some way we could make it more likely we will
 improve over time.

 [snip]

  As the current debate shows it's possible to have a public discussion
 about
  the direction of the project without having to delegate providing a
 vision
  to a president.

 The idea of a president that I had in mind, was not someone who makes
 all decisions, but the person who holds themselves responsible for the
 performance of the project.  If the project has a coherent vision
 already, the president has no need to provide one, but it's the
 president's job to worry about whether we have vision or not, and do
 what they need to, to make sure we don't lose track of that.   If you
 don't know it already, I highly recommend Jim Collins' work on 'level
 5 leadership' [1]


Still doesn't sound like the need for a president to me

 the person who holds themselves responsible for the
performance of the project

sounds more like the role of the core group (adding plural to persons) to
me, and cannot be pushed of to an official president.

Nathaniel to push and organize the discussion, Chuck for continuity, and
several core developers for detailed ideas and implementation, and a large
number of contributors. (stylized roles)
and noisy mailing list for feedback and discussion.

Given the size of the numpy development group, numpy needs individuals for
the vision and to push things not a president, vice-presidents and
assistant vice-presidents, IMO.

(Given the importance of numpy itself, there should be enough remedies if
the core group ever gets `out of touch` with the very large user base.)

Josef



  Given the current pattern all critical issues end up in a public debate
 on
  the mailing list. What numpy (and scipy) need is to have someone as a tie
  breaker to make any final decisions if there is no clear consensus, if
 there
  is no BDFL for it, then having the core group making those decisions
 looks
  appropriate to me.
 
  On the other hand the 'core' system seems to function on a
  model of mutual deference and personal loyalty that I believe is
  destructive of good management.
 
  That sounds actually like a good basis for team work to me. Also that has
  mutual in it instead of just deferring and being loyal to a president.
 
  Since I know scipy development much better:
 
  scipy has made a huge progress in the last 5 or 6 years since I've been
  following it, both in terms of code, in terms of development workflow,
 and
  in the number of developers. (When I started, I was essentially alone in
  scipy.stats, now there are 3 to 5 core developers that at least
 partially
  work on it, everything goes through PRs with public discussion and with
  critical issues additionally raised on the mailing list.)
 
  Ralf and Pauli are the defacto BDFLs for scipy overall, but all
 decisions in
  recent years have been without a fight, but not without lots of
 arguments,
  and, given the size and breadth of scipy, there are field experts
 (although
  not enough of those) to help in the discussion.

 I agree entirely, I think scipy is a good example where stability and
 clarity of leadership has made a huge difference to the health of the
 project.

 Cheers,

 Matthew

 [1]
 https://hbr.org/2005/07/level-5-leadership-the-triumph-of-humility-and-fierce-resolve
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015)

2015-08-27 Thread josef.pktd
On Thu, Aug 27, 2015 at 2:06 PM, Matthew Brett matthew.br...@gmail.com
wrote:

 Hi,

 On Thu, Aug 27, 2015 at 6:23 PM,  josef.p...@gmail.com wrote:
 
 
  On Thu, Aug 27, 2015 at 12:22 PM, Matthew Brett matthew.br...@gmail.com
 
  wrote:
 
  Hi
 
  On Thu, Aug 27, 2015 at 5:11 PM,  josef.p...@gmail.com wrote:
  
  
   On Thu, Aug 27, 2015 at 11:04 AM, Matthew Brett
   matthew.br...@gmail.com
   wrote:
  
   Hi,
  
   On Thu, Aug 27, 2015 at 3:34 PM,  josef.p...@gmail.com wrote:
   [snip]
I don't really see a problem with codifying the status quo.
  
   That's an excellent point.If we believe that the current
 situation
   is the best possible, both now and in the future, then codifying the
   status quo is an excellent idea.
  
   So, we should probably first start by asking ourselves:
  
   * what numpy is doing well;
   * what numpy could do better;
  
   and then ask, is there some way we could make it more likely we will
   improve over time.
  
   [snip]
  
As the current debate shows it's possible to have a public
 discussion
about
the direction of the project without having to delegate providing a
vision
to a president.
  
   The idea of a president that I had in mind, was not someone who makes
   all decisions, but the person who holds themselves responsible for
 the
   performance of the project.  If the project has a coherent vision
   already, the president has no need to provide one, but it's the
   president's job to worry about whether we have vision or not, and do
   what they need to, to make sure we don't lose track of that.   If you
   don't know it already, I highly recommend Jim Collins' work on 'level
   5 leadership' [1]
  
  
   Still doesn't sound like the need for a president to me
  
the person who holds themselves responsible for the
   performance of the project
  
   sounds more like the role of the core group (adding plural to
 persons)
   to
   me, and cannot be pushed of to an official president.
 
  Except that, in the past, having multiple people taking decisions has
  led to the situation where no-one feels themselves accountable for the
  result, hence this situation tends to lead to stagnation.
 
 
  Is there any evidence for this?

 Oh - dear - that's the key point, but I'm obviously not making it
 clearly enough.  Yes there is, and that was the evidence I was
 pointing to before.


If you mean the XFree and NetBSD cases, then I don't see any similarity to
the numpy or scipy development pattern. If I would draw any conclusion,
then maybe that NetBSD hat too much of formal governance structures and not
enough informal governance. It would be difficult to take over the
government if there is no government.

just one aside
*No desire to recruit new users  *

We are on a mission to take over the world (*). And forks of numpy like
pandas turn out to be mostly complementary and increase the userbase of
numpy.

(R and Python are in friendly, or sometimes unfriendly, competition, but,
AFAICS, we are both gaining users because of the others' presence. It's not
a zero sum game in this case.)

(*) But that's not in the mission statement.




 But anyway - Sebastian is right - this discussion isn't going anywhere
 useful.

 So - let's step back.

 In thinking about governance, we first need to ask what we want to
 achieve.  This includes considering the risks ahead for the project.

 So, in the spirit of fruitful discussion, can I ask what y'all
 consider to be the current problems with working on numpy (other than
 the technical ones).   What is numpy doing well, and what is it doing
 badly? What risks do we have to plan for in the future?


I thought that was implicit or explicit in the other thread.

Josef




 Cheers,

 Matthew
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.in1d() sets, bug?

2015-08-10 Thread josef.pktd
On Mon, Aug 10, 2015 at 1:40 PM, Benjamin Root ben.r...@ou.edu wrote:

  Not really, it is simply because ``np.asarray(set([1, 2, 3]))``
  returns an object array

 Holy crap! To be pedantic, it looks like it turns it into a numpy scalar,
 but still! I wouldn't have expected np.asarray() on a set (or dictionary,
 for that matter) to work because order is not guaranteed. Is this expected
 behavior?

 Digging into the implementation of in1d(), I can see now how passing a
 set() wouldn't be useful at all (as an aside, pretty clever algorithm). I
 know sets aren't array-like, but the code that used this seemed to work at
 first, and this problem wasn't revealed until I created some unit tests to
 exercise some possible corner cases. Silently producing possibly erroneous
 results is dangerous. Don't know if better documentation or some better
 sanity checking would be called for here, though.

 Ben Root


 On Mon, Aug 10, 2015 at 1:10 PM, Sebastian Berg 
 sebast...@sipsolutions.net wrote:

 On Mo, 2015-08-10 at 12:09 -0400, Benjamin Root wrote:
  Just came across this one today:
 
   np.in1d([1], set([0, 1, 2]), assume_unique=True)
  array([ False], dtype=bool)
 
   np.in1d([1], [0, 1, 2], assume_unique=True)
 
  array([ True], dtype=bool)
 
 
  I am assuming this has something to do with the fact that order is not
  guaranteed with set() objects? I was kind of hoping that setting
  assume_unique=True would be sufficient to overcome that problem.
  Should sets be rejected as an error?
 

 Not really, it is simply because ``np.asarray(set([1, 2, 3]))``
 returns an object array and 1 is not the same as ``set([1, 2, 3])``.

 I think earlier numpy versions may have had short cuts for short lists
 or something so this may have worked in some cases



is it possible to get at least a UserWarning when creating an object array
and dtype object hasn't been explicitly requested or underlying data is
already in an object dtype?


Josef



 - Sebastian


 
  This was using v1.9.0
 
 
  Cheers!
 
  Ben Root
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: Deprecate np.int, np.float, etc.?

2015-08-04 Thread josef.pktd
On Tue, Aug 4, 2015 at 4:39 AM, Sebastian Berg sebast...@sipsolutions.net
wrote:

 On Mo, 2015-08-03 at 21:32 +0200, Sturla Molden wrote:
  On 03/08/15 20:51, Chris Barker wrote:
 
   well, IIUC, np.int http://np.int is the python integer type, which
 is
   a C long in all the implemtations of cPython that I know about -- but
 is
   that a guarantee?in the future as well?
 
  It is a Python int on Python 2.
 
  On Python 3 dtype=np.int means the dtype will be C long, because a
  Python int has no size limit. But np.int aliases Python int. And
  creating an array with dype=int therefore does not create an array of
  Python int, it creates an array of C long. To actually get dtype=int we
  have to write dtype=object, which is just crazy.
 

 Since it seemes there may be a few half truths flying around in this
 thread. See http://docs.scipy.org/doc/numpy/user/basics.types.html



Quote:

Note that, above, we use the *Python* float object as a dtype. NumPy knows
that int refers to np.int_, bool meansnp.bool_, that float is np.float_ and
complex is np.complex_. The other data-types do not have Python
equivalents.

Is there a conflict with the current thread?

Josef
(I'm not a C person, so most of this is outside my scope, except for
watching bugfixes to make older code work for larger datasets. Use `intp`,
Luke.)




 and also note the sentence below the table (maybe the table should also
 note these):

 Additionally to intc the platform dependent C integer types short, long,
 longlong and their unsigned versions are defined.

 - Sebastian

 
  Sturla
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] difference between dtypes

2015-07-24 Thread josef.pktd
On Fri, Jul 24, 2015 at 3:46 AM, Robert Kern robert.k...@gmail.com wrote:

 On Wed, Jul 22, 2015 at 7:45 PM, josef.p...@gmail.com wrote:
 
  Is there an explanation somewhere of what different basic dtypes mean,
 across platforms and python versions?
 
   np.bool8
  type 'numpy.bool_'
   np.bool_
  type 'numpy.bool_'
   bool
  type 'bool'
 
 
  Are there any rules and recommendations or is it all folks lore?

 This may help a little:


 http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html#arrays-dtypes-constructing

 Basically, we accept the builtin Python type objects as a dtype argument
 and do something sensible with them. float - np.float64 because Python
 floats are C doubles. int - np.int32 or np.int64 depending on whatever a C
 long is (i.e. depending on the 64bitness of your CPU and how your OS
 chooses to deal with that). We encode those precision choices as aliases to
 the corresponding specific numpy scalar types (underscored as necessary to
 avoid shadowing builtins of the same name): np.float_ is np.float64, for
 example.

 See here for why the aliases to Python builtin types, np.int, np.float,
 etc. still exist:

 https://github.com/numpy/numpy/pull/6103#issuecomment-123652497

 If you just need to pass a dtype= argument and want the precision that
 matches the native integer and float for your platform, then I prefer to
 use the Python builtin types instead of the underscored aliases; they just
 look cleaner. If you need a true numpy scalar type (e.g. to construct a
 numpy scalar object), of course, you must use one of the numpy scalar
 types, and the underscored aliases are convenient for that. Never use the
 aliases to the Python builtin types.



(I don't have time to follow up on this for at least two weeks)

my thinking was that, if there is no actual difference between bool,
np.bool and np.bool_, the np.bool could become an alias and a replacement
for np.bool_, so we can get rid of a ugly trailing underscore.
If np.float is always float64 it could be mapped to that directly.

As the previous discussion on python int versus numpy int on python 3.x,
int is at least confusing.

Also I'm thinking that maybe adjusting the code to the (mis)interpretation,
instead of adjusting killing np.float completely might be nicer, (but
changing np.int would be riskier?)

Josef





 --
 Robert Kern

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] difference between dtypes

2015-07-23 Thread josef.pktd
Is there an explanation somewhere of what different basic dtypes mean,
across platforms and python versions?

 np.bool8
type 'numpy.bool_'
 np.bool_
type 'numpy.bool_'
 bool
type 'bool'


Are there any rules and recommendations or is it all folks lore?


I'm asking because my intuition picked up by osmosis might be off, and I
thought
https://github.com/scipy/scipy/pull/5076
is weird (i.e. counter intuitive).


Deprecation warnings are always a lot of fun, especially if
This log is too long to be displayed. Please reduce the verbosity of your
build or download the raw log.

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] floats for indexing, reshape - too strict ?

2015-07-02 Thread josef.pktd
On Thu, Jul 2, 2015 at 8:51 PM, Chris Barker - NOAA Federal 
chris.bar...@noaa.gov wrote:

 Sent from my iPhone

 
  The disadvantage I see is, that some weirder calculations would possible
  work most of the times, but not always,


   not sure if you can define a tolerance
  reasonable here unless it is exact.

 You could use a relative tolerance, but you'd still have to set that.
 Better to put that decision squarely in the user's hands.

  Though I guess you are right that
  `//` will also just round silently already.

 Yes, but if it's in the user's code, it should be obvious -- and then
 the user can choose to round, or floor, or ceiling


round, floor, ceil don't produce integers.

I'm writing library code, and I don't have control over what everyone does.

round, floor, ceil, and // might hide bugs or user mistakes, if we are
supposed to get something that is like an int but it's. 42.6 instead.

Josef
https://en.wikipedia.org/wiki/Phrases_from_The_Hitchhiker%27s_Guide_to_the_Galaxy#Answer_to_the_Ultimate_Question_of_Life.2C_the_Universe.2C_and_Everything_.2842.29




 -CHB

 
  - Sebastian
 
 
  for example
 
 
  5.0 == 5
  True
 
 
  np.ones(10 / 2)
  array([ 1.,  1.,  1.,  1.,  1.])
  10 / 2 == 5
  True
 
 
  or the python 2 version
 
 
  np.ones(10. / 2)
  array([ 1.,  1.,  1.,  1.,  1.])
  10. / 2 == 5
  True
 
 
  I'm using now 10 // 2, or int(10./2 + 1)   but this is unconditional
  and doesn't raise if the numbers are not close or equal to an integer
  (which would be a bug)
 
 
 
 
  Josef
 
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] floats for indexing, reshape - too strict ?

2015-07-01 Thread josef.pktd
About the deprecation warning for using another type than integers, in
ones, reshape, indexing and so on:

Wouldn't it be nicer to accept floats that are equal to an integer?

for example

 5.0 == 5
True

 np.ones(10 / 2)
array([ 1.,  1.,  1.,  1.,  1.])
 10 / 2 == 5
True

or the python 2 version

 np.ones(10. / 2)
array([ 1.,  1.,  1.,  1.,  1.])
 10. / 2 == 5
True

I'm using now 10 // 2, or int(10./2 + 1)   but this is unconditional and
doesn't raise if the numbers are not close or equal to an integer (which
would be a bug)


Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] floats for indexing, reshape - too strict ?

2015-07-01 Thread josef.pktd
On Wed, Jul 1, 2015 at 10:32 AM, Sebastian Berg sebast...@sipsolutions.net
wrote:

 On Mi, 2015-07-01 at 10:05 -0400, josef.p...@gmail.com wrote:
  About the deprecation warning for using another type than integers, in
  ones, reshape, indexing and so on:
 
 
  Wouldn't it be nicer to accept floats that are equal to an integer?
 

 Hmmm, the biggest point was that the old solution was to basically
 (besides strings) use `int(...)`, which means it does not raise any
 errors as you also mention.
 I am open to think about allowing exact floats for most of this
 (frankly, not advanced indexing at least for the moment, but we never
 did there), I think scipy may be doing that for some functions?

 The disadvantage I see is, that some weirder calculations would possible
 work most of the times, but not always, what I mean is such a case.
 A -- possibly silly -- example:

 In [8]: for i in range(10):
...: print i, i == i * 0.1 * 10
...:
 0 True
 1 True
 2 True
 3 False
 4 True
 5 True
 6 False
 7 False
 8 True
 9 True

 I am somewhat opposed to rounding a lot (i.e. not noticing if you got
 3. somewhere), so not sure if you can define a tolerance
 reasonable here unless it is exact. Though I guess you are right that
 `//` will also just round silently already.


Yes, I thought about this, something like `int_if_close` in analogy to
real_if_close would be useful.

However, given that we need to decide on a threshold in this case, I
thought it's overkill to put that into reshape, ones and indexing and
similar.

Simpler cases would work
number if triangular elements

 for i in range(10): print(i, i * (i - 1) / 2. == int(i * (i - 1) / 2.))

0 True
1 True
2 True
3 True
4 True
5 True
6 True
7 True
8 True
9 True

also np.ceil and np.trunc return floats, not integers.

One disadvantage of raising or warning after the equality check is that
developers have a tendency to write nice unit tests. Then the casting
doesn't break in the unit tests but might raise an exception at some random
data.


For reference: here are my changes in cleaning up
https://github.com/statsmodels/statsmodels/pull/2490/files


Josef






 - Sebastian

 
  for example
 
 
   5.0 == 5
  True
 
 
   np.ones(10 / 2)
  array([ 1.,  1.,  1.,  1.,  1.])
   10 / 2 == 5
  True
 
 
  or the python 2 version
 
 
   np.ones(10. / 2)
  array([ 1.,  1.,  1.,  1.,  1.])
   10. / 2 == 5
  True
 
 
  I'm using now 10 // 2, or int(10./2 + 1)   but this is unconditional
  and doesn't raise if the numbers are not close or equal to an integer
  (which would be a bug)
 
 
 
 
  Josef
 
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] annoying Deprecation warnings about non-integers

2015-06-30 Thread josef.pktd
I'm trying to fix some code in statsmodels that creates Deprecation
Warnings from numpy

Most of it are quite easy to fix, mainly cases where we use floats to avoid
integer division

I have two problems

first, I get Deprecation warnings in the test run that don't specify where
they happen.
I try to find them with file searches, but I don't see a `np.ones` that
might cause a problem
(needle in a haystack: Close to 4000 unittests and more than 100,000 lines
of numpython)
Also, I'm not sure the warnings are only from statsmodels, they could be in
numpy, scipy or pandas, couldn't they?


second, what's wrong with non-integers in `np.r_[[np.nan] * head, x,
[np.nan] * tail]` (see below)

I tried to set the warnings filter to `error` but then Python itself
errored right away.

https://travis-ci.org/statsmodels/statsmodels/jobs/68748936
https://github.com/statsmodels/statsmodels/issues/2480


Thanks for any clues

Josef


nosetests  -s --pdb-failures --pdb
M:\j\statsmodels\statsmodels_py34\statsmodels\tsa\tests

..C:\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\sit
e-packages\numpy\core\numeric.py:183: DeprecationWarning: using a
non-integer nu
mber instead of an integer will result in an error in the future
  a = empty(shape, dtype, order)
..


...M:\j\statsmodels\stat
smodels_py34\statsmodels\tsa\filters\filtertools.py:28: DeprecationWarning:
usin
g a non-integer number instead of an integer will result in an error in the
futu
re
  return np.r_[[np.nan] * head, x, [np.nan] * tail]
..


...C:\WinPython-64bit-3.4.3.1
\python-3.4.3.amd64\lib\site-packages\numpy\lib\twodim_base.py:231:
DeprecationW
arning: using a non-integer number instead of an integer will result in an
error
 in the future
  m = zeros((N, M), dtype=dtype)
C:\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\site-packages\numpy\l
ib\twodim_base.py:238: DeprecationWarning: using a non-integer number
instead of
 an integer will result in an error in the future
  m[:M-k].flat[i::M+1] = 1
...
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Python 3 and isinstance(np.int64(42), int)

2015-06-23 Thread josef.pktd
On Fri, Jun 19, 2015 at 4:15 PM, Chris Barker chris.bar...@noaa.gov wrote:

 On Wed, Jun 17, 2015 at 11:13 PM, Nathaniel Smith n...@pobox.com wrote:

  there's some
 argument that in Python, doing explicit type checks like this is
 usually a sign that one is doing something awkward,


 I tend to agree with that.

 On the other hand, numpy itself is kind-of sort-of statically typed. But
 in that case, if you need to know the type of an array -- check the array's
 dtype.

 Also:

   a = np.zeros(7, int)
   n = a[3]
   type(n)
 type 'numpy.int64'

 I Never liked declaring numpy arrays with the python types like int or
 float -- in numpy you usually care more about the type, so should simple
 use int64 if you want a 64 bit int. And float64 if you want a 64 bit
 float. Granted, pyton floats have always been float64 (on all platfroms??),
 and python ints used to a be a reasonable int type, but now that python
 ints are bigInts in py3, it really makes sense to be clear.

 And now that I think about it, in py2, int is 32 bit on win64 and 64 bit
 on *nix64 -- so you're really better off being explicit with your numpy
 arrays.



being late checking some examples

 a = np.zeros(7, int)
 a.dtype
dtype('int32')
 np.__version__
'1.9.2rc1'
 type(a[3])
class 'numpy.int32'


 a = np.zeros(7, int)
 a = np.array([88])
 a
array([88], dtype=int64)

 a = np.array([8])
 a
array([8], dtype=object)

 a = np.array([8], dtype=int)
Traceback (most recent call last):
  File pyshell#10, line 1, in module
a = np.array([8], dtype=int)
OverflowError: Python int too large to convert to C long


Looks like we need to be a bit more careful now.

Josef
Python 3.4.3




 -CHB


 --

 Christopher Barker, Ph.D.
 Oceanographer

 Emergency Response Division
 NOAA/NOS/ORR(206) 526-6959   voice
 7600 Sand Point Way NE   (206) 526-6329   fax
 Seattle, WA  98115   (206) 526-6317   main reception

 chris.bar...@noaa.gov

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Clarification sought on Scipy Numpy version requirements.

2015-06-19 Thread josef.pktd
On Fri, Jun 19, 2015 at 4:08 PM, Charles R Harris charlesr.har...@gmail.com
 wrote:

 Hi All,

 I'm looking to change some numpy deprecations into errors as well as
 remove some deprecated functions. The problem I see is that
 SciPy claims to support Numpy = 1.5 and Numpy 1.5 is really, really, old.
 So the question is, does support mean compiles with earlier versions
 of Numpy ? If that is the case there is very little that can be done about
 deprecation. OTOH, if it means Scipy can be compiled with more recent numpy
 versions but used with earlier Numpy versions (which is a good feat), I'd
 like to know. I'd also like to know what the interface requirements are, as
 I'd like to remove old_defines.h


numpy 1.6 I think is still accurate
https://github.com/scipy/scipy/pull/4265

As far as I know, you can never compile against a newer and run with an
older version.

We had the discussion recently about backwards versus forwards binary
compatibility

Josef





 Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] DEP: Deprecate boolean array indices with non-matching shape #4353

2015-06-05 Thread josef.pktd
On Fri, Jun 5, 2015 at 3:16 AM, Sebastian Berg sebast...@sipsolutions.net
wrote:

 On Do, 2015-06-04 at 18:04 -0700, Nathaniel Smith wrote:
  On Thu, Jun 4, 2015 at 5:57 PM, Nathaniel Smith n...@pobox.com wrote:
   So specifically the question is -- if you have an array with five
  items, and
   a Boolean array with three items, then currently you can use the
  later to
   index the former:
  
   arr = np.arange(5)
   mask = np.asarray([True, False, True])
   arr[mask] # returns array([0, 2])
  
   This is justified by the rule that indexing with a Boolean array
  should be
   the same as indexing with the same array that's been passed to
  np.nonzero().
   Empirically, though, this causes constant confusion and does not
  seen very
   useful, so the question is whether we should deprecate it.
 
  One place where the current behavior is particularly baffling and
  annoying is when you have multiple boolean masks in the same indexing
  operation. I think everyone would expect this to index separately on
  each axis (outer product indexing style, like slices do), and that's
  really the only useful interpretation, but that's not what it does...:


 This is not being deprecated in there for the moment, it is a different
 discussion. Though maybe we can improve the error message to mention
 that the array was originally boolean, has always been bugging me a bit
 (it used to mention for some cases it is not anymore).

 - Sebastian


  In [3]: a = np.arange(9).reshape((3, 3))
 
  In [4]: a
  Out[4]:
  array([[0, 1, 2],
 [3, 4, 5],
 [6, 7, 8]])
 
  In [6]: a[np.asarray([True, False, True]), np.asarray([False, True,
  True])]
  Out[6]: array([1, 8])
 
  In [7]: a[np.asarray([True, False, True]), np.asarray([False, False,
  True])]
  Out[7]: array([2, 8])
 
  In [8]: a[np.asarray([True, False, True]), np.asarray([True, True,
  True])]
 
 ---
  IndexErrorTraceback (most recent call
  last)
  ipython-input-8-30b3427bec2a in module()
   1 a[np.asarray([True, False, True]), np.asarray([True, True,
  True])]
 
  IndexError: shape mismatch: indexing arrays could not be broadcast
  together with shapes (2,) (3,)
 
 
  -n
 
  --
  Nathaniel J. Smith -- http://vorpus.org
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



What is actually being deprecated?
It looks like there are different examples.

wrong length: Nathaniels first example above, where the mask is not
broadcastable to original array because mask is longer or shorter than
shape[axis].
I also wouldn't have expected this to work, although I use np.nozero and
boolean mask indexing interchangeably, I would assume we need the correct
length for the mask.

The second case where the boolean mask has an extra dimension of length
one, or several boolean arrays might need more checking.
I'm pretty sure I used various version, assuming they are a feature, and
when I see arrays, I usually don't assume outer product indexing  (that
might lead to a similar discussion as the recent fancy versus orthogonal
indexing)


Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] DEP: Deprecate boolean array indices with non-matching shape #4353

2015-06-05 Thread josef.pktd
On Fri, Jun 5, 2015 at 11:50 AM, Anne Archibald archib...@astron.nl wrote:



 On Fri, Jun 5, 2015 at 5:45 PM Sebastian Berg sebast...@sipsolutions.net
 wrote:

 On Fr, 2015-06-05 at 08:36 -0400, josef.p...@gmail.com wrote:
 
 snip
 
  What is actually being deprecated?
  It looks like there are different examples.
 
 
  wrong length: Nathaniels first example above, where the mask is not
  broadcastable to original array because mask is longer or shorter than
  shape[axis].
  I also wouldn't have expected this to work, although I use np.nozero
  and boolean mask indexing interchangeably, I would assume we need the
  correct length for the mask.
 

 For the moment we are only talking about wrong length (along a given
 dimension). Not about wrong number of dimensions or multiple boolean
 indices.


 I am pro-deprecation then, definitely. I don't see a use case for padding
 a wrong-shaped boolean array with Falses, and the padding has burned me in
 the past.

 It's not orthogonal to the wrong-number-of-dimensions issue, though,
 because if your Boolean array has a dimension of length 1, broadcasting
 says duplicate it along that axis to match the indexee, and wrong-length
 says pad it with Falses. This ambiguity/pitfall disappears if the padding
 never happens, and that kind of broadcasting is very useful.


Good argument, now I understand why we only get a single column



 x = np.arange(4*5).reshape(4,5)
 mask = np.array([1,0,1,0,1], bool)

padding with False, this would also be deprecated AFAIU, and Anna pointed
out

 x[mask[:4][:,None]]
array([ 0, 10])
 x[mask[None,:]]
array([0, 2, 4])

masks can only be combined with slices, so no fancy masking allowed nor
defined (yet)

 x[mask[:4][:,None], mask[None,:]]
Traceback (most recent call last):
  File pyshell#31, line 1, in module
x[mask[:4][:,None], mask[None,:]]
IndexError: too many indices for array


I'm using 1d masks quite often to select rows or columns, which seems to
work in more than two dimensions
(Benjamin's surprise)

 x[:, mask]
array([[ 0,  2,  4],
   [ 5,  7,  9],
   [10, 12, 14],
   [15, 17, 19]])

 x[mask[:4][:,None] * mask[None,:]]
array([ 0,  2,  4, 10, 12, 14])
 x[:,:,None][mask[:4][:,None] * mask[None,:]]
array([[ 0],
   [ 2],
   [ 4],
   [10],
   [12],
   [14]])

Josef




 Anne

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] checking S versus U dtype

2015-06-01 Thread josef.pktd
What's the best way to check whether a numpy array is string or bytes on
python3?

using char?


 A = np.asarray([[1, 0, 0],['E', 1, 0],['E', 'E', 1]], dtype='U1')
 A
array([['1', '0', '0'],
   ['E', '1', '0'],
   ['E', 'E', '1']],
  dtype='U1')
 A.dtype
dtype('U1')
 A.dtype.char
'U'
 A.dtype.char == 'U'
True
 A.dtype.char == 'S'
False
 A.astype('S1').dtype.char == 'S'
True
 A.astype('S1').dtype.char == 'U'
False


background:
I don't know why sometimes I got S and sometimes U on Python 3.4, and I
want the code to work with both

 A == 'E'
array([[False, False, False],
   [ True, False, False],
   [ True,  True, False]], dtype=bool)

 A.astype('S1') == 'E'
False
 A.astype('S1') == b'E'
array([[False, False, False],
   [ True, False, False],
   [ True,  True, False]], dtype=bool)


Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-24 Thread josef.pktd
On Sun, May 24, 2015 at 9:08 AM, Alan G Isaac alan.is...@gmail.com wrote:

 On 5/24/2015 8:47 AM, Ralf Gommers wrote:
  Values only change if you leave out the call to seed()


 OK, but this claim seems to conflict with the following language:
 the global RandomState object should use the latest implementation of the
 methods.
 I take it that this is what Nathan meant by
 I think this is just a bug in the description of the proposal here, not
 in the proposal itself.

 So, is the correct phrasing
 the global RandomState object should use the latest implementation of the
 methods, unless explicitly seeded?


that's how I understand it.

I don't see any problems with the clarified proposal for the use cases that
I know of.

Can we choose the version also for the global random state, for example to
fix both version and seed in unit tests, with version  0?


BTW: I would expect that bug fixes are still exempt from backwards
compatibility.

fixing #5851 should be independent of the version, (without having looked
at the issue)

(If you need to replicate bugs, then use an old version of a package.)

Josef



 Thanks,
 Alan
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-24 Thread josef.pktd
On Sun, May 24, 2015 at 1:49 PM, Nathaniel Smith n...@pobox.com wrote:

 On May 24, 2015 8:43 AM, josef.p...@gmail.com wrote:
 
  Reminder: we are bottom or inline posting

 Can we stop hassling people about this? Inline replies are a great tool to
 have in your toolkit for complicated technical discussions, but I feel like
 our weird insistence on them has turned into a pointless and exclusionary
 thing. It's not like bottom replying is even any better -- the traditional
 mailing list rule is you trim quotes to just the part you're replying to
 (like this message); quoting the whole thing and replying underneath just
 to give people a bit of exercise for their scrolling finger would totally
 have gotten you flamed too.

 But email etiquette has moved on since the 90s, even regular posters to
 this list violate this rule all the time, it's time to let it go.


It's not a 90's thing and I learned about it around 2009 when I started in
here.
I find it very annoying trying to catch up with a longer thread and the
replies are all over the place.


Anne is a few years older than I in terms of numpy and scipy participation
and this was just intended to be a friendly reminder.

And as BTW: I'm glad Anne is back with scipy.


Josef



 -n

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-24 Thread josef.pktd
On Sun, May 24, 2015 at 11:13 AM, Anne Archibald archib...@astron.nl
wrote:

 Do we want a deprecation-like approach, so that eventually people who want
 replicability will specify versions, and everyone else gets bug fixes and
 improvements? This would presumably take several major versions, but it
 might avoid people getting unintentionally trapped on this version.

 Incidentally, bug fixes are complicated: if a bug fix uses more or fewer
 raw random numbers, it breaks repeatability not just for the call that got
 fixed but for all successive random number generations.


Reminder: we are bottom or inline posting





 Anne

 On Sun, May 24, 2015 at 5:04 PM josef.p...@gmail.com wrote:

 On Sun, May 24, 2015 at 9:08 AM, Alan G Isaac alan.is...@gmail.com
 wrote:

 On 5/24/2015 8:47 AM, Ralf Gommers wrote:
  Values only change if you leave out the call to seed()


 OK, but this claim seems to conflict with the following language:
 the global RandomState object should use the latest implementation of
 the methods.
 I take it that this is what Nathan meant by
 I think this is just a bug in the description of the proposal here, not
 in the proposal itself.

 So, is the correct phrasing
 the global RandomState object should use the latest implementation of
 the methods, unless explicitly seeded?


 that's how I understand it.

 I don't see any problems with the clarified proposal for the use cases
 that I know of.

 Can we choose the version also for the global random state, for example
 to fix both version and seed in unit tests, with version  0?


 BTW: I would expect that bug fixes are still exempt from backwards
 compatibility.

 fixing #5851 should be independent of the version, (without having
 looked at the issue)


I skimmed the issue.
In a strict sense it's not really a bug, the user doesn't get wrong
numbers, he or she gets Not A Number.

So there are no current usages that use the function in that range.

Josef




 (If you need to replicate bugs, then use an old version of a package.)

 Josef



 Thanks,
 Alan
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-24 Thread josef.pktd
On Sun, May 24, 2015 at 5:09 PM, Antony Lee antony@berkeley.edu wrote:

 2015-05-24 13:30 GMT-07:00 Sturla Molden sturla.mol...@gmail.com:

 On 24/05/15 10:22, Antony Lee wrote:

  Comments, and help for writing tests (in particular to make sure
  backwards compatibility is maintained) are welcome.

 I have one comment, and that is what makes random numbers so special?
 This applies to the rest of NumPy too, fixing a bug can sometimes change
 the output of a function.

 Personally I think we should only make guarantees about the data types,
 array shapes, and things like that, but not about the values. Those who
 need a particular version of NumPy for exact reproducibility should
 install the version of Python and NumPy they need. That is why virtual
 environments exist.


 I personally agree with this point of view (see original discussion in
 #5299, for example); if it was only up to me at least I'd make
 RandomState(seed) default to the latest version rather than the original
 one (whether to keep the old versions around is another question).  On the
 other hand, I see that this long-standing debate has prevented obvious
 improvements from being added sometimes for years (e.g. a patch for
 Ziggurat normal variates has been lying around since 2010), or led to
 potential API duplication in order to fix some clearly undesirable behavior
 (dirichlet returning nan being described as in a strict sense not really
 a bug(!)), so I'm willing to compromise to get this moving forward.



It's clearly a different kind of bug than some of the ones we fixed in
the past without backwards compatibility discussion where the distribution
was wrong, i.e. some values shifted so parts have more weight and parts
have less weight.

As I mentioned, I don't see any real problem with the proposal.

Josef




 Antony

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] binary wheels for numpy?

2015-05-15 Thread josef.pktd
On Fri, May 15, 2015 at 4:07 PM, Chris Barker chris.bar...@noaa.gov wrote:

 Hi folks.,

 I did a little intro to scipy session as part of a larger Python class
 the other day, and was dismayed to find that pip install numpy still
 dosn't work on Windows.

 Thanks mostly to Matthew Brett's work, the whole scipy stack is
 pip-installable on OS-X, it would be really nice if we had that for Windows.

 And no, saying you should go get Python(x,y) or Anaconda, or Canopy,
 or...) is really not a good solution. That is indeed the way to go if
 someone is primarily focusing on computational programming, but if you have
 a web developer, or someone new to Python for general use, they really
 should be able to just grab numpy and play around with it a bit without
 having to start all over again.


Unrelated to the pip/wheel discussion.

In my experience by far the easiest to get something running to play with
is using Winpython. Download and unzip (and maybe add to system path) and
most of the data analysis stack is available.

I haven't even bothered yet to properly install a full system python on
my Windows machine. I'm just working with 3 winpython. (One even has Julia
and IJulia included after following the installation instructions for a
short time.)

Josef





 My solution was to point folks to Chris Gohlke's site -- which is a
 Fabulous resource --

 THANK YOU CHRISTOPH!

 But I still think that we should have the basic scipy stack on PyPi as
 Windows Wheels...

 IIRC, the last run through on this discussion got stuck on the what
 hardware should it support -- wheels do not allow a selection at install
 time, so we'd have to decide what instruction set to support, and just
 stick with that. Which would mean that:

 some folks would get a numpy/scipy that would run a bit slower than it
 might
 and
 some folks would get one that wouldn't run at all on their machine.

 But I don't see any reason that we can't find a compromise here -- do a
 build that supports most machines, and be done with it. Even now, people
 have to go get (one way or another) a MKL-based build to get optimum
 performance anyway -- so if we pick an instruction set support by, say (an
 arbitrary, and impossible to determine) 95% of machines out there -- we're
 good to go.

 I take it there are licensing issues that prevent us from putting Chris'
 Binaries up on PyPi?

 But are there technical issues I'm forgetting here, or do we just need to
 come to a consensus as to hardware version to support and do it?

 -Chris







 --

 Christopher Barker, Ph.D.
 Oceanographer

 Emergency Response Division
 NOAA/NOS/ORR(206) 526-6959   voice
 7600 Sand Point Way NE   (206) 526-6329   fax
 Seattle, WA  98115   (206) 526-6317   main reception

 chris.bar...@noaa.gov

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] code snippet: assert all close or large

2015-04-30 Thread josef.pktd
Sorry, hit the wrong key

just an example that I think is not covered by numpy.testing assert

absolute tolerance for `inf`: assert x and y are allclose or x is large if
y is inf

On Thu, Apr 30, 2015 at 2:24 PM, josef.p...@gmail.com wrote:




def assert_allclose_large(x, y, rtol=1e-6, atol=0, ltol=1e30):

 assert x and y are allclose or x is large if y is inf



mask_inf = np.isinf(y)  ~np.isinf(x)

assert_allclose(x[~mask_inf], y[~mask_inf], rtol=rtol, atol=atol)

assert_array_less(ltol, x[mask_inf])


Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] code snippet: assert all close or large

2015-04-30 Thread josef.pktd

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-17 Thread josef.pktd
On Fri, Apr 17, 2015 at 10:07 AM, Sebastian Berg
sebast...@sipsolutions.net wrote:
 On Do, 2015-04-16 at 15:28 -0700, Matthew Brett wrote:
 Hi,

 snip

 So, how about a slight modification of your proposal?

 1) Raise deprecation warning for np.outer for non 1D arrays for a few
 versions, with depraction in favor of np.multiply.outer, then
 2) Raise error for np.outer on non 1D arrays


 I think that was Neil's proposal a bit earlier, too. +1 for it in any
 case, since at least for the moment I doubt outer is used a lot for non
 1-d arrays. Possible step 3) make it work on higher dims after a long
 period.

sounds ok to me

Some random comments of what I remember or guess in terms of usage

I think there are at most very few np.outer usages with 2d or higher dimension.
(statsmodels has two models that switch between 2d and 1d
parameterization where we don't use outer but it has similar
characteristics. However, we need to control the ravel order, which
IIRC is Fortran)

The current behavior of 0-D scalars in the initial post might be
useful if a numpy function returns a scalar instead of a 1-D array in
size=1. np.diag which is a common case, doesn't return a scalar (in my
version of numpy).

I don't know any use case where I would ever want to have the 2d
behavior of np.multiply.outer.
I guess we will or would have applications for outer along an axis,
for example if x.shape = (100, 10), then we have
x[:,None, :] * x[:, :, None] (I guess)
Something like this shows up reasonably often in econometrics as
Outer Product. However in most cases we can avoid constructing this
matrix and get the final results in a more memory efficient or faster
way.
(example an array of covariance matrices)

Josef





 - Sebastian


 Best,

 Matthew
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


  1   2   >