[Numpy-discussion] Re: Seeking feedback: design doc for `namedarray`, a lightweight array data structure with named dimensions

2023-12-01 Thread Benjamin Root
LArray might also be useful to look at. I think there was a time when it
didn't use pandas, but it does have it as a dependency now.

https://github.com/larray-project/larray

I think this would be a really useful endeavor. The CDF data model is
extremely useful, and adopting even a piece of it would bring great
benefits. I find it particularly useful in always having access to my
coordinates, especially when debugging a problem with my data, One thing
that can make things messy is with designing code to accept these as inputs
and outputs. Do you explicitly pass each of the data along with explicitly
passing in coordinate variables, or do you just let the coordinates "come
along for the ride"? If they come along implicitly, should functions
require additional parameters for the names of the coordinates, or should
the function require that the dimensions have particular names?

I've been using XArray since back when it was called "xray", and I still
don't have concrete answers to these questions. Hopefully, wider adoption
will help bring inspirations for better design principles.

Cheers!
Ben Root


On Fri, Dec 1, 2023 at 12:51 PM Dom Grigonis  wrote:

> I think this is the right place to mention `scipp` library.
>
> On 1 Dec 2023, at 17:36, Adrin  wrote:
>
> Some historical discussions on a namedarray on the scikit-learn side:
> https://github.com/scikit-learn/enhancement_proposals/pull/25
>
> Might be useful to y'all.
>
> On Fri, Oct 20, 2023 at 8:49 AM Dom Grigonis 
> wrote:
>
>> I would make use of it if it was also supporting pure-numpy indices too.
>> Pure-numpy n-dim array with indices is what I was after for a while now.
>> The reason is exactly that - to shed heavy dependencies as pandas and have
>> performance of pure numpy.
>>
>> Regards,
>> DG
>>
>> > On 20 Oct 2023, at 00:51, Anderson Banihirwe 
>> wrote:
>> >
>> > :wave:t5: folks, [there has been growing interest in a lightweight
>> array structure](https://github.com/pydata/xarray/issues/3981) that's in
>> the same vein as [xarray's Variable](
>> https://docs.xarray.dev/en/stable/generated/xarray.Variable.html). we've
>> put together a design doc for `namedarray`, and we could use your
>> feedback/input.
>> >
>> > ## what is `namedarray`?
>> >
>> > in essence, `namedarray` aims to be a lighter version of xarray's
>> Variable—shedding some of the heavier dependencies (e.g. Pandas) but still
>> retaining the goodness of named dimensions.
>> >
>> > ## what makes it special?
>> >
>> > * **Array Protocol Compatibility**: we are planning to make it
>> compatible with existing array protocols and the new [Python array API
>> standard](https://data-apis.org/array-api/latest/).
>> > * **Duck-Array Objects**: designed to wrap around multiple duck-array
>> objects, like NumPy, Dask, Sparse, Pint, CuPy, and PyTorch.
>> >
>> > ## why are we doing this?
>> >
>> > the goal is to bridge the gap between power and simplicity, providing a
>> lightweight alternative for scientific computing tasks that don't require
>> the full firepower of Xarray (`DataArray` and `Dataset`).
>> >
>> > ## share your thoughts
>> >
>> > We've put together a design doc that goes into the nitty-gritty of
>> `namedarray`. your insights could be invaluable in making this initiative a
>> success. please give it a read and share your thoughts [here](
>> https://github.com/pydata/xarray/discussions/8080)
>> >
>> > * **Design Doc**: [namedarray Design Document](
>> https://github.com/pydata/xarray/blob/main/design_notes/named_array_design_doc.md
>> )
>> >
>> > cross posting from [Scientifc Python Discourse](
>> https://discuss.scientific-python.org/t/seeking-feedback-design-doc-for-namedarray-a-lightweight-array-data-structure-with-named-dimensions/841
>> )
>> > ___
>> > NumPy-Discussion mailing list -- numpy-discussion@python.org
>> > To unsubscribe send an email to numpy-discussion-le...@python.org
>> > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>> > Member address: dom.grigo...@gmail.com
>>
>> ___
>> NumPy-Discussion mailing list -- numpy-discussion@python.org
>> To unsubscribe send an email to numpy-discussion-le...@python.org
>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>> Member address: adrin.jal...@gmail.com
>>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: dom.grigo...@gmail.com
>
>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: ben.v.r...@gmail.com
>
___

[Numpy-discussion] Re: Cirrus testing

2023-08-16 Thread Benjamin Root
With regards to scheduling wheel builds (and this may already be completely
obvious), but numpy should consider a scheduled build time that comes
reasonably before scipy's scheduled build time so that there is a quicker
turn-around time with possible breaking changes. If you schedule the numpy
wheel builds to be after scipy's, then it could be a whole week before
you'd find out about any breakages.

Cheers!
Ben Root


On Wed, Aug 16, 2023 at 7:12 AM Ralf Gommers  wrote:

>
>
> On Wed, Aug 16, 2023 at 5:01 AM Andrew Nelson  wrote:
>
>>
>> On Wed, 16 Aug 2023 at 10:51, Andrew Nelson  wrote:
>>
>>> There's a scipy issue on this that discusses how to reduce usage,
>>> https://github.com/scipy/scipy/issues/19006.
>>>
>>> Main points:
>>>
>>> - at the moment CI is run on PR and on Merge. Convert to only running on
>>> PR commits. I've just submitted a PR to do this for numpy.
>>>
>>
> Thanks! We shouldn't be running any CI on merge commits, other than for
> building and deploying new versions of the docs to
> https://numpy.org/devdocs/.
>
>
>> - add a manual trigger. Simple to achieve, but requires input from a
>>> maintainer.
>>>
>>
> I'd like to avoid that, things that require more actions from maintainers
> tend to be counterproductive.
>
>
>> - reduce wheel build frequency. At the moment I believe they're made
>>> every week. However, that decision has to factor in the increased frequency
>>> that may be desired as numpy2.0 is worked on.
>>>
>>
> We're doing the x86-64 wheel builds in GHA daily now, which is necessary.
> The aarch64/arm64 ones can be done less frequently, but once a week seems
> good at least in the run-up to 2.0.
>
>
>> Also, it's significantly more expensive to test on macOS M1 compared to
>> linux_aarch64. The latter isn't tested on cirrus. However, you could use
>> linux_aarch64 as a proxy for general ARM testing, and only run macOS when
>> necessary.
>>
>
> That sounds like a good idea to me. Or at least checking linux_aarch64
> first. I like how you configured that for SciPy.
>
> Also, now that we're on the topic of CI: I opened
> https://github.com/numpy/numpy/issues/24410 for a larger overhaul of our
> GitHub Actions jobs. I put that on the agenda for today's community
> meeting, but folks interested in CI may also want to comment on that issue.
>
> Cheers,
> Ralf
>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: ben.v.r...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-11 Thread Benjamin Root
After blinking and rubbing my eyes, I finally see what is meant by all of
this. I see now that the difference is that `cumsum0()` would return a
result that essentially have 0 be prepended to what would normally be the
result from `cumsum()`. From the description, I thought the "problem" was
that the summation starts from 1. Personally, I never really thought of
cumsum() as starting from index 1, so I didn't understand the problem as
stated.

So, I think some workshopping of the description is in order.

On Fri, Aug 11, 2023 at 1:53 PM Robert Kern  wrote:

> On Fri, Aug 11, 2023 at 1:47 PM Benjamin Root 
> wrote:
>
>> I'm really confused. Summing from zero should be what cumsum() does now.
>>
>> ```
>> >>> np.__version__
>> '1.22.4'
>> >>> np.cumsum([[1, 2, 3], [4, 5, 6]])
>> array([ 1,  3,  6, 10, 15, 21])
>> ```
>> which matches your example in the cumsum0() documentation. Did something
>> change in a recent release?
>>
>
> That's not what's in his example.
>
> --
> Robert Kern
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: ben.v.r...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-11 Thread Benjamin Root
I'm really confused. Summing from zero should be what cumsum() does now.

```
>>> np.__version__
'1.22.4'
>>> np.cumsum([[1, 2, 3], [4, 5, 6]])
array([ 1,  3,  6, 10, 15, 21])
```
which matches your example in the cumsum0() documentation. Did something
change in a recent release?

Ben Root

On Fri, Aug 11, 2023 at 8:55 AM Juan Nunez-Iglesias 
wrote:

> I'm very sensitive to the issues of adding to the already bloated numpy
> API, but I would definitely find use in this function. I literally made
> this error (thinking that the first element of cumsum should be 0) just a
> couple of days ago! What are the plans for the "extended" NumPy API after
> 2.0? Is there a good place for these variants?
>
> On Fri, 11 Aug 2023, at 2:07 AM, john.daw...@camlingroup.com wrote:
> > `cumsum` computes the sum of the first k summands for every k from 1.
> > Judging by my experience, it is more often useful to compute the sum of
> > the first k summands for every k from 0, as `cumsum`'s behaviour leads
> > to fencepost-like problems.
> > https://en.wikipedia.org/wiki/Off-by-one_error#Fencepost_error
> > For example, `cumsum` is not the inverse of `diff`. I propose adding a
> > function to NumPy to compute cumulative sums beginning with 0, that is,
> > an inverse of `diff`. It might be called `cumsum0`. The following code
> > is probably not the best way to implement it, but it illustrates the
> > desired behaviour.
> >
> > ```
> > def cumsum0(a, axis=None, dtype=None, out=None):
> > """
> > Return the cumulative sum of the elements along a given axis,
> > beginning with 0.
> >
> > cumsum0 does the same as cumsum except that cumsum computes the sum
> > of the first k summands for every k from 1 and cumsum, from 0.
> >
> > Parameters
> > --
> > a : array_like
> > Input array.
> > axis : int, optional
> > Axis along which the cumulative sum is computed. The default
> > (None) is to compute the cumulative sum over the flattened
> > array.
> > dtype : dtype, optional
> > Type of the returned array and of the accumulator in which the
> > elements are summed. If `dtype` is not specified, it defaults to
> > the dtype of `a`, unless `a` has an integer dtype with a
> > precision less than that of the default platform integer. In
> > that case, the default platform integer is used.
> > out : ndarray, optional
> > Alternative output array in which to place the result. It must
> > have the same shape and buffer length as the expected output but
> > the type will be cast if necessary. See
> > :ref:`ufuncs-output-type` for more details.
> >
> > Returns
> > ---
> > cumsum0_along_axis : ndarray.
> > A new array holding the result is returned unless `out` is
> > specified, in which case a reference to `out` is returned. If
> > `axis` is not None the result has the same shape as `a` except
> > along `axis`, where the dimension is smaller by 1.
> >
> > See Also
> > 
> > cumsum : Cumulatively sum array elements, beginning with the first.
> > sum : Sum array elements.
> > trapz : Integration of array values using the composite trapezoidal
> rule.
> > diff : Calculate the n-th discrete difference along given axis.
> >
> > Notes
> > -
> > Arithmetic is modular when using integer types, and no error is
> > raised on overflow.
> >
> > ``cumsum0(a)[-1]`` may not be equal to ``sum(a)`` for floating-point
> > values since ``sum`` may use a pairwise summation routine, reducing
> > the roundoff-error. See `sum` for more information.
> >
> > Examples
> > 
> > >>> a = np.array([[1, 2, 3], [4, 5, 6]])
> > >>> a
> > array([[1, 2, 3],
> >[4, 5, 6]])
> > >>> np.cumsum0(a)
> > array([ 0,  1,  3,  6, 10, 15, 21])
> > >>> np.cumsum0(a, dtype=float)  # specifies type of output value(s)
> > array([ 0.,  1.,  3.,  6., 10., 15., 21.])
> >
> > >>> np.cumsum0(a, axis=0)  # sum over rows for each of the 3 columns
> > array([[0, 0, 0],
> >[1, 2, 3],
> >[5, 7, 9]])
> > >>> np.cumsum0(a, axis=1)  # sum over columns for each of the 2 rows
> > array([[ 0,  1,  3,  6],
> >[ 0,  4,  9, 15]])
> >
> > ``cumsum(b)[-1]`` may not be equal to ``sum(b)``
> >
> > >>> b = np.array([1, 2e-9, 3e-9] * 100)
> > >>> np.cumsum0(b)[-1]
> > 100.0050045159
> > >>> b.sum()
> > 100.005029
> >
> > """
> > empty = a.take([], axis=axis)
> > zero = empty.sum(axis, dtype=dtype, keepdims=True)
> > later_cumsum = a.cumsum(axis, dtype=dtype)
> > return concatenate([zero, later_cumsum], axis=axis, dtype=dtype,
> out=out)
> > ```
> > ___
> > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > To unsubscribe send an email 

[Numpy-discussion] Re: Precision changes to sin/cos in the next release?

2023-05-31 Thread Benjamin Root
I think it is the special values aspect that is most concerning. Math is
just littered with all sorts of identities, especially with trig functions.
While I know that floating point calculations are imprecise, there are
certain properties of these functions that still held, such as going from
-1 to 1.

As a reference point on an M1 Mac using conda-forge:
```
>>> import numpy as np
>>> np.__version__
'1.24.3'
>>> np.sin(0.0)
0.0
>>> np.cos(0.0)
1.0
>>> np.sin(np.pi)
1.2246467991473532e-16
>>> np.cos(np.pi)
-1.0
>>> np.sin(2*np.pi)
-2.4492935982947064e-16
>>> np.cos(2*np.pi)
1.0
```

Not perfect, but still right in most places.

I'm ambivalent about reverting. I know I would love speed improvements
because transformation calculations in GIS is slow using numpy, but also
some coordinate transformations might break because of these changes.

Ben Root


On Wed, May 31, 2023 at 11:40 AM Charles R Harris 
wrote:

>
>
> On Wed, May 31, 2023 at 9:12 AM Robert Kern  wrote:
>
>> On Wed, May 31, 2023 at 10:40 AM Ralf Gommers 
>> wrote:
>>
>>>
>>>
>>> On Wed, May 31, 2023 at 4:19 PM Charles R Harris <
>>> charlesr.har...@gmail.com> wrote:
>>>


 On Wed, May 31, 2023 at 8:05 AM Robert Kern 
 wrote:

> I would much, much rather have the special functions in the `np.*`
> namespace be more accurate than fast on all platforms. These would not
> have been on my list for general purpose speed optimization. How much time
> is actually spent inside sin/cos even in a trig-heavy numpy program? And
> most numpy programs aren't trig-heavy, but the precision cost would be 
> paid
> and noticeable even for those programs. I would want fast-and-inaccurate
> functions to be strictly opt-in for those times that they really paid off.
> Probably by providing them in their own module or package rather than a
> runtime switch, because it's probably only a *part* of my program
> that needs that kind of speed and can afford that precision loss while
> there will be other parts that need the precision.
>
>
 I think that would be a good policy going forward.

>>>
>>> There's a little more to it than "precise and slow good", "fast == less
>>> accurate == bad". We've touched on this when SVML got merged (e.g., [1])
>>> and with other SIMD code, e.g. in the "Floating point precision
>>> expectations in NumPy" thread [2]. Even libm doesn't guarantee the best
>>> possible result of <0.5 ULP max error, and there are also considerations
>>> like whether any numerical errors are normally distributed around the exact
>>> mathematical answer or not (see, e.g., [3]).
>>>
>>
>> If we had a portable, low-maintenance, high-accuracy library that we
>> could vendorize (and its performance cost wasn't *horrid*), I might even
>> advocate that we do that. Reliance on platform libms is mostly about our
>> maintenance burden than a principled accuracy/performance tradeoff. My
>> preference *is* definitely firmly on the "precise and slow good" for
>> these ufuncs because of the role these ufuncs play in real numpy programs;
>> performance has a limited, situational effect while accuracy can have
>> substantial ones across the board.
>>
>> It seems fairly clear that with this recent change, the feeling is that
>>> the tradeoff is bad and that too much accuracy was lost, for not enough
>>> real-world gain. However, we now had several years worth of performance
>>> work with few complaints about accuracy issues.
>>>
>>
>> Except that we get a flurry of complaints now that they actually affect
>> popular platforms. I'm not sure I'd read much into a lack of complaints
>> before that.
>>
>>
>>> So I wouldn't throw out the baby with the bath water now and say that we
>>> always want the best accuracy only. It seems to me like we need a better
>>> methodology for evaluating changes. Contributors have been pretty careful,
>>> but looking back at SIMD PRs, there were usually detailed benchmarks but
>>> not always detailed accuracy impact evaluations.
>>>
>>
>> I've only seen micro-benchmarks testing the runtime of individual
>> functions, but maybe I haven't paid close enough attention. Have there been
>> any benchmarks on real(ish) *programs* that demonstrate what utility
>> these provide in even optimistic scenarios? I care precisely <1ULP about
>> the absolute performance of `np.sin()` on its own. There are definitely
>> programs that would care about that; I'm not sure any of them are (or
>> should be) written in Python, though.
>>
>>
> One of my takeaways is that there are special values where more care
> should be taken. Given the inherent inaccuracy of floating point
> computation, it can be argued that there should be no such expectation, but
> here we are. Some inaccuracies are more visible than others.
>
> I think it is less intrusive to have the option to lessen precision when
> more speed is needed than the other way around. Our experience is that most
> users are unsophisticated 

[Numpy-discussion] Re: np.bool_ vs Python bool behavior

2022-03-21 Thread Benjamin Root
Just as a quick note, I find it *very* common and handy to do something
like:

someCount = (x > 5).sum()

which requires implicit upcasting of np.bool_ to integer. Just making sure
that usecase isn't forgotten, as it had to be mentioned the last time this
subject came up.


On Mon, Mar 21, 2022 at 2:19 PM Sebastian Berg 
wrote:

> On Wed, 2022-03-16 at 18:14 +, Jacob Reinhold wrote:
> > Hi Sebastian and Chuck,
> >
> > Thanks for the response! (Sorry about the formatting in my original
> > post, I wasn't familiar with how to display code in this setting).
> >
> > I think keeping + as "logical or" and * as "logical and" on np.bool_
> > types is fine, although redundant given that | and & provide this
> > functionality and potentially misleading given the different behavior
> > from the native Python bool; however, I could see it being too
> > painful of a migration within v1.* numpy.
> >
> > I think my main point of contention is that division and
> > exponentiation aren't well defined operations on np.bool_, at least
> > as currently defined, and they should raise errors like subtraction.
> > Raising those errors would have caught the problem I ran into when
> > trying to taking the mean of multiple ndarrays of dtype=np.bool_. I'm
> > not sure what the realistic use case is to have division/exp. return
> > a float/int, especially when +/* return np.bool_ and subtraction
> > throws an error.
>
> Sorry for the slow followup.  Maybe aiming for that (or at least
> attempting it) can be formalized a bit easier.
> In principle, I do agree that we should error out in all of these
> cases.  Forcing the user to write `dtype=...` if they so wish.
>
> If we keep some of these (i.e. + and *), that change might not be very
> controversial (I am not sure).
>
> >
> > Sebastian, you stated:
> > "N.B.:  I have changed that logic. "Future" ufuncs are now reversed.
> > They will default to an error rather than using the `int8`
> > implementation."
> >
> > So is the division/exp. issue that I described with np.bool_ solved
> > in future releases?
> >
>
> No, unfortunately not.  It would be solved for future (new) ufuncs, but
> that doesn't necessary help us much.
>
> There is a bit of a parallel thing going on, due to us trying to get
> rid of value-based casting:
>
> np.uint8([1]) + np.int64(1000)  # should not return a uint16
>
> Once we pull that off, that new design may help.  Until then, it may
> make things a bit more confusing.
>
> However, I don't think that should stop us from going ahead.  It should
> not be a big hassle in practice.
>
>
> > Happy to help out on implementation/formalizing a proposal!
>
>
> The most formal thing would be to draft a (brief!) NEP:
>
> https://numpy.org/neps/nep-.html
>
> but I am not sure I want to ask for that quite yet :).  Maybe the
> decision isn't actually big enough to warrant a NEP at all.
>
> I have to think about the implementation (and if we start on a NEP, I
> can fill it in).  I suspect it is actually straight forward, so long we
> apply it to all ufuncs (even those in SciPy, etc.!).
>
> But there may well be some trickier parts I am missing right now.
>
> >
> > FWIW, I suppose you could change + to XOR. Then np.bool_ would be a
> > field (isomorphic to Z/2Z) and then you could reasonably define - and
> > /. (Although + would be equivalent to - and * would be equivalent to
> > /, which would probably be confusing to most users.)
> >
>
> Yeah, I think we removed `-` because it didn't even follow the Z/2Z
> behavior.
>
> Cheers,
>
> Sebastian
>
>
> > Best,
> > Jacob
> > ___
> > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > To unsubscribe send an email to numpy-discussion-le...@python.org
> > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > Member address: sebast...@sipsolutions.net
> >
>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: ben.v.r...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Conversion from C-layout to Fortran-layout in Cython

2021-11-10 Thread Benjamin Root
I have found that a bunch of lapack functions seem to have arguments for
stating whether or not the given arrays are C or F ordered. Then you
wouldn't need to worry about handling the layout yourself. For example, I
have some C++ code like so:

extern "C" {

/**
 * Forward declaration for LAPACK's Fortran dgemm function to allow use in
C/C++ code.
 *
 * This function is used for matrix multiplication between two arrays of
doubles.
 *
 * For complete reference:
http://www.netlib.org/lapack/explore-html/d1/d54/group__double__blas__level3_gaeda3cbd99c8fb834a60a6412878226e1.html
 */
void dgemm_(const char* TRANSA, const char* TRANSB, const int* M, const
int* N, const int* K,
const double* ALPHA, const double* A, const int* LDA, const double* B,
const int* LDB,
const double* BETA, double* C, const int* LDC);
}

...

dgemm_("C", "C", , , , , matrices.IW->data(),
,
inputs.data(), , , intermediate.data(), );

(in this case, I was using boost multiarrays, but the basic idea is the
same). IIRC, a bunch of other lapack functions had similar features.

I hope this is helpful.

Ben Root



On Wed, Nov 10, 2021 at 6:02 PM Ilhan Polat  wrote:

> I've asked this in Cython mailing list but probably I should also get some
> feedback here too.
>
> I have the following function defined in Cython and using flat memory
> pointers to hold n by n array data.
>
>
> cdef some_C_layout_func(double[:, :, ::1] Am) nogil: # ... cdef double *
> work1 = malloc(n*n*sizeof(double)) cdef double *work2 = 
> malloc(n*n*sizeof(double)) # ... # Lots of C-layout operations here # ...
> dgetrs('T', , , [0], , [0], [0], ,  )
> dcopy(, [0], , [0, 0, 0], ) free(...)
>
>
>
>
>
>
>
>
>
> Here, I have done everything in C layout with work1 and work2 but I have
> to convert work2 into Fortran layout to be able to solve AX = B. A can be
> transposed in Lapack internally via the flag 'T' so the only obstacle I
> have now is to shuffle work2 which holds B transpose in the eyes of Fortran
> since it is still in C layout.
>
> If I go naively and make loops to get one layout to the other that
> actually spoils all the speed benefits from this Cythonization due to cache
> misses. In fact 60% of the time is spent in that naive loop across the
> whole function. Same goes for the copy_fortran() of memoryviews.
>
> I have measured the regular NumPy np.asfortranarray()  and the performance
> is quite good enough compared to the actual linear solve. Hence whatever it
> is doing underneath I would like to reach out and do the same possibly via
> the C-API. But my C knowledge basically failed me around this line
> https://github.com/numpy/numpy/blob/8dbd507fb6c854b362c26a0dd056cd04c9c10f25/numpy/core/src/multiarray/multiarraymodule.c#L1817
>
> I have found the SO post from
> https://stackoverflow.com/questions/45143381/making-a-memoryview-c-contiguous-fortran-contiguous
> but I am not sure if that is the canonical way to do it in newer Python
> versions.
>
> Can anyone show me how to go about it without interacting with Python
> objects?
>
> Best,
> ilhan
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: ben.v.r...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: dtype=(bool) vs dtype=bool

2021-10-19 Thread Benjamin Root
(something) in python is only needed if you need to change the order of
precedence or if you need to split something across 2 or more lines.
Otherwise, it has no meaning and it is extraneous.

On Tue, Oct 19, 2021 at 9:42 AM  wrote:

> See the following testing in IPython shell:
>
> In [6]: import numpy as np
>
> In [7]: a = np.array([1], dtype=(bool))
>
> In [8]: b = np.array([1], dtype=bool)
>
> In [9]: a
> Out[9]: array([ True])
>
> In [10]: b
> Out[10]: array([ True])
>
> It seems that dtype=(bool) and dtype=bool are both correct usages. If so,
> which is preferable?
>
> Regards,
> HZ
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: ben.v.r...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


Re: [Numpy-discussion] copy="never" discussion and no deprecation cycle?

2021-06-23 Thread Benjamin Root
Why not both? The definition of the enum might live in a proper namespace
location, but I see no reason why `np.copy.IF_NEEDED =
np.flags.CopyFlgs.IF_NEEDED` can't be done (I mean, adding the enum members
as attributes to the `np.copy()` function). Seems perfectly reasonable to
me, and reads pretty nicely, too. It isn't like we are dropping support for
the booleans, so those are still around for easy typing.

Ben Root

On Wed, Jun 23, 2021 at 10:26 PM Stefan van der Walt 
wrote:

> On Wed, Jun 23, 2021, at 18:01, Juan Nunez-Iglesias wrote:
> > Personally I was a fan of the Enum approach. People dislike it because
> > it is not “Pythonic”, but imho that is an accident of history because
> > Enums only appeared (iirc) in Python 3.4. In fact, they are the right
> > data structure for this particular problem, so for my money we should
> > *make it* Pythonic by starting to use it everywhere where we have a
> > finite list of choices.
>
> The enum definitely feels like the right abstraction. But the resulting
> API is clunky because of naming and top-level scarcity.
>
> Hence the suggestion to tag it onto np.copy, but there is an argument to
> be made for consistency by placing all enums under np.flags or similar.
>
> Still, np.flags.copy.IF_NEEDED gets long.
>
> Stéfan
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] copy="never" discussion and no deprecation cycle?

2021-06-23 Thread Benjamin Root
> One reason was that Sebastian didn't like people doing `x.shape = ...`.
> Users do that, presumably, to trigger an error if a copy needs to be made.


Users do that because it is 1) easier than every other option, and 2) I am
pretty sure we were encouraged to do it this way for the past 10 years. The
whole "it won't copy" business (to me at least) was an added bonus. Most of
the time, I didn't want to copy anyway, so, sure!

`x.shape = ...` has been around for a long time, and you are going to have
a hard time convincing people to drop using such an easy-to-use property
setter in favor of an approach that adds more typing and takes a bit more
to read. There's also lots and lots of online tutorials, books, and
stackoverflow snippets that have this usage pattern. I think the horse has
long since left the barn, the chickens came to roost, and the cows came
home...


> We can fix Sebastian's issue by introducing a `copy` keyword to `reshape`,
> which currently has none:
>
>
This isn't a terrible idea to pursue, regardless of what I said above!
Explicit is better than implicit, and giving programmers the opportunity to
be explicit about what sort of copy semantics they intend in more places
would improve the library going forward.  I also like to highlight what
Chuck said a few posts ago about the fact that `copy=False` does not really
mean what people might think it means, and taking steps to address that
might also be good for the library.

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] EHN: Discusions about 'add numpy.topk'

2021-05-30 Thread Benjamin Root
to be honest, I read "topk" as "topeka", but I am weird. While numpy
doesn't use underscores all that much, I think this is one case where it
makes sense.

I'd also watch out for the use of the term "sorted", as it may mean
different things to different people, particularly with regards to what its
default value should be. I also find myself initially confused by the names
"largest" and "sorted", especially what should they mean with the "min-k"
behavior. I think Dask's use of negative k is very pythonic and would help
keep the namespace clean by avoiding the extra "min_k".

As for the indices, I am of two minds. On the one hand, I don't like
polluting the namespace with extra functions. On the other hand, having a
function that behaves differently based on a parameter is just fugly,
although we do have a function that does this - np.unique().

Ben Root

On Sun, May 30, 2021 at 8:22 AM Neal Becker  wrote:

> Topk is a bad choice imo.  I initially parsed it as to_pk, and had no idea
> what that was, although sounded a lot like a scipy signal function.
> Nlargest would be very obvious.
>
> On Sun, May 30, 2021, 7:50 AM Alan G. Isaac  wrote:
>
>> Mathematica and Julia both seem relevant here.
>> Mma has TakeLargest (and Wolfram tends to think hard about names).
>> https://reference.wolfram.com/language/ref/TakeLargest.html
>> Julia's closest comparable is perhaps partialsortperm:
>> https://docs.julialang.org/en/v1/base/sort/#Base.Sort.partialsortperm
>> Alan Isaac
>>
>>
>>
>> On 5/30/2021 4:40 AM, kang...@mail.ustc.edu.cn wrote:
>> > Hi, Thanks for reply, I present some details below:
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bad CRC errors when using np.savez, only sometimes though!

2021-05-14 Thread Benjamin Root
Isaac,

What I mean is that your bug might be similar to the savemat() bug that was
fixed in scipy in 2019. Completely different functions, but both functions
need to properly interact with zlib in order to work properly.

On Fri, May 14, 2021 at 10:22 AM Isaac Gerg  wrote:

> Hi Ben,  I am not sure.  However, in looking at the dates, it looks like
> that was fixed in scipy as of 2019.
>
> Would you recommend using the scipy save interface as opposed to the numpy
> one?
>
> On Fri, May 14, 2021 at 10:16 AM Benjamin Root 
> wrote:
>
>> Perhaps it is a similar bug as this one?
>> https://github.com/scipy/scipy/issues/6999
>>
>> Basically, it turned out that the CRC was getting computed on an
>> unflushed buffer, or something like that.
>>
>> On Fri, May 14, 2021 at 10:05 AM Isaac Gerg 
>> wrote:
>>
>>> I am using 1.19.5 on Windows 10 using Python 3.8.6 (tags/v3.8.6:db45529,
>>> Sep 23 2020, 15:52:53) [MSC v.1927 64 bit (AMD64)].
>>>
>>> I have two python processes running (i.e. no threads) which do
>>> independent processing jobs and NOT writing to the same directories.  Each
>>> process runs for 5-10 hours and then writes out a ~900MB npz file
>>> containing 4 arrays.
>>>
>>> When I go back to read in the npz files, I will sporadically get bad CRC
>>> errors which are related to npz using ziplib.  I cannot figure out why this
>>> is happening.  Looking through online forums, other folks have had CRC
>>> problems but they seem to be isolated to specifically using ziblib, not
>>> numpy.  I have found a few mentions though of ziplib causing headaches if
>>> the same file pointer is used across calls when one uses the file handle
>>> interface to ziblib as opposed to passing in a filename.'
>>>
>>> I have verified with 7zip that the files do in fact have a CRC error so
>>> its not an artifact of the ziblib.  I have also used the file handle
>>> interface to np.load and still get the error.
>>>
>>> Aside from writing my own numpy storage file container, I am stumped as
>>> to how to fix this, or reproduce this in a consistent manner.
>>> Any suggestions would be greatly appreciated!
>>>
>>> Thank you,
>>> Isaac
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bad CRC errors when using np.savez, only sometimes though!

2021-05-14 Thread Benjamin Root
Perhaps it is a similar bug as this one?
https://github.com/scipy/scipy/issues/6999

Basically, it turned out that the CRC was getting computed on an unflushed
buffer, or something like that.

On Fri, May 14, 2021 at 10:05 AM Isaac Gerg  wrote:

> I am using 1.19.5 on Windows 10 using Python 3.8.6 (tags/v3.8.6:db45529,
> Sep 23 2020, 15:52:53) [MSC v.1927 64 bit (AMD64)].
>
> I have two python processes running (i.e. no threads) which do
> independent processing jobs and NOT writing to the same directories.  Each
> process runs for 5-10 hours and then writes out a ~900MB npz file
> containing 4 arrays.
>
> When I go back to read in the npz files, I will sporadically get bad CRC
> errors which are related to npz using ziplib.  I cannot figure out why this
> is happening.  Looking through online forums, other folks have had CRC
> problems but they seem to be isolated to specifically using ziblib, not
> numpy.  I have found a few mentions though of ziplib causing headaches if
> the same file pointer is used across calls when one uses the file handle
> interface to ziblib as opposed to passing in a filename.'
>
> I have verified with 7zip that the files do in fact have a CRC error so
> its not an artifact of the ziblib.  I have also used the file handle
> interface to np.load and still get the error.
>
> Aside from writing my own numpy storage file container, I am stumped as to
> how to fix this, or reproduce this in a consistent manner.  Any suggestions
> would be greatly appreciated!
>
> Thank you,
> Isaac
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Is there a defined way to "unpad" an array, and if not, should there be?

2021-04-12 Thread Benjamin Root
Isn't that just slicing? Or perhaps you are looking for a way to simplify
the calculation of the slice arguments from the original pad arguments?

On Mon, Apr 12, 2021 at 4:15 PM Jeff Gostick  wrote:

> I often find myself padding an array to do some processing on it (i.e. to
> avoid edge artifacts), then I need to remove the padding.  I wish there
> was either a built in "unpad" function that accepted the same arguments as
> "pad", or that "pad" accepted negative numbers (e.g [-20, -4] would undo a
> padding of [20, 4]).  This seems like a pretty obvious feature to me so
> maybe I've just missed something, but I have looked through all the open
> and closed issues on github and don't see anything related to this.
>
>
> Jeff G
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] MAINT: Use of except-pass blocks

2021-04-06 Thread Benjamin Root
In both of those situations, the `pass` aspect makes sense, although they
probably should specify a better exception class to catch. The first one,
with the copyto() has a comment that explains what is goingon. The second
one, dealing with adding to the docstring, is needed because one can run
python in the "optimized" mode, which strips out docstrings.

On Tue, Apr 6, 2021 at 2:27 PM Michael Dubravski 
wrote:

> Hello everyone,
>
>
>
> There are multiple instances of except-pass blocks within the codebase
> that to my knowledge are bad practices (Referencing This StackOverflow
> Article
> .
> For example in numpy/ma/core.py there is an except-pass block that
> catches all exceptions thrown. Another example of this can be found in
> numpy/core/function_base.py. I was wondering if it would be a good idea
> to add some print statements for logging the exceptions caught. Also for
> cases where except-pass blocks are needed, is there an explanation for not
> logging exceptions?
>
>
>
>
> https://github.com/numpy/numpy/blob/914407d51b878bf7bf34dbd8dd72cc2dbc428673/numpy/ma/core.py#L1034-L1041
>
>
>
>
> https://github.com/numpy/numpy/blob/914407d51b878bf7bf34dbd8dd72cc2dbc428673/numpy/core/function_base.py#L461-L472
>
>
>
> Thanks,
>
> Michael Dubravski
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ENH: Proposal to add atleast_nd function

2021-02-11 Thread Benjamin Root
My original usecase for these was dealing with output data from Matlab
where those users would use `squeeze()` quite liberally. In addition, there
was the problem of the implicit squeeze() in the numpy's loadtxt() for
which I added the ndmin kwarg for in case an input CSV file had just one
row or no rows.

np.atleast_1d() is used in matplotlib in a bunch of places where inputs are
allowed to be scalar or lists.

On Thu, Feb 11, 2021 at 1:15 PM Stephan Hoyer  wrote:

> On Thu, Feb 11, 2021 at 9:42 AM Benjamin Root 
> wrote:
>
>> for me, I find that the at_least{1,2,3}d functions are useful for
>> sanitizing inputs. Having an at_leastnd() function can be viewed as a step
>> towards cleaning up the API, not cluttering it (although, deprecations of
>> the existing functions probably should be long given how long they have
>> existed).
>>
>
> I would love to see examples of this -- perhaps in matplotlib?
>
> My thinking is that in most cases it's probably a better idea to keep the
> interface simpler, and raise an error for lower-dimensional arrays.
> Automatic conversion is convenient (and endemic within the SciPy
> ecosystem), but is also a common source of bugs.
>
> On Thu, Feb 11, 2021 at 1:56 AM Stephan Hoyer  wrote:
>>
>>> On Wed, Feb 10, 2021 at 9:48 PM Juan Nunez-Iglesias 
>>> wrote:
>>>
>>>> I totally agree with the namespace clutter concern, but honestly, I
>>>> would use `atleast_nd` with its `pos` argument (I might rename it to
>>>> `position`, `axis`, or `axis_position`) any day over `at_least{1,2,3}d`,
>>>> for which I had no idea where the new axes would end up.
>>>>
>>>> So, I’m in favour of including it, and optionally deprecating
>>>> `atleast_{1,2,3}d`.
>>>>
>>>>
>>> I appreciate that `atleast_nd` feels more sensible than
>>> `at_least{1,2,3}d`, but I don't think "better" than a pattern we would not
>>> recommend is a good enough reason for inclusion in NumPy. It needs to stand
>>> on its own.
>>>
>>> What would be the recommended use-cases for this new function?
>>> Have any libraries building on top of NumPy implemented a version of
>>> this?
>>>
>>>
>>>> Juan.
>>>>
>>>> On 11 Feb 2021, at 9:48 am, Sebastian Berg 
>>>> wrote:
>>>>
>>>> On Wed, 2021-02-10 at 17:31 -0500, Joseph Fox-Rabinovitz wrote:
>>>>
>>>> I've created PR#18386 to add a function called atleast_nd to numpy and
>>>> numpy.ma. This would generalize the existing atleast_1d, atleast_2d,
>>>> and
>>>> atleast_3d functions.
>>>>
>>>> I proposed a similar idea about four and a half years ago:
>>>> https://mail.python.org/pipermail/numpy-discussion/2016-July/075722.html
>>>> ,
>>>> PR#7804. The reception was ambivalent, but a couple of folks have asked
>>>> me
>>>> about this, so I'm bringing it back.
>>>>
>>>> Some pros:
>>>>
>>>> - This closes issue #12336
>>>> - There are a couple of Stack Overflow questions that would benefit
>>>> - Been asked about this a couple of times
>>>> - Implementation of three existing atleast_*d functions gets easier
>>>> - Looks nicer that the equivalent broadcasting and reshaping
>>>>
>>>> Some cons:
>>>>
>>>> - Cluttering up the API
>>>> - Maintenance burden (but not a big one)
>>>> - This is just a utility function, which can be achieved through
>>>> broadcasting and reshaping
>>>>
>>>>
>>>> My main concern would be the namespace cluttering. I can't say I use
>>>> even the `atleast_2d` etc. functions personally, so I would tend to be
>>>> slightly against the addition. But if others land on the "useful" side here
>>>> (and it seemed a bit at least on github), I am also not opposed.  It is a
>>>> clean name that lines up with existing ones, so it doesn't seem like a big
>>>> "mental load" with respect to namespace cluttering.
>>>>
>>>> Bike shedding the API is probably a good idea in any case.
>>>>
>>>> I have pasted the current PR documentation (as html) below for quick
>>>> reference. I wonder a bit about the reasoning for having `pos` specify a
>>>> value rather than just a side?
>>>>
>>>>
>>>>
>>>> numpy.atleast_nd(*ary*, *ndim*, *pos=0*)
>>>> View 

Re: [Numpy-discussion] ENH: Proposal to add atleast_nd function

2021-02-11 Thread Benjamin Root
for me, I find that the at_least{1,2,3}d functions are useful for
sanitizing inputs. Having an at_leastnd() function can be viewed as a step
towards cleaning up the API, not cluttering it (although, deprecations of
the existing functions probably should be long given how long they have
existed).

On Thu, Feb 11, 2021 at 1:56 AM Stephan Hoyer  wrote:

> On Wed, Feb 10, 2021 at 9:48 PM Juan Nunez-Iglesias 
> wrote:
>
>> I totally agree with the namespace clutter concern, but honestly, I would
>> use `atleast_nd` with its `pos` argument (I might rename it to `position`,
>> `axis`, or `axis_position`) any day over `at_least{1,2,3}d`, for which I
>> had no idea where the new axes would end up.
>>
>> So, I’m in favour of including it, and optionally deprecating
>> `atleast_{1,2,3}d`.
>>
>>
> I appreciate that `atleast_nd` feels more sensible than
> `at_least{1,2,3}d`, but I don't think "better" than a pattern we would not
> recommend is a good enough reason for inclusion in NumPy. It needs to stand
> on its own.
>
> What would be the recommended use-cases for this new function?
> Have any libraries building on top of NumPy implemented a version of this?
>
>
>> Juan.
>>
>> On 11 Feb 2021, at 9:48 am, Sebastian Berg 
>> wrote:
>>
>> On Wed, 2021-02-10 at 17:31 -0500, Joseph Fox-Rabinovitz wrote:
>>
>> I've created PR#18386 to add a function called atleast_nd to numpy and
>> numpy.ma. This would generalize the existing atleast_1d, atleast_2d, and
>> atleast_3d functions.
>>
>> I proposed a similar idea about four and a half years ago:
>> https://mail.python.org/pipermail/numpy-discussion/2016-July/075722.html,
>> PR#7804. The reception was ambivalent, but a couple of folks have asked me
>> about this, so I'm bringing it back.
>>
>> Some pros:
>>
>> - This closes issue #12336
>> - There are a couple of Stack Overflow questions that would benefit
>> - Been asked about this a couple of times
>> - Implementation of three existing atleast_*d functions gets easier
>> - Looks nicer that the equivalent broadcasting and reshaping
>>
>> Some cons:
>>
>> - Cluttering up the API
>> - Maintenance burden (but not a big one)
>> - This is just a utility function, which can be achieved through
>> broadcasting and reshaping
>>
>>
>> My main concern would be the namespace cluttering. I can't say I use even
>> the `atleast_2d` etc. functions personally, so I would tend to be slightly
>> against the addition. But if others land on the "useful" side here (and it
>> seemed a bit at least on github), I am also not opposed.  It is a clean
>> name that lines up with existing ones, so it doesn't seem like a big
>> "mental load" with respect to namespace cluttering.
>>
>> Bike shedding the API is probably a good idea in any case.
>>
>> I have pasted the current PR documentation (as html) below for quick
>> reference. I wonder a bit about the reasoning for having `pos` specify a
>> value rather than just a side?
>>
>>
>>
>> numpy.atleast_nd(*ary*, *ndim*, *pos=0*)
>> View input as array with at least ndim dimensions.
>> New unit dimensions are inserted at the index given by *pos* if
>> necessary.
>> Parameters*ary  *array_like
>> The input array. Non-array inputs are converted to arrays. Arrays that
>> already have ndim or more dimensions are preserved.
>> *ndim  *int
>> The minimum number of dimensions required.
>> *pos  *int, optional
>> The index to insert the new dimensions. May range from -ary.ndim - 1 to
>> +ary.ndim (inclusive). Non-negative indices indicate locations before
>> the corresponding axis: pos=0 means to insert at the very beginning.
>> Negative indices indicate locations after the corresponding axis: pos=-1 
>> means
>> to insert at the very end. 0 and -1 are always guaranteed to work. Any
>> other number will depend on the dimensions of the existing array. Default
>> is 0.
>> Returns*res  *ndarray
>> An array with res.ndim >= ndim. A view is returned for array inputs.
>> Dimensions are prepended if *pos* is 0, so for example, a 1-D array of
>> shape (N,) with ndim=4becomes a view of shape (1, 1, 1, N). Dimensions
>> are appended if *pos* is -1, so for example a 2-D array of shape (M, N) 
>> becomes
>> a view of shape (M, N, 1, 1)when ndim=4.
>> *See also*
>> atleast_1d
>> 
>> , atleast_2d
>> 
>> , atleast_3d
>> 
>> *Notes*
>> This function does not follow the convention of the other atleast_*d 
>> functions
>> in numpy in that it only accepts a single array argument. To process
>> multiple arrays, use a comprehension or loop around the function call. See
>> examples below.
>> Setting pos=0 is equivalent to how the array would be interpreted by
>> numpy’s 

Re: [Numpy-discussion] Comment published in Nature Astronomy about The ecological impact of computing with Python

2020-11-24 Thread Benjamin Root
Digressing here, but the ozone hole over the antarctic was always going to
take time to recover because of the approximately 50 year residence time of
the CFCs in the upper atmosphere. Cold temperatures can actually speed up
depletion because of certain ice crystal formations that give a boost in
the CFC+sunlight+O3 reaction rate. Note that it doesn't mean that 50 years
are needed to get rid of all CFCs in the atmosphere, it is just a measure
of the amount of time it is expected to take for half of the gas that is
already there to be removed. That doesn't account for the amount of time it
has taken for CFC usage to drop in the first place, and the fact that there
are still CFC pollution occurring (albeit far less than in the 80's).

Ben Root

https://ozone.unep.org/nasa-provides-first-direct-evidence-ozone-hole-recovery
https://csl.noaa.gov/assessments/ozone/1998/faq11.html


On Tue, Nov 24, 2020 at 2:07 PM Charles R Harris 
wrote:

>
>
> On Tue, Nov 24, 2020 at 11:54 AM Benjamin Root 
> wrote:
>
>>
>> Given that AWS and Azure have both made commitments to have their data
>> centers be carbon neutral, and given that electricity and heat production
>> make up ~25% of GHG pollution, I find these sorts of
>> power-usage-analysis-for-the-sake-of-the-environment to be a bit
>> disingenuous. Especially since GHG pollution from power generation is
>> forecasted to shrink as more power is generated by alternative means. I am
>> fine with improving python performance, but let's not fool ourselves into
>> thinking that it is going to have any meaningful impact on the environment.
>>
>> Ben Root
>>
>
> Bingo. I lived through the Freon ozone panic that lasted for 20 years even
> after the key reaction rate was remeasured and found to be 75-100 times
> slower than that used in the research that started the panic. The models
> never recovered, but the panic persisted until it magically disappeared in
> 1994. There are still ozone holes over the Antarctic, last time I looked
> they were explained as due to an influx of cold air.
>
> If you want to deal with GHG, push nuclear power.
>
> 
>
> Chuck
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Comment published in Nature Astronomy about The ecological impact of computing with Python

2020-11-24 Thread Benjamin Root
Given that AWS and Azure have both made commitments to have their data
centers be carbon neutral, and given that electricity and heat production
make up ~25% of GHG pollution, I find these sorts of
power-usage-analysis-for-the-sake-of-the-environment to be a bit
disingenuous. Especially since GHG pollution from power generation is
forecasted to shrink as more power is generated by alternative means. I am
fine with improving python performance, but let's not fool ourselves into
thinking that it is going to have any meaningful impact on the environment.

Ben Root

https://sustainability.aboutamazon.com/environment/the-cloud?energyType=true
https://azure.microsoft.com/en-au/global-infrastructure/sustainability/#energy-innovations
https://www.epa.gov/ghgemissions/global-greenhouse-gas-emissions-data

On Tue, Nov 24, 2020 at 1:25 PM Sebastian Berg 
wrote:

> On Tue, 2020-11-24 at 18:41 +0100, Jerome Kieffer wrote:
> > Hi Pierre,
> >
> > I agree with your point of view: the author wants to demonstrate C++
> > and Fortran are better than Python... and environmentally speaking he
> > has some evidences.
> >
> > We develop with Python, Cython, Numpy, and OpenCL and what annoys me
> > most is the compilation time needed for the development of those
> > statically typed ahead of time extensions (C++, C, Fortran).
> >
> > Clearly the author wants to get his article viral and in a sense he
> > managed :). But he did not mention Julia / Numba and other JIT
> > compiled
> > languages (including matlab ?) that are probably outperforming the
> > C++ / Fortran when considering the development time and test-time.
> > Beside this the OpenMP parallelism (implicitly advertized) is far
> > from
> > scaling well on multi-socket systems and other programming paradigms
> > are needed to extract the best performances from spercomputers.
> >
>
> As an interesting aside: Algorithms may have actually improved *more*
> than computational speed when it comes to performance [1].  That shows
> the impressive scale and complexity of efficient code.
>
> So, I could possibly argue that the most important thing may well be
> accessibility of algorithms. And I think that is what a large chunk of
> Scientific Python packages are all about.
>
> Whether or not that has an impact on the environment...
>
> Cheers,
>
> Sebastian
>
>
> [1] This was the first resource I found, I am sure there are plenty:
> https://www.lanl.gov/conferences/salishan/salishan2004/womble.pdf
>
>
> > Cheers,
> >
> > Jerome
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy doesn't use RAM

2020-03-24 Thread Benjamin Root
Another thing to point out about having an array of that percentage of the
available memory is that it severely restricts what you can do with it.
Since you are above 50% of the available memory, you won't be able to
create another array that would be the result of computing something with
that array. So, you are restricted to querying (which you could do without
having everything in-memory), or in-place operations.

Dask arrays might be what you are really looking for.

Ben Root

On Tue, Mar 24, 2020 at 2:18 PM Sebastian Berg 
wrote:

> On Tue, 2020-03-24 at 13:59 -0400, Keyvis Damptey wrote:
> > Hi Numpy dev community,
> >
> > I'm keyvis, a statistical data scientist.
> >
> > I'm currently using numpy in python 3.8.2 64-bit for a clustering
> > problem,
> > on a machine with 1.9 TB RAM. When I try using np.zeros to create a
> > 600,000
> > by 600,000 matrix of dtype=np.float32 it says
> > "Unable to allocate 1.31 TiB for an array with shape (60, 60)
> > and
> >
> > data type float32"
> >
>
> If this error happens, allocating the memory failed. This should be
> pretty much a simple `malloc` call in C, so this is the kernel
> complaining, not Python/NumPy.
>
> I am not quite sure, but maybe memory fragmentation plays its part, or
> simply are actually out of memory for that process, 1.44TB is a
> significant portion of the total memory after all.
>
> Not sure what to say, but I think you should probably look into other
> solutions, maybe using HDF5, zarr, or memory-mapping (although I am not
> sure the last actually helps). It will be tricky to work with arrays of
> a size that is close to the available total memory.
>
> Maybe someone who works more with such data here can give you tips on
> what projects can help you or what solutions to look into.
>
> - Sebastian
>
>
>
> > I used psutils to determine how much RAM python thinks it has access
> > to and
> > it return with 1.8 TB approx.
> >
> > Is there some way I can fix numpy to create these large arrays?
> > Thanks for your time and consideration
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new MaskedArray class

2019-06-22 Thread Benjamin Root
"""Third, in the old np.ma.MaskedArray masked positions are very often
"effectively" clobbered, in the sense that they are not computed. For
example, if you do "c = a+b", and then change the mask of c"""

My use-cases don't involve changing the mask of "c". It would involve
changing the mask of "a" or "b" after I have calculated "c", so that I
could calculate "d". As a fairly simple example, I frequently work with
satellite data. We have multiple masks, such as water, vegetation, sandy
loam, bare rock, etc. The underlying satellite data in any of these places
isn't bad, they just need to be dealt with differently.  I wouldn't want
the act of applying a mask for a set of calculations on things that aren't
bare rock to mess up my subsequent calculation on things that aren't water.
Right now, I have to handle this explicitly with flattened sparse arrays,
which makes visualization and conception difficult.

Ben Root

On Sat, Jun 22, 2019 at 11:51 AM Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> Hi Allan,
>
> I'm not sure I would go too much by what the old MaskedArray class did. It
> indeed made an effort not to overwrite masked values with a new result,
> even to the extend of copying back masked input data elements to the output
> data array after an operation. But the fact that this is non-sensical if
> the dtype changes (or the units in an operation on quantities) suggests
> that this mental model simply does not work.
>
> I think a sensible alternative mental model for the MaskedArray class is
> that all it does is forward any operations to the data it holds and
> separately propagate a mask, ORing elements together for binary operations,
> etc., and explicitly skipping masked elements in reductions (ideally using
> `where` to be as agnostic as possible about the underlying data, for which,
> e.g., setting masked values to `0` for `np.reduce.add` may or may not be
> the right thing to do - what if they are string?).
>
> With this mental picture, the underlying data are always have well-defined
> meaning: they have been operated on as if the mask did not exist. There
> then is also less reason to try to avoid getting it back to the user.
>
> As a concrete example (maybe Ben has others): in astropy we have a
> sigma-clipping average routine, which uses a `MaskedArray` to iteratively
> mask items that are too far off from the mean; here, the mask varies each
> iteration (an initially masked element can come back into play), but the
> data do not.
>
> All the best,
>
> Marten
>
> On Sat, Jun 22, 2019 at 10:54 AM Allan Haldane 
> wrote:
>
>> On 6/21/19 2:37 PM, Benjamin Root wrote:
>> > Just to note, data that is masked isn't always garbage. There are plenty
>> > of use-cases where one may want to temporarily apply a mask for a set of
>> > computation, or possibly want to apply a series of different masks to
>> > the data. I haven't read through this discussion deeply enough, but is
>> > this new class going to destroy underlying masked data? and will it be
>> > possible to swap out masks?
>> >
>> > Cheers!
>> > Ben Root
>>
>> Indeed my implementation currently feels free to clobber the data at
>> masked positions and makes no guarantees not to.
>>
>> I'd like to try to support reasonable use-cases like yours though. A few
>> thoughts:
>>
>> First, the old np.ma.MaskedArray explicitly does not promise to preserve
>> masked values, with a big warning in the docs. I can't recall the
>> examples, but I remember coming across cases where clobbering happens.
>> So arguably your behavior was never supported, and perhaps this means
>> that no-clobber behavior is difficult to reasonably support.
>>
>> Second, the old np.ma.MaskedArray avoids frequent clobbering by making
>> lots of copies. Therefore, in most cases you will not lose any
>> performance in my new MaskedArray relative to the old one by making an
>> explicit copy yourself. I.e, is it problematic to have to do
>>
>>  >>> result = MaskedArray(data.copy(), trial_mask).sum()
>>
>> instead of
>>
>>  >>> marr.mask = trial_mask
>>  >>> result = marr.sum()
>>
>> since they have similar performance?
>>
>> Third, in the old np.ma.MaskedArray masked positions are very often
>> "effectively" clobbered, in the sense that they are not computed. For
>> example, if you do "c = a+b", and then change the mask of c, the values
>> at masked position of the result of (a+b) do not correspond to the sum
>> of the masked values i

Re: [Numpy-discussion] new MaskedArray class

2019-06-21 Thread Benjamin Root
Just to note, data that is masked isn't always garbage. There are plenty of
use-cases where one may want to temporarily apply a mask for a set of
computation, or possibly want to apply a series of different masks to the
data. I haven't read through this discussion deeply enough, but is this new
class going to destroy underlying masked data? and will it be possible to
swap out masks?

Cheers!
Ben Root


On Thu, Jun 20, 2019 at 12:44 PM Allan Haldane 
wrote:

> On 6/19/19 10:19 PM, Marten van Kerkwijk wrote:
> > Hi Allan,
> >
> > This is very impressive! I could get the tests that I wrote for my class
> > pass with yours using Quantity with what I would consider very minimal
> > changes. I only could not find a good way to unmask data (I like the
> > idea of setting the mask on some elements via `ma[item] = X`); is this
> > on purpose?
>
> Yes, I want to make it difficult for the user to access the garbage
> values under the mask, which are often clobbered values. The only way to
> "remove" a masked value is by replacing it with a new non-masked value.
>
>
> > Anyway, it would seem easily at the point where I should comment on your
> > repository rather than in the mailing list!
>
> To make further progress on this encapsulation idea I need a more
> complete ducktype to pass into MaskedArray to test, so that's what I'll
> work on next, when I have time. I'll either try to finish my
> ArrayCollection type, or try making a simple NDunit ducktype
> piggybacking on astropy's Unit.
>
> Best,
> Allan
>
>
> >
> > All the best,
> >
> > Marten
> >
> >
> > On Wed, Jun 19, 2019 at 5:45 PM Allan Haldane  > > wrote:
> >
> > On 6/18/19 2:04 PM, Marten van Kerkwijk wrote:
> > >
> > >
> > > On Tue, Jun 18, 2019 at 12:55 PM Allan Haldane
> > mailto:allanhald...@gmail.com>
> > > >>
> > wrote:
> > > 
> > >
> > > > This may be too much to ask from the initializer, but, if
> > so, it still
> > > > seems most useful if it is made as easy as possible to do,
> > say, `class
> > > > MaskedQuantity(Masked, Quantity): `.
> > >
> > > Currently MaskedArray does not accept ducktypes as underlying
> > arrays,
> > > but I think it shouldn't be too hard to modify it to do so.
> > Good idea!
> > >
> > >
> > > Looking back at my trial, I see that I also never got to duck
> arrays -
> > > only ndarray subclasses - though I tried to make the code as
> > agnostic as
> > > possible.
> > >
> > > (Trial at
> > >
> >
> https://github.com/astropy/astropy/compare/master...mhvk:utils-masked-class?expand=1
> )
> > >
> > > I already partly navigated this mixin-issue in the
> > > "MaskedArrayCollection" class, which essentially does
> > > ArrayCollection(MaskedArray(array)), and only takes about 30
> > lines of
> > > boilerplate. That's the backwards encapsulation order from
> > what you want
> > > though.
> > >
> > >
> > > Yes, indeed, from a quick trial `MaskedArray(np.arange(3.) * u.m,
> > > mask=[True, False, False])` does indeed not have a `.unit`
> attribute
> > > (and cannot represent itself...); I'm not at all sure that my
> > method of
> > > just creating a mixed class is anything but a recipe for disaster,
> > though!
> >
> > Based on your suggestion I worked on this a little today, and now my
> > MaskedArray more easily encapsulates both ducktypes and ndarray
> > subclasses (pushed to repo). Here's an example I got working with
> masked
> > units using unyt:
> >
> > [1]: from MaskedArray import X, MaskedArray, MaskedScalar
> >
> > [2]: from unyt import m, km
> >
> > [3]: import numpy as np
> >
> > [4]: uarr = MaskedArray([1., 2., 3.]*km, mask=[0,1,0])
> >
> > [5]: uarr
> >
> > MaskedArray([1., X , 3.])
> > [6]: uarr + 1*m
> >
> > MaskedArray([1.001, X, 3.001])
> > [7]: uarr.filled()
> >
> > unyt_array([1., 0., 3.], 'km')
> > [8]: np.concatenate([uarr, 2*uarr]).filled()
> > unyt_array([1., 0., 3., 2., 0., 6.], '(dimensionless)')
> >
> > The catch is the ducktype/subclass has to rigorously follow numpy's
> > indexing rules, including distinguishing 0d arrays from scalars. For
> now
> > only I used unyt in the example above since it happens to be less
> strict
> >  about dimensionless operations than astropy.units which trips up my
> > repr code. (see below for example with astropy.units). Note in the
> last
> > line I lost the dimensions, but that is because unyt does not handle
> > np.concatenate. To get that to work we need a true ducktype for
> units.
> >
> > The example above doesn't expose the ".units" attribute outside the
> > MaskedArray, and it doesn't print the units in the repr. But you can
> > access them using "filled".
> >

Re: [Numpy-discussion] Was the range() function ever created?

2019-05-24 Thread Benjamin Root
pandas is not built on numpy (at least, not anymore), but borrows a lot of
inspirations from numpy, and interacts with numpy fairly well. As part of
the scipy ecosystem, we all work together to improve interoperability and
features.

python's built-in range() function has been there long before numpy came on
the scene, so it just made sense to adopt that name since it was the way to
generate numbers in python.

Ben

On Fri, May 24, 2019 at 10:44 PM C W  wrote:

> When I looked up pandas mailing list. Numpy showed up. Maybe is because
> Pandas is built on Numpy? My apologies.
>
> Yes, please do. For people with statistical background, but not CS. It
> seems strange the *real* range() function is used to generate natural
> numbers.
>
> Thanks, Ben!
>
>
>
> On Fri, May 24, 2019 at 10:34 PM Benjamin Root 
> wrote:
>
>> This is the numpy discussion list, not the pandas discussion list. Now,
>> for numpy's part, I have had hankerings for a `np.minmax()` ufunc, but
>> never enough to get over just calling min and max on my data separately.
>>
>> On Fri, May 24, 2019 at 10:27 PM C W  wrote:
>>
>>> Hello all,
>>>
>>> I am want to calculate the range of a vector. I saw that someone asked
>>> for range() in 2011, but was it ever created?
>>> https://github.com/pandas-dev/pandas/issues/288
>>>
>>> Response at the time was to use df.describe(). But df.describe() gives
>>> all the 5-number summary statistics, but I DON'T WANT wall the extra stuff
>>> I didn't ask for. I was expected a numerical number. I can use that to feed
>>> into another function.
>>>
>>> It exists in Matlab and R, why not in Python? I'm quite frustrated every
>>> time I need to calculate the range.
>>>
>>> Thanks in advance.
>>>
>>>
>>>
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Was the range() function ever created?

2019-05-24 Thread Benjamin Root
This is the numpy discussion list, not the pandas discussion list. Now, for
numpy's part, I have had hankerings for a `np.minmax()` ufunc, but never
enough to get over just calling min and max on my data separately.

On Fri, May 24, 2019 at 10:27 PM C W  wrote:

> Hello all,
>
> I am want to calculate the range of a vector. I saw that someone asked for
> range() in 2011, but was it ever created?
> https://github.com/pandas-dev/pandas/issues/288
>
> Response at the time was to use df.describe(). But df.describe() gives all
> the 5-number summary statistics, but I DON'T WANT wall the extra stuff I
> didn't ask for. I was expected a numerical number. I can use that to feed
> into another function.
>
> It exists in Matlab and R, why not in Python? I'm quite frustrated every
> time I need to calculate the range.
>
> Thanks in advance.
>
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] three-dim array

2018-12-26 Thread Benjamin Root
Ewww, kinda wish that would be an error... It would be too easy for a typo
to get accepted this way.

On Wed, Dec 26, 2018 at 1:59 AM Eric Wieser 
wrote:

> In the latest version of numpy, this runs without an error, although may
> or may not be what you want:
>
> In [1]: np.array([[1,2],[[1,2],[3,4]]])
> Out[1]:
> array([[1, 2],
>[list([1, 2]), list([3, 4])]], dtype=object)
>
> Here the result is a 2x2 array, where some elements are numbers and others
> are lists.
> ​
>
> On Wed, 26 Dec 2018 at 06:23 Mark Alexander Mikofski <
> mikof...@berkeley.edu> wrote:
>
>> I believe numpy arrays must be rectangular, yours is jagged, instead try
>>
>> >>> x3d = np.array([[[1, 2], [1, 2], [3, 4]]])
>> >>> x3d.shape
>> (1, 3, 2)
>>
>> Note: 3 opening brackets, yours has 2
>> And single brackets around the 3 innermost arrays, yours has single
>> brackets for the 1st, and double brackets around the 2nd and 3rd
>>
>>
>> On Tue, Dec 25, 2018, 6:20 PM Jeff >
>>> hello,
>>>
>>> sorry newbe to numpy.
>>>
>>> I want to define a three-dim array.
>>> I know this works:
>>>
>>>  >>> np.array([[[1,2],[3,4]],[[5,6],[7,8]]])
>>> array([[[1, 2],
>>>  [3, 4]],
>>>
>>> [[5, 6],
>>>  [7, 8]]])
>>>
>>> But can you tell why this doesnt work?
>>>
>>>  >>> np.array([[1,2],[[1,2],[3,4]]])
>>> Traceback (most recent call last):
>>>File "", line 1, in 
>>> ValueError: setting an array element with a sequence.
>>>
>>>
>>> Thank you.
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [Matplotlib-devel] mplcairo 0.1 release

2018-07-23 Thread Benjamin Root
Congratulations to Antony for his hard work on this important backend!

As far as I am concerned, the cairo backend is the future of matplotlib.
Test this backend out for yourselves and help us take matplotlib to the
next level in high-quality charting!

Cheers!
Ben Root

On Sun, Jul 22, 2018 at 4:52 PM, Antony Lee  wrote:

> Dear all,
>
> I am pleased to announce the release of mplcairo 0.1
>
> # Description
>
> mplcairo is a Matplotlib backend based on the well-known cairo library,
> supporting output to both raster (including interactively) and vector
> formats.  In other words, it provides the functionality of Matplotlib's
> {,qt5,gtk3,wx,tk,macos}{agg,cairo}, pdf, ps, and svg backends.
>
> Per Matplotlib's standard API, the backend can be selected by calling
>
> matplotlib.use("module://mplcairo.qt")
>
> or setting your MPLBACKEND environment variable to `module://mplcairo.qt`
> for
> Qt5, and similarly for other toolkits.
>
> The source tarball, and Py3.6 manylinux and Windows wheels, are available
> on
> PyPI (I am looking for help to generate the OSX wheels).
>
> See the README for more details.
>
> # Why a new backend?
>
> Compared to Matplotlib's builtin Agg and cairo backends, mplcairo presents
> the
> following features:
>
> - Improved accuracy (e.g., with marker positioning, quad meshes, and text
>   kerning).
> - Support for a wider variety of font formats, such as otf and pfb, for
> vector
>   (PDF, PS, SVG) backends (Matplotlib's Agg backend also supports such
> fonts).
> - Optional support for complex text layout (right-to-left languages, etc.)
>   using Raqm.  **Note** that Raqm depends on Fribidi, which is licensed
> under
>   the LGPLv2.1+.
> - Support for embedding URLs in PDF (but not SVG) output (requires
>   cairo≥1.15.4).
> - Support for multi-page output both for PDF and PS (Matplotlib only
> supports
>   multi-page PDF).
> - Support for custom blend modes (see `examples/operators.py`).
>
> See the README for more details.
>
> # Changelog from mplcairo 0.1a1 to mplcairo 0.1
>
> - Integration with libraqm now occurs via dlopen() rather than being
> selected
>   at compile-time.
> - Various rendering and performance improvements.
> - On Travis, we now run Matplotlib's test suite with mplcairo patching the
>   default Agg renderer.
>
> Enjoy,
>
> Antony Lee
>
> ___
> Matplotlib-devel mailing list
> matplotlib-de...@python.org
> https://mail.python.org/mailman/listinfo/matplotlib-devel
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Splitting MaskedArray into a separate package

2018-05-23 Thread Benjamin Root
As further evidence of a widely used package that is often considered
"critical" to an ecosystem that gets negligible support, look no further
than Basemap. It went almost two years without any commits before I took it
up (and then only because my employer needed a couple of fixes).

I worry that a masked array package would turn into Basemap.

Ben Root


On Wed, May 23, 2018 at 10:52 PM, Benjamin Root <ben.v.r...@gmail.com>
wrote:

> users of a package does not equate to maintainers of a package. Scikits
> are successful because scientists that have specialty in a field can
> contribute code and support the packages using their domain knowledge. How
> many people here are specialists in masked/missing value computation?
>
> Would I like to see better missing value support in numpy? Sure, but until
> then, MaskedArrays are what we have and it is still better than just using
> NaNs all over the place.
>
> Cheers!
> Ben Root
>
> On Wed, May 23, 2018 at 7:38 PM, Stefan van der Walt <stef...@berkeley.edu
> > wrote:
>
>> Hi Eric,
>>
>> On Wed, 23 May 2018 10:02:22 -1000, Eric Firing wrote:
>> > Masked arrays are critical to my numpy usage, and I suspect they are
>> > critical for many other use cases as well.
>>
>> That's good to know; and the goal of this NEP should be to improve your
>> siatuion, not make it worse.
>>
>> > In fact, I would prefer that a high priority for major numpy
>> > development be the more complete integration of masked array
>> capabilities
>> > into numpy, not their removal to a separate package.
>> >
>> > I was unhappy to see
>> > the effort in that direction a few years ago being killed.  I didn't
>> agree
>> > with every design decision, but overall I thought it was going in the
>> right
>> > direction.
>>
>> I see this and the NEP as orthogonal issues.  MaskedArrays, one
>> particular version of the masked value solution, has never truly been a
>> first class citizen.
>>
>> If we could instead implement masked arrays such that it simply sits on
>> top of existing NumPy functionality (using, e.g., special dtypes or
>> bitmasks), re-using all the standard machinery, that would be a natural
>> fit in the core of NumPy, and would negate the need for MaskedArrays.
>> But we haven't reached that point yet, and I am not aware of any current
>> proposal to do so.
>>
>> > Bad or missing values (and situations where one wants to use a mask to
>> > operate on a subset of an array) are found in many domains of real
>> life; do
>> > you really want python users in those domains to have to fall back on
>> > Matlab-style reliance on nans and/or manual mask manipulations, as the
>> new
>> > maskedarray package is sidelined?
>>
>> This is not too far from the current status quo, I would argue.  The
>> functionality exists, but it is "bolted on" rather than "built in".  And
>> my guess is that the component will benefit from some extra attention
>> that it is not getting as part of the current package.
>>
>> > Or is there any realistic prospect for maintenance and improvement of
>> the
>> > package after it is separated out?
>>
>> In order to prevent the package from being "sidelined", we would have to
>> strengthen this part of the story.
>>
>> > Side question: does your proposed purification of numpy include
>> elimination
>> > of linalg and random?  Based on the criteria in the NEP, I would expect
>> it
>> > does; so maybe you should have a more ambitious NEP, and do the
>> purification
>> > all in one step as a numpy version 2.0.  (Surely if masked arrays are
>> > purged, the matrix class should be booted out at the same time.)
>>
>> That's an interesting question, and one I have wondered about.  Would it
>> make sense to ship just the core ndarray object?  I don't know.  It
>> probably depends a lot on whether we can define clear API boundaries,
>> whether this kind of split is desired from the average user's
>> perspective, and whether it could benefit the development of the
>> subcomponents.
>>
>> W.r.t. matrices, I think you're setting a trap for me here, but I'm
>> going to step into it anyway ;)
>>
>> https://mail.python.org/pipermail/numpy-discussion/2013-July/067254.html
>>
>> It is, then, not the first time I argued in favor of moving certain
>> components out of NumPy onto their own packages.  I would probably have
>> written that NEP this time around, had it not been for the m

Re: [Numpy-discussion] Right way to do fancy indexing from argsort() result?

2018-03-26 Thread Benjamin Root
Ah, yes, I should have thought about that. Kind of seems like something
that we could make `np.take()` do, somehow, for something that is easier to
read.

Thank you!
Ben Root


On Mon, Mar 26, 2018 at 2:28 PM, Robert Kern <robert.k...@gmail.com> wrote:

> On Mon, Mar 26, 2018 at 11:24 AM, Benjamin Root <ben.v.r...@gmail.com>
> wrote:
> >
> > I seem to be losing my mind... I can't seem to get this to work right.
> >
> > I have a (N, k) array `distances` (along with a bunch of other arrays of
> the same shape). I need to resort the rows, so I do:
> >
> > indexs = np.argsort(distances, axis=1)
> >
> > How do I use this index array correctly to get back distances sorted
> along rows? Note, telling me to use `np.sort()` isn't going to work because
> I need to apply the same indexing to a couple of other arrays.
> >
> > new_dists = distances[indexs]
> >
> > gives me a (N, k, k) array, while
> >
> > new_dists = np.take(indexs, axis=1)
> >
> > gives me a (N, N, k) array.
> >
> > What am I missing?
>
> Broadcasting!
>
>   new_dists = distances[np.arange(N)[:, np.newaxis], indexs]
>
> --
> Robert Kern
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] improving arange()? introducing fma()?

2018-02-22 Thread Benjamin Root
Sorry, I have been distracted with xarray improvements the past couple of
weeks.

Some thoughts on what has been discussed:

First, you are right...Decimal is not the right module for this. I think
instead I should use the 'fractions' module for loading grid spec
information from strings (command-line, configs, etc). The tricky part is
getting the yaml reader to use it instead of converting to a float under
the hood.

Second, what has been pointed out about the implementation of arange
actually helps to explain some oddities I have encountered. In some
situations, I have found that it was better for me to produce the reversed
sequence, and then reverse that array back and use it.

Third, it would be nice to do what we can to improve arange()'s results.
Would we be ok with a PR that uses fma() if it is available, but then falls
back on a regular multiply and add if it isn't available, or are we going
to need to implement it ourselves for consistency?

Lastly, there definitely needs to be a better tool for grid making. The
problem appears easy at first, but it is fraught with many pitfalls and
subtle issues. It is easy to say, "always use linspace()", but if the user
doesn't have the number of pixels, they will need to calculate that using
--- gasp! -- floating point numbers, which could result in the wrong
answer. Or maybe their first/last positions were determined by some other
calculation, and so the resulting grid does not have the expected spacing.
Another problem that I run into is starting from two different sized grids
and padding them both to be the same spec -- and getting that to match what
would come about if I had generated the grid from scratch.

Getting these things right is hard. I am not even certain that my existing
code for doing this even right. But, what I do know is that until we build
such a tool, users will continue to incorrectly use arange() and
linspace(), and waste time trying to re-invent the wheel badly, assuming
they even notice their mistakes in the first place! So, should such a tool
go into numpy, given how fundamental it is to generate a sequence of
floating point numbers, or should we try to put it into a package like
rasterio or xarray?

Cheers!
Ben Root



On Thu, Feb 22, 2018 at 2:02 PM, Chris Barker  wrote:

> @Ben: Have you found a solution to your problem? Are there thinks we could
> do in numpy to make it better?
>
> -CHB
>
>
> On Mon, Feb 12, 2018 at 9:33 AM, Chris Barker 
> wrote:
>
>> I think it's all been said, but a few comments:
>>
>> On Sun, Feb 11, 2018 at 2:19 PM, Nils Becker 
>> wrote:
>>
>>> Generating equidistantly spaced grids is simply not always possible.
>>>
>>
>> exactly -- and linspace gives pretty much teh best possible result,
>> guaranteeing tha tthe start an end points are exact, and the spacing is
>> within an ULP or two (maybe we could make that within 1 ULP always, but not
>> sure that's worth it).
>>
>>
>>> The reason is that the absolute spacing of the possible floating point
>>> numbers depends on their magnitude [1].
>>>
>>
>> Also that the exact spacing may not be exactly representable in FP -- so
>> you have to have at least one space that's a bit off to get the end points
>> right (or have the endpoints not exact).
>>
>>
>>> If you - for some reason - want the same grid spacing everywhere you may
>>> choose an appropriate new spacing.
>>>
>>
>> well, yeah, but usually you are trying to fit to some other constraint.
>> I'm still confused as to where these couple of ULPs actually cause
>> problems, unless you are doing in appropriate FP comparisons elsewhere.
>>
>> Curiously, either by design or accident, arange() seems to do something
>>> similar as was mentioned by Eric. It creates a new grid spacing by adding
>>> and subtracting the starting point of the grid. This often has similar
>>> effect as adding and subtracting N*dx (e.g. if the grid is symmetric around
>>> 0.0). Consequently, arange() seems to trade keeping the grid spacing
>>> constant for a larger error in the grid size and consequently in the end
>>> point.
>>>
>>
>> interesting -- but it actually makes sense -- that is the definition of
>> arange(), borrowed from range(), which was designed for integers, and, in
>> fact, pretty much mirroered the classic C index for loop:
>>
>>
>> for (int i=0; i> ...
>>
>>
>> or in python:
>>
>> i = start
>> while i < stop:
>> i += step
>>
>> The problem here is that termination criteria -- i < stop -- that is the
>> definition of the function, and works just fine for integers (where it came
>> from), but with FP, even with no error accumulation, stop may not be
>> exactly representable, so you could end up with a value for your last item
>> that is about (stop-step), or you could end up with a value that is a
>> couple ULPs less than step -- essentially including the end point when you
>> weren't supposed to.
>>
>> The truth is, making a floating 

Re: [Numpy-discussion] improving arange()? introducing fma()?

2018-02-09 Thread Benjamin Root
Interesting...

```
static void
@NAME@_fill(@type@ *buffer, npy_intp length, void *NPY_UNUSED(ignored))
{
npy_intp i;
@type@ start = buffer[0];
@type@ delta = buffer[1];
delta -= start;
for (i = 2; i < length; ++i) {
buffer[i] = start + i*delta;
}
}
```

So, the second value is computed using the delta arange was given, but then
tries to get the delta back, which incurs errors:
```
>>> a = np.float32(-115)
>>> delta = np.float32(0.01)
>>> b = a + delta
>>> new_delta = b - a
>>> "%.16f" % delta
'0.009997764826'
>>> "%.16f" % new_delta
'0.0100021362304688'
```

Also, right there is a good example of where the use of fma() could be of
value.

Cheers!
Ben Root


On Fri, Feb 9, 2018 at 4:56 PM, Eric Wieser <wieser.eric+nu...@gmail.com>
wrote:

> Can’t arange and linspace operations with floats be done internally
>
> Yes, and they probably should be - they’re done this way as a hack because
> the api exposed for custom dtypes is here
> <https://github.com/numpy/numpy/blob/81e15e812574d956fcc304c3982e2b59aa18aafb/numpy/core/include/numpy/ndarraytypes.h#L507-L511>,
> (example implementation here
> <https://github.com/numpy/numpy/blob/81e15e812574d956fcc304c3982e2b59aa18aafb/numpy/core/src/multiarray/arraytypes.c.src#L3711-L3721>)
> - essentially, you give it the first two elements of the array, and ask it
> to fill in the rest.
> ​
>
> On Fri, 9 Feb 2018 at 13:17 Matthew Harrigan <harrigan.matt...@gmail.com>
> wrote:
>
>> I apologize if I'm missing something basic, but why are floats being
>> accumulated in the first place?  Can't arange and linspace operations with
>> floats be done internally similar to `start + np.arange(num_steps) *
>> step_size`?  I.e. always accumulate (really increment) integers to limit
>> errors.
>>
>> On Fri, Feb 9, 2018 at 3:43 PM, Benjamin Root <ben.v.r...@gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Fri, Feb 9, 2018 at 12:19 PM, Chris Barker <chris.bar...@noaa.gov>
>>> wrote:
>>>
>>>> On Wed, Feb 7, 2018 at 12:09 AM, Ralf Gommers <ralf.gomm...@gmail.com>
>>>> wrote:
>>>>>
>>>>>  It is partly a plea for some development of numerically accurate
>>>>>> functions for computing lat/lon grids from a combination of inputs: 
>>>>>> bounds,
>>>>>> counts, and resolutions.
>>>>>>
>>>>>
>>>> Can you be more specific about what problems you've run into -- I work
>>>> with lat-lon grids all the time, and have never had a problem.
>>>>
>>>> float32 degrees gives you about 1 meter accuracy or better, so I can
>>>> see how losing a few digits might be an issue, though I would argue that
>>>> you maybe shouldn't use float32 if you are worried about anything close to
>>>> 1m accuracy... -- or shift to a relative coordinate system of some sort.
>>>>
>>>
>>> The issue isn't so much the accuracy of the coordinates themselves. I am
>>> only worried about 1km resolution (which is approximately 0.01 degrees at
>>> mid-latitudes). My concern is with consistent *construction* of a
>>> coordinate grid with even spacing. As it stands right now. If I provide a
>>> corner coordinate, a resolution, and the number of pixels, the result is
>>> not terrible (indeed, this is the approach used by gdal/rasterio). If I
>>> have start/end coordinates and the number of pixels, the result is not bad,
>>> either (use linspace). But, if I have start/end coordinates and a
>>> resolution, then determining the number of pixels from that is actually
>>> tricky to get right in the general case, especially with float32 and large
>>> grids, and especially if the bounding box specified isn't exactly divisible
>>> by the resolution.
>>>
>>>
>>>>
>>>> I have been playing around with the decimal package a bit lately,
>>>>>>
>>>>>
>>>> sigh. decimal is so often looked at a solution to a problem it isn't
>>>> designed for. lat-lon is natively Sexagesimal -- maybe we need that dtype
>>>> :-)
>>>>
>>>> what you get from decimal is variable precision -- maybe a binary
>>>> variable precision lib is a better answer -- that would be a good thing to
>>>> have easy access to in numpy, but in this case, if you want better accuracy
>>>> in a computation that will end up in float32, just use float64.
>>>>
>>>
>>> I am not concerned about comput

Re: [Numpy-discussion] improving arange()? introducing fma()?

2018-02-09 Thread Benjamin Root
On Fri, Feb 9, 2018 at 12:19 PM, Chris Barker  wrote:

> On Wed, Feb 7, 2018 at 12:09 AM, Ralf Gommers 
> wrote:
>>
>>  It is partly a plea for some development of numerically accurate
>>> functions for computing lat/lon grids from a combination of inputs: bounds,
>>> counts, and resolutions.
>>>
>>
> Can you be more specific about what problems you've run into -- I work
> with lat-lon grids all the time, and have never had a problem.
>
> float32 degrees gives you about 1 meter accuracy or better, so I can see
> how losing a few digits might be an issue, though I would argue that you
> maybe shouldn't use float32 if you are worried about anything close to 1m
> accuracy... -- or shift to a relative coordinate system of some sort.
>

The issue isn't so much the accuracy of the coordinates themselves. I am
only worried about 1km resolution (which is approximately 0.01 degrees at
mid-latitudes). My concern is with consistent *construction* of a
coordinate grid with even spacing. As it stands right now. If I provide a
corner coordinate, a resolution, and the number of pixels, the result is
not terrible (indeed, this is the approach used by gdal/rasterio). If I
have start/end coordinates and the number of pixels, the result is not bad,
either (use linspace). But, if I have start/end coordinates and a
resolution, then determining the number of pixels from that is actually
tricky to get right in the general case, especially with float32 and large
grids, and especially if the bounding box specified isn't exactly divisible
by the resolution.


>
> I have been playing around with the decimal package a bit lately,
>>>
>>
> sigh. decimal is so often looked at a solution to a problem it isn't
> designed for. lat-lon is natively Sexagesimal -- maybe we need that dtype
> :-)
>
> what you get from decimal is variable precision -- maybe a binary variable
> precision lib is a better answer -- that would be a good thing to have easy
> access to in numpy, but in this case, if you want better accuracy in a
> computation that will end up in float32, just use float64.
>

I am not concerned about computing distances or anything like that, I am
trying to properly construct my grid. I need consistent results regardless
of which way the grid is specified (start/end/count, start/res/count,
start/end/res). I have found that loading up the grid specs (using in a
config file or command-line) using the Decimal class allows me to exactly
and consistently represent the grid specification, and gets me most of the
way there. But the problems with arange() is frustrating, and I have to
have extra logic to go around that and over to linspace() instead.


>
> and I discovered the concept of "fused multiply-add" operations for
>>> improved accuracy. I have come to realize that fma operations could be used
>>> to greatly improve the accuracy of linspace() and arange().
>>>
>>
> arange() is problematic for non-integer use anyway, by its very definition
> (getting the "end point" correct requires the right step, even without FP
> error).
>
> and would it really help with linspace? it's computing a delta with one
> division in fp, then multiplying it by an integer (represented in fp --
> why? why not keep that an integer till the multiply?).
>

Sorry, that was a left-over from a previous draft of my email after I
discovered that linspace's accuracy was on par with fma(). And while
arange() has inherent problems, it can still be made better than it is now.
In fact, I haven't investigated this, but I did recently discover some unit
tests of mine started to fail after a numpy upgrade, and traced it back to
a reduction in the accuracy of a usage of arange() with float32s. So,
something got worse at some point, which means we could still get accuracy
back if we can figure out what changed.


>
> In particular, I have been needing improved results for computing
>>> latitude/longitude grids, which tend to be done in float32's to save memory
>>> (at least, this is true in data I come across).
>>>
>>
>> If you care about saving memory *and* accuracy, wouldn't it make more
>> sense to do your computations in float64, and convert to float32 at the
>> end?
>>
>
> that does seem to be the easy option :-)
>

Kinda missing the point, isn't it? Isn't that like saying "convert all your
data to float64s prior to calling np.mean()"? That's ridiculous. Instead,
we made np.mean() upcast the inner-loop operation, and even allow an option
to specify what the dtype that should be used for the aggregator.


>
>
>> Now, to the crux of my problem. It is next to impossible to generate a
>>> non-trivial numpy array of coordinates, even in double precision, without
>>> hitting significant numerical errors.
>>>
>>
> I'm confused, the example you posted doesn't have significant errors...
>

Hmm, "errors" was the wrong word. "Differences between methods" might be
more along the lines of what I was thinking. Remember, I am looking for

[Numpy-discussion] improving arange()? introducing fma()?

2018-02-06 Thread Benjamin Root
Note, the following is partly to document my explorations in computing
lat/on grids in numpy lately, in case others come across these issues in
the future. It is partly a plea for some development of numerically
accurate functions for computing lat/lon grids from a combination of
inputs: bounds, counts, and resolutions.

I have been playing around with the decimal package a bit lately, and I
discovered the concept of "fused multiply-add" operations for improved
accuracy. I have come to realize that fma operations could be used to
greatly improve the accuracy of linspace() and arange(). In particular, I
have been needing improved results for computing latitude/longitude grids,
which tend to be done in float32's to save memory (at least, this is true
in data I come across).

Since fma is not available yet in python, please consider the following C
snippet:

```
$ cat test_fma.c
#include
#include
#include

int main(){
float res = 0.01;
float x0 = -115.0;
int cnt = 7001;
float x1 = res * cnt + x0;
float x2 = fma(res, cnt, x0);
printf("x1 %.16f  x2 %.16f\n", x1, x2);
float err1 = fabs(x1 - -44.990);
float err2 = fabs(x2 - -44.990);
printf("err1 %.5e  err2 %.5e\n", err1, err2);
return 0;
}
$ gcc test_fma.c -lm -o test_fma
$ ./test_fma
x1 -44.9899978637695312  x2 -44.9900016784667969
err1 2.13623e-06  err2 1.67847e-06
```

And if you do similar for double-precision, the fma still yields
significant accuracy over double precision explicit multiply-add.

Now, to the crux of my problem. It is next to impossible to generate a
non-trivial numpy array of coordinates, even in double precision, without
hitting significant numerical errors. Which has lead me down the path of
using the decimal package (which doesn't play very nicely with numpy
because of the lack of casting rules for it). Consider the following:
```
$ cat test_fma.py
from __future__ import print_function
import numpy as np
res = np.float32(0.01)
cnt = 7001
x0 = np.float32(-115.0)
x1 = res * cnt + x0
print("res * cnt + x0 = %.16f" % x1)
x = np.arange(-115.0, -44.99 + (res / 2), 0.01, dtype='float32')
print("len(arange()): %d  arange()[-1]: %16f" % (len(x), x[-1]))
x = np.linspace(-115.0, -44.99, cnt, dtype='float32')
print("linspace()[-1]: %.16f" % x[-1])

$ python test_fma.py
res * cnt + x0 = -44.9900015648454428
len(arange()): 7002  arange()[-1]:   -44.975044
linspace()[-1]: -44.9900016784667969
```
arange just produces silly results (puts out an extra element... adding
half of the resolution is typically mentioned as a solution on mailing
lists to get around arange()'s limitations -- I personally don't do this).

linspace() is a tad bit better than arange(), but still a little bit worse
than just computing the value straight out. It also produces a result that
is on par with fma(), so that is reassuring that it is about as accurate as
can be at the moment. This also matches up almost exactly to what I would
get if I used Decimal()s and then casted to float32's for final storage, so
that's not too bad as well.

So, does it make any sense to improve arange by utilizing fma() under the
hood? Also, any plans for making fma() available as a ufunc?

Notice that most of my examples required knowing the number of grid points
ahead of time. But what if I didn't know that? What if I just have the
bounds and the resolution? Then arange() is the natural fit, but as I
showed, its accuracy is lacking, and you have to do some sort of hack to do
a closed interval.

Thoughts? Comments?
Ben Root
(donning flame suit)
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Setting custom dtypes and 1.14

2018-01-29 Thread Benjamin Root
I <3 structured arrays. I love the fact that I can access data by row and
then by fieldname, or vice versa. There are times when I need to pass just
a column into a function, and there are times when I need to process things
row by row. Yes, pandas is nice if you want the specialized indexing
features, but it becomes a bear to deal with if all you want is normal
indexing, or even the ability to easily loop over the dataset.

Cheers!
Ben Root

On Mon, Jan 29, 2018 at 3:24 PM,  wrote:

>
>
> On Mon, Jan 29, 2018 at 2:55 PM, Stefan van der Walt  > wrote:
>
>> On Mon, 29 Jan 2018 14:10:56 -0500, josef.p...@gmail.com wrote:
>>
>>> Given that there is pandas, xarray, dask and more, numpy could as well
>>> drop
>>> any pretense of supporting dataframe_likes. Or, adjust the recfunctions
>>> so
>>> we can still work dataframe_like with structured
>>> dtypes/recarrays/recfunctions.
>>>
>>
>> I haven't been following the duckarray discussion carefully, but could
>> this be an opportunity for a dataframe protocol, so that we can have
>> libraries ingest structured arrays, record arrays, pandas dataframes,
>> etc. without too much specialized code?
>>
>
> AFAIU while not being in the data handling area, pandas defines the
> interface and other libraries provide pandas compatible interfaces or
> implementations.
>
> statsmodels currently still has recarray support and usage. In some
> interfaces we support pandas, recarrays and plain arrays, or anything where
> asarray works correctly.
>
> But recarrays became messy to support, one rewrite of some functions last
> year converts recarrays to pandas, does the manipulation and then converts
> back to recarrays.
> Also we need to adjust our recarray usage with new numpy versions. But
> there is no real benefit because I doubt that statsmodels still has any
> recarray/structured dtype users. So, we only have to remove our own uses in
> the datasets and unit tests.
>
> Josef
>
>
>
>>
>> Stéfan
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy 1.14.0 release

2018-01-13 Thread Benjamin Root
I assume you mean 1.14.0, rather than 1.4.0?

Did recarrays change? I didn't see anything in the release notes.

On Sat, Jan 13, 2018 at 4:25 PM,  wrote:

> statsmodels does not work with numpy 1.4.0
>
> Besides the missing WarningsManager there seems to be 22 errors or
> failures from changes in numpy behavior, mainly from recarrays again.
>
> Josef
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy default citation

2017-09-05 Thread Benjamin Root
There was discussion awhile back of adopting a `__citation__` attribute.
Anyone remember what happened with that idea?

On Tue, Sep 5, 2017 at 5:21 PM, Feng Yu  wrote:

> str(numpy.version.citation) and numpy.version.citation.to_bibtex()?
>
> On Tue, Sep 5, 2017 at 2:15 PM, Paul Hobson  wrote:
> > Just a thought that popped into my head:
> > It'd be cool with the sci/py/data stack had a convention of
> > .citation so I could look it up w/o leaving my jupyter notebook
> :)
> >
> > -paul
> >
> > On Tue, Sep 5, 2017 at 1:29 PM, Stefan van der Walt <
> stef...@berkeley.edu>
> > wrote:
> >>
> >> On Tue, Sep 5, 2017, at 13:25, Charles R Harris wrote:
> >>
> >>
> >> On Tue, Sep 5, 2017 at 12:36 PM, Stefan van der Walt
> >>  wrote:
> >>
> >> Shall we add a citation to Travis's "Guide to NumPy (2nd ed.)" on both
> >>
> >>
> >> What is the citation for?
> >>
> >>
> >> It's the suggested reference to add to your paper, if you use the NumPy
> >> package in your work.
> >>
> >> Stéfan
> >>
> >>
> >> ___
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion@python.org
> >> https://mail.python.org/mailman/listinfo/numpy-discussion
> >>
> >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Scipy 2017 NumPy sprint

2017-07-01 Thread Benjamin Root
Just a heads-up. There is now a sphinx-gallery plugin. Matplotlib and a few
other projects have migrated their docs over to use it.

https://sphinx-gallery.readthedocs.io/en/latest/

Cheers!
Ben Root


On Sat, Jul 1, 2017 at 7:12 AM, Ralf Gommers  wrote:

>
>
> On Fri, Jun 30, 2017 at 6:50 AM, Pauli Virtanen  wrote:
>
>> Charles R Harris kirjoitti 29.06.2017 klo 20:45:
>> > Here's a random idea: how about building a NumPy gallery?
>> > scikit-{image,learn} has it, and while those projects may have more
>> > visual datasets, I can imagine something along the lines of Nicolas
>> > Rougier's beautiful book:
>> >
>> > http://www.labri.fr/perso/nrougier/from-python-to-numpy/
>> > 
>> >
>> >
>> > So that would be added in the  numpy
>> > /numpy.org
>> >  repo?
>>
>> Or https://scipy-cookbook.readthedocs.io/  ?
>> (maybe minus bitrot and images added :)
>> _
>
>
> I'd like the numpy.org one. numpy.org is now incredibly sparse and ugly,
> a gallery would make it look a lot better.
>
> Another idea, from the "deprecate np.matrix" discussion: add numpy
> documentation describing the preferred way to handle matrices, extolling
> the virtues of @, and move np.matrix documentation to a deprecated section.
>
> Ralf
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Boolean binary '-' operator

2017-06-27 Thread Benjamin Root
Forgive my ignorance, but what is "Z/2"?

On Tue, Jun 27, 2017 at 5:35 PM, Nathaniel Smith  wrote:

> On Jun 26, 2017 6:56 PM, "Charles R Harris" 
> wrote:
>
>
>> On 27 Jun 2017, 9:25 AM +1000, Nathaniel Smith , wrote:
>>
> I guess my preference would be:
>> 1) deprecate +
>> 2) move binary - back to deprecated-but-not-an-error
>> 3) fix np.diff to use logical_xor when the inputs are boolean, since
>> that seems to be what people expect
>> 4) keep unary - as an error
>>
>> And if we want to be less aggressive, then a reasonable alternative would
>> be:
>> 1) deprecate +
>> 2) un-deprecate binary -
>> 3) keep unary - as an error
>>
>>
> Using '+' for 'or' and '*' for 'and' is pretty common and the variation of
> '+' for 'xor' was common back in the day because 'and' and 'xor' make
> boolean algebra a ring, which appealed to mathematicians as opposed to
> everyone else ;)
>
>
> '+' for 'xor' and '*' for 'and' is perfectly natural; that's just + and *
> in Z/2. It's not only a ring, it's a field! '+' for 'or' is much weirder;
> why would you use '+' for an operation that's not even invertible? I guess
> it's a semi-ring. But we have the '|' character right there; there's no
> expectation that every weird mathematical notation will be matched in
> numpy... The most notable is that '*' doesn't mean matrix multiplication.
>
>
> You can see the same progression in measure theory where eventually
> intersection and xor (symmetric difference) was replaced with union and
> complement. Using '-' for xor is something I hadn't seen outside of numpy,
> but I suspect it must be standard somewhere.  I would leave '*' and '+'
> alone, as the breakage and inconvenience from removing them would be
> significant.
>
>
> '*' doesn't bother me, because it really does have only one sensible
> behavior; even built-in bool() effectively uses 'and' for '*'.
>
> But, now I remember... The major issue here is that some people want
> dot(a, b) on Boolean matrices to use these semantics, right? Because in
> this particular case it leads to some useful connections to the matrix
> representation for logical relations [1]. So it's sort of similar to the
> diff() case. For the basic operation, using '|' or '^' is fine, but there
> are these derived operations like 'dot' and 'diff' where people have
> different expectations.
>
> I guess Juan's example of 'sum' is relevant here too. It's pretty weird
> that if 'a' and 'b' are one-dimensional boolean arrays, 'a @ b' and 'sum(a
> * b)' give totally different results.
>
> So that's the fundamental problem: there are a ton of possible conventions
> that are each appealing in one narrow context, and they all contradict each
> other, so trying to shove them all into numpy simultaneously is messy.
>
> I'm glad we at least seem to have succeeded in getting rid of unary '-',
> that one was particularly indefensible in the context of everything else
> :-). For the rest, I'm really not sure whether it's better to deprecate
> everything and tell people to use specialized tools for specialized
> purposes (e.g. add a 'logical_dot'), or to special case the high-level
> operations people want (make 'dot' and 'diff' continue to work, but
> deprecate + and -), or just leave the whole incoherent mish-mash alone.
>
> -n
>
> [1] https://en.wikipedia.org/wiki/Logical_matrix
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] UC Berkeley hiring developers to work on NumPy

2017-05-15 Thread Benjamin Root
Great news, Nathaniel! It was a huge boost to matplotlib a couple of years
ago when we got an FTE, even if it was just for a few months. While that
effort didn't directly produce any new features, we were able to overhaul
some very old parts of the codebase. Probably why the effort was so
successful was that, 1) Michael had a clear idea of what needed work and
how to achieve it and 2) the components impacted were mostly not
user-facing.

With respect to off-list conversations, one thing that the matplotlib devs
have done is set up a weekly Google Hangouts session. A summary of that
meeting is then posted to the mailing list. A practice like that (posting
summaries of regular meetings) might be sufficient to feed off-line
discussions back to the greater community.


Cheers!
Ben Root


On Mon, May 15, 2017 at 4:43 AM, Matthew Brett 
wrote:

> Hi,
>
> On Sun, May 14, 2017 at 10:56 PM, Charles R Harris
>  wrote:
> >
> >
> > On Sat, May 13, 2017 at 11:45 PM, Nathaniel Smith  wrote:
> >>
> >> Hi all,
> >>
> >> As some of you know, I've been working for... quite some time now to
> >> try to secure funding for NumPy. So I'm excited that I can now
> >> officially announce that BIDS [1] is planning to hire several folks
> >> specifically to work on NumPy. These will full time positions at UC
> >> Berkeley, postdoc or staff, with probably 2 year (initial) contracts,
> >> and the general goal will be to work on some of the major priorities
> >> we identified at the last dev meeting: more flexible dtypes, better
> >> interoperation with other array libraries, paying down technical debt,
> >> and so forth. Though I'm sure the details will change as we start to
> >> dig into things and engage with the community.
> >>
> >> More details soon; universities move slowly, so nothing's going to
> >> happen immediately. But this is definitely happening and I wanted to
> >> get something out publicly before the conference season starts – so if
> >> you're someone who might be interested in coming to work with me and
> >> the other awesome folks at BIDS, then this is a heads-up: drop me a
> >> line and we can chat! I'll be at PyCon next week if anyone happens to
> >> be there. And feel free to spread the word.
> >
> >
> > Excellent news. Do you have any sort of timeline in mind?
> >
> > It will be interesting to see what changes this leads to, both in the
> code
> > and in the project sociology.
>
> I was thinking the same thing - if this does come about, it would
> likely have a big impact on practical governance.  It could also mean
> that more important development conversations happen off-list.   It
> seems to me it would be good to plan for this consciously.
>
> Cheers,
>
> Matthew
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to concatenate 2 recarrays into a single recarray

2017-05-11 Thread Benjamin Root
Check in numpy.recfunctions. I know there is merge_arrays() and
stack_arrays(). I forget which one does what.

Cheers!
Ben Root


On Thu, May 11, 2017 at 1:51 PM, Isaac Gerg  wrote:

> I'd prefer to stay in numpy land if possible.
>
> On Thu, May 11, 2017 at 1:17 PM, Isaac Xin Pei  wrote:
>
>> Check Pandas pd.concate ?
>> On Thu, May 11, 2017 at 12:45 PM Isaac Gerg 
>> wrote:
>>
>>> I have 2 arrays, a and b which are rec arrays of length 10.  Each array
>>> has 5 columns.
>>>
>>> I would like to combine all the columns into a single recarray with 10
>>> columns and length 10.
>>>
>>> Thanks,
>>> Isaac
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion