from:"Chris Barker"

Re: [Numpy-discussion] reducing effort spent on wheel builds?

2021-07-16 Thread Chris Barker

Just a note on:

> For the record, I am +1 on removing sdists from PyPI until pip changes
its default to --only-binary :all: [1]

I agree that the defaults for pip are unfortunate (and indeed the legacy of
pip doing, well, a lot, (i.e. building and installing and package managing
and dependencies, and ...) with one interface.

However, There's a long tradition of sdists on PyPi -- and PyPi is used,
for the most part, as the source of sdists for other systems (conda-forge
for example). I did just check, and numpy is an exception -- it's pointing
to gitHub:

source:
url: https://github.com/numpy/numpy/releases/download/v{{ version
}}/numpy-{{ version }}.tar.gz

But others may be counting on sdists on PyPi.

Also, an sdist is not always the same as a gitHub release -- there is some
"magic" in building it -- it's not just a copy of the repo. Again, numpy
may be building its releases as an sdist (or it just doesn't. matter), but
something to keep in mind.

Another thought is to only support platforms that have a
committed maintainer -- I think that's how Python itself does it. The more
obscure platforms are only supported if someone steps up to support them (I
suppose that's technically true for all platforms, but not hard to find
someone on the existing core dev team to support the majors). This can be a
bit tricky, as the users of a platform may not have the skills to maintain
the builds, but it seems fair enough to only support platforms that someone
cares enough about to do the work.

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Best fit linear piecewise function?

2021-06-07 Thread Chris Barker

On Thu, Jun 3, 2021 at 4:43 AM Klaus Zimmermann 
wrote:

> if you are interested in the 1d problem, you might also consider a
> spline fit of order 1, for example with scipy.interpolate, see [1].
>

hmm, yes, that should work -- I guess it didn't dawn on me because all
examples are higher order, but I'll check it out.

-CHB



> Cheers
> Klaus
>
>
> [1]
>
> https://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html#spline-interpolation-in-1-d-procedural-interpolate-splxxx
>
> On 03/06/2021 13:12, Mark Bakker wrote:
> > My students are using this and seem to like it:
> >
> > https://jekel.me/piecewise_linear_fit_py/about.html
> > <https://jekel.me/piecewise_linear_fit_py/about.html>
> >
> >
> > Date: Tue, 1 Jun 2021 17:22:52 -0700
> > From: Chris Barker  > <mailto:chris.bar...@noaa.gov>>
> > To: Discussion of Numerical Python  > <mailto:numpy-discussion@python.org>>
> > Subject: [Numpy-discussion] Best fit linear piecewise function?
> > Message-ID:
> >
> >  > <mailto:i58wz0dl_292gvntmfrte88ik98h2nqt1...@mail.gmail.com>>
> > Content-Type: text/plain; charset="utf-8"
> >
> > Do any of you know of code for finding an optimum linear piecewise
> > fit to a
> > set of points?
> >
> > Somethin like what is described in this article:
> >
> > https://www.hindawi.com/journals/mpe/2015/876862/
> > <https://www.hindawi.com/journals/mpe/2015/876862/>
> >
> > At a glance, that looked just hard enough to code up that I'm hoping
> > someone has already done it :-)
> >
> > -CHB
> >
> >
> > --
> >
> > Christopher Barker, Ph.D.
> > Oceanographer
> >
> > Emergency Response Division
> > NOAA/NOS/OR(206) 526-6959   voice
> > 7600 Sand Point Way NE   (206) 526-6329   fax
> > Seattle, WA  98115   (206) 526-6317   main reception
> >
> > chris.bar...@noaa.gov <mailto:chris.bar...@noaa.gov>
> > -- next part --
> > An HTML attachment was scrubbed...
> > URL:
> > <
> https://mail.python.org/pipermail/numpy-discussion/attachments/20210601/fa6567d6/attachment-0001.html
> > <
> https://mail.python.org/pipermail/numpy-discussion/attachments/20210601/fa6567d6/attachment-0001.html
> >>
> >
> >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Best fit linear piecewise function?

2021-06-07 Thread Chris Barker

On Thu, Jun 3, 2021 at 4:13 AM Mark Bakker  wrote:

> My students are using this and seem to like it:
>
> https://jekel.me/piecewise_linear_fit_py/about.html
>

thanks -- that looks perfect!

-CHB



>
>
>> Date: Tue, 1 Jun 2021 17:22:52 -0700
>> From: Chris Barker 
>> To: Discussion of Numerical Python 
>> Subject: [Numpy-discussion] Best fit linear piecewise function?
>> Message-ID:
>> > i58wz0dl_292gvntmfrte88ik98h2nqt1...@mail.gmail.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>> Do any of you know of code for finding an optimum linear piecewise fit to
>> a
>> set of points?
>>
>> Somethin like what is described in this article:
>>
>> https://www.hindawi.com/journals/mpe/2015/876862/
>>
>> At a glance, that looked just hard enough to code up that I'm hoping
>> someone has already done it :-)
>>
>> -CHB
>>
>>
>> --
>>
>> Christopher Barker, Ph.D.
>> Oceanographer
>>
>> Emergency Response Division
>> NOAA/NOS/OR(206) 526-6959   voice
>> 7600 Sand Point Way NE   (206) 526-6329   fax
>> Seattle, WA  98115   (206) 526-6317   main reception
>>
>> chris.bar...@noaa.gov
>> -- next part --
>> An HTML attachment was scrubbed...
>> URL: <
>> https://mail.python.org/pipermail/numpy-discussion/attachments/20210601/fa6567d6/attachment-0001.html
>> >
>>
>>
>> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Best fit linear piecewise function?

2021-06-02 Thread Chris Barker

On Wed, Jun 2, 2021 at 1:50 AM Vincent Schut  wrote:

> Multivariate Adaptive Regression Splines might fit the bill? Implemented
> for python as py-earth: https://github.com/scikit-learn-contrib/py-earth.
>

That looks massively more complex than I was thinking for my use case. And
it seems to be a fit, rather than exactly matching a subset of points. but
as I think about it, for my use case, it might actually work well.

I'll give it a try -- thanks!

-CHB




>
> On 6/2/21 2:22 AM, Chris Barker wrote:
>
> Do any of you know of code for finding an optimum linear piecewise fit to
> a set of points?
>
> Somethin like what is described in this article:
>
> https://www.hindawi.com/journals/mpe/2015/876862/
>
> At a glance, that looked just hard enough to code up that I'm hoping
> someone has already done it :-)
>
>
> -CHB
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
>
> ___
> NumPy-Discussion mailing 
> listNumPy-Discussion@python.orghttps://mail.python.org/mailman/listinfo/numpy-discussion
>
>
> --
>
> Vincent Schut
>
> Remote Sensing Software Engineer
>
> +31 302272679 ~ Maliebaan 22 | 3581CP | Utrecht | Netherlands
> Linkedin <https://www.linkedin.com/company/satelligence/> ~
> satelligence.com <http://www.satelligence.com>
> <http://www.satelligence.com>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Best fit linear piecewise function?

2021-06-01 Thread Chris Barker

Do any of you know of code for finding an optimum linear piecewise fit to a
set of points?

Somethin like what is described in this article:

https://www.hindawi.com/journals/mpe/2015/876862/

At a glance, that looked just hard enough to code up that I'm hoping
someone has already done it :-)

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] New DTypes: Are scalars a central concept in NumPy or not?

2020-04-08 Thread Chris Barker

On Wed, Apr 8, 2020 at 1:17 PM Sebastian Berg 
wrote:

> > > > But, backward compatibility aside, could we have ONLY Scalars?
> > > Well, it is hard to write functions that work on N-Dimensions
> > > (where N
> > > can be 0), if the 0-D array does not exist.
>

> So as a (silly) example, the following does not generalize to 0d, even
> though it should:
>
> def weird_normalize_by_trace_inplace(stacked_matrices)
> """Devides matrices by their trace but retains sign
> (works in-place, and thus e.g. not for integer arrays)
>
> Parameters
> --
> stacked_matrices : (..., N, M) ndarray
> """
> assert stacked_matrices.shape[-1] == stacked_matrices.shape[-2]
>
> trace = np.trace(stacked_matrices, axis1=-2, axis2=-1)
> trace[trace < 0] *= -1
> stacked_matrices /= trace
>
> Sure that function does not make sense and you could rewrite it, but
> the fact is that in that function you want to conditionally modify
> trace in-place, but trace can be 0d and the "conditional" modification
> breaks down.
>

I guess that's what I'm getting at -- there is always an endpoint to
reducing the rank. a function that's designed to work on a "stack" of
something doesn't have to work on a single something, when it can, instead,
work on a "stack" of hight one.

Isn't the trace of a matrix always a scalar? and thus the trace(s) of a
stack of matrixes would always  be 1-D?

So that function should do something like:

stacked_matrixes.shape = (-1, M, M)

yes?

and then it would always work.

Again, backwards compatibility, but there is a reason the np.atleast_*()
functions exist -- you often need to make sure your inputs have the
dimensionality expected.

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] New DTypes: Are scalars a central concept in NumPy or not?

2020-04-08 Thread Chris Barker

sorry to have fallen off the numpy grid for a bit, but:

On Mon, Mar 23, 2020 at 1:37 PM Sebastian Berg 
wrote:

> On Mon, 2020-03-23 at 11:45 -0700, Chris Barker wrote:
> > But, backward compatibility aside, could we have ONLY Scalars?
>


> Well, it is hard to write functions that work on N-Dimensions (where N
> can be 0), if the 0-D array does not exist. You can get away with
> scalars in most cases, because they pretend to be arrays in most cases
> (aside from mutability).
>


> But I am pretty sure we have a bunch of cases that need
> `res = np.asarray(res)` simply because `res` is N-D but could then be
> silently converted to a scalar. E.g. see
> https://github.com/numpy/numpy/issues/13105 for an issue about this
> (although it does not actually list any specific problems).
>

I'm not sure this is insolvable (again, backwards compatibility aside) --
after all, one of the key issues is that it's undetermined what the rank
should be of: array(a_scalar) -- 0-d is the only unambiguous answer, but
then it's not really an array in the usual sense anyway. So in theory, we
could not allow that conversion without specifying a rank.

at the end of the day, there has to be some endpoint on how far you can
reduce the rank of an array and have it work -- why not have 1 be the lower
limit?

-CHB







> - Sebastian
>
>
> > There is certainly a need for more numpy-like scalars: more than the
> > built
> > in data types, and some handy attributes and methods, like dtype,
> > .itemsize, etc. But could we make an enhanced scalar that had
> > everything we
> > actually need from a zero-d array?
> >
> > The key point would be mutability -- but do we really need mutable
> > scalars?
> > I can't think of any time I've needed that, when I couldn't have used
> > a 1-d
> > array of length 1.
> >
> > Is there a use case for zero-d arrays that could not be met with an
> > enhanced scalar?
> >
> > -CHB
> >
> >
> >
> >
> >
> >
> >
> > On Mon, Feb 24, 2020 at 12:30 PM Allan Haldane <
> > allanhald...@gmail.com>
> > wrote:
> >
> > > I have some thoughts on scalars from playing with ndarray ducktypes
> > > (__array_function__), eg a MaskedArray ndarray-ducktype, for which
> > > I
> > > wanted an associated "MaskedScalar" type.
> > >
> > > In summary, the ways scalars currently work makes ducktyping
> > > (duck-scalars) difficult:
> > >
> > >   * numpy scalar types are not subclassable, so my duck-scalars
> > > aren't
> > > subclasses of numpy scalars and aren't in the type hierarchy
> > >   * even if scalars were subclassable, I would have to subclass
> > > each
> > > scalar datatype individually to make masked versions
> > >   * lots of code checks  `np.isinstance(var, np.float64)` which
> > > breaks
> > > for my duck-scalars
> > >   * it was difficult to distinguish between a duck-scalar and a
> > > duck-0d
> > > array. The method I used in the end seems hacky.
> > >
> > > This has led to some daydreams about how scalars should work, and
> > > also
> > > led me last to read through your NEPs 40/41 with specific focus on
> > > what
> > > you said about scalars, and was about to post there until I saw
> > > this
> > > discussion. I agree with what you said in the NEPs about not making
> > > scalars be dtype instances.
> > >
> > > Here is what ducktypes led me to:
> > >
> > > If we are able to do something like define a `np.numpy_scalar` type
> > > covering all numpy scalars, which has a `.dtype` attribute like you
> > > describe in the NEPs, then that would seem to solve the ducktype
> > > problems above. Ducktype implementors would need to make a "duck-
> > > scalar"
> > > type in parallel to their "duck-ndarray" type, but I found that to
> > > be
> > > pretty easy using an abstract class in my MaskedArray ducktype,
> > > since
> > > the MaskedArray and MaskedScalar share a lot of behavior.
> > >
> > > A numpy_scalar type would also help solve some object-array
> > > problems if
> > > the object scalars are wrapped in the np_scalar type. A long time
> > > ago I
> > > started to try to fix up various funny/strange behaviors of object
> > > datatypes, but there are lots of special cases, and the main
> > > problem was
> > > that the returned objects (eg from indexing) were not numpy types
> > >

Re: [Numpy-discussion] Good use of dunder methods in numpy

2020-03-23 Thread Chris Barker

On Thu, Mar 5, 2020 at 2:15 PM Gregory Lee  wrote:

> If i can get a link to a file that shows how dunder methods help with
>> having cool coding APIs that would be great!
>>
>>
> You may want to take a look at PEP 465 as an example, then. If I recall
> correctly, the __matmul__ method described in it was added to the standard
> library largely with NumPy in mind.
> https://www.python.org/dev/peps/pep-0465/
>

and so were "rich comparisons", and in-place operators (at least in part).

numpy is VERY, VERY, heavily built on the concept of overloading operators,
i.e. using dunders or magic methods.

I'm going to venture a guess that numpy arrays custom define every single
standard dunder -- and certainly most of them.

-CHB





> On Thu, Mar 5, 2020 at 10:32 PM Sebastian Berg 
>> wrote:
>>
>>> Hi,
>>>
>>> On Thu, 2020-03-05 at 11:14 +0400, Abdur-Rahmaan Janhangeer wrote:
>>> > Greetings list,
>>> >
>>> > I have a talk about dunder methods in Python
>>> >
>>> > (
>>> >
>>> https://conference.mscc.mu/speaker/67604187-57c3-4be6-987c-ea4bef388ad3
>>> > )
>>> >
>>> > and it would be nice to include Numpy in the mix. Can someone point
>>> > me to one or two use cases / file link where dunder methods help
>>> > numpy?
>>> >
>>>
>>> I am not sure in what sense you are looking for. NumPy has its own set
>>> of dunder methods (some of which should not be used super much
>>> probably), like `__array__`, `__array_interface__`, `__array_ufunc__`,
>>> `__array_function__`, `__array_finalize__`, ...
>>> So we are using `__array_*__` for numpy related dunders.
>>>
>>> Of course we use most Python defined dunders, but I am not sure that
>>> you are looking for that?
>>>
>>> Best,
>>>
>>> Sebastian
>>>
>>>
>>> > Thanks
>>> >
>>> > fun info: i am a tiny numpy contributor with a one line merge.
>>> > ___
>>> > NumPy-Discussion mailing list
>>> > NumPy-Discussion@python.org
>>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] New DTypes: Are scalars a central concept in NumPy or not?

2020-03-23 Thread Chris Barker

I've always found the duality of zero-d arrays an scalars confusing, and
I'm sure I'm not alone.

Having both is just plain weird.

But, backward compatibility aside, could we have ONLY Scalars?

When we index into an array, the dimensionality is reduced by one, so
indexing into a 1D array has to get us something: but the zero-d array is a
really weird object -- do we really need it?

There is certainly a need for more numpy-like scalars: more than the built
in data types, and some handy attributes and methods, like dtype,
.itemsize, etc. But could we make an enhanced scalar that had everything we
actually need from a zero-d array?

The key point would be mutability -- but do we really need mutable scalars?
I can't think of any time I've needed that, when I couldn't have used a 1-d
array of length 1.

Is there a use case for zero-d arrays that could not be met with an
enhanced scalar?

-CHB







On Mon, Feb 24, 2020 at 12:30 PM Allan Haldane 
wrote:

> I have some thoughts on scalars from playing with ndarray ducktypes
> (__array_function__), eg a MaskedArray ndarray-ducktype, for which I
> wanted an associated "MaskedScalar" type.
>
> In summary, the ways scalars currently work makes ducktyping
> (duck-scalars) difficult:
>
>   * numpy scalar types are not subclassable, so my duck-scalars aren't
> subclasses of numpy scalars and aren't in the type hierarchy
>   * even if scalars were subclassable, I would have to subclass each
> scalar datatype individually to make masked versions
>   * lots of code checks  `np.isinstance(var, np.float64)` which breaks
> for my duck-scalars
>   * it was difficult to distinguish between a duck-scalar and a duck-0d
> array. The method I used in the end seems hacky.
>
> This has led to some daydreams about how scalars should work, and also
> led me last to read through your NEPs 40/41 with specific focus on what
> you said about scalars, and was about to post there until I saw this
> discussion. I agree with what you said in the NEPs about not making
> scalars be dtype instances.
>
> Here is what ducktypes led me to:
>
> If we are able to do something like define a `np.numpy_scalar` type
> covering all numpy scalars, which has a `.dtype` attribute like you
> describe in the NEPs, then that would seem to solve the ducktype
> problems above. Ducktype implementors would need to make a "duck-scalar"
> type in parallel to their "duck-ndarray" type, but I found that to be
> pretty easy using an abstract class in my MaskedArray ducktype, since
> the MaskedArray and MaskedScalar share a lot of behavior.
>
> A numpy_scalar type would also help solve some object-array problems if
> the object scalars are wrapped in the np_scalar type. A long time ago I
> started to try to fix up various funny/strange behaviors of object
> datatypes, but there are lots of special cases, and the main problem was
> that the returned objects (eg from indexing) were not numpy types and
> did not support numpy attributes or indexing. Wrapping the returned
> object in `np.numpy_scalar` might add an extra slight annoyance to
> people who want to unwrap the object, but I think it would make object
> arrays less buggy and make code using object arrays easier to reason
> about and debug.
>
> Finally, a few random votes/comments based on the other emails on the list:
>
> I think scalars have a place in numpy (rather than just reusing 0d
> arrays), since there is a clear use in having hashable, immutable
> scalars. Structured scalars should probably be immutable.
>
> I agree with your suggestion that scalars should not be indexable. Thus,
> my duck-scalars (and proposed numpy_scalar) would not be indexable.
> However, I think they should encode their datatype though a .dtype
> attribute like ndarrays, rather than by inheritance.
>
> Also, something to think about is that currently numpy scalars satisfy
> the property `isinstance(np.float64(1), float)`, i.e they are within the
> python numerical type hierarchy. 0d arrays do not have this property. My
> proposal above would break this. I'm not sure what to think about
> whether this is a good property to maintain or not.
>
> Cheers,
> Allan
>
>
>
> On 2/21/20 8:37 PM, Sebastian Berg wrote:
> > Hi all,
> >
> > When we create new datatypes, we have the option to make new choices
> > for the new datatypes [0] (not the existing ones).
> >
> > The question is: Should every NumPy datatype have a scalar associated
> > and should operations like indexing return a scalar or a 0-D array?
> >
> > This is in my opinion a complex, almost philosophical, question, and we
> > do not have to settle anything for a long time. But, if we do not
> > decide a direction before we have many new datatypes the decision will
> > make itself...
> > So happy about any ideas, even if its just a gut feeling :).
> >
> > There are various points. I would like to mostly ignore the technical
> > ones, but I am listing them anyway here:
> >
> >   * Scalars are faster (although that

Re: [Numpy-discussion] NEP 30 - Duck Typing for NumPy Arrays - Implementation

2019-09-17 Thread Chris Barker

On Tue, Sep 17, 2019 at 6:56 AM Peter Andreas Entschev 
wrote:

> I agree with your point and understand how the current text may be
> misleading, so we shall make it clearer in the NEP (as done in
> https://github.com/numpy/numpy/pull/14529) that both are valid ways:
>
> * Have a genuine implementation of __array__ (like Dask, as pointed
> out by Stephan); or
> * Raise an exception (as CuPy does).
>

great -- sounds like we're all (well three of us anyway) are on teh same
page.

Just need to sort out the text.

-CHB




>
> Thanks for opening the PR, I will comment there as well.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] NEP 30 - Duck Typing for NumPy Arrays - Implementation

2019-09-16 Thread Chris Barker

Here's a PR with a different dicsussion of __array__:

https://github.com/numpy/numpy/pull/14529

-CHB


On Mon, Sep 16, 2019 at 3:23 PM Chris Barker  wrote:

> OK -- I *finally* got it:
>
> when you pass an arbitrary object into np.asarray(), it will create an
> array object scalar with the object in it.
>
> So yes, I can see that you may want to raise a TypeError instead, so that
> users don't get an object array  scalar when they wre expecting to get an
> array-like object.
>
> So it's probably a good idea to recommend that when a class implements
> __dauckarray__ that it also implements __array__, which can either raise an
> exception or return and ndarray.
>
> -CHB
>
>
> On Mon, Sep 16, 2019 at 3:11 PM Chris Barker 
> wrote:
>
>> On Mon, Sep 16, 2019 at 2:27 PM Stephan Hoyer  wrote:
>>
>>> On Mon, Sep 16, 2019 at 1:45 PM Peter Andreas Entschev <
>>> pe...@entschev.com> wrote:
>>>
>>>> What would be the use case for a duck-array to implement __array__ and
>>>> return a NumPy array?
>>>
>>>
>>
>>> Dask arrays are a good example. They will want to implement
>>> __duck_array__ (or whatever we call it) because they support duck typed
>>> versions of NumPy operation. They also (already) implement __array__, so
>>> they can converted into NumPy arrays as a fallback. This is convenient for
>>> moderately sized dask arrays, e.g., so you can pass one into a matplotlib
>>> function.
>>>
>>
>> Exactly.
>>
>> And I have implemented __array__ in classes that are NOT duck arrays at
>> all (an image class, for instance). But I also can see wanting to support
>> both:
>>
>> use me as a duck array
>> and
>> convert me into a proper numpy array.
>>
>> OK -- looking again at the NEP, I see this suggested implementation:
>>
>> def duckarray(array_like):
>> if hasattr(array_like, '__duckarray__'):
>> return array_like.__duckarray__()
>> return np.asarray(array_like)
>>
>> So I see the point now, if a user wants a duck array -- they may not want
>> to accidentally coerce this object to a real array (potentially expensive).
>>
>> but in this case, asarray() will only get called (and thus __array__ will
>> only get called), if __duckarray__ is not implemented. So the only reason
>> to impliment __array__ and raise and Exception is so that users will get
>> that exception is the specifically call asarray() -- why should they get
>> that??
>>
>> I'm working on a PR with suggestion for this.
>>
>> -CHB
>>
>> --
>>
>> Christopher Barker, Ph.D.
>> Oceanographer
>>
>> Emergency Response Division
>> NOAA/NOS/OR(206) 526-6959   voice
>> 7600 Sand Point Way NE   (206) 526-6329   fax
>> Seattle, WA  98115   (206) 526-6317   main reception
>>
>> chris.bar...@noaa.gov
>>
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
>


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] NEP 30 - Duck Typing for NumPy Arrays - Implementation

2019-09-16 Thread Chris Barker

OK -- I *finally* got it:

when you pass an arbitrary object into np.asarray(), it will create an
array object scalar with the object in it.

So yes, I can see that you may want to raise a TypeError instead, so that
users don't get an object array  scalar when they wre expecting to get an
array-like object.

So it's probably a good idea to recommend that when a class implements
__dauckarray__ that it also implements __array__, which can either raise an
exception or return and ndarray.

-CHB


On Mon, Sep 16, 2019 at 3:11 PM Chris Barker  wrote:

> On Mon, Sep 16, 2019 at 2:27 PM Stephan Hoyer  wrote:
>
>> On Mon, Sep 16, 2019 at 1:45 PM Peter Andreas Entschev <
>> pe...@entschev.com> wrote:
>>
>>> What would be the use case for a duck-array to implement __array__ and
>>> return a NumPy array?
>>
>>
>
>> Dask arrays are a good example. They will want to implement
>> __duck_array__ (or whatever we call it) because they support duck typed
>> versions of NumPy operation. They also (already) implement __array__, so
>> they can converted into NumPy arrays as a fallback. This is convenient for
>> moderately sized dask arrays, e.g., so you can pass one into a matplotlib
>> function.
>>
>
> Exactly.
>
> And I have implemented __array__ in classes that are NOT duck arrays at
> all (an image class, for instance). But I also can see wanting to support
> both:
>
> use me as a duck array
> and
> convert me into a proper numpy array.
>
> OK -- looking again at the NEP, I see this suggested implementation:
>
> def duckarray(array_like):
> if hasattr(array_like, '__duckarray__'):
> return array_like.__duckarray__()
> return np.asarray(array_like)
>
> So I see the point now, if a user wants a duck array -- they may not want
> to accidentally coerce this object to a real array (potentially expensive).
>
> but in this case, asarray() will only get called (and thus __array__ will
> only get called), if __duckarray__ is not implemented. So the only reason
> to impliment __array__ and raise and Exception is so that users will get
> that exception is the specifically call asarray() -- why should they get
> that??
>
> I'm working on a PR with suggestion for this.
>
> -CHB
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
>


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] NEP 30 - Duck Typing for NumPy Arrays - Implementation

2019-09-16 Thread Chris Barker

On Mon, Sep 16, 2019 at 2:27 PM Stephan Hoyer  wrote:

> On Mon, Sep 16, 2019 at 1:45 PM Peter Andreas Entschev 
> wrote:
>
>> What would be the use case for a duck-array to implement __array__ and
>> return a NumPy array?
>
>

> Dask arrays are a good example. They will want to implement __duck_array__
> (or whatever we call it) because they support duck typed versions of NumPy
> operation. They also (already) implement __array__, so they can converted
> into NumPy arrays as a fallback. This is convenient for moderately sized
> dask arrays, e.g., so you can pass one into a matplotlib function.
>

Exactly.

And I have implemented __array__ in classes that are NOT duck arrays at all
(an image class, for instance). But I also can see wanting to support both:

use me as a duck array
and
convert me into a proper numpy array.

OK -- looking again at the NEP, I see this suggested implementation:

def duckarray(array_like):
if hasattr(array_like, '__duckarray__'):
return array_like.__duckarray__()
return np.asarray(array_like)

So I see the point now, if a user wants a duck array -- they may not want
to accidentally coerce this object to a real array (potentially expensive).

but in this case, asarray() will only get called (and thus __array__ will
only get called), if __duckarray__ is not implemented. So the only reason
to impliment __array__ and raise and Exception is so that users will get
that exception is the specifically call asarray() -- why should they get
that??

I'm working on a PR with suggestion for this.

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] NEP 30 - Duck Typing for NumPy Arrays - Implementation

2019-09-16 Thread Chris Barker

On Mon, Sep 16, 2019 at 1:46 PM Peter Andreas Entschev 
wrote:

> What would be the use case for a duck-array to implement __array__ and
> return a NumPy array?

some users need a genuine, actual numpy array (for passing to Cyton code,
for example).
if __array__ is not implemented, how can they get that from an array-like
object??

Only the author of the array-like object knows how best to make a numpy
array out of it.

Unless I'm missing something, this seems
> redundant and one should just use array/asarray functions then.

but if the object does not impliment __array__, then user's can't use the
array/asarray functions!

> This
> would also prevent error-handling, what if the developer intentionally
> wants a NumPy-like array (e.g., the original array passed to the
> duckarray function) or an exception (instead of coercing to a NumPy
> array)?
>

I'm really confused now -- if a end-user wants a duckarray, they should
call duckarray() -- if they want an actual numpy array, they should call
.asarray().

Why would anyone want an Exception? If you don't want an array, then don't
call asarray()

If you call duckarray(), and the object has not implemented __duckarray__,
then you will get an exception -- whoch you should.

If you call __array_, and __array__ has not been implimented, then you will
get an exception.

what is the potential problem here?

Which makes me think -- why should Duck arrays ever implement an __array__
method that raises an Exception? why not jsut not impliment it? (unless you
wantt o add some helpful error message -- which I did for the example in my
PR.

(PR to the numpy repo in progress)

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] How to Capitalize numpy?

2019-09-16 Thread Chris Barker

Thanks Joe, looks like everyone agrees:

In text, NumPy it is.

-CHB



On Mon, Sep 16, 2019 at 2:41 PM Joe Harrington  wrote:

> Here are my thoughts on textual capitalization (at first, I thought you
> wanted to raise money!):
>
> We all agree that in code, it is "numpy".  If you don't use that, it
> throws an error.  If, in text, we keep "numpy" with a forced lower-case
> letter at the start, it is just one more oddball to remember.  It is even
> weirder in titles and the beginnings of sentences.  I'd strongly like not
> to be weird that way.  A few packages are, it's annoying, and it doesn't
> much earn them any goodwill. The default among people who are not "in the
> know" will be to do what they're used to.  Let's give them what they're
> used to, a proper noun with initial (at least) capital.
>
> Likewise, I object to preferring a particular font.  What fonts to use for
> the names of things like software packages is a decision for publications
> to make.  A journal or manual might make fine distinctions and demand
> several different, specific fonts, while a popular publication might prefer
> not to do that.  Leave the typesetting to the editors of the publications.
> We can certainly adopt a standard for our publications (docs, web pages,
> etc.), but we should state explicitly that others can do as they like.
>
> It's not an acronym, so that leaves the options of "Numpy" and "NumPy".
> It would be great, easy to remember, consistent for others, etc., if NumPy
> and SciPy were capitalized the same way and were pronounced the same (I
> still occasionally hear "numpee").  So, I would favor "NumPy" to go along
> with "SciPy", and let the context choose the font.
>
> --jh--
>
>
> On 9/16/19 9:09 PM, Chris Barker 
>  wrote:
>
>
>
>
>
> Trivial note:
>
> On the subject of naming things (spelling things??) -- should it be:
>
> numpy
> or
> Numpy
> or
> NumPy
> ?
>
> All three are in the draft NEP 30 ( mostly "NumPy", I noticed this when
> reading/copy editing the NEP) . Is there an "official" capitalization?
>
> My preference, would be to use "numpy", and where practicable, use a
> "computer" font -- i.e. ``numpy`` in RST.
>
> But if there is consensus already for anything else, that's fine, I'd just
> like to know what it is.
>
> -CHB
>
>
>
> On Mon, Aug 12, 2019 at 4:02 AM Peter Andreas Entschev 
> wrote:
>
>> Apologies for the late reply. I've opened a new PR
>> https://github.com/numpy/numpy/pull/14257 with the changes requested
>> on clarifying the text. After reading the detailed description, I've
>> decided to add a subsection "Scope" to clarify the scope where NEP-30
>> would be useful. I think the inclusion of this new subsection
>> complements the "Detail description" forming a complete text w.r.t.
>> motivation of the NEP, but feel free to point out disagreements with
>> my suggestion. I've also added a new section "Usage" pointing out how
>> one would use duck array in replacement to np.asarray where relevant.
>>
>> Regarding the naming discussion, I must say I like the idea of keeping
>> the __array_ prefix, but it seems like that is going to be difficult
>> given that none of the existing ideas so far play very nicely with
>> that. So if the general consensus is to go with __numpy_like__, I
>> would also update the NEP to reflect that changes. FWIW, I
>> particularly neither like nor dislike __numpy_like__, but I don't have
>> any better suggestions than that or keeping the current naming.
>>
>> Best,
>> Peter
>>
>> On Thu, Aug 8, 2019 at 3:40 AM Stephan Hoyer  wrote:
>> >
>> >
>> >
>> > On Wed, Aug 7, 2019 at 6:18 PM Charles R Harris <
>> charlesr.har...@gmail.com> wrote:
>> >>
>> >>
>> >>
>> >> On Wed, Aug 7, 2019 at 7:10 PM Stephan Hoyer  wrote:
>> >>>
>> >>> On Wed, Aug 7, 2019 at 5:11 PM Ralf Gommers 
>> wrote:
>> >>>>
>> >>>>
>> >>>> On Mon, Aug 5, 2019 at 6:18 PM Stephan Hoyer 
>> wrote:
>> >>>>>
>> >>>>> On Mon, Aug 5, 2019 at 2:48 PM Ralf Gommers 
>> wrote:
>> >>>>>
>> >>>>>>
>> >>>>>> The NEP currently does not say who this is meant for. Would you
>> expect libraries like SciPy to adopt it for example?
>> >>>>>>
>> >>>>>> The NEP also (understandably) punts o

Re: [Numpy-discussion] How to Capitalize numpy?

2019-09-16 Thread Chris Barker

got it, thanks.

I've fixed that typo in a PR I"m working on , too.

-CHB


On Mon, Sep 16, 2019 at 2:41 PM Ralf Gommers  wrote:

>
>
> On Mon, Sep 16, 2019 at 1:42 PM Peter Andreas Entschev 
> wrote:
>
>> My answer to that: "NumPy". Reference: logo at the top of
>> https://numpy.org/neps/index.html .
>>
>
> Yes, NumPy is the right capitalization
>
>
>
>> In NEP-30 [1], I've used "NumPy" everywhere, except for references to
>> code, repos, etc., where "numpy" is used. I see there's one occurrence
>> of "Numpy", which was definitely a typo and I had not noticed it until
>> now, but I will address this on a future update, thanks for pointing
>> that out.
>>
>> [1] https://numpy.org/neps/nep-0030-duck-array-protocol.html
>>
>> On Mon, Sep 16, 2019 at 9:09 PM Chris Barker 
>> wrote:
>> >
>> > Trivial note:
>> >
>> > On the subject of naming things (spelling things??) -- should it be:
>> >
>> > numpy
>> > or
>> > Numpy
>> > or
>> > NumPy
>> > ?
>> >
>> > All three are in the draft NEP 30 ( mostly "NumPy", I noticed this when
>> reading/copy editing the NEP) . Is there an "official" capitalization?
>> >
>> > My preference, would be to use "numpy", and where practicable, use a
>> "computer" font -- i.e. ``numpy`` in RST.
>> >
>> > But if there is consensus already for anything else, that's fine, I'd
>> just like to know what it is.
>> >
>> > -CHB
>> >
>> >
>> >
>> > On Mon, Aug 12, 2019 at 4:02 AM Peter Andreas Entschev <
>> pe...@entschev.com> wrote:
>> >>
>> >> Apologies for the late reply. I've opened a new PR
>> >> https://github.com/numpy/numpy/pull/14257 with the changes requested
>> >> on clarifying the text. After reading the detailed description, I've
>> >> decided to add a subsection "Scope" to clarify the scope where NEP-30
>> >> would be useful. I think the inclusion of this new subsection
>> >> complements the "Detail description" forming a complete text w.r.t.
>> >> motivation of the NEP, but feel free to point out disagreements with
>> >> my suggestion. I've also added a new section "Usage" pointing out how
>> >> one would use duck array in replacement to np.asarray where relevant.
>> >>
>> >> Regarding the naming discussion, I must say I like the idea of keeping
>> >> the __array_ prefix, but it seems like that is going to be difficult
>> >> given that none of the existing ideas so far play very nicely with
>> >> that. So if the general consensus is to go with __numpy_like__, I
>> >> would also update the NEP to reflect that changes. FWIW, I
>> >> particularly neither like nor dislike __numpy_like__, but I don't have
>> >> any better suggestions than that or keeping the current naming.
>> >>
>> >> Best,
>> >> Peter
>> >>
>> >> On Thu, Aug 8, 2019 at 3:40 AM Stephan Hoyer  wrote:
>> >> >
>> >> >
>> >> >
>> >> > On Wed, Aug 7, 2019 at 6:18 PM Charles R Harris <
>> charlesr.har...@gmail.com> wrote:
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Wed, Aug 7, 2019 at 7:10 PM Stephan Hoyer 
>> wrote:
>> >> >>>
>> >> >>> On Wed, Aug 7, 2019 at 5:11 PM Ralf Gommers <
>> ralf.gomm...@gmail.com> wrote:
>> >> >>>>
>> >> >>>>
>> >> >>>> On Mon, Aug 5, 2019 at 6:18 PM Stephan Hoyer 
>> wrote:
>> >> >>>>>
>> >> >>>>> On Mon, Aug 5, 2019 at 2:48 PM Ralf Gommers <
>> ralf.gomm...@gmail.com> wrote:
>> >> >>>>>
>> >> >>>>>>
>> >> >>>>>> The NEP currently does not say who this is meant for. Would you
>> expect libraries like SciPy to adopt it for example?
>> >> >>>>>>
>> >> >>>>>> The NEP also (understandably) punts on the question of when
>> something is a valid duck array. If you want this to be widely used, that
>> will need an answer or at least some rough guidance though. For example, we
>> would expect a duck array to have a mean() method, but probably not a ptp()
>> method. A libr

[Numpy-discussion] Add a total_seconds() method to timedelta64?

2019-09-16 Thread Chris Barker

I just noticed that there is no obvious way to convert a timedelta64 to
seconds (or some other easy unit) as a number.

The stdlib datetime.datetime has a .total_seconds() method for doing that.
I think it's a handy thing to have.

Looking at StackOverflow (and others), I see people suggesting things like:

a_timedelta.astype(np.float) / 1e6

This seems a really bad idea, as it's assuming the timedelta is storing
milliseconds.

The "proper" way to do it also suggested:

a_timedelta / np.timedelta64(1, 's')

This is, in fact, a much better way to do it, and allows you to specify
other units if you like: "ms"., "us", etc.

There was semi-recently a discussion thread on python-ideas about adding
other methods to datetime: (e.g.  .total_hours, .total_minutes). That was
pretty much rejected (or petered out anyway), and some argued that dividing
by a timedelta of the unit you want is the "right" way to do it anyway
(some argued that .total_seconds() never should have been added.

Personally I understand the "correctness" of using united-division, but
"practicality beats purity", and the discoverability of a method or two
really makes it easier on folks.

That being said, of folks don't want to add .total_seconds and the like --
we should add a bit to the docs about this, suggesting using the division
approach.

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] How to Capitalize numpy?

2019-09-16 Thread Chris Barker

Trivial note:

On the subject of naming things (spelling things??) -- should it be:

numpy
or
Numpy
or
NumPy
?

All three are in the draft NEP 30 ( mostly "NumPy", I noticed this when
reading/copy editing the NEP) . Is there an "official" capitalization?

My preference, would be to use "numpy", and where practicable, use a
"computer" font -- i.e. ``numpy`` in RST.

But if there is consensus already for anything else, that's fine, I'd just
like to know what it is.

-CHB



On Mon, Aug 12, 2019 at 4:02 AM Peter Andreas Entschev 
wrote:

> Apologies for the late reply. I've opened a new PR
> https://github.com/numpy/numpy/pull/14257 with the changes requested
> on clarifying the text. After reading the detailed description, I've
> decided to add a subsection "Scope" to clarify the scope where NEP-30
> would be useful. I think the inclusion of this new subsection
> complements the "Detail description" forming a complete text w.r.t.
> motivation of the NEP, but feel free to point out disagreements with
> my suggestion. I've also added a new section "Usage" pointing out how
> one would use duck array in replacement to np.asarray where relevant.
>
> Regarding the naming discussion, I must say I like the idea of keeping
> the __array_ prefix, but it seems like that is going to be difficult
> given that none of the existing ideas so far play very nicely with
> that. So if the general consensus is to go with __numpy_like__, I
> would also update the NEP to reflect that changes. FWIW, I
> particularly neither like nor dislike __numpy_like__, but I don't have
> any better suggestions than that or keeping the current naming.
>
> Best,
> Peter
>
> On Thu, Aug 8, 2019 at 3:40 AM Stephan Hoyer  wrote:
> >
> >
> >
> > On Wed, Aug 7, 2019 at 6:18 PM Charles R Harris <
> charlesr.har...@gmail.com> wrote:
> >>
> >>
> >>
> >> On Wed, Aug 7, 2019 at 7:10 PM Stephan Hoyer  wrote:
> >>>
> >>> On Wed, Aug 7, 2019 at 5:11 PM Ralf Gommers 
> wrote:
> 
> 
>  On Mon, Aug 5, 2019 at 6:18 PM Stephan Hoyer 
> wrote:
> >
> > On Mon, Aug 5, 2019 at 2:48 PM Ralf Gommers 
> wrote:
> >
> >>
> >> The NEP currently does not say who this is meant for. Would you
> expect libraries like SciPy to adopt it for example?
> >>
> >> The NEP also (understandably) punts on the question of when
> something is a valid duck array. If you want this to be widely used, that
> will need an answer or at least some rough guidance though. For example, we
> would expect a duck array to have a mean() method, but probably not a ptp()
> method. A library author who wants to use np.duckarray() needs to know,
> because she can't test with all existing and future duck array
> implementations.
> >
> >
> > I think this is covered in NEP-22 already.
> 
> 
>  It's not really. We discussed this briefly in the community call
> today, Peter said he will try to add some text.
> 
>  We should not add new functions to NumPy without indicating who is
> supposed to use this, and what need it fills / problem it solves. It seems
> pretty clear to me that it's mostly aimed at library authors rather than
> end users. And also that mature libraries like SciPy may not immediately
> adopt it, because it's too fuzzy - so it's new libraries first, mature
> libraries after the dust has settled a bit (I think).
> >>>
> >>>
> >>> I totally agree -- we definitely should clarify this in the docstring
> and elsewhere in the docs. An example in the new doc page on "Writing
> custom array containers" (
> https://numpy.org/devdocs/user/basics.dispatch.html) would also probably
> be appropriate.
> >>>
> >
> > As discussed there, I don't think NumPy is in a good position to
> pronounce decisive APIs at this time. I would welcome efforts to try, but I
> don't think that's essential for now.
> 
> 
>  There's no need to pronounce a decisive API that fully covers duck
> array. Note that RNumPy is an attempt in that direction (not a full one,
> but way better than nothing). In the NEP/docs, at least saying something
> along the lines of "if you implement this, we recommend the following
> strategy: check if a function is present in Dask, CuPy and Sparse. If so,
> it's reasonable to expect any duck array to work here. If not, we suggest
> you indicate in your docstring what kinds of duck arrays are accepted, or
> what properties they need to have". That's a spec by implementation, which
> is less than ideal but better than saying nothing.
> >>>
> >>>
> >>> OK, I agree here as well -- some guidance is better than nothing.
> >>>
> >>> Two other minor notes on this NEP, concerning naming:
> >>> 1. We should have a brief note on why we settled on the name "duck
> array". Namely, as discussed in NEP-22, we don't love the "duck" jargon,
> but we couldn't come up with anything better since NumPy already uses
> "array like" and "any array" for different purposes.
> >>> 2. The protocol should use *something* more clearly

Re: [Numpy-discussion] converting a C bytes array to two dimensional numpy array

2019-07-19 Thread Chris Barker - NOAA Federal

You can also directly build a numpy array from a pointer with the numpy
API.

And I recommend Cython as an interface to make these things easy.

This does mean you’d need to have the numpy lib at build time, .which may
be a downside.

-CHB


Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115(206) 526-6317
main reception

On Jul 16, 2019, at 5:48 AM, Derek Homeier <
de...@astro.physik.uni-goettingen.de> wrote:

On 16 Jul 2019, at 9:30 am, Omry Levy  wrote:


I have a question, regarding conversion of C (unsigned char *) buffer to a
two dimensional numpy array


this is what i am doing:

1) I get a C network buffer of unsigned char *  let's call it the source
buffer

the size of the source buffer is:

W * H * 2  bytes


2)  I am using PyByteArray_FromStringAndSize() to convert the source buffer
(a C unsigned char *) to python bytes array.

a = PyByteArray_FromStringAndSize(source buffer, W * H * 2)


3) i am using numpy.frombuffer   to convert the python bytes array to a 1
dimensional numpy array of size W *H *2 bytes

b = numpy.frombuffer(a, dtype = np.uint8)


4) i am creating a 2 dimensional numpy array from (3) when each element in
that array is made of 2 bytes from the python bytes array

c = b.view(np.uint16).reshape((H, W))


Is there a way to optimize this some how ?

Can you suggest a faster and better solution ?


The PyByteArray conversion seems unnecessary - if you can access your input
as a buffer,
calling np.frombuffer on it directly with the correct dtype should work
just as well, and you
can reshape it on the fly:

c = np.frombuffer(source_buffer, dtype=np.uint16, [count=W*H]).reshape((H,
W))

The optional ‘count’ argument would only be required if you cannot simply
read the buffer
to its end.

HTH,
   Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] defining a NumPy API standard?

2019-06-04 Thread Chris Barker

One little point here:



>   * np.ndarray.cumprod: low importance -> prefer np.multiply.accumulate
>>
>
I think that's and example of something that *should* be part of the numpy
API, but should be implemented as a mixin, based on np.multiply.accumulate.

As I'm a still confused about the goal here, that means that:

Users should still use `.cumprod`, but implementers of numpy-like packages
should implement `.multiply.accumulate`, and not directly `cumprod`, but
rather use the numpy ABC, or however it is implemented.

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] defining a NumPy API standard?

2019-06-02 Thread Chris Barker

On Sun, Jun 2, 2019 at 3:45 AM Dashamir Hoxha  wrote:

>
> Would it be useful if we could integrate the documentation system with a
> discussion forum (like Discourse.org)? Each function can be linked to its
> own discussion topic, where users and developers can discuss about the
> function, upvote or downvote it etc. This kind of discussion seems to be a
> bit more structured than a mailing list discussion.
>

We could make a giHub repo for a document, and use issues to separately
discuss each topic.

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] defining a NumPy API standard?

2019-06-02 Thread Chris Barker - NOAA Federal

> Exactly. This is great, thanks Marten. I agree with pretty much everything in 
> this list.

For my part, a few things immediately popped out at my that I disagree with. ;-)

Which does not mean it isn’t a useful exercise, but it does mean we
should expect a fair bit of debate.

But I do think we should be clear as to what the point is:

I think it could be helpful for clarifying for new and long standing
users of numpy what the “numpythonic” way to use numpy is.

I think this is very closely tied to the duck typing discussion.

But for guiding implementations of “numpy-like” libraries, not so
much: they are going to implement the features their users need —
whether it’s “officially” part of the numpy API is a minor concern.
Unless there is an official “Standard”, but it doesn’t sound like
anyone has that in mind.

I’m also a bit confused as to the scope: is this effort about the
python API only? In which case, I’m not sure how it relates to
libraries in/for other languages. Or only about those that provide a
Python binding?

When I first read the topic of this thread, I expected it to be about
the C API — it would be nice to clearly define what parts of the C API
are considered public and stable. (Though maybe that’s already done —
I do get numpy API deprecation warnings at times..)

-CHB
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] GSoD - Technical Writter

2019-05-19 Thread Chris Barker - NOAA Federal

> a responsive web page may not be an absolute requirement, but still it
may be nice to be able to read the docs from a tablet or smartphone.

Unfortunately I am not familiar yet with Sphinx, but I hope that it can be
integrated with Jekyll or Hugo, and then one of their templates can be used.


Sphinx is powerful, featurefull, and the standard doc system for Python.
Let’s just stick with that.

But there are a LOT of themes available for Sphinx— I’m sure there are
responsive ones  out there that could be used or adapted.

http://www.sphinx-doc.org/en/stable/theming.html

You might check out the bootstrap theme:

https://github.com/ryan-roemer/sphinx-bootstrap-theme

-CHB



About the content of the User Guide etc. I don't see any obvious
improvement that is needed (maybe because I have not read them yet). One
thing that may help is making the code examples interactive, so that the
readers can play with them and see how the results change. For example this
may be useful: https://github.com/RunestoneInteractive/RunestoneComponents

The two changes that I have suggested above seem more like engineering work
(for improving the documentation infrastructure), than documentation work.
For making a content that can be easily grasped by the beginners, I think
that it should be presented as a series of problems and their solutions. In
other words don't show the users the features and their details, but ask
them to solve a simple problem, and then show them how to solve it with
NumPy/SciPy and its features. This would make it more attractive because
people usually don't like to read manuals from beginning to the end. This
is a job that can be done by the teachers for their students, having in
mind the level of their students and what they actually want them to learn.
I have noticed that there are already some lectures, or books, or tutorials
like this. This is a creative work, with a specific target audience in
mind, so I can't pretend that I can possibly do something useful about this
in a short time (2-3 months). But of course the links to the existing
resources can be made more visible and reachable from the main page of the
website.

Best regards,
Dashamir

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Style guide for numpy code?

2019-05-14 Thread Chris Barker - NOAA Federal

ility functions not
> requested in the assignment and that the user will never see can have
> reduced docstrings if the functions are simple and obvious, but at least
> give the one-line summary.
>
> (k) If you modify an existing function, you must either make a Git entry
> or, if it is not under revision control, include a Revision History section
> in your docstring and record your name, the date, the version number, your
> email, and the nature of the change you made.
>
> (l) Choose variable names that are meaningful and consistent in style.
> Document your style either at the head of a module or in a separate text
> file for the project.  For example, if you use CamelCaps with initial
> capital, say that.  If you reserve initial capitals for classes, say that.
> If you use underscores for variable subscripts and camelCaps for the base
> variables, say that.  If you accept some other style and build on that, say
> that.  There are too many good reasons to have such styles for only one to
> be the community standard.  If certain kinds of values should get the same
> variable or base variable, such as fundamental constants or things like
> amplitudes, say that.
>
> (j) It's best if variables that will appear in formulae are short, so more
> terms can fit in one 80 character line.
>
> Overall, having and following a style makes code easier to read.  And, as
> an added bonus, if you take care to be consistent, you will write slower,
> view your code more times, and catch more bugs as you write them.  Thus,
> for codes of any significant size, writing pedantically commented and
> aligned code is almost always faster than blast coding, if you include
> debugging time.
>
> Did you catch both bugs in item h?
>
> --jh--
>
> On 5/9/19 11:25 AM, Chris Barker - NOAA Federal 
>  wrote:
>
> Do any of you know of a style guide for computational / numpy code?
>
> I don't mean code that will go into numpy itself, but rather, users code
> that uses numpy (and scipy, and...)
>
> I know about (am a proponent of) PEP8, but it doesn’t address the unique
> needs of scientific programming.
>
> This is mostly about variable names. In scientific code, we often want:
>
> - variable names that match the math notation- so single character names,
> maybe upper or lower case to mean different things ( in ocean wave
> mechanics, often “h” is the water depth, and “H” is the wave height)
>
> -to distinguish between scalar, vector, and matrix values — often
> UpperCase means an array or matrix, for instance.
>
> But despite (or because of) these unique needs, a style guide would be
> really helpful.
>
> Anyone have one? Or even any notes on what you do yourself?
>
> Thanks,
> -CHB
>
>
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] numpy finding local tests on import?!?!

2019-05-10 Thread Chris Barker

TL;DR:

This issue appears to have been fixed in numpy 1.15 (at least, I didn't
test 1.14)

However, I also had some issues in my environment that I also fixed, so it
may be that numpy's behavior hasn't changed -- I don't have the energy to
test now.

And it doesn't hurt to have this in the archives in case someone else runs
into the problem.

Read on if you care about weird behaviour with the testing package in numpy
1.13

Numpy appears to be both running tests on import (or at lest the the
runner), and finding local tests that are not numpy's

I found this issue (closed without a resolution):

https://github.com/numpy/numpy/issues/11457

which is related -- but it's about the import time of numpy.testing, and
not about errors/issues from that import. But maybe the import process ahs
been changed in newer numpys

What I did, and what I got:

I am trying t debug what looks like a numpy-related issue in a project.

So one thing I did was try to import numpy and check __version__:

python -c "import numpy; print(numpy.__version__)"

very weird barf:


  File
"/Users/chris.barker/miniconda2/envs/gridded/lib/python2.7/unittest/runner.py",
line 4, in 
import time
  File "time.py", line 7, in 
import netCDF4 as nc4
  File
"/Users/chris.barker/miniconda2/envs/gridded/lib/python2.7/site-packages/netCDF4/__init__.py",
line 3, in 
from ._netCDF4 import *
  File "include/netCDF4.pxi", line 728, in init netCDF4._netCDF4
(netCDF4/_netCDF4.c:83784)
AttributeError: 'module' object has no attribute 'ndarray

I get the same thing if I fire up the interpreter and then import numpy

as the error seemed to come from:

unittest/runner.py

I had a hunch.

I was, in fact, running with my current working directory in the package
dir of my project, and there is a test package in that dir

I cd out of that, and presto! numy imports fine:

$ python -c "import numpy; print(numpy.__version__)"
1.13.1

OK, that's a kinda old numpy -- but it's the minimum required by my
project. (though I can probably update that -- I"ll do that soon)

So it appears that the test runner is looking in the current working dir
(or, I suppose sys.PATH) for packages called tests -- this seems like a
broken system, unless you are runing the tests explicitly from teh command
line, it shouldn't look in the cwd, and it probably shouldn't ever look in
all of sys.path.

BUt my bigger confusion here is -- why the heck is the test runner being
run at ALL on a simple import ?!?!?

If this has been fixed / changed in newer numpy's the OK -- I'll update my
dependencies.

-CHB



-

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Style guide for numpy code?

2019-05-09 Thread Chris Barker - NOAA Federal

Oops,

Somehow that got sent before I was done. (Like my use of the passive voice
there?)

Here is a complete message:

Do any of you know of a style guide for computational / numpy code?

I don't mean code that will go into numpy itself, but rather, users code
that uses numpy (and scipy, and...)

I know about (am a proponent of) PEP8, but it doesn’t address the unique
needs of scientific programming.

This is mostly about variable names. In scientific code, we often want:

- variable names that match the math notation- so single character names,
maybe upper or lower case to mean different things ( in ocean wave
mechanics, often “h” is the water depth, and “H” is the wave height)

-to distinguish between scalar, vector, and matrix values — often UpperCase
means an array or matrix, for instance.

But despite (or because of) these unique needs, a style guide would be
really helpful.

Anyone have one? Or even any notes on what you do yourself?

Thanks,
-CHB




-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Style guide for numpy code?

2019-05-08 Thread Chris Barker

Hey all,

Do any of you know of a style guide for computational / numpy code?

I don't mean code that will go into numpy itself, but rather, users code
that uses numpy (and scipy, and...)

I know about (am a proponent

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] grant proposal for core scientific Python projects (rejected)

2019-05-04 Thread Chris Barker - NOAA Federal

for 20+ years ;), but you can now say you know
>>> yet another person at NASA who has no idea this even exists ... :)
>>> Not only do I not know of that, but I know of NASA policies that make
>>> it very difficult for NASA civil servants to contribute to open source
>>> projects -- quite hypocritical, given the amount of open source
>>> code that NASA (like all other large organizations) depends critically
>>> on, but it's a fact.
>>>
>>> Cheers,
>>> Steve Waterbury
>>>
>>> (CLEARLY **NOT** SPEAKING IN ANY OFFICIAL CAPACITY FOR NASA OR
>>> THE U.S. GOVERNMENT AS A WHOLE!ï¿½ Hence the personal email
>>> address. :)
>>>
>>> On 5/2/19 9:31 PM, Chris Barker - NOAA Federal wrote:
>>>
>>> Sounds like this is a NASA specific thing, in which case, I guess
>>> someone at NASA would need to step up.
>>>
>>> Iï¿½m afraid I know no pythonistas at NASA.ï¿½
>>>
>>> But Iï¿½ll poke around NOAA to see if thereï¿½s anything similar.
>>>
>>> -CHB
>>>
>>> On Apr 25, 2019, at 1:04 PM, Ralf Gommers 
>>> wrote:
>>>
>>>
>>>
>>> On Sat, Apr 20, 2019 at 12:41 PM Ralf Gommers 
>>> wrote:
>>>
>>>>
>>>>
>>>> On Thu, Apr 18, 2019 at 10:03 PM Joe Harrington 
>>>> wrote:
>>>>
>>>>
>>>>> 3. There's such a thing as a share-in-savings contract at NASA, in
>>>>> which
>>>>> you calculate a savings, such as from avoided costs of licensing IDL
>>>>> or
>>>>> Matlab, and say you'll develop a replacement for that product that
>>>>> costs
>>>>> less, in exchange for a portion of the savings.ï¿½ These are rare and
>>>>> few
>>>>> people know about them, but one presenter to the committee did discuss
>>>>> them and thought they'd be appropriate.ï¿½ I've always felt that we
>>>>> could
>>>>> get a chunk of change this way, and was surprised to find that the
>>>>> approach exists and has a name.ï¿½ About 3 of 4 people I talk to at
>>>>> NASA
>>>>> have no idea this even exists, though, and I haven't pursued it to its
>>>>> logical end to see if it's viable.
>>>>>
>>>>
>>>> I've heard of these. Definitely worth looking into.
>>>>
>>>
>>> It seems to be hard to find any information about these share-in-savings
>>> contracts. The closest thing I found is this:
>>> https://www.federalregister.gov/documents/2018/06/22/2018-13463/nasa-federal-acquisition-regulation-supplement-removal-of-reference-to-the-shared-savings-policy-and
>>>
>>> It is called "Shared Savings" there, and was replaced last year by
>>> something called "Value Engineering Change Proposal". If anyone can comment
>>> on whether that's the same thing as Joe meant and whether this is worth
>>> following up on, that would be very helpful.
>>>
>>> Cheers,
>>> Ralf
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>> ___
>>> NumPy-Discussion mailing 
>>> listNumPy-Discussion@python.orghttps://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>>
>>> ___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] grant proposal for core scientific Python projects (rejected)

2019-05-04 Thread Chris Barker - NOAA Federal

On May 4, 2019, at 9:00 AM, Ralf Gommers

Okay never mind, this is apparently happening already:
https://hackmd.io/YbxTpC1ZT_aEapTqydmHCA. Please jump in there instead:)


Slightly different focus than I had in mind, but yes, it makes sense to
join that effort.

-CHB



Ralf


>  Cheers,
> Ralf
>
>>
>> -CHB
>>
>> --
>>
>> Christopher Barker, Ph.D.
>> Oceanographer
>>
>> Emergency Response Division
>> NOAA/NOS/OR(206) 526-6959   voice
>> 7600 Sand Point Way NE   (206) 526-6329   fax
>> Seattle, WA  98115   (206) 526-6317   main reception
>>
>> chris.bar...@noaa.gov
>>
>> ___
>> NumPy-Discussion mailing 
>> listNumPy-Discussion@python.orghttps://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] grant proposal for core scientific Python projects (rejected)

2019-05-03 Thread Chris Barker

On Fri, May 3, 2019 at 9:56 AM Stephen Waterbury 
wrote:

> Sure, I would be interested to discuss, let's try to meet up there.
>
> OK< that's two of us :-)

NumFocus folk: Should we take this off the list and talk about a BoF or
something at SciPy?

-CHB





> Steve
>
> On 5/3/19 12:23 PM, Chris Barker wrote:
>
> On Thu, May 2, 2019 at 11:51 PM Ralf Gommers 
> wrote:
>
>> On Fri, May 3, 2019 at 3:49 AM Stephen Waterbury 
>> wrote:
>>
>>> P.S.  If anyone wants to continue this discussion at SciPy 2019,
>>> I will be there (on my own nickel!  ;) ...
>>>
>>
> So will I (on NOAA's nickel, which I am grateful for)
>
> Maybe we should hold a BoF, or even something more formal, on Government
> support for SciPY Stack development?
>
> -CHB
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
>
> ___
> NumPy-Discussion mailing 
> listNumPy-Discussion@python.orghttps://mail.python.org/mailman/listinfo/numpy-discussion
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] grant proposal for core scientific Python projects (rejected)

2019-05-03 Thread Chris Barker

On Thu, May 2, 2019 at 11:51 PM Ralf Gommers  wrote:

> On Fri, May 3, 2019 at 3:49 AM Stephen Waterbury 
> wrote:
>
>> P.S.  If anyone wants to continue this discussion at SciPy 2019,
>> I will be there (on my own nickel!  ;) ...
>>
>
So will I (on NOAA's nickel, which I am grateful for)

Maybe we should hold a BoF, or even something more formal, on Government
support for SciPY Stack development?

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] grant proposal for core scientific Python projects (rejected)

2019-05-02 Thread Chris Barker - NOAA Federal

Sounds like this is a NASA specific thing, in which case, I guess someone
at NASA would need to step up.

I’m afraid I know no pythonistas at NASA.

But I’ll poke around NOAA to see if there’s anything similar.

-CHB

On Apr 25, 2019, at 1:04 PM, Ralf Gommers  wrote:

On Sat, Apr 20, 2019 at 12:41 PM Ralf Gommers 
wrote:

>
>
> On Thu, Apr 18, 2019 at 10:03 PM Joe Harrington 
> wrote:
>
>
>> 3. There's such a thing as a share-in-savings contract at NASA, in which
>> you calculate a savings, such as from avoided costs of licensing IDL or
>> Matlab, and say you'll develop a replacement for that product that costs
>> less, in exchange for a portion of the savings.  These are rare and few
>> people know about them, but one presenter to the committee did discuss
>> them and thought they'd be appropriate.  I've always felt that we could
>> get a chunk of change this way, and was surprised to find that the
>> approach exists and has a name.  About 3 of 4 people I talk to at NASA
>> have no idea this even exists, though, and I haven't pursued it to its
>> logical end to see if it's viable.
>>
>
> I've heard of these. Definitely worth looking into.
>

It seems to be hard to find any information about these share-in-savings
contracts. The closest thing I found is this:
https://www.federalregister.gov/documents/2018/06/22/2018-13463/nasa-federal-acquisition-regulation-supplement-removal-of-reference-to-the-shared-savings-policy-and

It is called "Shared Savings" there, and was replaced last year by
something called "Value Engineering Change Proposal". If anyone can comment
on whether that's the same thing as Joe meant and whether this is worth
following up on, that would be very helpful.

Cheers,
Ralf

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Boolean arrays with nulls?

2019-04-22 Thread Chris Barker

On Thu, Apr 18, 2019 at 10:52 AM Stuart Reynolds 
wrote:

> Is float8 a thing?
>

no, but np.float16 is -- so at least only twice as much memory as youo need
:-)

array([ nan,  inf, -inf], dtype=float16)

I think masked arrays are going to be just as much, as they need to carry
the mask.

-CHB



>
> On Thu, Apr 18, 2019 at 9:46 AM Stefan van der Walt 
> wrote:
>
>> Hi Stuart,
>>
>> On Thu, 18 Apr 2019 09:12:31 -0700, Stuart Reynolds wrote:
>> > Is there an efficient way to represent bool arrays with null entries?
>>
>> You can use the bool dtype:
>>
>> In [5]: x = np.array([True, False, True])
>>
>>
>>
>> In [6]: x
>>
>>
>> Out[6]: array([ True, False,  True])
>>
>> In [7]: x.dtype
>>
>>
>> Out[7]: dtype('bool')
>>
>> You should note that this stores one True/False value per byte, so it is
>> not optimal in terms of memory use.  There is no easy way to do
>> bit-arrays with NumPy, because we use strides to determine how to move
>> from one memory location to the next.
>>
>> See also:
>> https://www.reddit.com/r/Python/comments/5oatp5/one_bit_data_type_in_numpy/
>>
>> > What I’m hoping for is that there’s a structure that is ‘viewed’ as
>> > nan-able float data, but backed but a more efficient structures
>> > internally.
>>
>> There are good implementations of this idea, such as:
>>
>> https://github.com/ilanschnell/bitarray
>>
>> Those structures cannot typically utilize the NumPy machinery, though.
>> With the new array function interface, you should at least be able to
>> build something that has something close to the NumPy API.
>>
>> Best regards,
>> Stéfan
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] [SPAM]Re: introducing Numpy.net, a pure C# implementation of Numpy

2019-03-27 Thread Chris Barker

On Mon, Mar 18, 2019 at 1:19 PM Paul Hobson  wrote:

>
>> I'm a civil engineer who adopted Python early in his career and became
> the "data guy" in the office pretty early on. Our company's IT department
> manages lots of Windows Servers running SQL Server. In my case, running
> python apps on our infrastructure just isn't feasible or supported by the
> IT department.
>

Just curious -- does it have to be C#? or could it be any CLR application
-- i.e. IronPython?

I imagine you could build a web service pretty easily in IronPython --
though AFAIK, the attempts at getting numpy support (and thus Pandas, etc)
never panned out.

The point of all of this is that in those situations, have a numpy-like
> library would be very nice indeed. I've very excited to hear that the OP's
> work has been open sourced.
>

I wonder if the OP's work could be used to make a numpy for Iron Python
native to the CLR 

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Numpy-discussion

2019-01-29 Thread Chris Barker - NOAA Federal

> On Jan 29, 2019, at 12:15 AM, Matti Picus  wrote:

>>  Perhaps you could suggest they add this explanation (or if it is in our 
>> documentation point out where you think would be appropriate) since it seems 
>> many people have problems with dtypes and overflow.

arange() is particularly problematic, as it defaults(in the common
case) to integers, which are less familiar to new users than float64,
which numpy uses as default dtype in most places.

And many (most) use of arange() would be better served by lib space anyway.

-CHB

>
>
> Matti
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] asanyarray vs. asarray

2018-10-30 Thread Chris Barker

On Tue, Oct 30, 2018 at 2:22 PM, Stephan Hoyer  wrote:

> The Liskov substitution principle (LSP) suggests that the set of
> reasonable ndarray subclasses are exactly those that could also in
> principle correspond to a new dtype. Of np.ndarray subclasses in
> wide-spread use, I think only the various "array with units" types come
> close satisfying this criteria. They only fall short insofar as they
> present a misleading dtype (without unit information).
>

How about subclasses that only add functionality? My only use case of
subclassing is exactly that:

I have a "bounding box" object (probably could have been called a
rectangle) that is a subclass of ndarray, is always shape (2,2), and has
various methods for merging two such boxes, etc, adding a point, etc.

I did it that way, 'cause I had a lot of code already that simply used a
(2,2) array to represent a bounding box, and I wanted all that code to
still work.

I have had zero problems with it.

Maybe that's too trivial to be worth talking about, but this kind of use
case can be handy.

It is a bit awkward to write the code, though -- it would be nice to have a
cleaner API for this sort of subclassing (not that I have any idea how to
do that)

The main problem with subclassing for numpy.ndarray is that it guarantees
> too much: a large set of operations/methods along with a specific memory
> layout exposed as part of its public API.
>

This is a big deal -- we really have two concepts here:
 - a Python class (type) with certain behaviors in Python code
 - a wrapper around a strided memory block.

maybe it's possible to be clear about that distinction:

"Duck Arrays" are the Python API

Maybe a C-API object  would be useful, that shares the memory layout, but
could have completely different functionality at the Python level.

- CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] asanyarray vs. asarray

2018-10-29 Thread Chris Barker

On Fri, Oct 26, 2018 at 7:12 PM, Travis Oliphant 
wrote:


>  agree that we can stop bashing subclasses in general.   The problem with
> numpy subclasses is that they were made without adherence to SOLID:
> https://en.wikipedia.org/wiki/SOLID.  In particular the Liskov
> substitution principle:  https://en.wikipedia.org/wiki/
> Liskov_substitution_principle .
>

...


> did not properly apply them in creating np.matrix which clearly violates
> the substitution principle.
>

So -- could a matrix subclass be made "properly"? or is that an example of
something that should not have been a subclass?

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] A minor milestone

2018-09-08 Thread Chris Barker

There are probably a LOT of Windows users getting numpy from conda as well.

(I know my CI's and users do...)

It'd be nice if there was some way to track real usage!

-CHB


On Sat, Sep 8, 2018 at 3:44 PM, Charles R Harris 
wrote:

>
>
> On Fri, Sep 7, 2018 at 11:16 PM Andrew Nelson  wrote:
>
>> >  but on Travis I install it half a dozen times every day.
>>
>> Good point. I wonder if there's any way to take that into account when
>> considering whether to drop versions.
>>
>> On Sat, 8 Sep 2018 at 15:14, Nathaniel Smith  wrote:
>>
>>> On Fri, Sep 7, 2018 at 6:33 PM, Charles R Harris
>>>  wrote:
>>> > Thanks for the link. It would be nice to improve the Windows numbers,
>>> Linux
>>> > is still very dominant. I suppose that might be an artifact of the
>>> systems
>>> > used by developers as opposed to end users. It would be a different
>>> open
>>> > source world if Microsoft had always released their compilers for free
>>> and
>>> > kept them current with the evolving ISO specs.
>>>
>>> Well, keep in mind also that it's counting installs, not users...
>>> people destroy and reinstall Linux systems a *lot* more often than
>>> they do Windows/macOS systems, what with clouds and containers and CI
>>> systems and all. On my personal laptop I install numpy maybe once per
>>> release, but on Travis I install it half a dozen times every day.
>>>
>>>
> Would be interesting if the travisCI and appveyor downloads could be
> separated out.
>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Add pybind11 to docs about writing binding code

2018-08-20 Thread Chris Barker

On Mon, Aug 20, 2018 at 8:57 AM, Neal Becker  wrote:

> I'm confused, do you have a link or example showing how to use
> xtensor-python without pybind11?
>

I think you may have it backwards:

"""
The Python bindings for xtensor are based on the pybind11 C++ library,
which enables seemless interoperability between C++ and Python.
"""

So no, yu can't use xtenson-python without pybind11 -- I think what was
suggested was that you *could* use xtenson-python without using xtenson on
the C++ side. i.e. xtensor-python is a higher level binding system than
pybind11 alone, rather than just bindings for xtensor. And thus belongs in
the docs about binding tools.

Which makes me want to take a closer look at it...

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Dropping 32-bit from macOS wheels

2018-08-14 Thread Chris Barker

On Tue, Aug 14, 2018 at 2:17 AM, Matthew Brett 
wrote:

> We are planning to drop 32-bit compatibility from the numpy macOS
> wheels.

+1 -- it really is time.

I note that python.org has finally gone 64 bit only -- at least for the
default download:

"""
For 3.7.0, we provide two binary installer options for download. The
default variant is 64-bit-only and works on macOS 10.9 (Mavericks) and
later systems.
"""

granted, it'll be quite a while before everyone is running 3.7+, but still.

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Taking back control of the #numpy irc channel

2018-08-10 Thread Chris Barker

On Wed, Aug 8, 2018 at 9:06 AM, Sebastian Berg

> If someone is unhappy with us two being the main
> contact/people who have those right on freenode,


On the contrary, thanks much for taking this on!

-CHB



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] pytest, fixture and parametrize

2018-08-09 Thread Chris Barker - NOAA Federal

> On Aug 8, 2018, at 5:40 PM, Matti Picus  wrote:
> We have these guidelines http://www.numpy.org/devdocs/reference/testing.html,

Thanks Matti — that’s clearly the place to document best practices.

>  It was updated for pytest in the 1.15 release, but could use some more 
> editing and refinement.

Giving it s quick read right now, it clearly has not embraced pytest (yet?)

Before we try to update the doc, we should have some guidelines as to the goals.

At the broad level — do we want to make use of the pytest testing
Framework, or just use pytest as a test runner?

I would say that it’s a no brainer, except that he numpy testing
framework already has a lot of the key  features of pytest that we
need.

Options:

1) Use pytest only for test running

2) Use pytest features that numpy.testing does not have a good replacement for.

3) prefer pytest features for new tests.

In all cases, I think we should limit ourselves to the less magical
pytest features.

BTW, I see this:

“Setup and teardown functions to functions and methods are known as
“fixtures”, and their use is not encouraged.”

First, is that a typo? I don’t understand what it means. Though I
think I know what a fixture is.

Second — why are fixtures discouraged?

Third, if fixtures are discouraged, rather than banned, we should say
when it is OK to use one.

-CHB

> Matti
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] pytest, fixture and parametrize

2018-08-08 Thread Chris Barker

On Wed, Aug 8, 2018 at 9:38 AM, Evgeni Burovski 
wrote:

> Stdlib unittest supports self.assertRaises context manager from python 3.1
>

but that requires using unittest :-)

On Wed, Aug 8, 2018, 7:30 PM Eric Wieser 
> wrote:
>
>> You forget that we already have that feature in our testing layer,
>>
>> with np.testing.assert_raises(AnException):
>> pass
>>
>>
fair enough -- I wouldn't re-write that now, but as its there already, it
may make sense to use it.

Perhaps we need a doc that lays out the prefered testing utilities.

Worthy of a NEP? Or is just a README or something in the code base.

Personally, I think a commitment to pytest is the best way to go -- but
there are a lot of legacy tests, so there will be a jumble.

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] pytest, fixture and parametrize

2018-08-08 Thread Chris Barker - NOAA Federal

> as to whether we are should use pytest fixtures and parametrized classes and 
> functions.

Absolutely!

 > the disadvantage is being dependent on pytest as unittest does not
support that functionality.

Which is the whole point of using pytest, yes?

I’m very opinionated about this, but I really dislike unittest — it’s
simply way too Java-y — makes the easy things harder than they should
be, and is missing critical should-be-easy features.

I moved to pure pytest a few years ago, and have been very happy about
it.  In fact, I recently converted some d unittest code to pure
pytest, and it was literally about 1/4 as much code.

The only reason I can see to avoid pytest features is that it’s not in
the standard lib — but it’s not a run-time dependency, and it’s easy
to install and well supported — so all good.

I suppose we may want to avoid some of pytest’s more magical esoteric
features, but certainly not core functionality like fixtures and
parameterized tests.

-CHB
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Adoption of a Code of Conduct

2018-08-07 Thread Chris Barker

On Mon, Aug 6, 2018 at 5:30 PM, Matthew Harrigan  wrote:

> It's also key to note the specific phrasing -- it is *diversity* that is
>> honored, whereas we would (and do) welcome diverse individuals.
>>
>
> I'm afraid I miss your point.  I understand that diversity is what is
> being honoured in the current CoC, and that is my central issue.  My issue
> is not so much diversity, but more that honour is not the right word.  We
> all agree (I think/hope) that we should and do welcome diverse
> individuals.  That actually paraphrases my suggested edit:
>
> Though no list can hope to be comprehensive, we explicitly *welcome*
> diversity in: age, culture, ethnicity, genotype, gender identity or
> expression, language, national origin, neurotype, phenotype, political
> beliefs, profession, race, religion, sexual orientation, socioeconomic
> status, subculture and technical ability.
>

I think the authors were explicitly using a stronger word: diversity is not
jstu welcome, it is more than welcome -- it is honored -- that is, it's a
good thing that we explicitly want to support.


> Practically speaking I don't think my edit means much.  I can't think of a
> situation where someone is friendly, welcoming, and respectful to everyone
> yet should be referred referred to CoC committee for failing to honour
> diversity.  One goal of the CoC should be to make sure that diverse people
> from potentially marginalized or targeted groups feel welcome and my edit
> addresses that more directly than the original.  But in principle the
> difference, to me at least, is stark.  Thank you for considering my view.
>
>
> On Mon, Aug 6, 2018 at 1:58 PM, Chris Barker 
> wrote:
>
>>
>> On August 4, 2018 00:23:44 Matthew Harrigan 
>>> wrote:
>>>
>>>> One concern I have is the phrase "explicitly honour" in "we explicitly
>>>> honour diversity in: age, culture, ...".  Honour is a curious word choice.
>>>> honour <https://www.dictionary.com/browse/honour> is defined as, among
>>>> other things, "to worship", "high public esteem; fame; glory", and "a
>>>> source of credit or distinction".
>>>>
>>>
I think that last one is, in fact, the point.

Anyway, I for one think it's fine either way, but would suggest that any
minor changes like this be made to the SciPy CoC (of at all), and that
numpy uses the same one.

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Adoption of a Code of Conduct

2018-08-06 Thread Chris Barker

> On August 4, 2018 00:23:44 Matthew Harrigan 
> wrote:
>
>> One concern I have is the phrase "explicitly honour" in "we explicitly
>> honour diversity in: age, culture, ...".  Honour is a curious word choice.
>> honour  is defined as, among
>> other things, "to worship", "high public esteem; fame; glory", and "a
>> source of credit or distinction".  I would object to some of those
>> interpretations.  Also its not clear to me how honouring diversity relates
>> to conduct.  I would definitely agree to follow the other parts of the
>> CoC and also to welcome others regardless of where they fall on the various
>> axes of diversity.  "Explicitly welcome" is better and much more closely
>> related to conduct IMO.
>>
>
> While honor may be a slightly strange choice, I don't think it is as
> strange as this specific definition makes it out to be. You also say "I
> honor my promise", i.e., I take it seriously, and it has meaning to me.
>
> Diversity has meaning to our community (it enriches us, both
> intellectually and otherwise) and should be cherished.
>

It's also key to note the specific phrasing -- it is *diversity* that is
honored, whereas we would (and do) welcome diverse individuals.

So I like the phasing as it is.

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Adoption of a Code of Conduct

2018-08-03 Thread Chris Barker

On Fri, Aug 3, 2018 at 12:45 PM, Stefan van der Walt 
wrote:

> I'll note that at least the Contributor Covenant is pretty vague about
>> enforcement:
>>
>> """
>> All complaints will be reviewed and investigated and will result in a
>> response that is deemed necessary and appropriate to the circumstances.
>> """
>>
>> I'd think refining THAT part for the project may provide the benefits of
>> discussion...
>>
>
> But the SciPy CoC has a whole additional document that goes into further
> detail on this specific issue, so let's not concern ourselves with the
> weaknesses of the Covenant (there are plenty),
>

Actually, I did not indent that to be highlighting a limitation in the
Covenant, but rather pointing out that there is plenty to discuss, even if
one does adopt an existing CoC.

But at Ralf points out, that discussion has been had in the context of
scipy, so I agree -- numpy should adopt scipy's CoC and be done with it.

In fact, if someone still feels strongly that "political beliefs" should be
removed, then it's probably better to bring that up in the context of
scipy, rather than numpy -- as has been said, it is practically the same
community.

To the point where the scipy developers guide and the numpy developers
guide are published on the same web site.

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Adoption of a Code of Conduct

2018-08-03 Thread Chris Barker

On Fri, Aug 3, 2018 at 11:33 AM, Nelle Varoquaux 
wrote:

I think what matters in code of conduct is community buy-in and the
> discussions around it, more than the document itself.
>

This is a really good point. Though I think a community could still have
that discussion around whether and which CoC to adopt, rather than the
bike-shedding of the document itself.

And the reality is that a small sub-fraction of eh community takes part in
the conversation anyway.

I'm very much on the fence about whether this thread has been truly
helpful, for instance, though it's certainly got me trolling the web
reading about the issue -- which I probably would not have if this were
simply a: "should we adopt the NumFocos CoC" thread...

By off-loading the discussion and writing process to someone else, you are
> missing most of the benefits of codes of conducts.
>

well, when reading about CoCs, it seem a large part of their benefit is not
to the existing community, but rather what it projects to the rest of the
world, particularly possible new contributors.

> This is also the reason why I think codes of conduct should be revisited
> regularly.
>

That is a good idea, yes.

I'll note that at least the Contributor Covenant is pretty vague about
enforcement:

"""
All complaints will be reviewed and investigated and will result in a
response that is deemed necessary and appropriate to the circumstances.
"""

I'd think refining THAT part for the project may provide the benefits of
discussion...

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Adoption of a Code of Conduct

2018-08-03 Thread Chris Barker

On Fri, Aug 3, 2018 at 11:20 AM, Chris Barker  wrote:

> Given Jupyter, numpy, scipy, matplotlib?, etc, are all working on a CoC --
> maybe we could have NumFocus take a lead on this for the whole community?
>

or adopt an existing one, like maybe:

The Contributor Covenant <http://www.contributor-covenant.org/> was adopted
by several prominent open source projects, including Atom, AngularJS,
Eclipse, and even Rails. According to Github, total adoption of the
Contributor Covenant is nearing an astounding ten thousand open source
projects.

I'm trying to figure out why numpy (Or any project, really) has either
unique needs or people better qualified to write a CoC than any other
project or community. So much like OSS licences -- it's much better to pick
an established one than write your own.

For the record, the Covenant does have a laundry list of "classes", that
does not include political belief, but does mention "political" here:

"""
Examples of unacceptable behavior by participants include:
...
Trolling, insulting/derogatory comments, and personal or political attacks
 ...
"""

-CHB

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Adoption of a Code of Conduct

2018-08-03 Thread Chris Barker

One other thought:

Given Jupyter, numpy, scipy, matplotlib?, etc, are all working on a CoC --
maybe we could have NumFocus take a lead on this for the whole community?

I think most (all?) of the NumFocus projects have essentially the same
goals in this regard.

-CHB





On Fri, Aug 3, 2018 at 9:44 AM, Chris Barker  wrote:

> On Fri, Aug 3, 2018 at 8:59 AM, Hameer Abbasi 
> wrote
>>
>>
>> I’ve created a PR, and I’ve kept the language “not too stern”.
>> https://github.com/scipy/scipy/pull/9109
>>
>
> Thanks -- for ease of this thread, the sentence Hameer added is:
>
> "We expect that you will extend the same courtesy and open-mindedness
> towards other members of the SciPy community."
>
> LGTM
>
> -CHB
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
>



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Adoption of a Code of Conduct

2018-08-03 Thread Chris Barker

On Fri, Aug 3, 2018 at 8:59 AM, Hameer Abbasi 
wrote
>
>
> I’ve created a PR, and I’ve kept the language “not too stern”.
> https://github.com/scipy/scipy/pull/9109
>

Thanks -- for ease of this thread, the sentence Hameer added is:

"We expect that you will extend the same courtesy and open-mindedness
towards other members of the SciPy community."

LGTM

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Roadmap proposal, v3

2018-07-25 Thread Chris Barker - NOAA Federal

> Obviously the string dtype proposal in the roadmap is only a sketch at this 
> point :).
>
> I do think that options listed currently (encoded strings with fixed-width 
> storage and variable length strings) cover the breadth of proposals from last 
> time. We may not want to implement all of them in NumPy, but I think we can 
> agree that there are use cases for all them, even if only as external dtypes?

Maybe :-) — but I totally agree that more complete handling of strings
should be on the roadmap.

> Would it help to add "and/or" after the first bullet? Mostly I care about 
> having like to have "improve string dtypes" in some form on the roadmap, and 
> thought it would be helpful to list the concrete proposals that I recall.

Sure, something like and/or that makes it clear that the details are
yet to be determined would be great.

> The actual design choices (especially if we proposal to change any default 
> behavior) will certainly need a NEP.

Then that will be the place to hash out the details — perfect.

I just got a little concerned that s not-well vetted solution was
getting nailed down in the roadmap.

-CHB
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Roadmap proposal, v3

2018-07-25 Thread Chris Barker - NOAA Federal

Great work, thanks!

I see this:


 “- Fixed width encoded strings (utf8, latin1, ...)”

And a bit of discussion in the PR.

But I think there are key questions to be addressed in handling strings in
numpy. I know we’ve had a lot of discussion about it on this list over the
years — but is there a place that has captured that discussion / and or we
can start a new one?

For example, I am very wary of putting a non-fixed width encoding (e.g.
Utf-8) in a fixed width field.

But this PR is not the place to discuss that.

-CHB



Sent from my iPhone

On Jul 24, 2018, at 3:21 PM, Hameer Abbasi 
wrote:

Hey Stefan/Ralf/Stephan,

This looks nice, generally what the community agrees on. Great work, and
thanks for putting this together.

Best regards,
Hameer Abbasi
Sent from Astro  for Mac

On 24. Jul 2018 at 21:04, Stefan van der Walt  wrote:


Hi everyone,

Please take a look at the latest roadmap proposal:

https://github.com/numpy/numpy/pull/11611

This is a living document, so can easily be modified in the future, but
we'd like to get in place a document that corresponds fairly closely
with current community priorities.

Best regards,
Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Dropping Python 3.4 support for NumPy 1.16

2018-06-13 Thread Chris Barker

>
> I think NumPy 1.16 would be a good time to drop Python 3.4 support.
>>
>
+1

Using python3 before 3.5 was still kinda "bleeding edge" -- so projects are
more likely to be actively upgrading.

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] A roadmap for NumPy - longer term planning

2018-06-01 Thread Chris Barker

On Fri, Jun 1, 2018 at 9:46 AM, Chris Barker  wrote:

> numpy is also quite a bit slower than raw python for math with (very)
> small arrays:
>

doing a bit more experimentation, the advantage is with pure python for
over 10 elements (I got bored...). but I noticed that the time for numpy
computation is pretty much constant for 2 up to around 100 elements. Which
implies that the bulk of the issue is with "startup" costs, rather than
fancy indexing or anything like that. so maybe a short cut wouldn't be
helpful.

Note if you use a list comp (the pythonic translation of an array
operation) thecrossover point is about 15 elements (in my tests, on my
machine...)

In [90]: % timeit t2 = [x * 10 for x in t]

920 ns ± 4.88 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)

-CHB




> In [31]: % timeit t2 = (t[0] * 10, t[1] * 10)
> 162 ns ± 0.79 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>
> In [32]: a
> Out[32]: array([ 3.4,  5.6])
>
> In [33]: % timeit a2 = a * 10
> 941 ns ± 7.95 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
>
>
> (I often want to so this sort of thing, not for performance, but for ease
> of computation -- say you have 2 or three coordinates that represent a
> point -- it's really nice to be able to scale or shift with array
> operations, rather than all that indexing -- but it is pretty slo with
> numpy.
>
> I've wondered if numpy could be optimized for small 1D arrays, and maybe
> even 2d arrays with a small fixed second dimension (N x 2, N x 3), by
> special-casing / short-cutting those cases.
>
> It would require some careful profiling to see if it would help, but it
> sure seems possible.
>
> And maybe scalars could be fit into the same system.
>
> -CHB
>
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
>



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] A roadmap for NumPy - longer term planning

2018-06-01 Thread Chris Barker

On Fri, Jun 1, 2018 at 4:43 AM, Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

>  one thing that always slightly annoyed me is that numpy math is way
> slower for scalars than python math
>

numpy is also quite a bit slower than raw python for math with (very) small
arrays:

In [31]: % timeit t2 = (t[0] * 10, t[1] * 10)
162 ns ± 0.79 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [32]: a
Out[32]: array([ 3.4,  5.6])

In [33]: % timeit a2 = a * 10
941 ns ± 7.95 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)

(I often want to so this sort of thing, not for performance, but for ease
of computation -- say you have 2 or three coordinates that represent a
point -- it's really nice to be able to scale or shift with array
operations, rather than all that indexing -- but it is pretty slo with
numpy.

I've wondered if numpy could be optimized for small 1D arrays, and maybe
even 2d arrays with a small fixed second dimension (N x 2, N x 3), by
special-casing / short-cutting those cases.

It would require some careful profiling to see if it would help, but it
sure seems possible.

And maybe scalars could be fit into the same system.

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] best way of speeding up a filtering-like algorithm

2018-03-29 Thread Chris Barker

sorry, not enough time to look closely, but a couple general comments:

On Wed, Mar 28, 2018 at 5:56 PM, Moroney, Catherine M (398E) <
catherine.m.moro...@jpl.nasa.gov> wrote:

> I have the following sample code (pretty simple algorithm that uses a
> rolling filter window) and am wondering what the best way is of speeding it
> up.  I tried rewriting it in Cython by pre-declaring the variables but that
> didn’t buy me a lot of time.  Then I rewrote it in Fortran (and compiled it
> with f2py) and now it’s lightning fast.
>

if done right, Cython should be almost as fast as Fortran, and just as fast
if you use the "restrict" correctly (which I hope can be done in Cython):

https://en.wikipedia.org/wiki/Pointer_aliasing


> But I would still like to know if I could rewrite it in pure
> python/numpy/scipy
>

you can use stride_tricks to make arrays "appear" to be N+1 D, to implement
windows without actually duplicating the data, and then use array
operations on them. This can buy a lot of speed, but will not be as fast
(by a factor of 10 or so) as Cython or Fortran

see:

https://github.com/PythonCHB/IRIS_Python_Class/blob/master/Numpy/code/filter_example.py
for and example in 1D



> or in Cython and get a similar speedup.
>
>
see above -- a direct port of your Fortran code to Cython should get you
within a factor of two or so of the Fortran, and then using "restrict" to
let the compiler know your pointers aren't aliased should get you the reset
of the way.

Here is an example of a Automatic Gain Control filter in 1D, iplimented in
numpy with stride_triks, and C and Cython and Fortran.

https://github.com/PythonCHB/IRIS_Python_Class/tree/master/Interfacing_C/agc_example

Note that in that example, I never got C or Cython as fast as Fortran --
but I think using "restrict" in the C would do it.

HTH,

-CHB



>
> Here is the raw Python code:
>
>
>
> def mixed_coastline_slow(nsidc, radius, count, mask=None):
>
>
>
> nsidc_copy = numpy.copy(nsidc)
>
>
>
> if (mask is None):
>
> idx_coastline = numpy.where(nsidc_copy == NSIDC_COASTLINE_MIXED)
>
> else:
>
> idx_coastline = numpy.where(mask & (nsidc_copy ==
> NSIDC_COASTLINE_MIXED))
>
>
>
> for (irow0, icol0) in zip(idx_coastline[0], idx_coastline[1]):
>
>
>
> rows = ( max(irow0-radius, 0), min(irow0+radius+1,
> nsidc_copy.shape[0]) )
>
> cols = ( max(icol0-radius, 0), min(icol0+radius+1,
> nsidc_copy.shape[1]) )
>
> window = nsidc[rows[0]:rows[1], cols[0]:cols[1]]
>
>
>
> npoints = numpy.where(window != NSIDC_COASTLINE_MIXED, True,
> False).sum()
>
> nsnowice = numpy.where( (window >= NSIDC_SEAICE_LOW) & (window <=
> NSIDC_FRESHSNOW), \
>
> True, False).sum()
>
>
>
> if (100.0*nsnowice/npoints >= count):
>
>  nsidc_copy[irow0, icol0] = MISR_SEAICE_THRESHOLD
>
>
>
> return nsidc_copy
>
>
>
> and here is my attempt at Cython-izing it:
>
>
>
> import numpy
>
> cimport numpy as cnumpy
>
> cimport cython
>
>
>
> cdef int NSIDC_SIZE  = 721
>
> cdef int NSIDC_NO_SNOW = 0
>
> cdef int NSIDC_ALL_SNOW = 100
>
> cdef int NSIDC_FRESHSNOW = 103
>
> cdef int NSIDC_PERMSNOW  = 101
>
> cdef int NSIDC_SEAICE_LOW  = 1
>
> cdef int NSIDC_SEAICE_HIGH = 100
>
> cdef int NSIDC_COASTLINE_MIXED = 252
>
> cdef int NSIDC_SUSPECT_ICE = 253
>
>
>
> cdef int MISR_SEAICE_THRESHOLD = 6
>
>
>
> def mixed_coastline(cnumpy.ndarray[cnumpy.uint8_t, ndim=2] nsidc, int
> radius, int count):
>
>
>
>  cdef int irow, icol, irow1, irow2, icol1, icol2, npoints, nsnowice
>
>  cdef cnumpy.ndarray[cnumpy.uint8_t, ndim=2] nsidc2 \
>
> = numpy.empty(shape=(NSIDC_SIZE, NSIDC_SIZE), dtype=numpy.uint8)
>
>  cdef cnumpy.ndarray[cnumpy.uint8_t, ndim=2] window \
>
> = numpy.empty(shape=(2*radius+1, 2*radius+1), dtype=numpy.uint8)
>
>
>
>  nsidc2 = numpy.copy(nsidc)
>
>
>
>  idx_coastline = numpy.where(nsidc2 == NSIDC_COASTLINE_MIXED)
>
>
>
>  for (irow, icol) in zip(idx_coastline[0], idx_coastline[1]):
>
>
>
>   irow1 = max(irow-radius, 0)
>
>   irow2 = min(irow+radius+1, NSIDC_SIZE)
>
>   icol1 = max(icol-radius, 0)
>
>   icol2 = min(icol+radius+1, NSIDC_SIZE)
>
>   window = nsidc[irow1:irow2, icol1:icol2]
>
>
>
>   npoints = numpy.where(window != NSIDC_COASTLINE_MIXED, True,
> False).sum()
>
>   nsnowice = numpy.where( (window >= NSIDC_SEAICE_LOW) & (window
> <= NSIDC_FRESHSNOW), \
>
>   True, False).sum()
>
>
>
>   if (100.0*nsnowice/npoints >= count):
>
>nsidc2[irow, icol] = MISR_SEAICE_THRESHOLD
>
>
>
>  return nsidc2
>
>
>
> Thanks in advance for any advice!
>
>
>
> Catherine
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division

Re: [Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray

2018-03-10 Thread Chris Barker

On Sat, Mar 10, 2018 at 1:27 PM, Matthew Rocklin  wrote:

> I'm very glad to see this discussion.
>

me too, but

> I think that coming up with a single definition of array-like may be
> difficult, and that we might end up wanting to embrace duck typing instead.
>

exactly -- I think there is a clear line between "uses the numpy memory
layout" and the Python API. But the python API is pretty darn big, and many
"array_ish" classes implement only partvof it, and may even implement some
parts a bit differently. So really hard to have "one" definition, except
"Python API exactly like a ndarray" -- and I'm wondering how useful that is.

It seems to me that different array-like classes will implement different
> mixtures of features.  It may be difficult to pin down a single definition
> that includes anything except for the most basic attributes (shape and
> dtype?).
>

or a minimum set -- but again, how useful??

> Storage objects like h5py (support getitem in a numpy-like way)
>

Exactly -- though I don't know about h5py, but netCDF4 variables supoprt a
useful subst of ndarray, but do "fancy indexing" differently -- so are they
ndarray_ish? -- sorry to coin yet another term :-)

> I can imagine authors of both groups saying that they should qualify as
> array-like because downstream projects that consume them should not convert
> them to numpy arrays in important contexts.
>

indeed. My solution so far is to define my own duck types "asarraylike"
that checks for the actual methods I need:

https://github.com/NOAA-ORR-ERD/gridded/blob/master/gridded/utilities.py

which has:

must_have = ['dtype', 'shape', 'ndim', '__len__', '__getitem__', '
__getattribute__']

def isarraylike(obj):
"""
tests if obj acts enough like an array to be used in gridded.
This should catch netCDF4 variables and numpy arrays, at least, etc.
Note: these won't check if the attributes required actually work right.
"""
for attr in must_have:
if not hasattr(obj, attr):
return False
return True
def asarraylike(obj):
"""
If it satisfies the requirements of pyugrid the object is returned as is.
If not, then numpy's array() will be called on it.
:param obj: The object to check if it's like an array
"""
return obj if isarraylike(obj) else np.array(obj)

It's possible that we could come up with semi-standard "groupings" of
attributes to produce "levels" of compatibility, or maybe not levels, but
independentgroupings, so you could specify which groupings you need in this
instance.

> The name "duck arrays" that we sometimes use doesn't necessarily mean
> "quack like an ndarray" but might actually mean a number of different
> things in different contexts.  Making a single class or predicate for duck
> arrays may not be as effective as we want.  Instead, it might be that we
> need a number of different protocols like `__array_mat_vec__` or 
> `__array_slice__`
> that downstream projects can check instead.  I can imagine cases where I
> want to check only "can I use this thing to multiply against arrays" or
> "can I get numpy arrays out of this thing with numpy slicing" rather than
> "is this thing array-like" because I may genuinely not care about most of
> the functionality in a blessed definition of "array-like".
>

exactly.

but maybe we won't know until we try.

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] improving arange()? introducing fma()?

2018-02-23 Thread Chris Barker

On Fri, Feb 9, 2018 at 1:16 PM, Matthew Harrigan  wrote:

> I apologize if I'm missing something basic, but why are floats being
> accumulated in the first place?  Can't arange and linspace operations with
> floats be done internally similar to `start + np.arange(num_steps) *
> step_size`?  I.e. always accumulate (really increment) integers to limit
> errors.
>

I haven't looked at the arange() code, but linspace does does not
accumulate floats -- which is why it's already almost as good as it can be.
As regards to a fused-multiply-add, it does have to do a single
multiply_add operation for each value (as per your example code), so we may
be able to save a ULP there.

The problem with arange() is that the definition is poorly specified:

start + (step_num * step) while value < stop.

Even without fp issues, it's weird if (stop - start) / step is not an
integer. -- the "final" step will not be the same as the rest.

Say you want a "grid" with fully integer values. if the step is just right,
all is easy:

In [*72*]: np.arange(0, 11, 2)

Out[*72*]: array([ 0,  2,  4,  6,  8, 10])

(this is assuming you want 10 as the end point.

but then:

In [*73*]: np.arange(0, 11, 3)

Out[*73*]: array([0, 3, 6, 9])

but I wanted 10 as an end point. so:

In [*74*]: np.arange(0, 13, 3)

Out[*74*]: array([ 0,  3,  6,  9, 12])

hmm, that's not right either. Of course it's not -- you can't get 10 as an
end point, 'cause it's not a multiple of the step. With integers, you CAN
require that the end point be a multiple of the step, but with fp, you
can't required that it be EXACTLY a multiple, because either the end point
or the step may not be exactly representable, even if you do the math with
no loss of precision. And now you are stuck with the user figuring out for
themselves whether the closest fp representation of the end point is
slightly larger or smaller than the real value, so the < check will work.
NOT good.

This is why arange is simply not the tool to use.

Making a grid, you usually want to specify the end points and the number of
steps which is almost what linspace does. Or, _maybe_ you want to specify
the step and the number of steps, and accept that the end point may not be
exactly what you "expect". There is no built-in function for this in numpy.
maybe there should be, but it's pretty easy to write, as you show above.

Anyone that understands FP better than I do:

In the above code, you are multiplying the step by an integer -- is there
any precision loss when you do that??

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] improving arange()? introducing fma()?

2018-02-22 Thread Chris Barker

On Thu, Feb 22, 2018 at 11:57 AM, Sebastian Berg  wrote:

> > First, you are right...Decimal is not the right module for this. I
> > think instead I should use the 'fractions' module for loading grid
> > spec information from strings (command-line, configs, etc). The
> > tricky part is getting the yaml reader to use it instead of
> > converting to a float under the hood.
>

I'm not sure fractions is any better (Or necessary, anyway) in the end, you
need floats, so the inherent limitations of floats aren't the problem. In
your original use-case, you wanted a 32 bit float grid in the end, so doing
the calculations is 64 bit float and then downcasting is as good as you're
going to get, and easy and fast. And I suppose you could use 128 bit float
if you want to get to 64 bit in the end -- not as easy, and python itself
doesn't have it.

>  The
> tricky part is getting the yaml reader to use it instead of
> converting to a float under the hood.

64 bit floats support about 15 decimal digits -- are you string-based
sources providing more than that?? if not, then the 64 bit float version is
as good as it's going to get.

> Second, what has been pointed out about the implementation of arange
> > actually helps to explain some oddities I have encountered. In some
> > situations, I have found that it was better for me to produce the
> > reversed sequence, and then reverse that array back and use it.
>

interesting -- though I'd still say "don't use arange" is the "correct"
answer.

> > Third, it would be nice to do what we can to improve arange()'s
> > results. Would we be ok with a PR that uses fma() if it is available,
> > but then falls back on a regular multiply and add if it isn't
> > available, or are we going to need to implement it ourselves for
> > consistency?
>

I would think calling fma() if supported would be fine -- if there is an
easy macro to check if it's there. I don't know if numpy has a policy about
this sort of thing, but I'm pretty sure everywhere else, the final details
of computation fal back to the hardware/compiler/library (i.e. Intel used
extended precision fp, other platforms don't, etc) so I can't see that
having a slightly more accurate computation in arange on some platforms and
not others would cause a problem. If any of the tests are checking to that
level of accuracy, they should be fixed :-)

  2. It sounds *not* like a fix, but rather a
>  "make it slightly less bad, but it is still awful"
>

exactly -- better accuracy is a good thing, but it's not the source of the
problem here -- the source of the problem is inherent to FP, and/or poorly
specified goal. having arrange or linspace lose a couple ULPs fewer isn't
going to change anything.

> Using fma inside linspace might make linspace a bit more exact
> possible, and would be a good thing, though I am not sure we have a
> policy yet for something that is only used sometimes,

see above -- nor do I, but it seems like a fine idea to me.

> It also would be nice to add stable summation to numpy in general (in
> whatever way), which maybe is half related but on nobody's specific
> todo list.

I recall a conversation on this list a (long) while back about compensated
summation (Kahan summation) -- I guess nothign ever came of it?

> Lastly, there definitely needs to be a better tool for grid making.
> > The problem appears easy at first, but it is fraught with many
> > pitfalls and subtle issues. It is easy to say, "always use
> > linspace()", but if the user doesn't have the number of pixels, they
> > will need to calculate that using --- gasp! -- floating point
> > numbers, which could result in the wrong answer.

agreed -- this tends to be an inherently over-specified problem:

min_value
max_value
spacing
number_of_grid_spaces

That is four values, and only three independent ones.

arange() looks like it uses: min_value, max_value, spacing -- but it
doesn't really (see previous discussion) so not the right tool for anything.

linspace() uses:  min_value, max_value, (number_of_grid_spaces + 1), which
is about as good as you can get (except for that annoying 1).

But what if you are given min_value, spacing, number_of_grid_spaces?

Maybe we need a function for that?? (which I think would simply be:

np.arange(number_of_grid_spaces + 1) * spacing

Which is why we probably don't need a function :-) (note that that's only
error of one multiplication per grid point)

Or maybe a way to take all four values, and return a "best fit" grid. The
problem with that is that it's over specified, and and it may not be only
fp error that makes it not fit. What should a code do???

So Ben:

What is the problem you are trying to solve?  -- I'm still confused. What
information do you have to define the grid? Maybe all we need are docs for
how to best compute a grid with given specifications? And point to them in
the arange() and linspace() docstrings.

-CHB

I once wanted to add a "step" argument to linspace, but

Re: [Numpy-discussion] improving arange()? introducing fma()?

2018-02-22 Thread Chris Barker

@Ben: Have you found a solution to your problem? Are there thinks we could
do in numpy to make it better?

-CHB


On Mon, Feb 12, 2018 at 9:33 AM, Chris Barker <chris.bar...@noaa.gov> wrote:

> I think it's all been said, but a few comments:
>
> On Sun, Feb 11, 2018 at 2:19 PM, Nils Becker <nilsc.bec...@gmail.com>
> wrote:
>
>> Generating equidistantly spaced grids is simply not always possible.
>>
>
> exactly -- and linspace gives pretty much teh best possible result,
> guaranteeing tha tthe start an end points are exact, and the spacing is
> within an ULP or two (maybe we could make that within 1 ULP always, but not
> sure that's worth it).
>
>
>> The reason is that the absolute spacing of the possible floating point
>> numbers depends on their magnitude [1].
>>
>
> Also that the exact spacing may not be exactly representable in FP -- so
> you have to have at least one space that's a bit off to get the end points
> right (or have the endpoints not exact).
>
>
>> If you - for some reason - want the same grid spacing everywhere you may
>> choose an appropriate new spacing.
>>
>
> well, yeah, but usually you are trying to fit to some other constraint.
> I'm still confused as to where these couple of ULPs actually cause
> problems, unless you are doing in appropriate FP comparisons elsewhere.
>
> Curiously, either by design or accident, arange() seems to do something
>> similar as was mentioned by Eric. It creates a new grid spacing by adding
>> and subtracting the starting point of the grid. This often has similar
>> effect as adding and subtracting N*dx (e.g. if the grid is symmetric around
>> 0.0). Consequently, arange() seems to trade keeping the grid spacing
>> constant for a larger error in the grid size and consequently in the end
>> point.
>>
>
> interesting -- but it actually makes sense -- that is the definition of
> arange(), borrowed from range(), which was designed for integers, and, in
> fact, pretty much mirroered the classic C index for loop:
>
>
> for (int i=0; i<N; i++) {
> ...
>
>
> or in python:
>
> i = start
> while i < stop:
> i += step
>
> The problem here is that termination criteria -- i < stop -- that is the
> definition of the function, and works just fine for integers (where it came
> from), but with FP, even with no error accumulation, stop may not be
> exactly representable, so you could end up with a value for your last item
> that is about (stop-step), or you could end up with a value that is a
> couple ULPs less than step -- essentially including the end point when you
> weren't supposed to.
>
> The truth is, making a floating point range() was simply a bad idea to
> begin with -- it's not the way to define a range of numbers in floating
> point. Whiuch is why the docs now say "When using a non-integer step, such
> as 0.1, the results will often not
> be consistent.  It is better to use ``linspace`` for these cases."
>
> Ben wants a way to define grids in a consistent way -- make sense. And
> yes, sometimes, the original source you are trying to match (like GDAL)
> provides a starting point and step. But with FP, that is simply
> problematic. If:
>
> start + step*num_steps != stop
>
> exactly in floating point, then you'll need to do the math one way or
> another to get what you want -- and I'm not sure anyone but the user knows
> what they want -- do you want step to be as exact as possible, or do you
> want stop to be as exact as possible?
>
> All that being said -- if arange() could be made a tiny bit more accurate
> with fma or other numerical technique, why not? it won't solve the
> problem, but if someone writes and tests the code (and it does not require
> compiler or hardware features that aren't supported everywhere numpy
> compiles), then sure. (Same for linspace, though I'm not sure it's possible)
>
> There is one other option: a new function (or option) that makes a grid
> from a specification of: start, step, num_points. If that is really a
> common use case (that is, you don't care exactly what the end-point is),
> then it might be handy to have it as a utility.
>
> We could also have an arange-like function that, rather than < stop, would
> do "close to" stop. Someone that understands FP better than I might be able
> to compute what the expected error might be, and find the closest end point
> within that error. But I think that's a bad specification -- (stop - start)
> / step may be nowhere near an integer -- then what is the function supposed
> to do??
>
>
> BTW: I kind of wish that linspace specified the number of steps, rather
> than the number of points, that is

Re: [Numpy-discussion] improving arange()? introducing fma()?

2018-02-12 Thread Chris Barker

I think it's all been said, but a few comments:

On Sun, Feb 11, 2018 at 2:19 PM, Nils Becker  wrote:

> Generating equidistantly spaced grids is simply not always possible.
>

exactly -- and linspace gives pretty much teh best possible result,
guaranteeing tha tthe start an end points are exact, and the spacing is
within an ULP or two (maybe we could make that within 1 ULP always, but not
sure that's worth it).

> The reason is that the absolute spacing of the possible floating point
> numbers depends on their magnitude [1].
>

Also that the exact spacing may not be exactly representable in FP -- so
you have to have at least one space that's a bit off to get the end points
right (or have the endpoints not exact).

> If you - for some reason - want the same grid spacing everywhere you may
> choose an appropriate new spacing.
>

well, yeah, but usually you are trying to fit to some other constraint. I'm
still confused as to where these couple of ULPs actually cause problems,
unless you are doing in appropriate FP comparisons elsewhere.

Curiously, either by design or accident, arange() seems to do something
> similar as was mentioned by Eric. It creates a new grid spacing by adding
> and subtracting the starting point of the grid. This often has similar
> effect as adding and subtracting N*dx (e.g. if the grid is symmetric around
> 0.0). Consequently, arange() seems to trade keeping the grid spacing
> constant for a larger error in the grid size and consequently in the end
> point.
>

interesting -- but it actually makes sense -- that is the definition of
arange(), borrowed from range(), which was designed for integers, and, in
fact, pretty much mirroered the classic C index for loop:

for (int i=0; i 1. Comparison to calculations with decimal can be difficult as not all
> simple decimal step sizes are exactly representable as
>
finite floating point numbers.
>

yeah, this is what I mean by inappropriate use of Decimal -- decimal is not
inherently "more accurate" than fp -- is just can represent _decimal_
numbers exactly, which we are all use to -- we want  1 / 10 to be exact,
but dont mind that 1 / 3 isn't.

Decimal also provided variable precision -- so it can be handy for that. I
kinda wish Python had an arbitrary precision binary floating point built
in...

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Re: [Numpy-discussion] improving arange()? introducing fma()?

2018-02-09 Thread Chris Barker

On Wed, Feb 7, 2018 at 12:09 AM, Ralf Gommers 
wrote:
>
>  It is partly a plea for some development of numerically accurate
>> functions for computing lat/lon grids from a combination of inputs: bounds,
>> counts, and resolutions.
>>
>
Can you be more specific about what problems you've run into -- I work with
lat-lon grids all the time, and have never had a problem.

float32 degrees gives you about 1 meter accuracy or better, so I can see
how losing a few digits might be an issue, though I would argue that you
maybe shouldn't use float32 if you are worried about anything close to 1m
accuracy... -- or shift to a relative coordinate system of some sort.

I have been playing around with the decimal package a bit lately,
>>
>
sigh. decimal is so often looked at a solution to a problem it isn't
designed for. lat-lon is natively Sexagesimal -- maybe we need that dtype
:-)

what you get from decimal is variable precision -- maybe a binary variable
precision lib is a better answer -- that would be a good thing to have easy
access to in numpy, but in this case, if you want better accuracy in a
computation that will end up in float32, just use float64.

and I discovered the concept of "fused multiply-add" operations for
>> improved accuracy. I have come to realize that fma operations could be used
>> to greatly improve the accuracy of linspace() and arange().
>>
>
arange() is problematic for non-integer use anyway, by its very definition
(getting the "end point" correct requires the right step, even without FP
error).

and would it really help with linspace? it's computing a delta with one
division in fp, then multiplying it by an integer (represented in fp --
why? why not keep that an integer till the multiply?).

In particular, I have been needing improved results for computing
>> latitude/longitude grids, which tend to be done in float32's to save memory
>> (at least, this is true in data I come across).
>>
>
> If you care about saving memory *and* accuracy, wouldn't it make more
> sense to do your computations in float64, and convert to float32 at the
> end?
>

that does seem to be the easy option :-)


> Now, to the crux of my problem. It is next to impossible to generate a
>> non-trivial numpy array of coordinates, even in double precision, without
>> hitting significant numerical errors.
>>
>
I'm confused, the example you posted doesn't have significant errors...


> Which has lead me down the path of using the decimal package (which
>> doesn't play very nicely with numpy because of the lack of casting rules
>> for it). Consider the following:
>> ```
>> $ cat test_fma.py
>> from __future__ import print_function
>> import numpy as np
>> res = np.float32(0.01)
>> cnt = 7001
>> x0 = np.float32(-115.0)
>> x1 = res * cnt + x0
>> print("res * cnt + x0 = %.16f" % x1)
>> x = np.arange(-115.0, -44.99 + (res / 2), 0.01, dtype='float32')
>> print("len(arange()): %d  arange()[-1]: %16f" % (len(x), x[-1]))
>> x = np.linspace(-115.0, -44.99, cnt, dtype='float32')
>> print("linspace()[-1]: %.16f" % x[-1])
>>
>> $ python test_fma.py
>> res * cnt + x0 = -44.9900015648454428
>> len(arange()): 7002  arange()[-1]:   -44.975044
>> linspace()[-1]: -44.9900016784667969
>> ```
>> arange just produces silly results (puts out an extra element... adding
>> half of the resolution is typically mentioned as a solution on mailing
>> lists to get around arange()'s limitations -- I personally don't do this).
>>
>
The real solution is "don't do that" arange is not the right tool for the
job.

Then there is this:

res * cnt + x0 = -44.9900015648454428
linspace()[-1]: -44.9900016784667969

that's as good as you are ever going to get with 32 bit floats...

Though I just noticed something about your numbers -- there should be a
nice even base ten delta if you have 7001 gaps -- but linspace produces N
points, not N gaps -- so maybe you want:


In [*17*]: l = np.linspace(-115.0, -44.99, 7002)


In [*18*]: l[:5]

Out[*18*]: array([-115.  , -114.99, -114.98, -114.97, -114.96])


In [*19*]: l[-5:]

Out[*19*]: array([-45.03, -45.02, -45.01, -45.  , -44.99])


or, in float32 -- not as pretty:


In [*20*]: l = np.linspace(-115.0, -44.99, 7002, dtype=np.float32)


In [*21*]: l[:5]

Out[*21*]:

array([-115., -114.98999786, -114.98000336, -114.97000122,

   -114.9508], dtype=float32)


In [*22*]: l[-5:]

Out[*22*]: array([-45.02999878, -45.0246, -45.00999832, -45.,
-44.99000168], dtype=float32)


but still as good as you get with float32, and exactly the same result as
computing in float64 and converting:



In [*25*]: l = np.linspace(-115.0, -44.99, 7002).astype(np.float32)


In [*26*]: l[:5]

Out[*26*]:

array([-115., -114.98999786, -114.98000336, -114.97000122,

   -114.9508], dtype=float32)


In [*27*]: l[-5:]

Out[*27*]: array([-45.02999878, -45.0246, -45.00999832, -45.,
-44.99000168], dtype=float32)



>> So, does it make any sense to improve arange by utilizing

Re: [Numpy-discussion] Extending C with Python

2018-01-31 Thread Chris Barker

I'm guessing you could use Cython to make this easier. It's usually used
for calling C from Python, but can do the sandwich in both directions...

Just a thought -- it will help with some of that boilerplate code...

-CHB




On Tue, Jan 30, 2018 at 10:57 PM, Jialin Liu  wrote:

> Amazing! It works! Thank you Robert.
>
> I've been stuck with this many days.
>
> Best,
> Jialin
> LBNL/NERSC
>
> On Tue, Jan 30, 2018 at 10:52 PM, Robert Kern 
> wrote:
>
>> On Wed, Jan 31, 2018 at 3:25 PM, Jialin Liu  wrote:
>>
>>> Hello,
>>> I'm extending C with python (which is opposite way of what people
>>> usually do, extending python with C), I'm currently stuck in passing a C
>>> array to python layer, could anyone plz advise?
>>>
>>> I have a C buffer in my C code and want to pass it to a python function.
>>> In the C code, I have:
>>>
>>> npy_intp  dims [2];
 dims[0] = 10;
 dims[1] = 20;
 import_array();
 npy_intp m=2;
 PyObject * py_dims = PyArray_SimpleNewFromData(1, , NPY_INT16 ,(void
 *)dims ); // I also tried NPY_INT
 PyObject_CallMethod(pInstance, method_name, "O", py_dims);
>>>
>>>
>>> In the Python code, I want to just print that array:
>>>
>>> def f(self, dims):
>>>
>>>print ("np array:%d,%d"%(dims[0],dims[1]))
>>>
>>>
>>>
>>> But it only prints the first number correctly, i.e., dims[0]. The second
>>> number is always 0.
>>>
>>
>> The correct typecode would be NPY_INTP.
>>
>> --
>> Robert Kern
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Setting custom dtypes and 1.14

2018-01-30 Thread Chris Barker

On Mon, Jan 29, 2018 at 7:44 PM, Allan Haldane 
wrote:

> I suggest that if we want to allow either means over fields, or conversion
> of a n-D structured array to an n+1-D regular ndarray, we should add a
> dedicated function to do so in numpy.lib.recfunctions
> which does not depend on the binary representation of the array.
>

IIUC, the core use-case of structured dtypes is binary compatibility with
external systems (arrays of C structs, mostly) -- at least that's how I use
them :-)

In which case, "conversion of a n-D structured array to an n+1-D regular
ndarray" is an important feature -- actually even more important if you
don't use recarrays

So yes, let's have a utility to make that easy.

as for recarrays -- are we that far from having them be robust and useful?
in which case, why not keep them around, fix the few issues, but explicitly
not try to extend them into more dataframe-like domains

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Setting custom dtypes and 1.14

2018-01-26 Thread Chris Barker

On Fri, Jan 26, 2018 at 2:35 PM, Allan Haldane 
wrote:

> As I remember, numpy has some fairly convoluted code for array creation
> which tries to make sense of various nested lists/tuples/ndarray
> combinations. It makes a difference for structured arrays and object
> arrays. I don't remember the details right now, but I know in some cases
> the rule is "If it's a Python list, recurse, otherwise assume it is an
> object array".
>

that's at least explainable, and the "try to figure out what the user
means" array cratinon is pretty much an impossible problem, so what we've
got is probably about as good as it can get.

> > These points make me think that instead of a `.totuple` method, this
> > might be more suitable as a new function in np.lib.recfunctions.
> >
> > I don't seem to have that module -- and I'm running 1.14.0 -- is this a
> > new idea?
>
> Sorry, I didn't specify it correctly. It is "numpy.lib.recfunctions".
>

thanks -- found it.


> Also, the functions in that module encourage "pandas-like" use of
> structured arrays, but I'm not sure they should be used that way. I've
> been thinking they should be primarily used for binary interfaces
> with/to numpy, eg to talk to C programs or to read complicated binary
> files.
>

that's my use-case. And I agree -- if you really want to do that kind of
thing, pandas is the way to go.

I thought recarrays were pretty cool back in the day, but pandas is a much
better option.

So I pretty much only use structured arrays for data exchange with C
code

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Setting custom dtypes and 1.14

2018-01-26 Thread Chris Barker

On Fri, Jan 26, 2018 at 10:48 AM, Allan Haldane 
wrote:

> > What do folks think about a totuple() method — even before this I’ve
> > wanted that. But in this case, it seems particularly useful.
>

> Two thoughts:
>
> 1. `totuple` makes most sense for 2d arrays. But what should it do for
> 1d or 3+d arrays? I suppose it could make the last dimension a tuple, so
> 1d arrays would give a list of tuples of size 1.
>

I was thinking it would be exactly like .tolist() but with tuples -- so
you'd get tuples all the way down (or is that turtles?)

IN this use case, it would have saved me the generator expression:

(tuple(r) for r in arr)

not a huge deal, but it would be nice to not  have to write that, and to
have the looping be in C with no intermediate array generation.

2. structured array's .tolist() already returns a list of tuples. If we
> have a 2d structured array, would it add one more layer of tuples?

no -- why? it would return a tuple of tuples instead.

> That
> would raise an exception if read back in by `np.array` with the same dtype.
>

Hmm -- indeed, if the top-level structure is a tuple, the array constructor
gets confused:

This works fine -- as it should:

In [*84*]: new_full = np.array(full.tolist(), full.dtype)

But this does not:

In [*85*]: new_full = np.array(tuple(full.tolist()), full.dtype)

---

ValueErrorTraceback (most recent call last)

 in ()

> 1 new_full = np.array(tuple(full.tolist()), full.dtype)

ValueError: could not assign tuple of length 4 to structure with 2 fields.

I was hoping it would dig down to the inner structures looking for a match
to the dtype, rather than looking at the type of the top level. Oh well.

So yeah, not sure where you would go from tuple to list -- probably at the
bottom level, but that may not always be unambiguous.

These points make me think that instead of a `.totuple` method, this
> might be more suitable as a new function in np.lib.recfunctions.

I don't seem to have that module -- and I'm running 1.14.0 -- is this a new
idea?

> If the
> goal is to help manipulate structured arrays, that submodule is
> appropriate since it already has other functions do manipulate fields in
> similar ways. What about calling it `pack_last_axis`?
>
> def pack_last_axis(arr, names=None):
> if arr.names:
> return arr
> names = names or ['f{}'.format(i) for i in range(arr.shape[-1])]
> return arr.view([(n, arr.dtype) for n in names]).squeeze(-1)
>
> Then you could do:
>
> >>> pack_last_axis(uv).tolist()
>
> to get a list of tuples.
>

not sure what idea is here -- in my example, I had a regular 2-d array, so
no names:

In [*90*]: pack_last_axis(uv)

---

AttributeErrorTraceback (most recent call last)

 in ()

> 1 pack_last_axis(uv)

 in pack_last_axis(arr, names)

*  1* def pack_last_axis(arr, names=None):

> 2 if arr.names:

*  3* return arr

*  4* names = names or ['f{}'.format(i) for i in range(arr.shape[-1
])]

*  5* return arr.view([(n, arr.dtype) for n in names]).squeeze(-1)

AttributeError: 'numpy.ndarray' object has no attribute 'names'

So maybe you meants something like:

In [*95*]: *def* pack_last_axis(arr, names=None):

...: *try*:

...: arr.names

...: *return* arr

...: *except* *AttributeError*:

...: names = names *or* ['f{}'.format(i) *for* i *in* range
(arr.shape[-1])]

...: *return* arr.view([(n, arr.dtype) *for* n *in*
names]).squeeze(-1)

which does work, but seems like a convoluted way to get tuples!

However, I didn't actually need tuples, I needed something I could pack
into a stuctarray, and this does work, without the tolist:

full = np.array(zip(time, pack_last_axis(uv)), dtype=dt)

So maybe that is the way to go.

I'm not sure I'd have thought to look for this function, but what can you
do?

Thanks for your attention to this,

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Setting custom dtypes and 1.14

2018-01-25 Thread Chris Barker - NOAA Federal

> On Jan 25, 2018, at 4:06 PM, Allan Haldane  wrote:

>> 1) This is a known change with good reason?

> . The
> change occurred because the old assignment behavior was dangerous, and
> was not doing what you thought.

OK, that’s a good reason!

>> A) improve the error message.
>
> Good idea. I'll see if we can do it for 1.14.1.

What do folks think about a totuple() method — even before this I’ve
wanted that. But in this case, it seems particularly useful.

-CHB
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Setting custom dtypes and 1.14

2018-01-25 Thread Chris Barker

Hi all,

I'm pretty sure this is the same thing as recently discussed on this list
about 1.14, but to confirm:

I had failures in my code with an upgrade for 1.14 -- turns out it was a
single line in a single test fixture, so no big deal, but a regression just
the same, with no deprecation warning.

I was essentially doing this:

In [*48*]: dt

Out[*48*]: dtype([('time', ' 1 full = np.array(zip(time, uv), dtype=dt)


ValueError: setting an array element with a sequence.


It took some poking, but the solution was to do:

full = np.array(zip(time, (tuple(w) *for* w *in* uv)), dtype=dt)

That is, convert the values to nested tuples, rather than an array in a
tuple, or a list in a tuple.

As I said, my problem is solved, but to confirm:

1) This is a known change with good reason?

2) My solution was the best (only) one -- the only way to set a nested
dtype like that is with tuples?

If so, then I think we should:

A) improve the error message.

"ValueError: setting an array element with a sequence."

Is not really clear -- I spent a while trying to figure out how I could set
a nested dtype like that without a sequence? and I was actually using a
ndarray, so it wasn't even a generic sequence. And a tuple is a sequence,
too...

I had a vague recollection that in some circumstances, numpy treats tuples
and lists (and arrays) differently (fancy indexing??), so I tried the tuple
thing and that worked. But I've been around numpy a long time -- that could
have been very very confusing to many people.

So could the message be changed to something like:

"ValueError: setting an array element with a generic sequence. Only the
tuple type can be used in this context."

or something like that -- I'm not sure where else this same error message
might pop up, so that could be totally inappropriate.


2) maybe add a .totuple()method to ndarray, much like the .tolist() method?
that would have been handy here.


-Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Using np.frombuffer and cffi.buffer on array of C structs (problem with struct member padding)

2018-01-25 Thread Chris Barker - NOAA Federal

The numpy dtype constructor takes an “align” keyword that will pad it for
you.


https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.dtype.html

-CHB
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Using np.frombuffer and cffi.buffer on array of C structs (problem with struct member padding)

2018-01-25 Thread Chris Barker - NOAA Federal

Sent from my iPhone

> On Jan 25, 2018, at 6:51 AM, Joe  wrote:
>
> Hello,
>
> how I could dynamically handle the dtype of a structured array when reading
> an array of C structs with np.frombuffer (regarding the member padding in the 
> struct).
>
> So far I manually adjusted the dtype of the structured array and added a 
> field for the padding,
> this works on a small scale.
> The structs are not defined within my code, but within a third party and
> basically I am looking for no-worry hassle free way to handle this, because 
> there are a lot of structs
>
> Is there some smart way to do this in Numpy?

The numpy dtype constructor takes an “align” keyword that will pad it for you.

However, if these strict are coming from a lib compiled by a third
party, I’m not sure you can count on the alignment rules being the
same.

So maybe you will need to use the cffi functions :-(

-CHB


>
> So far the best approach seems to parse the struct with the cffi functions
> ffi.sizeof(), ffi.offsetof() and maybe ffi.alignof() to find out where the 
> padding
> happens and add it dynamically to the dtype. But maybe someone has a
> smarter idea how to solve this.
>
> You can find a more detailed description and a working example here:
>
> https://stackoverflow.com/questions/48423725/how-to-handle-member-padding-in-struct-when-reading-cffi-buffer-with-numpy-fromb
>
> Kind regards,
> Joe
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] NumPy 1.14.0rc1 release

2017-12-14 Thread Chris Barker - NOAA Federal

Thanks Chuck!

And a huge thanks to that awesome list of contributors!!!

-Chris

Sent from my iPhone

On Dec 13, 2017, at 2:55 PM, Charles R Harris 
wrote:

Hi All,

On behalf of the NumPy team, I am pleased to announce NumPy 1.14.0rc1. Numpy
1.14.0rc1 is the result of seven months of work and contains a large number
of bug fixes and new features, along with several changes with potential
compatibility issues. The major change that users will notice are the
stylistic changes in the way numpy arrays and scalars are printed, a change
that will affect doctests. See the release notes for details on how to
preserve the old style printing when needed.

A major decision affecting future development concerns the schedule for
dropping Python 2.7 support in the runup to 2020. The decision has been
made to support 2.7 for all releases made in 2018, with the last release
being designated a long term release with support for bug fixes extending
through the end of 2019. Starting from January, 2019 support for 2.7 will
be dropped in all new releases. More details can be found in the relevant
NEP

.

This release supports Python 2.7 and 3.4 - 3.6. Wheels for the pre-release
are available on PyPI. Source tarballs, zipfiles, release notes, and the
changelog are available on github
.


*Highlights*

   - The ``np.einsum`` function uses BLAS when possible

   - ``genfromtxt``, ``loadtxt``, ``fromregex`` and ``savetxt`` can now
   handle files with arbitrary Python supported encoding.

   - Major improvements to printing of NumPy arrays and scalars.


*New functions*

   - ``parametrize``: decorator added to numpy.testing


   - ``chebinterpolate``: Interpolate function at Chebyshev points.


   - ``format_float_positional`` and ``format_float_scientific`` : format
   floating-point scalars unambiguously with control of rounding and padding.


   - ``PyArray_ResolveWritebackIfCopy`` and
   ``PyArray_SetWritebackIfCopyBase``, new C-API functions useful in achieving
   PyPy compatibity.


*Contributors*

A total of 101 people contributed to this release.  People with a "+" by
their names contributed a patch for the first time.

   - Alexey Brodkin +
   - Allan Haldane
   - Andras Deak +
   - Andrew Lawson +
   - Antoine Pitrou
   - Bernhard M. Wiedemann +
   - Bob Eldering +
   - Brandon Carter
   - CJ Carey
   - Charles Harris
   - Chris Lamb
   - Christoph Boeddeker +
   - Christoph Gohlke
   - Daniel Hrisca +
   - Daniel Smith
   - Danny Hermes
   - David Freese
   - David Hagen
   - David Linke +
   - David Schaefer +
   - Dillon Niederhut +
   - Egor Panfilov +
   - Emilien Kofman
   - Eric Wieser
   - Erik Bray +
   - Erik Quaeghebeur +
   - Garry Polley +
   - Gunjan +
   - Henke Adolfsson +
   - Hidehiro NAGAOKA +
   - Hong Xu +
   - Iryna Shcherbina +
   - Jaime Fernandez
   - James Bourbeau +
   - Jamie Townsend +
   - Jarrod Millman
   - Jean Helie +
   - Jeroen Demeyer +
   - John Goetz +
   - John Kirkham
   - John Zwinck
   - Jonathan Helmus
   - Joseph Fox-Rabinovitz
   - Joseph Paul Cohen +
   - Joshua Leahy +
   - Julian Taylor
   - Jörg Döpfert +
   - Keno Goertz +
   - Kevin Sheppard +
   - Kexuan Sun +
   - Konrad Kapp +
   - Kristofor Maynard +
   - Licht Takeuchi +
   - Loïc Estève
   - Lukas Mericle +
   - Marten van Kerkwijk
   - Matheus Portela +
   - Matthew Brett
   - Matti Picus
   - Michael Lamparski +
   - Michael Odintsov +
   - Michael Schnaitter +
   - Michael Seifert
   - Mike Nolta
   - Nathaniel J. Smith
   - Nelle Varoquaux +
   - Nicholas Del Grosso +
   - Nico Schlömer +
   - Oleg Zabluda +
   - Oleksandr Pavlyk
   - Pauli Virtanen
   - Pim de Haan +
   - Ralf Gommers
   - Robert T. McGibbon +
   - Roland Kaufmann
   - Sebastian Berg
   - Serhiy Storchaka +
   - Shitian Ni +
   - Spencer Hill +
   - Srinivas Reddy Thatiparthy +
   - Stefan Winkler +
   - Stephan Hoyer
   - Steven Maude +
   - SuperBo +
   - Thomas Köppe +
   - Toon Verstraelen
   - Vedant Misra +
   - Warren Weckesser
   - Wirawan Purwanto +
   - Yang Li +
   - Ziyan +
   - chaoyu3 +
   - hemildesai +
   - k_k...@yahoo.com +
   - nickdg +
   - orbit-stabilizer +
   - schnaitterm +
   - solarjoe
   - wufangjie +
   - xoviat +
   - Élie Gouzien +

Enjoy,

Chuck

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] NEP process update

2017-12-07 Thread Chris Barker

Great idea -- thanks for pushing this forward all.

In the end, you can have the NEPs in a separate repo, and still publish
them closely with the main docs (intersphinx is pretty cool), or have them
in the same repo and publish them separately.

So I say let the folks doing the work decide what workflow works best for
them.

Comments on a couple other points:

I find myself going back to PEPs quite a bit -- mostly to understand the
hows an whys of a feature, rather than the how-to-use its.

And yes -- we should keep NEPs updated -- they certainly should be edited
for typos and minor clarifications, but It's particularly important if the
implementation ends up differing a bit from what was expected when the NEP
was written.

I'm not sure what the PEP policy is about this, but they are certainly
maintained with regard to typos and the like.

-CHB

On Wed, Dec 6, 2017 at 10:43 AM, Charles R Harris  wrote:

>
>
> On Wed, Dec 6, 2017 at 7:23 AM, Marten van Kerkwijk <
> m.h.vankerkw...@gmail.com> wrote:
>
>> Would be great to have structure, and especially a template - ideally,
>> the latter is enough for someone to create a NEP, i.e., has lots of
>> in-template documentation.
>>
>> One thing I'd recommend thinking a little about is to what extend a
>> NEP is "frozen" after acceptance. In astropy we've seen situations
>> where it helps to clarify details later, and it may be good to think
>> beforehand what one wants. In my opinion, one should allow
>> clarifications of accepted NEPs, and major editing of ones still
>> pending (as happened for __[numpy|array]_ufunc__).
>>
>> I think the location is secondary, but for what it is worth, I'm not
>> fond of the astropy APEs being in a separate repository, mostly
>> because I like detailed discussion of everything related in the
>> project to happen in one place on github. Also, having to clone a
>> repository is yet another hurdle for doing stuff. So, I'd suggest to
>> keep the NEPs in the main repository.
>
>
> +1
>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Deprecate matrices in 1.15 and remove in 1.17?

2017-12-01 Thread Chris Barker

On Thu, Nov 30, 2017 at 11:58 AM, Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> Your point about not doing things in the python 2->3 move makes sense;
>

But this is NOT the 2->3 move -- numpy as been py3 compatible for years. At
some point, it is a really good idea to deprecate some things.

Personally, I think Matrix should have been deprecated a good while ago --
it never really worked well, and folks have been advised not to use it for
years. But anyway, once we can count on having @ then there really is no
reason to have Matrix, so it happens that dropping py2 support is the first
time we can count on that. But this is really deprecating something when we
stop support for py < 3.5, not the py2 to py3 transition.

Remember that deprecating is different than dropping. If we want to keep
Matrix around for one release after py2 is dropped, so that people can use
it once they are "forced" to move to py3, OK, but let's get clear
deprecation plan in place.

Also -- we aren't requiring people to move to py3 -- we are only requiring
people to move to py3 if they want the latest numpy features.

One last note: Guido's suggestion that libraries not take py3 as an
opportunity to change APIs was a good one, but it was also predicated on
the fact that py2+p3 support was going to be needed for a good while. So
this is really a different case. It's really a regular old deprecation --
you aren't going to have this feature in future numpy releases -- py2/3 has
nothing to do with it.

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Type annotations for NumPy

2017-11-28 Thread Chris Barker - NOAA Federal

(a) it would be good if NumPy type annotations could include an
“array_like” type that allows lists, tuples, etc.


I think that would be a sequence — already supported by the Typing system.

(b) I’ve always thought (since PEP561) that it would be cool for type
annotations to replace compiler type annotations for e.g. Cython and Numba.
Is this in the realm of possibility for the future?


Well, this was brought up early in the Typing discussion, and it was made
clear that these kinds of truly static types, as needed by Cython, was a
non-goal of the project.

That being said, perhaps it could be made to work with a bunch of
additional type objects.

And we should lol lol to Cython for ideas about how to type numpy arrays.

One note: in addition to shape (rank) and types, there is contiguous and C
or F order. That may want to be considered.

-CHB
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Type annotations for NumPy

2017-11-28 Thread Chris Barker - NOAA Federal

On Nov 25, 2017, at 3:35 PM, Matthew Rocklin  wrote:

Thoughts on basing this on a more generic Array type rather than the
np.ndarray?


This would actually be more consistent with the current python typing
approach.

I can imagine other nd-array libraries (XArray, Tensorflow, Dask.array)
wanting to reuse this work.


It may be tough to come up with the right ABC though— see another recent
thread on this list.

-CHB
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support

2017-11-17 Thread Chris Barker

On Fri, Nov 17, 2017 at 4:35 AM, Peter Cock 
wrote:

> Since Konrad Hinsen no longer follows the NumPy discussion list
> for lack of time, he has not posted here - but he has commented
> about this on Twitter and written up a good blog post:
>
> http://blog.khinsen.net/posts/2017/11/16/a-plea-for-
> stability-in-the-scipy-ecosystem/
>
> In a field where scientific code is expected to last and be developed
> on a timescale of decades, the change of pace with Python 2 and 3
> is harder to handle.
>

sure -- but I do not get what the problem is here!

from his post:

"""
The disappearance of Python 2 will leave much scientific software orphaned,
and many published results irreproducible.
"""

This is an issue we should all be concerned about, and, in fact, the scipy
community has been particularly active in the reproducibility realm.

BUT: that statement makes NO SENSE. dropping Python2 support in numpy (or
any other package) means that newer versions of numpy will not run on py2
-- but if you want to reproduce results, you need to run the code WITH THE
VERSION THAT WAS USED IN THE  FIRST PLACE.

So if someone publishes something based on code written in python2.7 and
numpy 1.13, then it is not helpful for reproducibility at all for numpy
1.18 (or 2.*, or whatever we call it) to run on python2. So there is no
issue here.

Potential issues will arise post 2020, when maybe python2.7 (and numpy
1.13) will no longer run on an up to date OS. But the OS vendors do a
pretty good job of backward compatibility -- so we've got quite a few years
to go on that.

And it will also be important that older versions of packages are available
-- but as long as we don't delete the archives, that should be the case for
a good long while.

So not sure what the problem is here.

note relevant for reproducibility,but I have always been puzzled that folks
often desperately want to run the very latest numpy on an old Python (2.6,
1.5, ) if you can update your numy, update your darn Python too!

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support

2017-11-13 Thread Chris Barker

On Fri, Nov 10, 2017 at 2:03 PM, Robert McLeod  wrote:

> Pip repo names and actual module names don't have to be the same.  One
> potential work-around would be to make a 'numpylts' repo on PyPi which is
> the 1.17 version with support for Python 2.7 and bug-fix releases as
> required.  This will still cause regressions but it's a matter of modifying
> `requirements.txt` in downstream Python 2.7 packages and not much else.
>
> E.g. in `requirements.txt`:
>
> numpy;python_version>"3.0"
> numpylts; python_version<"3.0"
>

Can't we handle this with numpy versioning?

IIUC, numpy (py3 only) and numpy (LTS) will not only support different
platforms, but also be different versions. So if you have py2 or py2+3 code
that uses numpy, it will have to specify a <= version number anyway.

Also -- I think Nathaniel's point was that wheels have the python version
baked in, so pip, when run from py2, should find the latest py2 compatible
numpy automagically.

And thanks for writing this up -- LGTM

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support

2017-11-08 Thread Chris Barker

On Wed, Nov 8, 2017 at 11:08 AM, Julian Taylor <
jtaylor.deb...@googlemail.com> wrote:

>
> Would dropping python2 support for windows earlier than the other
> platforms a reasonable approach?
>

no. I'm not Windows fan myself, but it is a HUGE fraction of the userbase.

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-07 Thread Chris Barker

On Mon, Nov 6, 2017 at 4:28 PM, Stephan Hoyer  wrote:

>
>> What's needed, though, is not just a single ABC. Some thought and design
>> needs to go into segmenting the ndarray API to declare certain behaviors,
>> just like was done for collections:
>>
>> https://docs.python.org/3/library/collections.abc.html
>>
>> You don't just have a single ABC declaring a collection, but rather "I am
>> a mapping" or "I am a mutable sequence". It's more of a pain for developers
>> to properly specify things, but this is not a bad thing to actually give
>> code some thought.
>>
>
> I agree, it would be nice to nail down a hierarchy of duck-arrays, if
> possible. Although, there are quite a few options, so I don't know how
> doable this is.
>

Exactly -- there are an exponential amount of options...

> Well, to get the ball rolling a bit, the key thing that matplotlib needs
> to know is if `shape`, `reshape`, 'size', broadcasting, and logical
> indexing is respected. So, I see three possible abc's here: one for
> attribute access (things like `shape` and `size`) and another for shape
> manipulations (broadcasting and reshape, and assignment to .shape).

I think we're going to get into an string of ABCs:

ArrayLikeForMPL_ABC

etc, etc.

> And then a third abc for indexing support, although, I am not sure how
> that could get implemented...

This is the really tricky one -- all ABCs really check is the existence of
methods -- making sure they behave the same way is up to the developer of
the ducktype.

which is K, but will require discipline.

But indexing, specifically fancy indexing, is another matter -- I'm not
sure if there even a way with an ABC to check for what types of indexing
are support, but we'd still have the problem with whether the semantics are
the same!

For example, I work with netcdf variable objects, which are partly
duck-typed as ndarrays, but I think n-dimensional fancy indexing works
differently... how in the world do you detect that with an ABC???

For the shapes and reshaping, I wrote an ShapedLikeNDArray mixin/ABC
> for astropy, which may be a useful starting point as it also provides
> a way to implement the methods ndarray uses to reshape and get
> elements: see
> https://github.com/astropy/astropy/blob/master/astropy/utils/misc.py#L863

Sounds like a good starting point for discussion.

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support

2017-11-07 Thread Chris Barker

On Mon, Nov 6, 2017 at 6:14 PM, Charles R Harris 
wrote:

> Also -- if py2.7 continues to see the use I expect it will well past when
>>> pyton.org officially drops it, I wouldn't be surprised if a Python2.7
>>> Windows build based on a newer compiler would come along -- perhaps by
>>> Anaconda or conda-forge, or ???
>>>
>>
>> I suspect that this will indeed happen. I am aware of multiple companies
>> following this path already (building python + numpy themselves with a
>> newer MS compiler).
>>
>
> I think Anaconda is talking about distributing a compiler, but what that
> will be on windows is anyone's guess. When we drop 2.7, there is a lot of
> compatibility crud that it would be nice to get rid of, and if we do that
> then NumPy will no longer compile against 2.7. I suspect some companies
> have just been putting off the task of upgrading to Python 3, which should
> be pretty straight forward these days apart from system code that needs to
> do a lot of work with bytes.
>

I agree, and if there is a compelling reason to upgrade, folks WILL do it.
But I've been amazed over the years at folks' desire to stick with what
they have! And I'm guilty too, anything new I start with py3, but older
larger codebases are still py2, I just can't find the energy to spend a the
week or so it would probably take to update everything...

But in the original post, the Windows Compiler issue was mentioned, so
there seems to be two reasons to drop py2:

A) wanting to use py3 only features.
B) wanting to use never C (C++?) compiler features.

I suggest we be clear about which of these is driving the decisions, and
explicit about the goals. That is, if (A) is critical, we don't even have
to talk about (B)

But we could choose to do (B) without doing (A) -- I suspect there will be
a user base for that

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposal of timeline for dropping Python 2.7 support

2017-11-06 Thread Chris Barker

On Sun, Nov 5, 2017 at 10:25 AM, Charles R Harris  wrote:

>  the timeline I've been playing with is to keep Python 2.7 support through
> 2018, which given our current pace, would be for NumPy 1.15 and 1.16. After
> that 1.16 would become a long term support release with backports of
> critical bug fixes
>

+1

I think py2.7 is going to be around for a long time yet -- which means we
really do want to keep the long term support -- which may be quite some
time. But that's doesn't mean people insisting on no upgrading PYthon need
to get the latest and greatest numpy.

Also -- if py2.7 continues to see the use I expect it will well past when
pyton.org officially drops it, I wouldn't be surprised if a Python2.7
Windows build based on a newer compiler would come along -- perhaps by
Anaconda or conda-forge, or ???

If that happens, I suppose we could re-visit 2.7 support. Though it sure
would be nice to clean up the dang Unicode stuff for good, too!

In short, if it makes it easier for numpy to move forward, let's do it!

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-06 Thread Chris Barker

On Sat, Nov 4, 2017 at 6:47 AM, Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

>
> You just summarized excellently why I'm on a quest to change `asarray`
> to `asanyarray` within numpy

+1 -- we should all be using asanyarray() most of the time. However a
couple notes:

asarray() pre-dates asanyarray() by a LOT. asanyarray was added to better
handle subclasses, but there is a lot of legacy code out there.

An legacy coders -- I know that I still usually use asarray without
thinking about it -- sorry!

Obviously, this covers only ndarray
> subclasses, not duck types, though I guess in principle one could use
> the ABC registration mechanism mentioned above to let those types pass
> through.
>

The trick there is that what does it mean to be duck-typed to an ndarray?
For may applications its' critical that the C API be the same, so
duck-typing doesn't really apply.

And in other cases, in only needs to support a small portion of the numpy
API. IS essence, there are an almost infinite number of possible ABCs for
an ndarray...

For my part, I've been known to write custom "array_like" code -- it checks
for the handful of methods I know I need to use, and tI test it against the
small handful of duck-typed arrays that I know I want my code to work with.

Klunky, and maybe we could come up with a standard way to do it and include
that in numpy, but I'm not sure that ABCs are the way to do it.

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] List comprehension and loops performances with NumPy arrays

2017-10-10 Thread Chris Barker - NOAA Federal

Andrea,

One note: transposing is almost free — it just rearranges the strides —
I.e. changed how the array is interpreted. It doesn’t actually move the
data around.

-CHB

Sent from my iPhone

On Oct 7, 2017, at 2:58 AM, Andrea Gavana  wrote:

Apologies, correct timeit code this time (I had gotten the wrong shape for
the output matrix in the loop case):

if __name__ == '__main__':

repeat = 1000
items = [Item('item_%d'%(i+1)) for i in xrange(500)]

output = numpy.asarray([item.do_something() for item in items]).T
statements = ['''
  output = numpy.asarray([item.do_something() for item in
items]).T
  ''',
  '''
  output = numpy.empty((8, 500))
  for i, item in enumerate(items):
  output[:, i] = item.do_something()
  ''']

methods = ['List Comprehension', 'Empty plus Loop   ']
setup  = 'from __main__ import numpy, items'

for stmnt, method in zip(statements, methods):

elapsed = timeit.repeat(stmnt, setup=setup, number=1, repeat=repeat)
minv, maxv, meanv = min(elapsed), max(elapsed), numpy.mean(elapsed)
elapsed.sort()
best_of_3 = numpy.mean(elapsed[0:3])
result = numpy.asarray((minv, maxv, meanv, best_of_3))*repeat

print method, ': MIN: %0.2f ms , MAX: %0.2f ms , MEAN: %0.2f ms ,
BEST OF 3: %0.2f ms'%tuple(result.tolist())


Results are the same as before...



On 7 October 2017 at 11:52, Andrea Gavana  wrote:

> Hi All,
>
> I have this little snippet of code:
>
> import timeit
> import numpy
>
> class Item(object):
>
> def __init__(self, name):
>
> self.name = name
> self.values = numpy.random.rand(8, 1)
>
> def do_something(self):
>
> sv = self.values.sum(axis=0)
> array = numpy.empty((8, ))
> f = numpy.dot(0.5*numpy.ones((8, )), self.values)[0]
> array.fill(f)
> return array
>
>
> In my real application, the method do_something does a bit more than that,
> but I believe the snippet is enough to start playing with it. What I have
> is a list of (on average) 500-1,000 classes Item, and I am trying to
> retrieve the output of do_something for each of them in a single, big 2D
> numpy array.
>
> My current approach is to use list comprehension like this:
>
> output = numpy.asarray([item.do_something() for item in items]).T
>
> (Note: I need the transposed of that 2D array, always).
>
> But then I though: why not preallocating the output array and make a
> simple loop:
>
> output = numpy.empty((500, 8))
> for i, item in enumerate(items):
> output[i, :] = item.do_something()
>
>
> I was expecting this version to be marginally faster - as the previous one
> has to call asarray and then transpose the matrix, but I was in for a
> surprise:
>
> if __name__ == '__main__':
>
> repeat = 1000
> items = [Item('item_%d'%(i+1)) for i in xrange(500)]
>
> statements = ['''
>   output = numpy.asarray([item.do_something() for item in
> items]).T
>   ''',
>   '''
>   output = numpy.empty((500, 8))
>   for i, item in enumerate(items):
>   output[i, :] = item.do_something()
>   ''']
>
> methods = ['List Comprehension', 'Empty plus Loop   ']
>
> setup  = 'from __main__ import numpy, items'
>
> for stmnt, method in zip(statements, methods):
>
> elapsed = timeit.repeat(stmnt, setup=setup, number=1,
> repeat=repeat)
> minv, maxv, meanv = min(elapsed), max(elapsed), numpy.mean(elapsed)
> elapsed.sort()
> best_of_3 = numpy.mean(elapsed[0:3])
> result = numpy.asarray((minv, maxv, meanv, best_of_3))*repeat
>
> print method, ': MIN: %0.2f ms , MAX: %0.2f ms , MEAN: %0.2f ms ,
> BEST OF 3: %0.2f ms'%tuple(result.tolist())
>
>
> I get this:
>
> List Comprehension : MIN: 7.32 ms , MAX: 9.13 ms , MEAN: 7.85 ms , BEST OF
> 3: 7.33 ms
> Empty plus Loop: MIN: 7.99 ms , MAX: 9.57 ms , MEAN: 8.31 ms , BEST OF
> 3: 8.01 ms
>
>
> Now, I know that list comprehensions are renowned for being insanely fast,
> but I though that doing asarray plus transpose would by far defeat their
> advantage, especially since the list comprehension is used to call a
> method, not to do some simple arithmetic inside it...
>
> I guess I am missing something obvious here... oh, and if anyone has
> suggestions about how to improve my crappy code (performance wise), please
> feel free to add your thoughts.
>
> Thank you.
>
> Andrea.
>
>
>
>
>
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] converting list of int16 values to bitmask and back to list of int32\float values

2017-09-19 Thread Chris Barker

not sure what you are getting from:

Modbus.read_input_registers()

but if it is a binary stream then you can put it all in one numpy array
(probably type uint8 (byte)).

then you can manipulate the type with arr.astype() and arr.byteswap()

astype will tell numpy to interpret the same block of data as a different
type.

You also may be able to create the array with np.fromstring() or
np.frombuffer() in the fisrst place.

-CHB





On Thu, Sep 14, 2017 at 10:11 AM, Nissim Derdiger 
wrote:

> Hi all!
>
> I'm writing a Modbus TCP client using *pymodbus3* library.
>
> When asking for some parameters, the response is always a list of int16.
>
> In order to make the values usable, I need to transfer them into 32bit
> bites, than put them in the correct order (big\little endian wise), and
> then to cast them back to the desired format (usually int32 or float)
>
> I've solved it with a pretty naïve code, but I'm guessing there must be a
> more elegant and fast way to solve it with NumPy.
>
> Your help would be very much appreciated!
>
> Nissim.
>
>
>
> My code:
>
> def Read(StartAddress, NumOfRegisters, FunctionCode,ParameterType,
> BitOrder):
>
> # select the Parameters format
>
> PrmFormat = 'f' # default is float
>
> if ParameterType == 'int':
>
> PrmFormat = 'i'
>
> # select the endian state - maybe move to the connect
> function?
>
> endian = '
> if BitOrder == 'little':
>
> endian = '>I'
>
> # start asking for the payload
>
> payload = None
>
> while payload == None:
>
> payload = Modbus.read_input_registers(StartAddress,
> NumOfRegisters)
>
>  parse the answer
>
> ResultRegisters = []
>
> # convert the returned registers from list of int16 to
> list of 32 bits bitmaks
>
> for reg in range(int(NumOfRegisters / 2)):
>
> ResultRegisters[reg] =
> struct.pack(endian, payload.registers[2 * reg]) +
> struct.pack(endian,payload.registers[2 * reg + 1])
>
> # convert this list to a list with the real parameter
> format
>
> for reg in range(len(ResultRegisters)):
>
> ResultRegisters[reg]=
> struct.unpack(PrmFormat,ResultRegisters(reg))
>
> # return results
>
> return ResultRegisters
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Only integer scalar arrays can be converted to a scalar index

2017-09-15 Thread Chris Barker - NOAA Federal

No thoughts on optimizing memory, but that indexing error probably comes
from np.mean producing float results. An astype call shoulder that work.

-CHB

Sent from my iPhone

On Sep 15, 2017, at 5:51 PM, Robert McLeod  wrote:


On Fri, Sep 15, 2017 at 2:37 PM, Elliot Hallmark 
wrote:

> Nope. Numpy only works on in memory arrays. You can determine your own
> chunking strategy using hdf5, or something like dask can figure that
> strategy out for you. With numpy you might worry about not accidentally
> making duplicates or intermediate arrays, but that's the extent of memory
> optimization you can do in numpy itself.
>

NumPy does have it's own memory map variant on ndarray:

https://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.html



-- 
Robert McLeod, Ph.D.
robbmcl...@gmail.com
robbmcl...@protonmail.com

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] quantile() or percentile()

2017-08-14 Thread Chris Barker

+1 on quantile()

-CHB


On Sun, Aug 13, 2017 at 6:28 AM, Charles R Harris  wrote:

>
>
> On Thu, Aug 10, 2017 at 3:08 PM, Eric Wieser 
> wrote:
>
>> Let’s try and keep this on topic - most replies to this message has been
>> about #9211, which is an orthogonal issue.
>>
>> There are two main questions here:
>>
>>1. Would the community prefer to use np.quantile(x, 0.25) instead of 
>> np.percentile(x,
>>25), if they had the choice
>>2. Is this desirable enough to justify increasing the API surface?
>>
>> The general consensus on the github issue answers yes to 1, but is
>> neutral on 2. It would be good to get more opinions.
>>
>
> I think a quantile function would be natural and desirable.
>
> 
>
> Chuck
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] pytest and degrees of separation.

2017-07-11 Thread Chris Barker

On Tue, Jul 11, 2017 at 5:04 PM, Thomas Caswell  wrote:

> Going with option 2 is probably the best option so that you can use pytest
> fixtures and parameterization.
>

I agree -- those are worth a lot!

-CHB



> Might be worth looking at how Matplotlib re-arranged things on our master
> branch to maintain back-compatibility with nose-specific tools that were
> used by down-stream projects.
>
> Tom
>
> On Tue, Jul 11, 2017 at 4:22 PM Sebastian Berg 
> wrote:
>
>> On Tue, 2017-07-11 at 14:49 -0600, Charles R Harris wrote:
>> > Hi All,
>> >
>> > Just looking for opinions and feedback on the need to keep NumPy from
>> > having a hard nose/pytest dependency. The options as I see them are:
>> >
>> > pytest is never imported until the tests are run -- current practice
>> > with nose
>> > pytest is never imported unless the testfiles are imported -- what I
>> > would like
>> > pytest is imported together when numpy is -- what we need to avoid.
>> > Currently the approach has been 1), but I think 2) makes more sense
>> > and allows more flexibility.
>>
>>
>> I am not quite sure about everything here. My guess is we can do
>> whatever we want when it comes to our own tests, and I don't mind just
>> switching everything to pytest (I for one am happy as long as I can run
>> `runtests.py` ;)).
>> When it comes to the utils we provide, those should keep working
>> without nose/pytest if they worked before without it I think.
>>
>> My guess is that all your options do that, so I think we should take
>> the one that gives the nicest maintainable code :). Though can't say I
>> looked enough into it to really make a well educated decision, that
>> probably means your option 2.
>>
>> - Sebastian
>>
>>
>>
>> > Thoughts?
>> > Chuck
>> > ___
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@python.org
>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] record data previous to Numpy use

2017-07-06 Thread Chris Barker

On Thu, Jul 6, 2017 at 10:55 AM,  wrote:
>
> It's is just a reflexion, but for huge files one solution might be to
> split/write/build first the array in a dedicated file (2x o(n) iterations -
> one to identify the blocks size - additional one to get and write), and
> then to load it in memory and work with numpy -
>

I may have your use case confused, but if you have a huge file with
multiple "blocks" in it, there shouldn't be any problem with loading it in
one go -- start at the top of the file and load one block at a time
(accumulating in a list) -- then you only have the memory overhead issues
for one block at a time, should be no problem.

at this stage the dimension is known and some packages will be fast and
> more adapted (pandas or astropy as suggested).
>
pandas at least is designed to read variations of CSV files, not sure you
could use the optimized part to read an array out of part of an open file
from a particular point or not.

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Making a 1.13.2 release

2017-07-06 Thread Chris Barker

On Thu, Jul 6, 2017 at 6:10 AM, Charles R Harris 
wrote:

> I've delayed the NumPy 1.13.2 release hoping for Python 3.6.2 to show up
> fixing #29943   so we can close #9272
> , but the Python release has
> been delayed to July 11 (expected). The Python problem means that NumPy
> compiled with Python 3.6.1 will not run in Python 3.6.0.
>

If it's compiled against 3.6.0 will it work fine with 3.6.1? and probably
3.6.2 as well?

If so, it would be nice to do it that way, if Matthew doesn't mind :-)

But either way, it'll be good to get it out.

Thanks!

-CHB



> However, I've also been asked to have a bugfixed version of 1.13 available
> for Scipy 2017 next week. At this point it looks like the best thing to do
> is release 1.13.1 compiled with Python 3.6.1 and ask folks to upgrade
> Python if they have a problem, and then release 1.13.2 as soon as 3.6.2 is
> released.
>
> Thoughts?
>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] record data previous to Numpy use

2017-07-06 Thread Chris Barker

OK, you have two performance "issues"

1) memory use: IF yu need to read a file to build a numpy array, and dont
know how big it is when you start,  you need to accumulate the values
first, and then make an array out of them. And numpy arrays are fixed size,
so they can not efficiently accumulate values.

The usual way to handle this is to read the data into a list with .append()
or the like, and then make an array from it. This is quite fast -- lists
are fast and efficient for extending arrays. However, you are then storing
(at least) a pointer and a python float object for each value, which is a
lot more memory than a single float value in a numpy array, and you need to
make the array from it, which means you have the full list and all its
pyton floats AND the array in memory at once.

Frankly, computers have a lot of memory these days, so this is a non-issue
in most cases.

Nonetheless, a while back I wrote an extendable numpy array object to
address just this issue. You can find the code on gitHub here:

https://github.com/PythonCHB/NumpyExtras/blob/master/numpy_extras/accumulator.py

I have not tested it with recent numpy's but I expect is still works fine.
It's also py2, but wouldn't take much to port.

In practice, it uses less memory that the "build a list, then make it into
an array", but isnt any faster, unless you add (.extend) a bunch of values
at once, rather than one at a time. (if you do it one at a time, the whole
python float to numpy float conversion, and function call overhead takes
just as long).

But it will, generally be as fast or faster than using  a list, and use
less memory, so a fine basis for a big ascii file reader.

However, it looks like while your files may be huge, they hold a number of
arrays, so each array may not be large enough to bother with any of this.

2) parsing and converting overhead -- for the most part, python/numpy text
file reading code read the text into a python string, converts it to python
number objects, then puts them in a list or converts them to native numbers
in an array. This whole process is a bit slow (though reading files is slow
anyway, so usually not worth worrying about, which is why the built-in file
reading methods do this). To improve this, you need to use code that reads
the file and parses it in C, and puts it straight into a numpy array
without passing through python. This is what the pandas (and I assume
astropy) text file readers do.

But if you don't want those dependencies, there is the "fromfile()"
function in numpy -- it is not very robust, but if you files are
well-formed, then it is quite fast. So your code would look something like:

with open(the_filename) as infile:
while True:
line = infile.readline()
if not line:
break
# work with line to figure out the next block
if ready_to_read_a_block:
arr = np.fromfile(infile, dtype=np.int32, count=num_values,
sep=' ')
# sep specifies that you are reading text, not binary!
arr.shape = the_shape_it_should_be

But Robert is right -- get it to work with the "usual" methods -- i.e. put
numbers in a list, then make an array out it -- first, and then worry about
making it faster.

-CHB

On Thu, Jul 6, 2017 at 1:49 AM,  wrote:

> Dear All
>
>
> First of all thanks for the answers and the information’s (I’ll ding into
> it) and let me trying to add comments on what I want to :
>
>1. My asci file mainly contains data (float and int) in a single column
>2. (it is not always the case but I can easily manage it – as well I
>saw I can use ‘spli’ instruction if necessary)
>3. Comments/texts indicates the beginning of a bloc immediately
>followed by the number of sub-blocs
>4. So I need to read/record all the values in order to build a matrix
>before working on it (using Numpy & vectorization)
>   - The columns 2 and 3 have been added for further treatments
>   - The ‘0’ values will be specifically treated afterward
>
>
> Numpy won’t be a problem I guess (I did some basic tests and I’m quite
> confident) on how to proceed, but I’m really blocked on data records … I
> trying to find a way to efficiently read and record data in a matrix:
>
>- avoiding dynamic memory allocation (here using ‘append’ in python
>meaning, not np),
>- dealing with huge asci file: the latest file I get contains more
>than *60 million of lines*
>
>
> Please find in attachment an extract of the input format
> (‘example_of_input’), and the matrix I’m trying to create and manage with
> Numpy
>
>
> Thanks again for your time
>
> Paul
>
>
> ###
>
> ##BEGIN *-> line number x in the original file*
>
> 42   *-> indicates the number of sub-blocs*
>
> 1 *-> number of the 1rst sub-bloc*
>
> 6 *-> gives how many value belong to the sub bloc*
>
> 12
>
> 47
>
> 2
>
> 46
>
> 3
>
> 51
>
> ….
>
> 13  * -> another type of sub-bloc with 25

Re: [Numpy-discussion] Scipy 2017 NumPy sprint

2017-07-05 Thread Chris Barker

On Mon, Jul 3, 2017 at 4:27 PM, Stephan Hoyer  wrote:

> If someone who does subclasses/array-likes or so (e.g. like Stefan
>> Hoyer ;)) and is interested, and also we do some
>> teleconferencing/chatting (and I have time) I might be interested
>> in discussing and possibly trying to develop the new indexer ideas,
>> which I feel are pretty far, but I got stuck on how to get subclasses
>> right.
>
>
> I am off course very happy to discuss this (online or via teleconference,
> sadly I won't be at scipy), but to be clear I use array likes, not
> subclasses. I think Marten van Kerkwijk is the last one who thinks that is
> still a good idea :).
>

Indeed -- I thought the community more or less had decided that duck-typing
was THE way to make something that could be plugged in where a numpy array
is expected.

Along those lines, there was some discussion of having a set of utilities
(or maybe eve3n an ABC?) that would make it easier to create a ndarray-like
object.

That is, the boilerplate needed for multi-dimensional indexing and slicing,
etc...

That could be a nice little sprint-able project.

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Array and string interoperability

2017-06-06 Thread Chris Barker

On Mon, Jun 5, 2017 at 3:59 PM, Mikhail V  wrote:

> -- classify by "forward/backward" conversion:
> For this time consider only forward, i.e. I copy data from string
> to numpy array
>
> -- classify by " bytes  vs  ordinals ":
>
> a)  bytes:  If I need raw bytes - in this case e.g.
>
>   B = bytes(s.encode())
>

no need to call "bytes" -- encode() returns a bytes object:

In [1]: s = "this is a simple ascii-only string"

In [2]: b = s.encode()

In [3]: type(b)

Out[3]: bytes

In [4]: b

Out[4]: b'this is a simple ascii-only string'


>
> will do it. then I can copy data to array. So currently there are methods
> coverings this. If I understand correctly the data extracted corresponds
> to utf-??  byte feed, i.e. non-constant byte-length of chars (1 up to
> 4 bytes per char for
> the 'wide' unicode, correct me if I am wrong).
>

In [5]: s.encode?
Docstring:
S.encode(encoding='utf-8', errors='strict') -> bytes

So the default is utf-8, but you can set any encoding you want (that python
supports)

 In [6]: s.encode('utf-16')

Out[6]: b'\xff\xfet\x00h\x00i\x00s\x00 \x00i\x00s\x00 \x00a\x00
\x00s\x00i\x00m\x00p\x00l\x00e\x00
\x00a\x00s\x00c\x00i\x00i\x00-\x00o\x00n\x00l\x00y\x00
\x00s\x00t\x00r\x00i\x00n\x00g\x00'



> b):  I need *ordinals*
>   Yes, I need ordinals, so for the bytes() method, if a Python 3
> string contains only
>   basic ascii, I can so or so convert to bytes then to integer array
> and the length will
>   be the same 1byte for each char.
>   Although syntactically seen, and with slicing, this will look e.g. like:
>
> s= "012 abc"
> B = bytes(s.encode())  # convert to bytes
> k  = len(s)
> arr = np.zeros(k,"u1")   # init empty array length k
> arr[0:2] = list(B[0:2])
> print ("my array: ", arr)
> ->
> my array:  [48 49  0  0  0  0  0]
>

This can be done more cleanly:

In [15]: s= "012 abc"

In [16]: b = s.encode('ascii')

# you want to use the ascii encoding so you don't get utf-8 cruft if there
are non-ascii characters
#  you could use latin-1 too (Or any other one-byte per char encoding

In [17]: arr = np.fromstring(b, np.uint8)
# this is using fromstring() to means it's old py definiton - treat teh
contenst as bytes
# -- it really should be called "frombytes()"
# you could also use:

In [22]: np.frombuffer(b, dtype=np.uint8)
Out[22]: array([48, 49, 50, 32, 97, 98, 99], dtype=uint8)In [18]: print arr

In [19]: print(arr)
[48 49 50 32 97 98 99]

# you got the ordinals

In [20]: "".join([chr(i) for i in arr])
Out[20]: '012 abc'

# yes, they are the right ones...



> Result seems correct. Note that I also need to use list(B), otherwise
> the slicing does not work (fills both values with 1, no idea where 1
> comes from).
>

that is odd -- I can't explain it right now either...


> Or I can write e.g.:
> arr[0:2] = np.fromstring(B[0:2], "u1")
>
> But looks indeed like a 'hack' and not so simple.
>

is the above OK?


> -- classify "what is maximal ordinal value in the string"
> Well, say, I don't know what is maximal ordinal, e.g. here I take
> 3 Cyrillic letters instead of 'abc':
>
> s= "012 АБВ"
> k  = len(s)
> arr = np.zeros(k,"u4")   # init empty 32 bit array length k
> arr[:] = np.fromstring(np.array(s),"u4")
> ->
> [  48   49   50   32 1040 1041 1042]
>

so this is making a numpy string, which is a UCS-4 encoding unicode -- i.e.
4 bytes per charactor. Then you care converting that to an 4-byte unsigned
int. but no need to do it with fromstring:

In [52]: s
Out[52]: '012 АБВ'

In [53]: s_arr.reshape((1,)).view(np.uint32)
Out[53]: array([  48,   49,   50,   32, 1040, 1041, 1042], dtype=uint32)

we need the reshape() because .view does not work with array scalars -- not
sure why not?

> This gives correct results indeed. So I get my ordinals as expected.
> So this is better/preferred way, right?
>

I would maybe do it more "directly" -- i.e. use python's string to do the
encoding:

In [64]: s
Out[64]: '012 АБВ'

In [67]: np.fromstring(s.encode('U32'), dtype=np.uint32)
Out[67]: array([65279,48,49,50,32,  1040,  1041,  1042],
dtype=uint32)

that first value is the byte-order mark (I think...), you  can strip it off
with:

In [68]: np.fromstring(s.encode('U32')[4:], dtype=np.uint32)
Out[68]: array([  48,   49,   50,   32, 1040, 1041, 1042], dtype=uint32)

or, probably better simply specify the byte order in the encoding:

In [69]: np.fromstring(s.encode('UTF-32LE'), dtype=np.uint32)
Out[69]: array([  48,   49,   50,   32, 1040, 1041, 1042], dtype=uint32)

arr = np.ordinals(s)
> arr[0:2] = np.ordinals(s[0:2])  # with slicing
>
> or, e.g. in such format:
>
> arr = np.copystr(s)
> arr[0:2] = np.copystr(s[0:2])
>

I don't think any of this is necessary -- the UCS4 (Or UTF-32) "encoding"
is pretty much the ordinals anyway.

As you notices, if you make a numpy unicode string array, and change the
dtype to unsigned int32, you get what you want.

You really don't want to mess with any of this unless you understand
unicode and encodings anyway

Though it

Re: [Numpy-discussion] Array and string interoperability

2017-06-05 Thread Chris Barker

On Mon, Jun 5, 2017 at 1:51 PM, Thomas Jollans  wrote:

> > and overloading fromstring() to mean both "binary dump of data" and
> > "parse the text" due to whether the sep argument is set was always a
> > bad idea :-(
> >
> > .. and fromstring(s, sep=a_sep_char)
>
> As it happens, this is pretty much what stdlib bytearray does since 3.2
> (http://bugs.python.org/issue8990)

I'm not sure that the array.array.fromstring() ever parsed the data string
as text, did it?

Anyway, This is what array.array now has:
array.frombytes(s)

Appends items from the string, interpreting the string as an array of
machine values (as if it had been read from a file using the
fromfile()method).
New in version 3.2: fromstring() is renamed to frombytes() for clarity.

array.fromfile(f, n)

Read n items (as machine values) from the file object f and append them to
the end of the array. If less than n items are available, EOFError is
raised, but the items that were available are still inserted into the
array. f must be a real built-in file object; something else with a read()
method won’t do.

array.fromstring()

Deprecated alias for frombytes().

I think numpy should do the same.And frombytes() should remove the "sep"
parameter. If someone wants to write a fast efficient simple text parser,
then it should get a new name: fromtext() maybe???And the fromfile() sep
argument should be deprecated as well, for the same reasons.array also has:

array.fromunicode(s)

Extends this array with data from the given unicode string. The array must
be a type 'u' array; otherwise a ValueError is raised.
Usearray.frombytes(unicodestring.encode(enc)) to append Unicode data to an
array of some other type.

which I think would be better supported by:np.frombytes(str.encode('UCS-4'),
dtype=uint32)
-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Array and string interoperability

2017-06-05 Thread Chris Barker

Just a few notes:

However, the fact that this works for bytestrings on Python 3 is, in my
> humble opinion, ridiculous:
>
> >>> np.array(b'100', 'u1') # b'100' IS NOT TEXT
> array(100, dtype=uint8)
>

Yes, that is a mis-feature -- I think due to bytes and string being the
same object in py2 -- so on py3, numpy continues to treat a bytes objects
as also a 1-byte-per-char string, depending on context. And users want to
be able to write numpy code that will run the same on py2 and py3, so we
kinda need this kind of thing.

Makes me think that an optional "pure-py-3" mode for numpy might be a good
idea. If that flag is set, your code will only run on py3 (or at least
might run differently).


> > Further thoughts:
> > If trying to create "u1" array from a Pyhton 3 string, question is,
> > whether it should throw an error, I think yes,


well, you can pass numbers > 255 into a u1 already:

In [*96*]: np.array(456, dtype='u1')

Out[*96*]: array(200, dtype=uint8)
and it does the wrap-around overflow thing... so why not?


> and in this case
> > "u4" type should be explicitly specified by initialisation, I suppose.
> > And e.g. translation from unicode to extended ascii (Latin1) or whatever
> > should be done on Python side  or with explicit translation.
>

absolutely!

If you ask me, passing a unicode string to fromstring with sep='' (i.e.
> to parse binary data) should ALWAYS raise an error: the semantics only
> make sense for strings of bytes.
>

exactly -- we really should have a "frombytes()" alias for fromstring() and
it should only work for atual bytes objects (strings on py2, naturally).

and overloading fromstring() to mean both "binary dump of data" and "parse
the text" due to whether the sep argument is set was always a bad idea :-(

.. and fromstring(s, sep=a_sep_char)

has been semi broken (or at least not robust) forever anyway.

Currently, there appears to be some UTF-8 conversion going on, which
> creates potentially unexpected results:
>
> >>> s = 'αβγδ'
> >>> a = np.fromstring(s, 'u1')
> >>> a
> array([206, 177, 206, 178, 206, 179, 206, 180], dtype=uint8)
> >>> assert len(a) * a.dtype.itemsize  == len(s)
> Traceback (most recent call last):
>   File "", line 1, in 
> AssertionError
> >>>
>
> This is, apparently (https://github.com/numpy/numpy/issues/2152), due to
> how the internals of Python deal with unicode strings in C code, and not
> due to anything numpy is doing.
>

exactly -- py3 strings are pretty nifty implementation of unicode text --
they have nothing to do with storing binary data, and should not be used
that way. There is essentially no reason you would ever want to pass the
actual binary representation to any other code.

fromstring should be re-named frombytes, and it should raise an exception
if you pass something other than a bytes object (or maybe a memoryview or
other binary container?)

we might want to keep fromstring() for parsing strings, but only if it were
fixed...

IMHO calling fromstring(..., sep='') with a unicode string should be
> deprecated and perhaps eventually forbidden. (Or fixed, but that would
> break backwards compatibility)


agreed.

> Python3 assumes 4-byte strings but in reality most of the time
> > we deal with 1-byte strings, so there is huge waste of resources
> > when dealing with 4-bytes. For many serious projects it is just not
> needed.
>
> That's quite enough anglo-centrism, thank you. For when you need byte
> strings, Python 3 has a type for that. For when your strings contain
> text, bytes with no information on encoding are not enough.
>

There was a big thread about this recently -- it seems to have not quite
come to a conclusion. But anglo-centrism aside, there is substantial demand
for a "smaller" way to store mostly-ascii text.

I _think_ the conversation was steering toward an encoding-specified string
dtype, so us anglo-centric folks could use latin-1 or utf-8.

But someone would need to write the code.

-CHB

> There can be some convenience methods for ascii operations,
> > like eg char.toupper(), but currently they don't seem to work with
> integer
> > arrays so why not make those potentially useful methots usable
> > and make them work on normal integer arrays?
> I don't know what you're doing, but I don't think numpy is normally the
> right tool for text manipulation...
>

I agree here. But if one were to add such a thing (vectorized string
operations) -- I'd think the thing to do would be to wrap (or port) the
python string methods. But it shoudl only work for actual string dtypes, of
course.

note that another part of the discussion previously suggested that we have
a dtype that wraps a native python string object -- then you'd get all for
free. This is essentially an object array with strings in it, which you can
do now.

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206)

Re: [Numpy-discussion] UC Berkeley hiring developers to work on NumPy

2017-05-14 Thread Chris Barker - NOAA Federal

Awesome! This is really great news.

Does this mean is several person-years of funding secured?

-CHB

> On May 13, 2017, at 10:47 PM, Nathaniel Smith  wrote:
>
> Hi all,
>
> As some of you know, I've been working for... quite some time now to
> try to secure funding for NumPy. So I'm excited that I can now
> officially announce that BIDS [1] is planning to hire several folks
> specifically to work on NumPy. These will full time positions at UC
> Berkeley, postdoc or staff, with probably 2 year (initial) contracts,
> and the general goal will be to work on some of the major priorities
> we identified at the last dev meeting: more flexible dtypes, better
> interoperation with other array libraries, paying down technical debt,
> and so forth. Though I'm sure the details will change as we start to
> dig into things and engage with the community.
>
> More details soon; universities move slowly, so nothing's going to
> happen immediately. But this is definitely happening and I wanted to
> get something out publicly before the conference season starts – so if
> you're someone who might be interested in coming to work with me and
> the other awesome folks at BIDS, then this is a heads-up: drop me a
> line and we can chat! I'll be at PyCon next week if anyone happens to
> be there. And feel free to spread the word.
>
> -n
>
> [1] http://bids.berkeley.edu/
>
> --
> Nathaniel J. Smith -- https://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] proposal: smaller representation of string arrays

2017-04-26 Thread Chris Barker

On Wed, Apr 26, 2017 at 4:30 PM, Stephan Hoyer  wrote:

>
> Sorry, I remain unconvinced (for the reasons that Robert, Nathaniel and
> myself have already given), but we seem to be talking past each other here.
>

yeah -- I think it's not clear what the use cases we are talking about are.

> I am still -1 on any new string encoding support unless that includes at
> least UTF-8, with length indicated by the number of bytes.
>

I've said multiple times that utf-8 support is key to any "exchange binary
data" use case (memory mapping?) -- so yes, absolutely.

I _think_ this may be some of the source for the confusion:

The name of this thread is: "proposal: smaller representation of string
arrays".

And I got the impression, maybe mistaken, that folks were suggesting that
internally encoding strings in numpy as "UTF-8, with length indicated by
the number of bytes." was THE solution to the

" the 'U' dtype takes up way too much memory, particularly  for
mostly-ascii data" problem.

I do not think it is a good solution to that problem.

I think a good solution to that problem is latin-1 encoding. (bear with me
here...)

But a bunch of folks have brought up that while we're messing around with
string encoding, let's solve another problem:

* Exchanging unicode text at the binary level with other systems that
generally don't use UCS-4.

For THAT -- utf-8 is critical.

But if I understand Julian's proposal -- he wants to create a parameterized
text dtype that you can set the encoding on, and then numpy will use the
encoding (and python's machinery) to encode / decode when passing to/from
python strings.

It seems this would support all our desires:

I'd get a latin-1 encoded type for compact representation of mostly-ascii
data.

Thomas would get latin-1 for binary interchange with mostly-ascii data

The HDF-5 folks would get utf-8 for binary interchange (If we can workout
the null-padding issue)

Even folks that had weird JAVA or Windows-generated UTF-16 data files could
do the binary interchange thing

I'm now lost as to what the hang-up is.

-CHB

PS: null padding is a pain, python strings seem to preserve the zeros, whic
is odd -- is thre a unicode code-point at x00?

But you can use it to strip properly with the unicode sandwich:

In [63]: ut16 = text.encode('utf-16') + b'\x00\x00\x00\x00\x00\x00'

In [64]: ut16.decode('utf-16')
Out[64]: 'some text\x00\x00\x00'

In [65]: ut16.decode('utf-16').strip('\x00')
Out[65]: 'some text'

In [66]: ut16.decode('utf-16').strip('\x00').encode('utf-16')
Out[66]: b'\xff\xfes\x00o\x00m\x00e\x00 \x00t\x00e\x00x\x00t\x00'

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] proposal: smaller representation of string arrays

2017-04-26 Thread Chris Barker

On Wed, Apr 26, 2017 at 10:45 AM, Robert Kern  wrote:

> >>> > The maximum length of an UTF-8 character is 4 bytes, so we could use
> that to size arrays by character length. The advantage over UTF-32 is that
> it is easily compressible, probably by a factor of 4 in many cases.
>

isn't UTF-32 pretty compressible also? lots of zeros in there

here's an example with pure ascii  Lorem Ipsum text:

In [17]: len(text)
Out[17]: 446

In [18]: len(utf8)
Out[18]: 446

# the same -- it's pure ascii

In [20]: len(utf32)
Out[20]: 1788

# four times a big -- of course.

In [22]: len(bz2.compress(utf8))
Out[22]: 302

# so from 446 to 302, not that great -- probably it would be better for
longer text
# -- but are compressing whole arrays or individual strings?

In [23]: len(bz2.compress(utf32))
Out[23]: 319

# almost as good as the compressed utf-8

And I'm guessing it would be even closer with more non-ascii charactors.

OK -- turns out I'm wrong -- here it with greek -- not a lot of ascii
charactors:

In [29]: len(text)
Out[29]: 672

In [30]: utf8 = text.encode("utf-8")

In [31]: len(utf8)
Out[31]: 1180

# not bad, really -- still smaller than utf-16 :-)

In [33]: len(bz2.compress(utf8))
Out[33]: 495

# pretty good then -- better than 50%

In [34]: utf32 = text.encode("utf-32")
In [35]: len(utf32)

Out[35]: 2692

In [36]: len(bz2.compress(utf32))
Out[36]: 515

# still not quite as good as utf-8, but close.

So: utf-8 compresses better than utf-32, but only by a little bit -- at
least with bz2.

But it is a lot smaller uncompressed.

>>> The major use case that we have for a UTF-8 array is HDF5, and it
> specifies the width in bytes, not Unicode characters.
> >>
> >> It's not just HDF5. Counting bytes is the Right Way to measure the size
> of UTF-8 encoded text:
> >> http://utf8everywhere.org/#myths
>

It's really the only way with utf-8 -- which is why it is an impedance
mismatch with python strings.

>> I also firmly believe (though clearly this is not universally agreed
> upon) that UTF-8 is the Right Way to encode strings for *non-legacy*
> applications.
>

fortunately, we don't need to agree to that to agree that:

> So if we're adding any new string encodings, it needs to be one of them.
>

Yup -- the most important one to add -- I don't think it is "The Right Way"
for all applications -- but it "The Right Way" for text interchange.

And regardless of what any of us think -- it is widely used.

> (1) object arrays of strings. (We have these already; whether a
> strings-only specialization would permit useful things like string-oriented
> ufuncs is a question for someone who's willing to implement one.)
>

This is the right way to get variable length strings -- but I'm concerned
that it doesn't mesh well with numpy uses like npz files, raw dumping of
array data, etc. It should not be the only way to get proper Unicode
support, nor the default when you do:

array(["this", "that"])

> > (2) a dtype for fixed byte-size, specified-encoding, NULL-padded data.
> All python encodings should be permitted. An additional function to
> truncate encoded data without mangling the encoding would be handy.
>

I think necessary -- at least when you pass in a python string...

> I think it makes more sense for this to be NULL-padded than
> NULL-terminated but it may be necessary to support both; note that
> NULL-termination is complicated for encodings like UCS4.
>

is it if you know it's UCS4? or even know the size of the code-unit (I
think that's the term)

> This also includes the legacy UCS4 strings as a special case.
>

what's special about them? I think the only thing shold be that they are
the default.
>

> > (3) a dtype for fixed-length byte strings. This doesn't look very
> different from an array of dtype u8, but given we have the bytes type,
> accessing the data this way makes sense.
>
> The void dtype is already there for this general purpose and mostly works,
> with a few niggles.
>

I'd never noticed that! And if I had I never would have guessed I could use
it that way.

> If it worked more transparently and perhaps rigorously with `bytes`, then
> it would be quite suitable.
>

Then we should fix a bit of those things -- and call it soemthig like
"bytes", please.

-CHB

>
> --

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] proposal: smaller representation of string arrays

2017-04-26 Thread Chris Barker - NOAA Federal

> > I DO recommend Latin-1 As a default encoding ONLY for  "mostly ascii, with 
> > a few extra characters" data. With all the sloppiness over the years, there 
> > are way to many files like that.
>
> That sloppiness that you mention is precisely the "unknown encoding" problem.

Exactly -- but from a practicality beats purity perspective, there is
a difference between "I have no idea whatsoever" and "I know it is
mostly ascii, and European, but there are some extra characters in
there"

Latin-1 had proven very useful for that case.

I suppose in most cases ascii with errors='replace' would be a good
choice, but I'd still rather not throw out potentially useful
information.

> Your previous advocacy has also touched on using latin-1 to decode existing 
> files with unknown encodings as well. If you want to advocate for using 
> latin-1 only for the creation of new data, maybe stop talking about existing 
> files? :-)

Yeah, I've been very unfocused in this discussion -- sorry about that.

> > Note: the primary use-case I have in mind is working with ascii text in 
> > numpy arrays efficiently-- folks have called for that. All I'm saying is 
> > use Latin-1 instead of ascii -- that buys you some useful extra characters.
>
> For that use case, the alternative in play isn't ASCII, it's UTF-8, which 
> buys you a whole bunch of useful extra characters. ;-)

UTF-8 does not match the character-oriented Python text model. Plenty
of people argue that that isn't the "correct" model for Unicode text
-- maybe so, but it is the model python 3 has chosen. I wrote a much
longer rant about that earlier.

So I think the easy to access, and particularly defaults, numpy string
dtypes should match it.

It's become clear in this discussion that there is s strong desire to
support a numpy dtype that stores text in particular binary formats
(I.e. Encodings). Rather than choose one or two, we might as well
support all encodings supported by python.

In that case, we'll have utf-8 for those that know they want that, and
we'll have latin-1 for those that incorrectly think they want that :-)

So what remains is to decide is implementation, syntax, and defaults.

Let's keep in mind that most of us on this list, and in this
discussion, are the folks that write interface code and the like. But
most numpy users are not as tuned in to the internals. So defaults
should be set to best support the more "naive" user.

> . If all we do is add a latin-1 dtype for people to use to create new 
> in-memory data, then someone is going to use it to read existing data in 
> unknown or ambiguous encodings.

If we add every encoding known to man someone is going to use Latin-1
to read unknown encodings. Indeed, as we've all pointed out, there is
no correct encoding with which to read unknown encodings.

Frankly, if we have UTF-8 under the hood, I think people are even MORE
likely to use it inappropriately-- it's quite scary how many people
think UTF-8 == Unicode, and think all you need to do is "use utf-8",
and you don't need to change any of the rest of your code. Oh, and
once you've done that, you can use your existing ASCII-only tests and
think you have a working application :-)

-CHB
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] proposal: smaller representation of string arrays

2017-04-25 Thread Chris Barker - NOAA Federal

> On Apr 25, 2017, at 12:38 PM, Nathaniel Smith  wrote:

> Eh... First, on Windows and MacOS, filenames are natively Unicode.

Yeah, though once they are stored I. A text file -- who the heck
knows? That may be simply unsolvable.
> s. And then from in Python, if you want to actually work with those filenames 
> you need to either have a bytestring type or else a Unicode type that uses 
> surrogateescape to represent the non-ascii characters.

> IMO if you have filenames that are arbitrary bytestrings and you need to 
> represent this properly, you should just use bytestrings -- really, they're 
> perfectly friendly :-).

I thought the Python file (and Path) APIs all required (Unicode)
strings? That was the whole complaint!

And no, bytestrings are not perfectly friendly in py3.

This got really complicated and sidetracked, but All I'm suggesting is
that if we have a 1byte per char string type, with a fixed encoding,
that that encoding be Latin-1, rather than ASCII.

That's it, really.

Having a settable encoding would work fine, too.

-CHB
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

1 2 >

1 - 100 of 103 matches

Mail list logo