[Numpy-discussion] Behaviour of copy for structured dtypes with gaps

2019-04-11 Thread Marten van Kerkwijk
Hi All,

An issue [1] about the copying of arrays with structured dtype raised a
question about what the expected behaviour is: does copy always preserve
the dtype as is, or should it remove padding?

Specifically, consider an array with a structure with many fields, say 'a'
to 'z'. Since numpy 1.16, if one does a[['a', 'z']]`, a view will be
returned. In this case, its dtype will include a large offset. Now, if we
copy this view, should the result have exactly the same dtype, including
the large offset (i.e., the copy takes as much memory as the original full
array), or should the padding be removed? From the discussion so far, it
seems the logic has boiled down to a choice between:

(1) Copy is a contract that the dtype will not vary (e.g., we also do not
change endianness);

(2) Copy is a contract that any access to the data in the array will return
exactly the same result, without wasting memory and possibly optimized for
access with different strides. E.g., `array[::10].copy() also compacts the
result.

An argument in favour of (2) is that, before numpy 1.16, `a[['a',
'z']].copy()` did return an array without padding. Of course, this relied
on `a[['a', 'z']]` already returning a copy without padding, but still this
is a regression.

More generally, there should at least be a clear way to get the compact
copy. Also, it would make sense for things like `np.save` to remove any
padding (it currently does not).

What do people think? All the best,

Marten

[1] https://github.com/numpy/numpy/issues/13299
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] adding Quansight Labs as institutional partner

2019-04-11 Thread Ralf Gommers
On Tue, Apr 9, 2019 at 6:25 PM Ralf Gommers  wrote:

>
>
> Thanks Alan, good questions. The donations via the Flipcause site go to
> NumFOCUS. NumFOCUS is a 501(c)3 and NumPy's fiscal sponsor, so any
> individual or institution that wants to donate to NumPy should preferably
> donate to NumFOCUS. That way your donation is tax-deductable if you're in
> the US, and it can be used in a way that the NumPy Steering Council prefers.
>
> Quansight Labs is not a nonprofit and it doesn't make much sense for it to
> focus on donations. That said, it does have a very capable team, so could
> contract with NumFOCUS to do work on NumPy, if the NumPy Steering Council
> thinks that's in NumPy's best interest (e.g. for developing a particular
> feature).
>
>
> On Tue, Apr 9, 2019 at 6:05 PM Alan Isaac  wrote:
>
>> Under the section "How will we fund this?" in the first Quansight link,
>> there is not category of "individual and institutional donations".
>> I noticed this because the question recently arose at my university,
>> how can the university occasionally donate to NumPy development?
>>
>
> We should talk:) Anything we can do as a project to make that easier
> should be done. Also if you need an invoice or purchase order from
> NumFOCUS, I believe that can be easily arranged (and has been arranged for
> other projects in the past).
>
> In order for this to happen, the recipient of the donation and the
>> intended use of the funds must be transparently documented.  As an
>> example, suppose one goes to numpy.org and scrolls down (!?) to
>> the "Donate to Numpy" button.  It is entirely unclear what that
>> means, and clicking the button leads to a flipcause site that fails
>> to clarify.
>
>
> The numpy.org donation button needs to be overhauled anyway, because
> NumFOCUS is switching away from Flipcause. At the same time we can clarify
> on that page where donation go and how we then decide to use that funding.
>

Here is a PR with updates to the numpy.org front page:
https://github.com/numpy/numpy.org/pull/20. I think it contains all the
essentials (governance, roadmap, where the funds go). A larger website
overhaul is also in order, but that's hard to do right now. Howevrer if
there is still information missing that people really look for when
considering a donation, I'd love to know so I can add that straight away.

Cheers,
Ralf



>
>> I suspect many academic institutions would be interested
>> in making occasional, modest contributions toward NumPy development,
>> if the recipient and intended uses were entirely transparent.
>>
>
> I think academic institutions or the people in it may have a lot of
> goodwill towards NumPy, however as a project we historically have been very
> bad at communicating needs and asking for donations. That button doesn't
> really do much; our average donation level is like $50/month. I actually
> would like to improve that. E.g. if we have a good story and ask people
> whose research relies on NumPy (or other core projects) to build say a 0.5%
> software support item in their grant requests, that could turn into a
> decent revenue stream, which will then help with maintenance and
> accelerating development of new features on our roadmap.
>
> Cheers,
> Ralf
>
>
>
>> Cheers, Alan Isaac
>>
>>
>> On 4/9/2019 3:10 AM, Ralf Gommers wrote:
>> > Hi all,
>> >
>> > Last week I joined Quansight. In Quansight Labs I will be working on
>> increasing the contributions to core SciPy/PyData projects, and will also
>> have funded time to work on NumPy and
>> > other projects myself. Hameer Abbasi has had and will continue to have
>> funded time to work on NumPy as well. So I have submitted a pull request
>> (gh-13289) to add Quansight Labs as
>> > an Institutional Partner (we list those at
>> https://docs.scipy.org/doc/numpy/dev/governance/people.html#institutional-partners,
>> currently only BIDS).
>> >
>> > Both Travis and I wrote blog posts about where we want to go with
>> Quansight Labs. Given the relevance to NumPy I thought it would be
>> appropriate to reference those posts here:
>> > -
>> https://www.quansight.com/single-post/2019/04/02/Welcoming-Ralf-Gommers-as-Director-of-Quansight-Labs
>> > <
>> https://www.quansight.com/single-post/2019/04/02/Welcoming-Ralf-Gommers-as-Director-of-Quansight-Labs
>> >
>> > - https://labs.quansight.org/blog/2019/4/joining-labs/
>> >
>> > Any feedback, suggestion or idea is very welcome.
>> >
>> > Cheers,
>> > Ralf
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Behaviour of copy for structured dtypes with gaps

2019-04-11 Thread Stefan van der Walt
Hi Marten,

On Thu, 11 Apr 2019 09:51:10 -0400, Marten van Kerkwijk wrote:
> From the discussion so far, it
> seems the logic has boiled down to a choice between:
> 
> (1) Copy is a contract that the dtype will not vary (e.g., we also do not
> change endianness);
> 
> (2) Copy is a contract that any access to the data in the array will return
> exactly the same result, without wasting memory and possibly optimized for
> access with different strides. E.g., `array[::10].copy() also compacts the
> result.

I think you'll get different answers, depending on whom you ask—those
interested in low-level memory layout, vs those who use the higher-level
API.  Given that higher-level API use is much more common, I would lean
in the direction of option (2).

From that perspective, we already don't make consistency guarantees about memory
layout and other flags.  E.g.,

In [16]: x = np.arange(12).reshape((3, 4))  

 
In [17]: x.strides  


Out[17]: (32, 8)

In [18]: x[::2, 1::2].strides   

Out[18]: (64, 16)

In [19]: np.copy(x[::2, 1::2]).strides  


Out[19]: (16, 8)

Not to mention this odd copy contract:

>>> x = np.array([[1,2,3],[4,5,6]], order='F')
>>> print(np.copy(x).flags['C_CONTIGUOUS'])
>>> print(x.copy().flags['C_CONTIGUOUS'])

False
True


The objection about arrays that don't behave identically in [0] feels
somewhat arbitary to me.  As shown above, you can always find attributes
that differ between a copied array and the original.

The user's expectation is that they'll get an array that behaves the
same way as the original, not one that is byte-for-byte compatible.  The
most common use case is to make sure that the original array doesn't get
overwritten.

Just to play devil's advocate with myself: if you do choose option (2),
how would you go about making an identical memory copy of the original array?

Best regards,
Stéfan


[0] https://github.com/numpy/numpy/issues/13299#issuecomment-481912827
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Behaviour of copy for structured dtypes with gaps

2019-04-11 Thread Travis Oliphant
I agree with Stefan that option 2 is what NumPy should go with for .copy()

If you want to get an identical memory copy you should be getting the .data
attribute and doing something with that buffer.

My $0.02

-Travis


On Thu, Apr 11, 2019 at 6:01 PM Stefan van der Walt 
wrote:

> Hi Marten,
>
> On Thu, 11 Apr 2019 09:51:10 -0400, Marten van Kerkwijk wrote:
> > From the discussion so far, it
> > seems the logic has boiled down to a choice between:
> >
> > (1) Copy is a contract that the dtype will not vary (e.g., we also do not
> > change endianness);
> >
> > (2) Copy is a contract that any access to the data in the array will
> return
> > exactly the same result, without wasting memory and possibly optimized
> for
> > access with different strides. E.g., `array[::10].copy() also compacts
> the
> > result.
>
> I think you'll get different answers, depending on whom you ask—those
> interested in low-level memory layout, vs those who use the higher-level
> API.  Given that higher-level API use is much more common, I would lean
> in the direction of option (2).
>
> From that perspective, we already don't make consistency guarantees about
> memory
> layout and other flags.  E.g.,
>
> In [16]: x = np.arange(12).reshape((3, 4))
>
>
> In [17]: x.strides
>
>
> Out[17]: (32, 8)
>
> In [18]: x[::2, 1::2].strides
>
>  Out[18]: (64, 16)
>
> In [19]: np.copy(x[::2, 1::2]).strides
>
>
> Out[19]: (16, 8)
>
> Not to mention this odd copy contract:
>
> >>> x = np.array([[1,2,3],[4,5,6]], order='F')
> >>> print(np.copy(x).flags['C_CONTIGUOUS'])
> >>> print(x.copy().flags['C_CONTIGUOUS'])
>
> False
> True
>
>
> The objection about arrays that don't behave identically in [0] feels
> somewhat arbitary to me.  As shown above, you can always find attributes
> that differ between a copied array and the original.
>
> The user's expectation is that they'll get an array that behaves the
> same way as the original, not one that is byte-for-byte compatible.  The
> most common use case is to make sure that the original array doesn't get
> overwritten.
>
> Just to play devil's advocate with myself: if you do choose option (2),
> how would you go about making an identical memory copy of the original
> array?
>
> Best regards,
> Stéfan
>
>
> [0] https://github.com/numpy/numpy/issues/13299#issuecomment-481912827
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Behaviour of copy for structured dtypes with gaps

2019-04-11 Thread Nathaniel Smith
My concern would be that to implement (2), I think .copy() has to
either special-case certain dtypes, or else we have to add some kind
of "simplify for copy" operation to the dtype protocol. These both add
architectural complexity, so maybe it's better to avoid it unless we
have a compelling reason?

On Thu, Apr 11, 2019 at 6:51 AM Marten van Kerkwijk
 wrote:
>
> Hi All,
>
> An issue [1] about the copying of arrays with structured dtype raised a 
> question about what the expected behaviour is: does copy always preserve the 
> dtype as is, or should it remove padding?
>
> Specifically, consider an array with a structure with many fields, say 'a' to 
> 'z'. Since numpy 1.16, if one does a[['a', 'z']]`, a view will be returned. 
> In this case, its dtype will include a large offset. Now, if we copy this 
> view, should the result have exactly the same dtype, including the large 
> offset (i.e., the copy takes as much memory as the original full array), or 
> should the padding be removed? From the discussion so far, it seems the logic 
> has boiled down to a choice between:
>
> (1) Copy is a contract that the dtype will not vary (e.g., we also do not 
> change endianness);
>
> (2) Copy is a contract that any access to the data in the array will return 
> exactly the same result, without wasting memory and possibly optimized for 
> access with different strides. E.g., `array[::10].copy() also compacts the 
> result.
>
> An argument in favour of (2) is that, before numpy 1.16, `a[['a', 
> 'z']].copy()` did return an array without padding. Of course, this relied on 
> `a[['a', 'z']]` already returning a copy without padding, but still this is a 
> regression.
>
> More generally, there should at least be a clear way to get the compact copy. 
> Also, it would make sense for things like `np.save` to remove any padding (it 
> currently does not).
>
> What do people think? All the best,
>
> Marten
>
> [1] https://github.com/numpy/numpy/issues/13299
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion



-- 
Nathaniel J. Smith -- https://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion