Re: [Numpy-discussion] Scipy 2017 NumPy sprint

2017-07-07 Thread Ryan May
On Fri, Jul 7, 2017 at 4:27 PM, Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> Hi All,
>
> I doubt I'm really the last one thinking ndarray subclassing is a good
> idea, but as that was stated, I feel I should at least pipe in. It
> seems to me there is both a perceived problem -- with the two
> subclasses that numpy provides -- `matrix` and `MaskedArray` -- both
> being problematic in ways that seem to me to have very little to do
> with subclassing being a bad idea, and a real one following from the
> fact that numpy was written at a time when python's inheritance system
> was not as well developed as it is now.
>
> Though based on my experience with Quantity, I'd also argue that the
> more annoying problems are not so much with `ndarray` itself, but
> rather with the helper functions.  Ufuncs were not so bad -- they
> really just needed a better override mechanism, which __array_ufunc__
> now provides -- but for quite a few of the other functions subclassing
> was clearly an afterthought. Indeed, `MaskedArray` provides a nice
> example of this, with its many special `np.ma.` routines,
> providing huge duplication and thus lots of duplicated bugs (which
> Eric has been patiently fixing...). Indeed, `MaskedArray` is also a
> much better example than ndarrat of a class that is really hard to
> subclass (even though, conceptually, it should be a far easier one).
>
> All that said, duck-type arrays make a lot of sense, and e.g. the
> slicing and shaping methods are easily emulated, especially if one's
> underlying data are stored in `ndarray`. For astropy's version of a
> relevant mixin, see
> http://docs.astropy.org/en/stable/api/astropy.utils.misc.
> ShapedLikeNDArray.html


My biggest problem with subclassing as it exists now is that they don't
survive the first encounter with np.asarray (or np.array). So much code
written to work with numpy uses that as a bandaid (for e.g. handling lists)
that in my experience it's 50/50 whether passing a subclass to a function
will actually behave as expected--even if there's no good reason it
shouldn't.

Ryan

-- 
Ryan May
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Scipy 2017 NumPy sprint

2017-07-07 Thread Marten van Kerkwijk
Hi All,

I doubt I'm really the last one thinking ndarray subclassing is a good
idea, but as that was stated, I feel I should at least pipe in. It
seems to me there is both a perceived problem -- with the two
subclasses that numpy provides -- `matrix` and `MaskedArray` -- both
being problematic in ways that seem to me to have very little to do
with subclassing being a bad idea, and a real one following from the
fact that numpy was written at a time when python's inheritance system
was not as well developed as it is now.

Though based on my experience with Quantity, I'd also argue that the
more annoying problems are not so much with `ndarray` itself, but
rather with the helper functions.  Ufuncs were not so bad -- they
really just needed a better override mechanism, which __array_ufunc__
now provides -- but for quite a few of the other functions subclassing
was clearly an afterthought. Indeed, `MaskedArray` provides a nice
example of this, with its many special `np.ma.` routines,
providing huge duplication and thus lots of duplicated bugs (which
Eric has been patiently fixing...). Indeed, `MaskedArray` is also a
much better example than ndarrat of a class that is really hard to
subclass (even though, conceptually, it should be a far easier one).

All that said, duck-type arrays make a lot of sense, and e.g. the
slicing and shaping methods are easily emulated, especially if one's
underlying data are stored in `ndarray`. For astropy's version of a
relevant mixin, see
http://docs.astropy.org/en/stable/api/astropy.utils.misc.ShapedLikeNDArray.html

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Polynomial silent breakage with 1.13

2017-07-07 Thread Matthew Brett
Hi,

On Fri, Jul 7, 2017 at 6:14 PM, Eric Wieser  wrote:
> That’s a regression, and it’s on me, in 8762.
>
> That was a side effect of a fix for the weird behaviour here.
>
> I think we need to fix this in 1.13.2, so we should file an issue about it.

Thanks for the feedback.  Do you want to file an issue, or should I?

Cheers,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Polynomial silent breakage with 1.13

2017-07-07 Thread Matthew Brett
Hi,

Our (nipy's) test suite just failed with the upgrade to numpy 1.13,
and the cause boiled down to this:

```
import numpy as np

poly = np.poly1d([1])
poly.c[0] *= 2
print(poly.c)
```

Numpy 1.12 gives (to me) expected output:

[2]

Numpy 1.13 gives (to me) unexpected output:

[1]

The problem is caused by the fact that the coefficients are now a
*copy* of the actual coefficient array - I think in an attempt to stop
us modifying the coefficients directly.

I can't see any deprecation warnings with `-W always`.

The pain point here is that code that used to give the right answer
has now (I believe silently) switched to giving the wrong answer.

Cheers,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] record data previous to Numpy use

2017-07-07 Thread Derek Homeier
On 07 Jul 2017, at 4:24 PM, paul.carr...@free.fr wrote:
> 
> ps : I'd like to use the following code that is much more familiar for me :-)
> 
> COMP_list = np.asarray(COMP_list, dtype = np.float64) 
> i = np.arange(1,NumberOfRecords,2)
> COMP_list = np.delete(COMP_list,i)
> 
Not sure about the background of this, but if you want to remove every second 
entry
(if NumberOfRecords is the full length of the list, that is), it would always 
be preferable
to make changes to the list, or even better, extract only the entries you want:

COMP_list = np.asarray(COMP_list[::2], dtype = np.float64)

Have a good weekend

Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] record data previous to Numpy use

2017-07-07 Thread paul . carrico
Hi (all) 

Ounce again I would like to thanks the community for the supports. 

I progressing in moving my code to Python .. 

In my mind some parts remains quite hugly (and burns me the eyes), but
it works and I'll optimized it in the future ; so far I can work with
the data in a single reading 

I builts some blocks in a text file and used Astropy to read it (work
fine now - i'll test pandas next step) 

Not finish yet but in a significant progress compare to yesterday :-) 

Have a good WE 

Paul 

ps : I'd like to use the following code that is much more familiar for
me :-) 

COMP_list = np.asarray(COMP_list, dtype = np.float64) 
i = np.arange(1,NumberOfRecords,2)
COMP_list = np.delete(COMP_list,i) 

Le 2017-07-07 12:04, Derek Homeier a écrit :

> On 7 Jul 2017, at 1:59 am, Chris Barker  wrote: 
> 
>> On Thu, Jul 6, 2017 at 10:55 AM,   wrote:
>> It's is just a reflexion, but for huge files one solution might be to 
>> split/write/build first the array in a dedicated file (2x o(n) iterations - 
>> one to identify the blocks size - additional one to get and write), and then 
>> to load it in memory and work with numpy - 
>> 
>> I may have your use case confused, but if you have a huge file with multiple 
>> "blocks" in it, there shouldn't be any problem with loading it in one go -- 
>> start at the top of the file and load one block at a time (accumulating in a 
>> list) -- then you only have the memory overhead issues for one block at a 
>> time, should be no problem.
>> 
>> at this stage the dimension is known and some packages will be fast and more 
>> adapted (pandas or astropy as suggested).
>> 
>> pandas at least is designed to read variations of CSV files, not sure you 
>> could use the optimized part to read an array out of part of an open file 
>> from a particular point or not.
> The fragmented structure indeed would probably be the biggest challenge, 
> although astropy,
> while it cannot read from an open file handle, at least should be able to 
> directly parse a block
> of input lines, e.g. collected with readline() in a list. Guess pandas could 
> do the same.
> Alternatively the line positions of the blocks could be directly passed to 
> the data_start and
> data_end keywords, but that would require opening and at least partially 
> reading the file
> multiple times. In fact, if the blocks are relatively small, the overhead may 
> be too large to
> make it worth using the faster parsers - if you look at the timing notebooks 
> I had linked to
> earlier, it takes at least ~100 input lines before they show any speed gains 
> over genfromtxt,
> and ~1000 to see roughly linear scaling. In that case writing your own 
> customised reader
> could be the best option after all.
> 
> Cheers,
> Derek
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] record data previous to Numpy use

2017-07-07 Thread Derek Homeier
On 7 Jul 2017, at 1:59 am, Chris Barker  wrote:
> 
> On Thu, Jul 6, 2017 at 10:55 AM,   wrote:
> It's is just a reflexion, but for huge files one solution might be to 
> split/write/build first the array in a dedicated file (2x o(n) iterations - 
> one to identify the blocks size - additional one to get and write), and then 
> to load it in memory and work with numpy - 
> 
> 
> I may have your use case confused, but if you have a huge file with multiple 
> "blocks" in it, there shouldn't be any problem with loading it in one go -- 
> start at the top of the file and load one block at a time (accumulating in a 
> list) -- then you only have the memory overhead issues for one block at a 
> time, should be no problem.
> 
> at this stage the dimension is known and some packages will be fast and more 
> adapted (pandas or astropy as suggested).
> 
> pandas at least is designed to read variations of CSV files, not sure you 
> could use the optimized part to read an array out of part of an open file 
> from a particular point or not.
> 
The fragmented structure indeed would probably be the biggest challenge, 
although astropy,
while it cannot read from an open file handle, at least should be able to 
directly parse a block
of input lines, e.g. collected with readline() in a list. Guess pandas could do 
the same.
Alternatively the line positions of the blocks could be directly passed to the 
data_start and
data_end keywords, but that would require opening and at least partially 
reading the file
multiple times. In fact, if the blocks are relatively small, the overhead may 
be too large to
make it worth using the faster parsers - if you look at the timing notebooks I 
had linked to
earlier, it takes at least ~100 input lines before they show any speed gains 
over genfromtxt,
and ~1000 to see roughly linear scaling. In that case writing your own 
customised reader
could be the best option after all.

Cheers,
Derek
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion