Re: [Numpy-discussion] correct sizeof for ndarray

2015-10-20 Thread Allan Haldane

On 10/20/2015 12:05 AM, Jason Newton wrote:

Hi folks,

I noticed an unexpected behavior of itemsize for structures with offsets
that are larger than that of a packed structure in memory.  This matters
when parsing in memory structures from C and some others (recently and
HDF5/h5py detail got me for a bit).

So what is the correct way to get "sizeof" a structure?  AFAIK this is
the size of the last item + it's offset.  If this doesn't exist...
shouldn't it?

Thanks,
Jason


Hi Jason,

The 'itemsize' attribute of a dtype object is probably what you're 
looking for. It gives the itemsize in bytes.


"last item + it's offset" is not a reliable way to get the itemsize 
because "aligned" (and other) structures can have trailing padding, just 
like C structs:


>>> dtype('i4,u1', align=True).itemsize
8

The documentation on all this is a little scattered right now, but there 
are hints in the array.dtypes reference page and the dtype docstring.


Cheers,
Allan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-20 Thread josef.pktd
On Mon, Oct 19, 2015 at 9:51 PM,  wrote:

>
>
> On Mon, Oct 19, 2015 at 9:15 PM, Nathaniel Smith  wrote:
>
>> On Mon, Oct 19, 2015 at 5:55 AM,   wrote:
>> >
>> >
>> > On Mon, Oct 19, 2015 at 2:14 AM, Nathaniel Smith  wrote:
>> >>
>> >> On Sun, Oct 18, 2015 at 9:35 PM,   wrote:
>> >>  np.column_stack((np.ones(10), np.ones(10))).flags
>> >> >   C_CONTIGUOUS : True
>> >> >   F_CONTIGUOUS : False
>> >> >
>> >>  np.__version__
>> >> > '1.9.2rc1'
>> >> >
>> >> >
>> >> > on my notebook which has numpy 1.6.1 it is f_contiguous
>> >> >
>> >> >
>> >> > I was just trying to optimize a loop over variable adjustment in
>> >> > regression,
>> >> > and found out that we lost fortran contiguity.
>> >> >
>> >> > I always thought column_stack is for fortran usage (linalg)
>> >> >
>> >> > What's the alternative?
>> >> > column_stack was one of my favorite commands, and I always assumed we
>> >> > have
>> >> > in statsmodels the right memory layout to call the linalg libraries.
>> >> >
>> >> > ("assumed" means we don't have timing nor unit tests for it.)
>> >>
>> >> In general practice no numpy functions make any guarantee about memory
>> >> layout, unless that's explicitly a documented part of their contract
>> >> (e.g. 'ascontiguous', or some functions that take an order= argument
>> >> -- I say "some" b/c there are functions like 'reshape' that take an
>> >> argument called order= that doesn't actually refer to memory layout).
>> >> This isn't so much an official policy as just a fact of life -- if
>> >> no-one has any idea that the someone is depending on some memory
>> >> layout detail then there's no way to realize that we've broken
>> >> something. (But it is a good policy IMO.)
>> >
>> >
>> > I understand that in general.
>> >
>> > However, I always thought column_stack is a array creation function
>> which
>> > have guaranteed memory layout. And since it's stacking by columns I
>> thought
>> > that order is always Fortran.
>> > And the fact that it doesn't have an order keyword yet, I thought is
>> just a
>> > missing extension.
>>
>> I guess I don't know what to say except that I'm sorry to hear that
>> and sorry that no-one noticed until several releases later.
>>
>
>
> Were there more contiguity changes in 0.10?
> I just saw a large number of test errors and failures in statespace models
> which are heavily based on cython code where it's not just a question of
> performance.
>
> I don't know yet what's going on, but I just saw that we have some
> explicit tests for fortran contiguity which just started to fail.
>
>
>
>
>>
>> >> If this kind of problem gets caught during a pre-release cycle then we
>> >> generally do try to fix it, because we try not to break code, but if
>> >> it's been broken for 2 full releases then there's no much we can do --
>> >> we can't go back in time to fix it so it sounds like you're stuck
>> >> working around the problem no matter what (unless you want to refuse
>> >> to support 1.9.0 through 1.10.1, which I assume you don't... worst
>> >> case, you just have to do a global search replace of np.column_stack
>> >> with statsmodels.utils.column_stack_f, right?).
>> >>
>> >> And the regression issue seems like the only real argument for
>> >> changing it back -- we'd never guarantee f-contiguity here if starting
>> >> from a blank slate, I think?
>> >
>> >
>> > When the cat is out of the bag, the down stream developer writes
>> > compatibility code or helper functions.
>> >
>> > I will do that at at least the parts I know are intentionally designed
>> for F
>> > memory order.
>> >
>> > ---
>> >
>> > statsmodels doesn't really check or consistently optimize the memory
>> order,
>> > except in some cython functions.
>> > But, I thought we should be doing quite well with getting Fortran
>> ordered
>> > arrays. I only paid attention where we have more extensive loops
>> internally.
>> >
>> > Nathniel, Does patsy guarantee memory layout (F-contiguous) when
>> creating
>> > design matrices?
>>
>> I never thought about it :-). So: no, it looks like right now patsy
>> usually returns C-order matrices (or really, whatever np.empty or
>> np.repeat returns), and there aren't any particular guarantees that
>> this will continue to be the case in the future.
>>
>> Is returning matrices in F-contiguous layout really important? Should
>> there be a return_type="fortran_matrix" option or something like that?
>>
>
> I don't know, yet. My intuition was that it would be better because we
> feed the arrays directly to pinv/SVD or QR which, I think, require by
> default Fortran contiguous.
>
> However, my intuition might not be correct, and it might not make much
> difference in a single OLS estimation.
>

I did some quick timing checks of pinv and qr, and the Fortran ordered is
only about 5% to 15% faster and uses about the same amount of memory
(watching the Task manager). So, nothing to get excited 

Re: [Numpy-discussion] Behavior of numpy.copy with sub-classes

2015-10-20 Thread Benjamin Root
In many other parts of numpy, calling the numpy function that had an
equivalent array method would result in the method being called. I would
certainly be surprised if the copy() method behaved differently from the
np.copy() function.

Now it is time for me to do some grepping of my code-bases...

On Mon, Oct 19, 2015 at 10:40 PM, Charles R Harris <
charlesr.har...@gmail.com> wrote:

>
>
> On Mon, Oct 19, 2015 at 8:28 PM, Nathan Goldbaum 
> wrote:
>
>>
>>
>> On Mon, Oct 19, 2015 at 7:23 PM, Jonathan Helmus 
>> wrote:
>>
>>> In GitHub issue #3474, a number of us have started a conversation on how
>>> NumPy's copy function should behave when passed an instance which is a
>>> sub-class of the array class.  Specifically, the issue began by noting that
>>> when a MaskedArray is passed to np.copy, the sub-class is not passed
>>> through but rather a ndarray is returned.
>>>
>>> I suggested adding a "subok" parameter which controls how sub-classes
>>> are handled and others suggested having the function call a copy method on
>>> duck arrays.  The "subok" parameter is implemented in PR #6509 as an
>>> example. Both of these options would change the API of numpy.copy and
>>> possibly break backwards compatibility.  Do others have an opinion of how
>>> np.copy should handle sub-classes?
>>>
>>> For a concrete example of this behavior and possible changes, what type
>>> should copy_x be in the following snippet:
>>>
>>> import numpy as np
>>> x = np.ma.array([1,2,3])
>>> copy_x = np.copy(x)
>>>
>>
>> FWIW, it looks like np.copy() is never used in our code to work with the
>> ndarray subclass we maintain in yt. Instead we use the copy() method much
>> more often, and that returns the appropriate type. I guess it makes sense
>> to have the type of the return value of np.copy() agree with the type of
>> the copy() member function.
>>
>> That said, breaking backwards compatibility here before numpy 2.0 might
>> very well break real code. It might be worth it search e.g. github for all
>> instances of np.copy() to see if they're dealing with subclasses.
>>
>
> The problem with github searches is that there are a ton of numpy forks.
> ISTR once finding a method to avoid them, but can't remember what is was.
> If anyone knows how to do that, I'd appreciate learning.
>
> Chuck
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion