Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-20 Thread josef.pktd
On Mon, Oct 19, 2015 at 9:51 PM,  wrote:

>
>
> On Mon, Oct 19, 2015 at 9:15 PM, Nathaniel Smith  wrote:
>
>> On Mon, Oct 19, 2015 at 5:55 AM,   wrote:
>> >
>> >
>> > On Mon, Oct 19, 2015 at 2:14 AM, Nathaniel Smith  wrote:
>> >>
>> >> On Sun, Oct 18, 2015 at 9:35 PM,   wrote:
>> >>  np.column_stack((np.ones(10), np.ones(10))).flags
>> >> >   C_CONTIGUOUS : True
>> >> >   F_CONTIGUOUS : False
>> >> >
>> >>  np.__version__
>> >> > '1.9.2rc1'
>> >> >
>> >> >
>> >> > on my notebook which has numpy 1.6.1 it is f_contiguous
>> >> >
>> >> >
>> >> > I was just trying to optimize a loop over variable adjustment in
>> >> > regression,
>> >> > and found out that we lost fortran contiguity.
>> >> >
>> >> > I always thought column_stack is for fortran usage (linalg)
>> >> >
>> >> > What's the alternative?
>> >> > column_stack was one of my favorite commands, and I always assumed we
>> >> > have
>> >> > in statsmodels the right memory layout to call the linalg libraries.
>> >> >
>> >> > ("assumed" means we don't have timing nor unit tests for it.)
>> >>
>> >> In general practice no numpy functions make any guarantee about memory
>> >> layout, unless that's explicitly a documented part of their contract
>> >> (e.g. 'ascontiguous', or some functions that take an order= argument
>> >> -- I say "some" b/c there are functions like 'reshape' that take an
>> >> argument called order= that doesn't actually refer to memory layout).
>> >> This isn't so much an official policy as just a fact of life -- if
>> >> no-one has any idea that the someone is depending on some memory
>> >> layout detail then there's no way to realize that we've broken
>> >> something. (But it is a good policy IMO.)
>> >
>> >
>> > I understand that in general.
>> >
>> > However, I always thought column_stack is a array creation function
>> which
>> > have guaranteed memory layout. And since it's stacking by columns I
>> thought
>> > that order is always Fortran.
>> > And the fact that it doesn't have an order keyword yet, I thought is
>> just a
>> > missing extension.
>>
>> I guess I don't know what to say except that I'm sorry to hear that
>> and sorry that no-one noticed until several releases later.
>>
>
>
> Were there more contiguity changes in 0.10?
> I just saw a large number of test errors and failures in statespace models
> which are heavily based on cython code where it's not just a question of
> performance.
>
> I don't know yet what's going on, but I just saw that we have some
> explicit tests for fortran contiguity which just started to fail.
>
>
>
>
>>
>> >> If this kind of problem gets caught during a pre-release cycle then we
>> >> generally do try to fix it, because we try not to break code, but if
>> >> it's been broken for 2 full releases then there's no much we can do --
>> >> we can't go back in time to fix it so it sounds like you're stuck
>> >> working around the problem no matter what (unless you want to refuse
>> >> to support 1.9.0 through 1.10.1, which I assume you don't... worst
>> >> case, you just have to do a global search replace of np.column_stack
>> >> with statsmodels.utils.column_stack_f, right?).
>> >>
>> >> And the regression issue seems like the only real argument for
>> >> changing it back -- we'd never guarantee f-contiguity here if starting
>> >> from a blank slate, I think?
>> >
>> >
>> > When the cat is out of the bag, the down stream developer writes
>> > compatibility code or helper functions.
>> >
>> > I will do that at at least the parts I know are intentionally designed
>> for F
>> > memory order.
>> >
>> > ---
>> >
>> > statsmodels doesn't really check or consistently optimize the memory
>> order,
>> > except in some cython functions.
>> > But, I thought we should be doing quite well with getting Fortran
>> ordered
>> > arrays. I only paid attention where we have more extensive loops
>> internally.
>> >
>> > Nathniel, Does patsy guarantee memory layout (F-contiguous) when
>> creating
>> > design matrices?
>>
>> I never thought about it :-). So: no, it looks like right now patsy
>> usually returns C-order matrices (or really, whatever np.empty or
>> np.repeat returns), and there aren't any particular guarantees that
>> this will continue to be the case in the future.
>>
>> Is returning matrices in F-contiguous layout really important? Should
>> there be a return_type="fortran_matrix" option or something like that?
>>
>
> I don't know, yet. My intuition was that it would be better because we
> feed the arrays directly to pinv/SVD or QR which, I think, require by
> default Fortran contiguous.
>
> However, my intuition might not be correct, and it might not make much
> difference in a single OLS estimation.
>

I did some quick timing checks of pinv and qr, and the Fortran ordered is
only about 5% to 15% faster and uses about the same amount of memory
(watching the Task manager). So, nothing to get excited 

Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-19 Thread Nathaniel Smith
On Sun, Oct 18, 2015 at 9:35 PM,   wrote:
 np.column_stack((np.ones(10), np.ones(10))).flags
>   C_CONTIGUOUS : True
>   F_CONTIGUOUS : False
>
 np.__version__
> '1.9.2rc1'
>
>
> on my notebook which has numpy 1.6.1 it is f_contiguous
>
>
> I was just trying to optimize a loop over variable adjustment in regression,
> and found out that we lost fortran contiguity.
>
> I always thought column_stack is for fortran usage (linalg)
>
> What's the alternative?
> column_stack was one of my favorite commands, and I always assumed we have
> in statsmodels the right memory layout to call the linalg libraries.
>
> ("assumed" means we don't have timing nor unit tests for it.)

In general practice no numpy functions make any guarantee about memory
layout, unless that's explicitly a documented part of their contract
(e.g. 'ascontiguous', or some functions that take an order= argument
-- I say "some" b/c there are functions like 'reshape' that take an
argument called order= that doesn't actually refer to memory layout).
This isn't so much an official policy as just a fact of life -- if
no-one has any idea that the someone is depending on some memory
layout detail then there's no way to realize that we've broken
something. (But it is a good policy IMO.)

If this kind of problem gets caught during a pre-release cycle then we
generally do try to fix it, because we try not to break code, but if
it's been broken for 2 full releases then there's no much we can do --
we can't go back in time to fix it so it sounds like you're stuck
working around the problem no matter what (unless you want to refuse
to support 1.9.0 through 1.10.1, which I assume you don't... worst
case, you just have to do a global search replace of np.column_stack
with statsmodels.utils.column_stack_f, right?).

And the regression issue seems like the only real argument for
changing it back -- we'd never guarantee f-contiguity here if starting
from a blank slate, I think?

-n

-- 
Nathaniel J. Smith -- http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-19 Thread josef.pktd
On Mon, Oct 19, 2015 at 2:14 AM, Nathaniel Smith  wrote:

> On Sun, Oct 18, 2015 at 9:35 PM,   wrote:
>  np.column_stack((np.ones(10), np.ones(10))).flags
> >   C_CONTIGUOUS : True
> >   F_CONTIGUOUS : False
> >
>  np.__version__
> > '1.9.2rc1'
> >
> >
> > on my notebook which has numpy 1.6.1 it is f_contiguous
> >
> >
> > I was just trying to optimize a loop over variable adjustment in
> regression,
> > and found out that we lost fortran contiguity.
> >
> > I always thought column_stack is for fortran usage (linalg)
> >
> > What's the alternative?
> > column_stack was one of my favorite commands, and I always assumed we
> have
> > in statsmodels the right memory layout to call the linalg libraries.
> >
> > ("assumed" means we don't have timing nor unit tests for it.)
>
> In general practice no numpy functions make any guarantee about memory
> layout, unless that's explicitly a documented part of their contract
> (e.g. 'ascontiguous', or some functions that take an order= argument
> -- I say "some" b/c there are functions like 'reshape' that take an
> argument called order= that doesn't actually refer to memory layout).
> This isn't so much an official policy as just a fact of life -- if
> no-one has any idea that the someone is depending on some memory
> layout detail then there's no way to realize that we've broken
> something. (But it is a good policy IMO.)
>

I understand that in general.

However, I always thought column_stack is a array creation function which
have guaranteed memory layout. And since it's stacking by columns I thought
that order is always Fortran.
And the fact that it doesn't have an order keyword yet, I thought is just a
missing extension.



>
> If this kind of problem gets caught during a pre-release cycle then we
> generally do try to fix it, because we try not to break code, but if
> it's been broken for 2 full releases then there's no much we can do --
> we can't go back in time to fix it so it sounds like you're stuck
> working around the problem no matter what (unless you want to refuse
> to support 1.9.0 through 1.10.1, which I assume you don't... worst
> case, you just have to do a global search replace of np.column_stack
> with statsmodels.utils.column_stack_f, right?).
>
> And the regression issue seems like the only real argument for
> changing it back -- we'd never guarantee f-contiguity here if starting
> from a blank slate, I think?
>

When the cat is out of the bag, the down stream developer writes
compatibility code or helper functions.

I will do that at at least the parts I know are intentionally designed for
F memory order.

---

statsmodels doesn't really check or consistently optimize the memory order,
except in some cython functions.
But, I thought we should be doing quite well with getting Fortran ordered
arrays. I only paid attention where we have more extensive loops internally.

Nathniel, Does patsy guarantee memory layout (F-contiguous) when creating
design matrices?

Josef



>
> -n
>
> --
> Nathaniel J. Smith -- http://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-19 Thread josef.pktd
On Mon, Oct 19, 2015 at 5:16 AM, Sebastian Berg 
wrote:

> On Mo, 2015-10-19 at 01:34 -0400, josef.p...@gmail.com wrote:
> >
>
> 
>
>
> >
> > It looks like in 1.9 it depends on the order of the 2-d arrays, which
> > it didn't do in 1.6
> >
>
> Yes, it uses concatenate, and concatenate probably changed in 1.7 to use
> "K" (since "K" did not really exists before 1.7 IIRC).
> Not sure what we can do about it, the order is not something that is
> easily fixed unless explicitly given. It might be optimized (as in this
> case I would guess).
> Whether or not doing the fastest route for these kind of functions is
> faster for the user is of course impossible to know, we can only hope
> that in most cases it is better.
> If someone has an idea how to decide I am all ears, but I think all we
> can do is put in asserts/tests in the downstream code if it relies
> heavily on the order (or just copy, if the order is wrong) :(, another
> example is change of the output order in advanced indexing in some
> cases, it makes it faster sometimes, and probably slower in others, what
> is right seems very much non-trivial.
>

To understand the reason:

Is this to have more efficient memory access during copying?

AFAIU, column_stack needs to create a new array which has to be either F or
C contiguous, so we always have to pick one of the two. With a large number
of 1d arrays it seemed more "intuitive" to me to copy them by columns.

Josef



>
> - Sebastian
>
>
> >
> > >>> np.column_stack((np.ones(10), np.ones((10, 2), order='F'))).flags
> >   C_CONTIGUOUS : False
> >   F_CONTIGUOUS : True
> >   OWNDATA : True
> >   WRITEABLE : True
> >   ALIGNED : True
> >   UPDATEIFCOPY : False
> >
> >
> >
> >
> > which means the default order looks more like "K" now, not "C", IIUC
> >
> >
> > Josef
> >
> >
> >
> >
> >
> > Josef
> >
> >
> >
> >
> >
> > Stephan
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> >
> >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-19 Thread Sebastian Berg
On Mo, 2015-10-19 at 01:34 -0400, josef.p...@gmail.com wrote:
> 




> 
> It looks like in 1.9 it depends on the order of the 2-d arrays, which
> it didn't do in 1.6
> 

Yes, it uses concatenate, and concatenate probably changed in 1.7 to use
"K" (since "K" did not really exists before 1.7 IIRC).
Not sure what we can do about it, the order is not something that is
easily fixed unless explicitly given. It might be optimized (as in this
case I would guess).
Whether or not doing the fastest route for these kind of functions is
faster for the user is of course impossible to know, we can only hope
that in most cases it is better.
If someone has an idea how to decide I am all ears, but I think all we
can do is put in asserts/tests in the downstream code if it relies
heavily on the order (or just copy, if the order is wrong) :(, another
example is change of the output order in advanced indexing in some
cases, it makes it faster sometimes, and probably slower in others, what
is right seems very much non-trivial.

- Sebastian


> 
> >>> np.column_stack((np.ones(10), np.ones((10, 2), order='F'))).flags
>   C_CONTIGUOUS : False
>   F_CONTIGUOUS : True
>   OWNDATA : True
>   WRITEABLE : True
>   ALIGNED : True
>   UPDATEIFCOPY : False
> 
> 
> 
> 
> which means the default order looks more like "K" now, not "C", IIUC
> 
> 
> Josef
> 
> 
>  
> 
> 
> Josef
>  
> 
> 
> 
> 
> Stephan
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> 
> 
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion



signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-19 Thread josef.pktd
On Mon, Oct 19, 2015 at 9:00 AM,  wrote:

>
>
> On Mon, Oct 19, 2015 at 5:16 AM, Sebastian Berg <
> sebast...@sipsolutions.net> wrote:
>
>> On Mo, 2015-10-19 at 01:34 -0400, josef.p...@gmail.com wrote:
>> >
>>
>> 
>>
>>
>> >
>> > It looks like in 1.9 it depends on the order of the 2-d arrays, which
>> > it didn't do in 1.6
>> >
>>
>> Yes, it uses concatenate, and concatenate probably changed in 1.7 to use
>> "K" (since "K" did not really exists before 1.7 IIRC).
>> Not sure what we can do about it, the order is not something that is
>> easily fixed unless explicitly given. It might be optimized (as in this
>> case I would guess).
>> Whether or not doing the fastest route for these kind of functions is
>> faster for the user is of course impossible to know, we can only hope
>> that in most cases it is better.
>> If someone has an idea how to decide I am all ears, but I think all we
>> can do is put in asserts/tests in the downstream code if it relies
>> heavily on the order (or just copy, if the order is wrong) :(, another
>> example is change of the output order in advanced indexing in some
>> cases, it makes it faster sometimes, and probably slower in others, what
>> is right seems very much non-trivial.
>>
>
> To understand the reason:
>
> Is this to have more efficient memory access during copying?
>
> AFAIU, column_stack needs to create a new array which has to be either F
> or C contiguous, so we always have to pick one of the two. With a large
> number of 1d arrays it seemed more "intuitive" to me to copy them by
> columns.
>


just as background

I was mainly surprised last night about having my long held beliefs
shattered. I skipped numpy 1.7 and 1.8 in my development environment and
still need to catch up now that I use 1.9 as my main numpy version.

I might have to update a bit my "folk wisdom", which is not codified
anywhere and doesn't have unit tests.

For example, the improvement iteration for Fortran contiguous or not C or F
contiguous arrays sounded very useful, but I never checked if it would
affect us.

Josef



>
> Josef
>
>
>
>>
>> - Sebastian
>>
>>
>> >
>> > >>> np.column_stack((np.ones(10), np.ones((10, 2), order='F'))).flags
>> >   C_CONTIGUOUS : False
>> >   F_CONTIGUOUS : True
>> >   OWNDATA : True
>> >   WRITEABLE : True
>> >   ALIGNED : True
>> >   UPDATEIFCOPY : False
>> >
>> >
>> >
>> >
>> > which means the default order looks more like "K" now, not "C", IIUC
>> >
>> >
>> > Josef
>> >
>> >
>> >
>> >
>> >
>> > Josef
>> >
>> >
>> >
>> >
>> >
>> > Stephan
>> >
>> > ___
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@scipy.org
>> >
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> >
>> >
>> >
>> >
>> > ___
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@scipy.org
>> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-19 Thread josef.pktd
On Mon, Oct 19, 2015 at 9:15 PM, Nathaniel Smith  wrote:

> On Mon, Oct 19, 2015 at 5:55 AM,   wrote:
> >
> >
> > On Mon, Oct 19, 2015 at 2:14 AM, Nathaniel Smith  wrote:
> >>
> >> On Sun, Oct 18, 2015 at 9:35 PM,   wrote:
> >>  np.column_stack((np.ones(10), np.ones(10))).flags
> >> >   C_CONTIGUOUS : True
> >> >   F_CONTIGUOUS : False
> >> >
> >>  np.__version__
> >> > '1.9.2rc1'
> >> >
> >> >
> >> > on my notebook which has numpy 1.6.1 it is f_contiguous
> >> >
> >> >
> >> > I was just trying to optimize a loop over variable adjustment in
> >> > regression,
> >> > and found out that we lost fortran contiguity.
> >> >
> >> > I always thought column_stack is for fortran usage (linalg)
> >> >
> >> > What's the alternative?
> >> > column_stack was one of my favorite commands, and I always assumed we
> >> > have
> >> > in statsmodels the right memory layout to call the linalg libraries.
> >> >
> >> > ("assumed" means we don't have timing nor unit tests for it.)
> >>
> >> In general practice no numpy functions make any guarantee about memory
> >> layout, unless that's explicitly a documented part of their contract
> >> (e.g. 'ascontiguous', or some functions that take an order= argument
> >> -- I say "some" b/c there are functions like 'reshape' that take an
> >> argument called order= that doesn't actually refer to memory layout).
> >> This isn't so much an official policy as just a fact of life -- if
> >> no-one has any idea that the someone is depending on some memory
> >> layout detail then there's no way to realize that we've broken
> >> something. (But it is a good policy IMO.)
> >
> >
> > I understand that in general.
> >
> > However, I always thought column_stack is a array creation function which
> > have guaranteed memory layout. And since it's stacking by columns I
> thought
> > that order is always Fortran.
> > And the fact that it doesn't have an order keyword yet, I thought is
> just a
> > missing extension.
>
> I guess I don't know what to say except that I'm sorry to hear that
> and sorry that no-one noticed until several releases later.
>


Were there more contiguity changes in 0.10?
I just saw a large number of test errors and failures in statespace models
which are heavily based on cython code where it's not just a question of
performance.

I don't know yet what's going on, but I just saw that we have some explicit
tests for fortran contiguity which just started to fail.




>
> >> If this kind of problem gets caught during a pre-release cycle then we
> >> generally do try to fix it, because we try not to break code, but if
> >> it's been broken for 2 full releases then there's no much we can do --
> >> we can't go back in time to fix it so it sounds like you're stuck
> >> working around the problem no matter what (unless you want to refuse
> >> to support 1.9.0 through 1.10.1, which I assume you don't... worst
> >> case, you just have to do a global search replace of np.column_stack
> >> with statsmodels.utils.column_stack_f, right?).
> >>
> >> And the regression issue seems like the only real argument for
> >> changing it back -- we'd never guarantee f-contiguity here if starting
> >> from a blank slate, I think?
> >
> >
> > When the cat is out of the bag, the down stream developer writes
> > compatibility code or helper functions.
> >
> > I will do that at at least the parts I know are intentionally designed
> for F
> > memory order.
> >
> > ---
> >
> > statsmodels doesn't really check or consistently optimize the memory
> order,
> > except in some cython functions.
> > But, I thought we should be doing quite well with getting Fortran ordered
> > arrays. I only paid attention where we have more extensive loops
> internally.
> >
> > Nathniel, Does patsy guarantee memory layout (F-contiguous) when creating
> > design matrices?
>
> I never thought about it :-). So: no, it looks like right now patsy
> usually returns C-order matrices (or really, whatever np.empty or
> np.repeat returns), and there aren't any particular guarantees that
> this will continue to be the case in the future.
>
> Is returning matrices in F-contiguous layout really important? Should
> there be a return_type="fortran_matrix" option or something like that?
>

I don't know, yet. My intuition was that it would be better because we feed
the arrays directly to pinv/SVD or QR which, I think, require by default
Fortran contiguous.

However, my intuition might not be correct, and it might not make much
difference in a single OLS estimation.

There are a few critical loops in variable selection that I'm planning to
investigate to see how much it matters.
Memory optimization was never high in our priority compared to expanding
the functionality overall, but reading the Julia mailing list is starting
to worry me a bit. :)

(I'm even starting to see the reason for multiple dispatch.)

Josef


>
> -n
>
> --
> Nathaniel J. Smith -- http://vorpus.org

Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-19 Thread Nathaniel Smith
On Mon, Oct 19, 2015 at 5:55 AM,   wrote:
>
>
> On Mon, Oct 19, 2015 at 2:14 AM, Nathaniel Smith  wrote:
>>
>> On Sun, Oct 18, 2015 at 9:35 PM,   wrote:
>>  np.column_stack((np.ones(10), np.ones(10))).flags
>> >   C_CONTIGUOUS : True
>> >   F_CONTIGUOUS : False
>> >
>>  np.__version__
>> > '1.9.2rc1'
>> >
>> >
>> > on my notebook which has numpy 1.6.1 it is f_contiguous
>> >
>> >
>> > I was just trying to optimize a loop over variable adjustment in
>> > regression,
>> > and found out that we lost fortran contiguity.
>> >
>> > I always thought column_stack is for fortran usage (linalg)
>> >
>> > What's the alternative?
>> > column_stack was one of my favorite commands, and I always assumed we
>> > have
>> > in statsmodels the right memory layout to call the linalg libraries.
>> >
>> > ("assumed" means we don't have timing nor unit tests for it.)
>>
>> In general practice no numpy functions make any guarantee about memory
>> layout, unless that's explicitly a documented part of their contract
>> (e.g. 'ascontiguous', or some functions that take an order= argument
>> -- I say "some" b/c there are functions like 'reshape' that take an
>> argument called order= that doesn't actually refer to memory layout).
>> This isn't so much an official policy as just a fact of life -- if
>> no-one has any idea that the someone is depending on some memory
>> layout detail then there's no way to realize that we've broken
>> something. (But it is a good policy IMO.)
>
>
> I understand that in general.
>
> However, I always thought column_stack is a array creation function which
> have guaranteed memory layout. And since it's stacking by columns I thought
> that order is always Fortran.
> And the fact that it doesn't have an order keyword yet, I thought is just a
> missing extension.

I guess I don't know what to say except that I'm sorry to hear that
and sorry that no-one noticed until several releases later.

>> If this kind of problem gets caught during a pre-release cycle then we
>> generally do try to fix it, because we try not to break code, but if
>> it's been broken for 2 full releases then there's no much we can do --
>> we can't go back in time to fix it so it sounds like you're stuck
>> working around the problem no matter what (unless you want to refuse
>> to support 1.9.0 through 1.10.1, which I assume you don't... worst
>> case, you just have to do a global search replace of np.column_stack
>> with statsmodels.utils.column_stack_f, right?).
>>
>> And the regression issue seems like the only real argument for
>> changing it back -- we'd never guarantee f-contiguity here if starting
>> from a blank slate, I think?
>
>
> When the cat is out of the bag, the down stream developer writes
> compatibility code or helper functions.
>
> I will do that at at least the parts I know are intentionally designed for F
> memory order.
>
> ---
>
> statsmodels doesn't really check or consistently optimize the memory order,
> except in some cython functions.
> But, I thought we should be doing quite well with getting Fortran ordered
> arrays. I only paid attention where we have more extensive loops internally.
>
> Nathniel, Does patsy guarantee memory layout (F-contiguous) when creating
> design matrices?

I never thought about it :-). So: no, it looks like right now patsy
usually returns C-order matrices (or really, whatever np.empty or
np.repeat returns), and there aren't any particular guarantees that
this will continue to be the case in the future.

Is returning matrices in F-contiguous layout really important? Should
there be a return_type="fortran_matrix" option or something like that?

-n

-- 
Nathaniel J. Smith -- http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-18 Thread josef.pktd
On Mon, Oct 19, 2015 at 1:10 AM, Stephan Hoyer  wrote:

> Looking at the git logs, column_stack appears to have been that way
> (creating a new array with concatenate) since at least NumPy 0.9.2, way
> back in January 2006:
> https://github.com/numpy/numpy/blob/v0.9.2/numpy/lib/shape_base.py#L271
>

Then it must have been changed somewhere else between 1.6.1 amd 1.9.2rc1

I have my notebook and my desktop with different numpy and python versions
next to each other and I don't see a typo in my command.

I assume python 2.7 versus python 3.4 doesn't make a difference.

--

>>> np.column_stack((np.ones(10), np.ones(10))).flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

>>> np.__version__
'1.6.1'
>>> import sys
>>> sys.version
'2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)]'



>>> np.column_stack((np.ones(10), np.ones(10))).flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

>>> np.__version__
'1.9.2rc1'
>>> import sys
>>> sys.version
'3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:44:40) [MSC v.1600 64 bit
(AMD64)]'

---

comparing all flags, owndata also has changed, but I don't think that has
any effect

Josef


>
>
> Stephan
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-18 Thread josef.pktd
On Mon, Oct 19, 2015 at 1:27 AM,  wrote:

>
>
> On Mon, Oct 19, 2015 at 1:10 AM, Stephan Hoyer  wrote:
>
>> Looking at the git logs, column_stack appears to have been that way
>> (creating a new array with concatenate) since at least NumPy 0.9.2, way
>> back in January 2006:
>> https://github.com/numpy/numpy/blob/v0.9.2/numpy/lib/shape_base.py#L271
>>
>
> Then it must have been changed somewhere else between 1.6.1 amd 1.9.2rc1
>
> I have my notebook and my desktop with different numpy and python versions
> next to each other and I don't see a typo in my command.
>
> I assume python 2.7 versus python 3.4 doesn't make a difference.
>
> --
>
> >>> np.column_stack((np.ones(10), np.ones(10))).flags
>   C_CONTIGUOUS : False
>   F_CONTIGUOUS : True
>   OWNDATA : False
>   WRITEABLE : True
>   ALIGNED : True
>   UPDATEIFCOPY : False
>
> >>> np.__version__
> '1.6.1'
> >>> import sys
> >>> sys.version
> '2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)]'
>
> 
>
> >>> np.column_stack((np.ones(10), np.ones(10))).flags
>   C_CONTIGUOUS : True
>   F_CONTIGUOUS : False
>   OWNDATA : True
>   WRITEABLE : True
>   ALIGNED : True
>   UPDATEIFCOPY : False
>
> >>> np.__version__
> '1.9.2rc1'
> >>> import sys
> >>> sys.version
> '3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:44:40) [MSC v.1600 64 bit
> (AMD64)]'
>
> ---
>
> comparing all flags, owndata also has changed, but I don't think that has
> any effect
>

qualification

It looks like in 1.9 it depends on the order of the 2-d arrays, which it
didn't do in 1.6

>>> np.column_stack((np.ones(10), np.ones((10, 2), order='F'))).flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False


which means the default order looks more like "K" now, not "C", IIUC

Josef



>
> Josef
>
>
>>
>>
>> Stephan
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-18 Thread josef.pktd
On Mon, Oct 19, 2015 at 12:35 AM,  wrote:

> >>> np.column_stack((np.ones(10), np.ones(10))).flags
>   C_CONTIGUOUS : True
>   F_CONTIGUOUS : False
>
> >>> np.__version__
> '1.9.2rc1'
>
>
> on my notebook which has numpy 1.6.1 it is f_contiguous
>
>
> I was just trying to optimize a loop over variable adjustment in
> regression, and found out that we lost fortran contiguity.
>
> I always thought column_stack is for fortran usage (linalg)
>
> What's the alternative?
> column_stack was one of my favorite commands, and I always assumed we have
> in statsmodels the right memory layout to call the linalg libraries.
>
> ("assumed" means we don't have timing nor unit tests for it.)
>

What's the difference between using array and column_stack except for a
transpose and memory order?

my current usecase is copying columns on top of each other

#exog0 = np.column_stack((np.ones(nobs), x0, x0s2))

exog0 = np.array((np.ones(nobs), x0, x0s2)).T

exog_opt = exog0.copy(order='F')


the following part is in a loop, followed by some linear algebra for OLS,
res_optim is a scalar parameter.
exog_opt[:, -1] = np.clip(exog0[:, k] + res_optim, 0, np.inf)

Are my assumption on memory access correct, or is there a better way?

(I have quite a bit code in statsmodels that is optimized for fortran
ordered memory layout especially for sequential regression, under the
assumption that column_stack provides that Fortran order.)

Also, do I need to start timing and memory benchmarking or is it obvious
that a loop

for k in range(maxi):
x = arr[:, :k]


depends on memory order?

Josef


>
> Josef
>
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-18 Thread Stephan Hoyer
Looking at the git logs, column_stack appears to have been that way
(creating a new array with concatenate) since at least NumPy 0.9.2, way
back in January 2006:
https://github.com/numpy/numpy/blob/v0.9.2/numpy/lib/shape_base.py#L271

Stephan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion