On Mon, Oct 19, 2015 at 9:51 PM, wrote:
>
>
> On Mon, Oct 19, 2015 at 9:15 PM, Nathaniel Smith wrote:
>
>> On Mon, Oct 19, 2015 at 5:55 AM, wrote:
>> >
>> >
>> > On Mon, Oct 19, 2015 at 2:14 AM, Nathaniel Smith wrote:
>> >>
>> >> On Sun, Oct 18, 2015 at 9:35 PM, wrote:
>> >> np.column_stack((np.ones(10), np.ones(10))).flags
>> >> > C_CONTIGUOUS : True
>> >> > F_CONTIGUOUS : False
>> >> >
>> >> np.__version__
>> >> > '1.9.2rc1'
>> >> >
>> >> >
>> >> > on my notebook which has numpy 1.6.1 it is f_contiguous
>> >> >
>> >> >
>> >> > I was just trying to optimize a loop over variable adjustment in
>> >> > regression,
>> >> > and found out that we lost fortran contiguity.
>> >> >
>> >> > I always thought column_stack is for fortran usage (linalg)
>> >> >
>> >> > What's the alternative?
>> >> > column_stack was one of my favorite commands, and I always assumed we
>> >> > have
>> >> > in statsmodels the right memory layout to call the linalg libraries.
>> >> >
>> >> > ("assumed" means we don't have timing nor unit tests for it.)
>> >>
>> >> In general practice no numpy functions make any guarantee about memory
>> >> layout, unless that's explicitly a documented part of their contract
>> >> (e.g. 'ascontiguous', or some functions that take an order= argument
>> >> -- I say "some" b/c there are functions like 'reshape' that take an
>> >> argument called order= that doesn't actually refer to memory layout).
>> >> This isn't so much an official policy as just a fact of life -- if
>> >> no-one has any idea that the someone is depending on some memory
>> >> layout detail then there's no way to realize that we've broken
>> >> something. (But it is a good policy IMO.)
>> >
>> >
>> > I understand that in general.
>> >
>> > However, I always thought column_stack is a array creation function
>> which
>> > have guaranteed memory layout. And since it's stacking by columns I
>> thought
>> > that order is always Fortran.
>> > And the fact that it doesn't have an order keyword yet, I thought is
>> just a
>> > missing extension.
>>
>> I guess I don't know what to say except that I'm sorry to hear that
>> and sorry that no-one noticed until several releases later.
>>
>
>
> Were there more contiguity changes in 0.10?
> I just saw a large number of test errors and failures in statespace models
> which are heavily based on cython code where it's not just a question of
> performance.
>
> I don't know yet what's going on, but I just saw that we have some
> explicit tests for fortran contiguity which just started to fail.
>
>
>
>
>>
>> >> If this kind of problem gets caught during a pre-release cycle then we
>> >> generally do try to fix it, because we try not to break code, but if
>> >> it's been broken for 2 full releases then there's no much we can do --
>> >> we can't go back in time to fix it so it sounds like you're stuck
>> >> working around the problem no matter what (unless you want to refuse
>> >> to support 1.9.0 through 1.10.1, which I assume you don't... worst
>> >> case, you just have to do a global search replace of np.column_stack
>> >> with statsmodels.utils.column_stack_f, right?).
>> >>
>> >> And the regression issue seems like the only real argument for
>> >> changing it back -- we'd never guarantee f-contiguity here if starting
>> >> from a blank slate, I think?
>> >
>> >
>> > When the cat is out of the bag, the down stream developer writes
>> > compatibility code or helper functions.
>> >
>> > I will do that at at least the parts I know are intentionally designed
>> for F
>> > memory order.
>> >
>> > ---
>> >
>> > statsmodels doesn't really check or consistently optimize the memory
>> order,
>> > except in some cython functions.
>> > But, I thought we should be doing quite well with getting Fortran
>> ordered
>> > arrays. I only paid attention where we have more extensive loops
>> internally.
>> >
>> > Nathniel, Does patsy guarantee memory layout (F-contiguous) when
>> creating
>> > design matrices?
>>
>> I never thought about it :-). So: no, it looks like right now patsy
>> usually returns C-order matrices (or really, whatever np.empty or
>> np.repeat returns), and there aren't any particular guarantees that
>> this will continue to be the case in the future.
>>
>> Is returning matrices in F-contiguous layout really important? Should
>> there be a return_type="fortran_matrix" option or something like that?
>>
>
> I don't know, yet. My intuition was that it would be better because we
> feed the arrays directly to pinv/SVD or QR which, I think, require by
> default Fortran contiguous.
>
> However, my intuition might not be correct, and it might not make much
> difference in a single OLS estimation.
>
I did some quick timing checks of pinv and qr, and the Fortran ordered is
only about 5% to 15% faster and uses about the same amount of memory
(watching the Task manager). So, nothing to get excited about.
I used functions like this where xf is either C or Fortran contiguous
def funcl_f():
fo