Re: [Numpy-discussion] when did column_stack become C-contiguous?
On Mon, Oct 19, 2015 at 9:51 PM,wrote: > > > On Mon, Oct 19, 2015 at 9:15 PM, Nathaniel Smith wrote: > >> On Mon, Oct 19, 2015 at 5:55 AM, wrote: >> > >> > >> > On Mon, Oct 19, 2015 at 2:14 AM, Nathaniel Smith wrote: >> >> >> >> On Sun, Oct 18, 2015 at 9:35 PM, wrote: >> >> np.column_stack((np.ones(10), np.ones(10))).flags >> >> > C_CONTIGUOUS : True >> >> > F_CONTIGUOUS : False >> >> > >> >> np.__version__ >> >> > '1.9.2rc1' >> >> > >> >> > >> >> > on my notebook which has numpy 1.6.1 it is f_contiguous >> >> > >> >> > >> >> > I was just trying to optimize a loop over variable adjustment in >> >> > regression, >> >> > and found out that we lost fortran contiguity. >> >> > >> >> > I always thought column_stack is for fortran usage (linalg) >> >> > >> >> > What's the alternative? >> >> > column_stack was one of my favorite commands, and I always assumed we >> >> > have >> >> > in statsmodels the right memory layout to call the linalg libraries. >> >> > >> >> > ("assumed" means we don't have timing nor unit tests for it.) >> >> >> >> In general practice no numpy functions make any guarantee about memory >> >> layout, unless that's explicitly a documented part of their contract >> >> (e.g. 'ascontiguous', or some functions that take an order= argument >> >> -- I say "some" b/c there are functions like 'reshape' that take an >> >> argument called order= that doesn't actually refer to memory layout). >> >> This isn't so much an official policy as just a fact of life -- if >> >> no-one has any idea that the someone is depending on some memory >> >> layout detail then there's no way to realize that we've broken >> >> something. (But it is a good policy IMO.) >> > >> > >> > I understand that in general. >> > >> > However, I always thought column_stack is a array creation function >> which >> > have guaranteed memory layout. And since it's stacking by columns I >> thought >> > that order is always Fortran. >> > And the fact that it doesn't have an order keyword yet, I thought is >> just a >> > missing extension. >> >> I guess I don't know what to say except that I'm sorry to hear that >> and sorry that no-one noticed until several releases later. >> > > > Were there more contiguity changes in 0.10? > I just saw a large number of test errors and failures in statespace models > which are heavily based on cython code where it's not just a question of > performance. > > I don't know yet what's going on, but I just saw that we have some > explicit tests for fortran contiguity which just started to fail. > > > > >> >> >> If this kind of problem gets caught during a pre-release cycle then we >> >> generally do try to fix it, because we try not to break code, but if >> >> it's been broken for 2 full releases then there's no much we can do -- >> >> we can't go back in time to fix it so it sounds like you're stuck >> >> working around the problem no matter what (unless you want to refuse >> >> to support 1.9.0 through 1.10.1, which I assume you don't... worst >> >> case, you just have to do a global search replace of np.column_stack >> >> with statsmodels.utils.column_stack_f, right?). >> >> >> >> And the regression issue seems like the only real argument for >> >> changing it back -- we'd never guarantee f-contiguity here if starting >> >> from a blank slate, I think? >> > >> > >> > When the cat is out of the bag, the down stream developer writes >> > compatibility code or helper functions. >> > >> > I will do that at at least the parts I know are intentionally designed >> for F >> > memory order. >> > >> > --- >> > >> > statsmodels doesn't really check or consistently optimize the memory >> order, >> > except in some cython functions. >> > But, I thought we should be doing quite well with getting Fortran >> ordered >> > arrays. I only paid attention where we have more extensive loops >> internally. >> > >> > Nathniel, Does patsy guarantee memory layout (F-contiguous) when >> creating >> > design matrices? >> >> I never thought about it :-). So: no, it looks like right now patsy >> usually returns C-order matrices (or really, whatever np.empty or >> np.repeat returns), and there aren't any particular guarantees that >> this will continue to be the case in the future. >> >> Is returning matrices in F-contiguous layout really important? Should >> there be a return_type="fortran_matrix" option or something like that? >> > > I don't know, yet. My intuition was that it would be better because we > feed the arrays directly to pinv/SVD or QR which, I think, require by > default Fortran contiguous. > > However, my intuition might not be correct, and it might not make much > difference in a single OLS estimation. > I did some quick timing checks of pinv and qr, and the Fortran ordered is only about 5% to 15% faster and uses about the same amount of memory (watching the Task manager). So, nothing to get excited
Re: [Numpy-discussion] when did column_stack become C-contiguous?
On Sun, Oct 18, 2015 at 9:35 PM,wrote: np.column_stack((np.ones(10), np.ones(10))).flags > C_CONTIGUOUS : True > F_CONTIGUOUS : False > np.__version__ > '1.9.2rc1' > > > on my notebook which has numpy 1.6.1 it is f_contiguous > > > I was just trying to optimize a loop over variable adjustment in regression, > and found out that we lost fortran contiguity. > > I always thought column_stack is for fortran usage (linalg) > > What's the alternative? > column_stack was one of my favorite commands, and I always assumed we have > in statsmodels the right memory layout to call the linalg libraries. > > ("assumed" means we don't have timing nor unit tests for it.) In general practice no numpy functions make any guarantee about memory layout, unless that's explicitly a documented part of their contract (e.g. 'ascontiguous', or some functions that take an order= argument -- I say "some" b/c there are functions like 'reshape' that take an argument called order= that doesn't actually refer to memory layout). This isn't so much an official policy as just a fact of life -- if no-one has any idea that the someone is depending on some memory layout detail then there's no way to realize that we've broken something. (But it is a good policy IMO.) If this kind of problem gets caught during a pre-release cycle then we generally do try to fix it, because we try not to break code, but if it's been broken for 2 full releases then there's no much we can do -- we can't go back in time to fix it so it sounds like you're stuck working around the problem no matter what (unless you want to refuse to support 1.9.0 through 1.10.1, which I assume you don't... worst case, you just have to do a global search replace of np.column_stack with statsmodels.utils.column_stack_f, right?). And the regression issue seems like the only real argument for changing it back -- we'd never guarantee f-contiguity here if starting from a blank slate, I think? -n -- Nathaniel J. Smith -- http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] when did column_stack become C-contiguous?
On Mon, Oct 19, 2015 at 2:14 AM, Nathaniel Smithwrote: > On Sun, Oct 18, 2015 at 9:35 PM, wrote: > np.column_stack((np.ones(10), np.ones(10))).flags > > C_CONTIGUOUS : True > > F_CONTIGUOUS : False > > > np.__version__ > > '1.9.2rc1' > > > > > > on my notebook which has numpy 1.6.1 it is f_contiguous > > > > > > I was just trying to optimize a loop over variable adjustment in > regression, > > and found out that we lost fortran contiguity. > > > > I always thought column_stack is for fortran usage (linalg) > > > > What's the alternative? > > column_stack was one of my favorite commands, and I always assumed we > have > > in statsmodels the right memory layout to call the linalg libraries. > > > > ("assumed" means we don't have timing nor unit tests for it.) > > In general practice no numpy functions make any guarantee about memory > layout, unless that's explicitly a documented part of their contract > (e.g. 'ascontiguous', or some functions that take an order= argument > -- I say "some" b/c there are functions like 'reshape' that take an > argument called order= that doesn't actually refer to memory layout). > This isn't so much an official policy as just a fact of life -- if > no-one has any idea that the someone is depending on some memory > layout detail then there's no way to realize that we've broken > something. (But it is a good policy IMO.) > I understand that in general. However, I always thought column_stack is a array creation function which have guaranteed memory layout. And since it's stacking by columns I thought that order is always Fortran. And the fact that it doesn't have an order keyword yet, I thought is just a missing extension. > > If this kind of problem gets caught during a pre-release cycle then we > generally do try to fix it, because we try not to break code, but if > it's been broken for 2 full releases then there's no much we can do -- > we can't go back in time to fix it so it sounds like you're stuck > working around the problem no matter what (unless you want to refuse > to support 1.9.0 through 1.10.1, which I assume you don't... worst > case, you just have to do a global search replace of np.column_stack > with statsmodels.utils.column_stack_f, right?). > > And the regression issue seems like the only real argument for > changing it back -- we'd never guarantee f-contiguity here if starting > from a blank slate, I think? > When the cat is out of the bag, the down stream developer writes compatibility code or helper functions. I will do that at at least the parts I know are intentionally designed for F memory order. --- statsmodels doesn't really check or consistently optimize the memory order, except in some cython functions. But, I thought we should be doing quite well with getting Fortran ordered arrays. I only paid attention where we have more extensive loops internally. Nathniel, Does patsy guarantee memory layout (F-contiguous) when creating design matrices? Josef > > -n > > -- > Nathaniel J. Smith -- http://vorpus.org > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] when did column_stack become C-contiguous?
On Mon, Oct 19, 2015 at 5:16 AM, Sebastian Bergwrote: > On Mo, 2015-10-19 at 01:34 -0400, josef.p...@gmail.com wrote: > > > > > > > > > > It looks like in 1.9 it depends on the order of the 2-d arrays, which > > it didn't do in 1.6 > > > > Yes, it uses concatenate, and concatenate probably changed in 1.7 to use > "K" (since "K" did not really exists before 1.7 IIRC). > Not sure what we can do about it, the order is not something that is > easily fixed unless explicitly given. It might be optimized (as in this > case I would guess). > Whether or not doing the fastest route for these kind of functions is > faster for the user is of course impossible to know, we can only hope > that in most cases it is better. > If someone has an idea how to decide I am all ears, but I think all we > can do is put in asserts/tests in the downstream code if it relies > heavily on the order (or just copy, if the order is wrong) :(, another > example is change of the output order in advanced indexing in some > cases, it makes it faster sometimes, and probably slower in others, what > is right seems very much non-trivial. > To understand the reason: Is this to have more efficient memory access during copying? AFAIU, column_stack needs to create a new array which has to be either F or C contiguous, so we always have to pick one of the two. With a large number of 1d arrays it seemed more "intuitive" to me to copy them by columns. Josef > > - Sebastian > > > > > > >>> np.column_stack((np.ones(10), np.ones((10, 2), order='F'))).flags > > C_CONTIGUOUS : False > > F_CONTIGUOUS : True > > OWNDATA : True > > WRITEABLE : True > > ALIGNED : True > > UPDATEIFCOPY : False > > > > > > > > > > which means the default order looks more like "K" now, not "C", IIUC > > > > > > Josef > > > > > > > > > > > > Josef > > > > > > > > > > > > Stephan > > > > ___ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org > > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > ___ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org > > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] when did column_stack become C-contiguous?
On Mo, 2015-10-19 at 01:34 -0400, josef.p...@gmail.com wrote: > > > It looks like in 1.9 it depends on the order of the 2-d arrays, which > it didn't do in 1.6 > Yes, it uses concatenate, and concatenate probably changed in 1.7 to use "K" (since "K" did not really exists before 1.7 IIRC). Not sure what we can do about it, the order is not something that is easily fixed unless explicitly given. It might be optimized (as in this case I would guess). Whether or not doing the fastest route for these kind of functions is faster for the user is of course impossible to know, we can only hope that in most cases it is better. If someone has an idea how to decide I am all ears, but I think all we can do is put in asserts/tests in the downstream code if it relies heavily on the order (or just copy, if the order is wrong) :(, another example is change of the output order in advanced indexing in some cases, it makes it faster sometimes, and probably slower in others, what is right seems very much non-trivial. - Sebastian > > >>> np.column_stack((np.ones(10), np.ones((10, 2), order='F'))).flags > C_CONTIGUOUS : False > F_CONTIGUOUS : True > OWNDATA : True > WRITEABLE : True > ALIGNED : True > UPDATEIFCOPY : False > > > > > which means the default order looks more like "K" now, not "C", IIUC > > > Josef > > > > > > Josef > > > > > > Stephan > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion signature.asc Description: This is a digitally signed message part ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] when did column_stack become C-contiguous?
On Mon, Oct 19, 2015 at 9:00 AM,wrote: > > > On Mon, Oct 19, 2015 at 5:16 AM, Sebastian Berg < > sebast...@sipsolutions.net> wrote: > >> On Mo, 2015-10-19 at 01:34 -0400, josef.p...@gmail.com wrote: >> > >> >> >> >> >> > >> > It looks like in 1.9 it depends on the order of the 2-d arrays, which >> > it didn't do in 1.6 >> > >> >> Yes, it uses concatenate, and concatenate probably changed in 1.7 to use >> "K" (since "K" did not really exists before 1.7 IIRC). >> Not sure what we can do about it, the order is not something that is >> easily fixed unless explicitly given. It might be optimized (as in this >> case I would guess). >> Whether or not doing the fastest route for these kind of functions is >> faster for the user is of course impossible to know, we can only hope >> that in most cases it is better. >> If someone has an idea how to decide I am all ears, but I think all we >> can do is put in asserts/tests in the downstream code if it relies >> heavily on the order (or just copy, if the order is wrong) :(, another >> example is change of the output order in advanced indexing in some >> cases, it makes it faster sometimes, and probably slower in others, what >> is right seems very much non-trivial. >> > > To understand the reason: > > Is this to have more efficient memory access during copying? > > AFAIU, column_stack needs to create a new array which has to be either F > or C contiguous, so we always have to pick one of the two. With a large > number of 1d arrays it seemed more "intuitive" to me to copy them by > columns. > just as background I was mainly surprised last night about having my long held beliefs shattered. I skipped numpy 1.7 and 1.8 in my development environment and still need to catch up now that I use 1.9 as my main numpy version. I might have to update a bit my "folk wisdom", which is not codified anywhere and doesn't have unit tests. For example, the improvement iteration for Fortran contiguous or not C or F contiguous arrays sounded very useful, but I never checked if it would affect us. Josef > > Josef > > > >> >> - Sebastian >> >> >> > >> > >>> np.column_stack((np.ones(10), np.ones((10, 2), order='F'))).flags >> > C_CONTIGUOUS : False >> > F_CONTIGUOUS : True >> > OWNDATA : True >> > WRITEABLE : True >> > ALIGNED : True >> > UPDATEIFCOPY : False >> > >> > >> > >> > >> > which means the default order looks more like "K" now, not "C", IIUC >> > >> > >> > Josef >> > >> > >> > >> > >> > >> > Josef >> > >> > >> > >> > >> > >> > Stephan >> > >> > ___ >> > NumPy-Discussion mailing list >> > NumPy-Discussion@scipy.org >> > >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> > >> > >> > >> > >> > ___ >> > NumPy-Discussion mailing list >> > NumPy-Discussion@scipy.org >> > https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> ___ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] when did column_stack become C-contiguous?
On Mon, Oct 19, 2015 at 9:15 PM, Nathaniel Smithwrote: > On Mon, Oct 19, 2015 at 5:55 AM, wrote: > > > > > > On Mon, Oct 19, 2015 at 2:14 AM, Nathaniel Smith wrote: > >> > >> On Sun, Oct 18, 2015 at 9:35 PM, wrote: > >> np.column_stack((np.ones(10), np.ones(10))).flags > >> > C_CONTIGUOUS : True > >> > F_CONTIGUOUS : False > >> > > >> np.__version__ > >> > '1.9.2rc1' > >> > > >> > > >> > on my notebook which has numpy 1.6.1 it is f_contiguous > >> > > >> > > >> > I was just trying to optimize a loop over variable adjustment in > >> > regression, > >> > and found out that we lost fortran contiguity. > >> > > >> > I always thought column_stack is for fortran usage (linalg) > >> > > >> > What's the alternative? > >> > column_stack was one of my favorite commands, and I always assumed we > >> > have > >> > in statsmodels the right memory layout to call the linalg libraries. > >> > > >> > ("assumed" means we don't have timing nor unit tests for it.) > >> > >> In general practice no numpy functions make any guarantee about memory > >> layout, unless that's explicitly a documented part of their contract > >> (e.g. 'ascontiguous', or some functions that take an order= argument > >> -- I say "some" b/c there are functions like 'reshape' that take an > >> argument called order= that doesn't actually refer to memory layout). > >> This isn't so much an official policy as just a fact of life -- if > >> no-one has any idea that the someone is depending on some memory > >> layout detail then there's no way to realize that we've broken > >> something. (But it is a good policy IMO.) > > > > > > I understand that in general. > > > > However, I always thought column_stack is a array creation function which > > have guaranteed memory layout. And since it's stacking by columns I > thought > > that order is always Fortran. > > And the fact that it doesn't have an order keyword yet, I thought is > just a > > missing extension. > > I guess I don't know what to say except that I'm sorry to hear that > and sorry that no-one noticed until several releases later. > Were there more contiguity changes in 0.10? I just saw a large number of test errors and failures in statespace models which are heavily based on cython code where it's not just a question of performance. I don't know yet what's going on, but I just saw that we have some explicit tests for fortran contiguity which just started to fail. > > >> If this kind of problem gets caught during a pre-release cycle then we > >> generally do try to fix it, because we try not to break code, but if > >> it's been broken for 2 full releases then there's no much we can do -- > >> we can't go back in time to fix it so it sounds like you're stuck > >> working around the problem no matter what (unless you want to refuse > >> to support 1.9.0 through 1.10.1, which I assume you don't... worst > >> case, you just have to do a global search replace of np.column_stack > >> with statsmodels.utils.column_stack_f, right?). > >> > >> And the regression issue seems like the only real argument for > >> changing it back -- we'd never guarantee f-contiguity here if starting > >> from a blank slate, I think? > > > > > > When the cat is out of the bag, the down stream developer writes > > compatibility code or helper functions. > > > > I will do that at at least the parts I know are intentionally designed > for F > > memory order. > > > > --- > > > > statsmodels doesn't really check or consistently optimize the memory > order, > > except in some cython functions. > > But, I thought we should be doing quite well with getting Fortran ordered > > arrays. I only paid attention where we have more extensive loops > internally. > > > > Nathniel, Does patsy guarantee memory layout (F-contiguous) when creating > > design matrices? > > I never thought about it :-). So: no, it looks like right now patsy > usually returns C-order matrices (or really, whatever np.empty or > np.repeat returns), and there aren't any particular guarantees that > this will continue to be the case in the future. > > Is returning matrices in F-contiguous layout really important? Should > there be a return_type="fortran_matrix" option or something like that? > I don't know, yet. My intuition was that it would be better because we feed the arrays directly to pinv/SVD or QR which, I think, require by default Fortran contiguous. However, my intuition might not be correct, and it might not make much difference in a single OLS estimation. There are a few critical loops in variable selection that I'm planning to investigate to see how much it matters. Memory optimization was never high in our priority compared to expanding the functionality overall, but reading the Julia mailing list is starting to worry me a bit. :) (I'm even starting to see the reason for multiple dispatch.) Josef > > -n > > -- > Nathaniel J. Smith -- http://vorpus.org
Re: [Numpy-discussion] when did column_stack become C-contiguous?
On Mon, Oct 19, 2015 at 5:55 AM,wrote: > > > On Mon, Oct 19, 2015 at 2:14 AM, Nathaniel Smith wrote: >> >> On Sun, Oct 18, 2015 at 9:35 PM, wrote: >> np.column_stack((np.ones(10), np.ones(10))).flags >> > C_CONTIGUOUS : True >> > F_CONTIGUOUS : False >> > >> np.__version__ >> > '1.9.2rc1' >> > >> > >> > on my notebook which has numpy 1.6.1 it is f_contiguous >> > >> > >> > I was just trying to optimize a loop over variable adjustment in >> > regression, >> > and found out that we lost fortran contiguity. >> > >> > I always thought column_stack is for fortran usage (linalg) >> > >> > What's the alternative? >> > column_stack was one of my favorite commands, and I always assumed we >> > have >> > in statsmodels the right memory layout to call the linalg libraries. >> > >> > ("assumed" means we don't have timing nor unit tests for it.) >> >> In general practice no numpy functions make any guarantee about memory >> layout, unless that's explicitly a documented part of their contract >> (e.g. 'ascontiguous', or some functions that take an order= argument >> -- I say "some" b/c there are functions like 'reshape' that take an >> argument called order= that doesn't actually refer to memory layout). >> This isn't so much an official policy as just a fact of life -- if >> no-one has any idea that the someone is depending on some memory >> layout detail then there's no way to realize that we've broken >> something. (But it is a good policy IMO.) > > > I understand that in general. > > However, I always thought column_stack is a array creation function which > have guaranteed memory layout. And since it's stacking by columns I thought > that order is always Fortran. > And the fact that it doesn't have an order keyword yet, I thought is just a > missing extension. I guess I don't know what to say except that I'm sorry to hear that and sorry that no-one noticed until several releases later. >> If this kind of problem gets caught during a pre-release cycle then we >> generally do try to fix it, because we try not to break code, but if >> it's been broken for 2 full releases then there's no much we can do -- >> we can't go back in time to fix it so it sounds like you're stuck >> working around the problem no matter what (unless you want to refuse >> to support 1.9.0 through 1.10.1, which I assume you don't... worst >> case, you just have to do a global search replace of np.column_stack >> with statsmodels.utils.column_stack_f, right?). >> >> And the regression issue seems like the only real argument for >> changing it back -- we'd never guarantee f-contiguity here if starting >> from a blank slate, I think? > > > When the cat is out of the bag, the down stream developer writes > compatibility code or helper functions. > > I will do that at at least the parts I know are intentionally designed for F > memory order. > > --- > > statsmodels doesn't really check or consistently optimize the memory order, > except in some cython functions. > But, I thought we should be doing quite well with getting Fortran ordered > arrays. I only paid attention where we have more extensive loops internally. > > Nathniel, Does patsy guarantee memory layout (F-contiguous) when creating > design matrices? I never thought about it :-). So: no, it looks like right now patsy usually returns C-order matrices (or really, whatever np.empty or np.repeat returns), and there aren't any particular guarantees that this will continue to be the case in the future. Is returning matrices in F-contiguous layout really important? Should there be a return_type="fortran_matrix" option or something like that? -n -- Nathaniel J. Smith -- http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] when did column_stack become C-contiguous?
On Mon, Oct 19, 2015 at 1:10 AM, Stephan Hoyerwrote: > Looking at the git logs, column_stack appears to have been that way > (creating a new array with concatenate) since at least NumPy 0.9.2, way > back in January 2006: > https://github.com/numpy/numpy/blob/v0.9.2/numpy/lib/shape_base.py#L271 > Then it must have been changed somewhere else between 1.6.1 amd 1.9.2rc1 I have my notebook and my desktop with different numpy and python versions next to each other and I don't see a typo in my command. I assume python 2.7 versus python 3.4 doesn't make a difference. -- >>> np.column_stack((np.ones(10), np.ones(10))).flags C_CONTIGUOUS : False F_CONTIGUOUS : True OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False >>> np.__version__ '1.6.1' >>> import sys >>> sys.version '2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)]' >>> np.column_stack((np.ones(10), np.ones(10))).flags C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False >>> np.__version__ '1.9.2rc1' >>> import sys >>> sys.version '3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:44:40) [MSC v.1600 64 bit (AMD64)]' --- comparing all flags, owndata also has changed, but I don't think that has any effect Josef > > > Stephan > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] when did column_stack become C-contiguous?
On Mon, Oct 19, 2015 at 1:27 AM,wrote: > > > On Mon, Oct 19, 2015 at 1:10 AM, Stephan Hoyer wrote: > >> Looking at the git logs, column_stack appears to have been that way >> (creating a new array with concatenate) since at least NumPy 0.9.2, way >> back in January 2006: >> https://github.com/numpy/numpy/blob/v0.9.2/numpy/lib/shape_base.py#L271 >> > > Then it must have been changed somewhere else between 1.6.1 amd 1.9.2rc1 > > I have my notebook and my desktop with different numpy and python versions > next to each other and I don't see a typo in my command. > > I assume python 2.7 versus python 3.4 doesn't make a difference. > > -- > > >>> np.column_stack((np.ones(10), np.ones(10))).flags > C_CONTIGUOUS : False > F_CONTIGUOUS : True > OWNDATA : False > WRITEABLE : True > ALIGNED : True > UPDATEIFCOPY : False > > >>> np.__version__ > '1.6.1' > >>> import sys > >>> sys.version > '2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)]' > > > > >>> np.column_stack((np.ones(10), np.ones(10))).flags > C_CONTIGUOUS : True > F_CONTIGUOUS : False > OWNDATA : True > WRITEABLE : True > ALIGNED : True > UPDATEIFCOPY : False > > >>> np.__version__ > '1.9.2rc1' > >>> import sys > >>> sys.version > '3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:44:40) [MSC v.1600 64 bit > (AMD64)]' > > --- > > comparing all flags, owndata also has changed, but I don't think that has > any effect > qualification It looks like in 1.9 it depends on the order of the 2-d arrays, which it didn't do in 1.6 >>> np.column_stack((np.ones(10), np.ones((10, 2), order='F'))).flags C_CONTIGUOUS : False F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False which means the default order looks more like "K" now, not "C", IIUC Josef > > Josef > > >> >> >> Stephan >> >> ___ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] when did column_stack become C-contiguous?
On Mon, Oct 19, 2015 at 12:35 AM,wrote: > >>> np.column_stack((np.ones(10), np.ones(10))).flags > C_CONTIGUOUS : True > F_CONTIGUOUS : False > > >>> np.__version__ > '1.9.2rc1' > > > on my notebook which has numpy 1.6.1 it is f_contiguous > > > I was just trying to optimize a loop over variable adjustment in > regression, and found out that we lost fortran contiguity. > > I always thought column_stack is for fortran usage (linalg) > > What's the alternative? > column_stack was one of my favorite commands, and I always assumed we have > in statsmodels the right memory layout to call the linalg libraries. > > ("assumed" means we don't have timing nor unit tests for it.) > What's the difference between using array and column_stack except for a transpose and memory order? my current usecase is copying columns on top of each other #exog0 = np.column_stack((np.ones(nobs), x0, x0s2)) exog0 = np.array((np.ones(nobs), x0, x0s2)).T exog_opt = exog0.copy(order='F') the following part is in a loop, followed by some linear algebra for OLS, res_optim is a scalar parameter. exog_opt[:, -1] = np.clip(exog0[:, k] + res_optim, 0, np.inf) Are my assumption on memory access correct, or is there a better way? (I have quite a bit code in statsmodels that is optimized for fortran ordered memory layout especially for sequential regression, under the assumption that column_stack provides that Fortran order.) Also, do I need to start timing and memory benchmarking or is it obvious that a loop for k in range(maxi): x = arr[:, :k] depends on memory order? Josef > > Josef > > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] when did column_stack become C-contiguous?
Looking at the git logs, column_stack appears to have been that way (creating a new array with concatenate) since at least NumPy 0.9.2, way back in January 2006: https://github.com/numpy/numpy/blob/v0.9.2/numpy/lib/shape_base.py#L271 Stephan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion