Re: [Numpy-discussion] Change default order to Fortran order
Hi, On Mon, Aug 3, 2015 at 3:13 PM, Gregory Lee grle...@gmail.com wrote: I agree that often you don't need to worry about the memory order. However, it is not uncommon in medical imaging to go back and forth between a 2D or 3D image representation and a 1D array representation (e.g. as often used in image reconstruction algorithms). I found that the main time it was necessary to pay careful attention to the memory layout was when converting Matlab scripts that involve reshaping operations. Yes, good point.A typical example would be this kind of thing: # data is a 4D array with time / volume axis last data_2d = data.reshape((-1, data.shape[-1]) For MATLAB, the columns of this array would (by default) have the values on the first axis fastest changing, then the second, then the third, whereas numpy's default is the other way round. I find I usually don't have to worry about this, because I'm later going to do: data_processed_4d = data_2d.reshape(data.shape) which will reverse the previous reshape in the correct way. But in any case - this is not directly to do with the array memory layout. You will get the same output from reshape whether the memory layout of `data` was Fortran or C. Cheers, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Change default order to Fortran order
On Mon Aug 3 16:26:10 2015 GMT+0200, Matthew Brett wrote: Hi, On Mon, Aug 3, 2015 at 3:13 PM, Gregory Lee grle...@gmail.com wrote: I agree that often you don't need to worry about the memory order. However, it is not uncommon in medical imaging to go back and forth between a 2D or 3D image representation and a 1D array representation (e.g. as often used in image reconstruction algorithms). I found that the main time it was necessary to pay careful attention to the memory layout was when converting Matlab scripts that involve reshaping operations. Yes, good point.A typical example would be this kind of thing: # data is a 4D array with time / volume axis last data_2d = data.reshape((-1, data.shape[-1]) For MATLAB, the columns of this array would (by default) have the values on the first axis fastest changing, then the second, then the third, whereas numpy's default is the other way round. I find I usually don't have to worry about this, because I'm later going to do: data_processed_4d = data_2d.reshape(data.shape) which will reverse the previous reshape in the correct way. But in any case - this is not directly to do with the array memory layout. You will get the same output from reshape whether the memory layout of `data` was Fortran or C. Just as a remark. Reshape has an (iteration not really memory) order parameter, thou it may do more copies if those do not match. - Sebastian Cheers, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Change default order to Fortran order
Matthew Brett matthew.br...@gmail.com wrote: Sure, but to avoid confusion, maybe move the discussion of image indexing order to another thread? I think this thread is about memory layout, which is a different issue. It is actually a bit convoluted and not completely orthogonal. Memory layout does not matter for 2d ndexing, i.e. (x,y) vs. (row, column), if you are careful when iterating, but it does matter for Nd indexing. There is a reason to prefer (x,y,z,t,r) in column major order or (recording, time, slice, row, column) in row major order. Otherwise you can get very inefficient memory traversals. Then if you work with visualization libraries that expects (x,y,z) and column major order, e.g. ITK, VTK and OpenGL, this is really what you want to use. And the choise of indexing (x,y,z) cannot be seen as independent of the memory layout. Remember, it is not just a matter of mapping coordinates to pixels. The data sets are so large in MRI processing that memory layout does matter. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Change default order to Fortran order
On Mon, Aug 3, 2015 at 5:01 PM, Sturla Molden sturla.mol...@gmail.com wrote: Matthew Brett matthew.br...@gmail.com wrote: Sure, but to avoid confusion, maybe move the discussion of image indexing order to another thread? I think this thread is about memory layout, which is a different issue. It is actually a bit convoluted and not completely orthogonal. Memory layout does not matter for 2d ndexing, i.e. (x,y) vs. (row, column), if you are careful when iterating, but it does matter for Nd indexing. There is a reason to prefer (x,y,z,t,r) in column major order or (recording, time, slice, row, column) in row major order. Otherwise you can get very inefficient memory traversals. Then if you work with visualization libraries that expects (x,y,z) and column major order, e.g. ITK, VTK and OpenGL, this is really what you want to use. And the choise of indexing (x,y,z) cannot be seen as independent of the memory layout. Remember, it is not just a matter of mapping coordinates to pixels. The data sets are so large in MRI processing that memory layout does matter. I completely agree that memory layout can affect performance, and this can be important. On the other hand, I think you agree that the relationship of axis order to memory layout is just a matter of mapping coordinates to pixels. So you can change the axis ordering without changing the memory layout and the memory layout without changing the axis ordering. Of course you could argue that it would be simpler to fuse the two issues, and enforce one memory layout - say Fortran.The result might well be easier think about, but it wouldn't be much like numpy, and it would have lots of performance and memory disadvantages. Cheers, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposal: Deprecate np.int, np.float, etc.?
On Sun, Aug 2, 2015 at 5:13 AM, Sturla Molden sturla.mol...@gmail.com wrote: A long is only machine word wide on posix, in windows its not. Actually it is the opposite. A pointer is 64 bit on AMD64, but the native integer and pointer offset is only 32 bit. But it does not matter because it is int that should be machine word sized, not long, which it is on both platforms. All this illustrates that there is a lot of platform independence and complexity to the standard C types. I suppose it's a good thing -- you can use something like int in C code, and presto! more precision in the future when you re-compile on a newer system. However, for any code that needs some kind of binary compatibility between systems (or is dynamic, like python -- i.e. types are declared at run-time, not compile time), the fixed width types are a lot safer (or at least easier to reason about). So we have tow issue with numpy: 1) confusing python types with C types -- e.g. np.int is currently a python integer, NOT a C int -- I think this is a litte too confusing, and should be depricated. (and np.long -- even more confusing!!!) 2) The vagaries of the standard C types: int, long, etc (spelled np.intc, which is a int32 on my machine, anyway) [NOTE: is there a C long dtype? I can't find it at the moment...] It's probably a good idea to keep these, particularly for interfacing with C code (like my example of calling C code that use int). Though it would be good to make sure the docstring make it clear what they are. However, Id like to see a recommended practice of using sized types wherevver you can: uint8 int32 float32 float54 etc not sure how to propagate that practice, but I'd love to see it become common. Should we add aliases for the stdint names? np.int_32_t, etc??? might be good to adhere to an established standard. -CHB Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Change default order to Fortran order
On Sun, Aug 2, 2015 at 1:46 PM, Sturla Molden sturla.mol...@gmail.com wrote: On 02/08/15 22:28, Bryan Van de Ven wrote: And to eliminate the order kwarg, use functools.partial to patch the zeros function (or any others, as needed): This will probably break code that depends on NumPy, like SciPy and scikit-image. But if NumPy is all that matters, sure go ahead and monkey patch. Otherwise keep the patched functions in another namespace. Id be really careful about this -- sure it's annoying, but a kind of global change of behavior could wreak havok. I'd create a set of Fortran-order constructors -- if it were me, I do: fzeros fones, etc. but you could, I suppose, create a namespace and ut hem all there, then create a fnumpy that would write those over: import numpy as np and away you go -- but that wouldn't change any code that imports numpy in the usual way. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [pytables-dev] ANN: PyTables 3.2.1 released
Good job. Thanks Andrea -- Antonio Valentino Il giorno 04/ago/2015, alle ore 02:38, Andrea Bedini and...@andreabedini.com ha scritto: === Announcing PyTables 3.2.1 === We are happy to announce PyTables 3.2.1. What's new == This is a bug fix release. It contains a fix for a segv fault in indexesextension.keysort(). In case you want to know more in detail what has changed in this version, please refer to: http://www.pytables.org/release_notes.html For an online version of the manual, visit: http://www.pytables.org/usersguide/index.html What it is? === PyTables is a library for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data with support for full 64-bit file addressing. PyTables runs on top of the HDF5 library and NumPy package for achieving maximum throughput and convenient use. PyTables includes OPSI, a new indexing technology, allowing to perform data lookups in tables exceeding 10 gigarows (10**10 rows) in less than a tenth of a second. Resources = About PyTables: http://www.pytables.org About the HDF5 library: http://hdfgroup.org/HDF5/ About NumPy: http://numpy.scipy.org/ Acknowledgments === Thanks to many users who provided feature improvements, patches, bug reports, support and suggestions. See the ``THANKS`` file in the distribution package for a (incomplete) list of contributors. Most specially, a lot of kudos go to the HDF5 and NumPy makers. Without them, PyTables simply would not exist. Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. **Enjoy data!** -- The PyTables Developers -- You received this message because you are subscribed to the Google Groups pytables-dev group. To unsubscribe from this group and stop receiving emails from it, send an email to pytables-dev+unsubscr...@googlegroups.com. Visit this group at http://groups.google.com/group/pytables-dev. For more options, visit https://groups.google.com/d/optout. signature.asc Description: Message signed with OpenPGP using GPGMail ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Change default order to Fortran order
This is very good discussion. Thank you all for replying. I can see the fundamental difference is that I always think/talk/read/write a 3D image as I(x, y, z), not (plane, row, column) . I am coming from MRI (magnetic resonance imaging) research, and I can assure you that the entire MRI community is using (x, y, z), including books, journal papers, conference abstracts, presentations, everything. We even talk about what we called logical x/y/z and physical x/y/z, and the rotation matrix that converts the two coordinate systems. The radiologists are also used to (x, y, z). For example, we always say my image is 256 by 256 by 20 slices, and we never say 20 by 256 by 256. So, basically, at least in MRI, we always talk about an image as I(x, y, z), and we always assume that x is the fastest changing index. That's why I prefer column-major (because it is more natural). Of course, I can totally get my work done by using row-major, I just have to always remind myself write last dimension index first when coding. I actually have done this before, and I found it would be so much easier if just using column-major. Kang On 08/02/15, Juan Nunez-Iglesias jni.s...@gmail.com wrote: Hi Kang, Feel free to come chat about your application on the scikit-image list [1]! I'll note that we've been through the array order discussion many times there and even have a doc page about it [2]. The short version is that you'll save yourself a lot of pain by starting to think of your images as (plane, row, column) instead of (x, y, z). The syntax actually becomes friendlier too. For example, to do something to each slice of data, you do: for plane in image: plane += foo instead of for z in image.shape[2]: image[:, :, z] += foo for example. Juan. [1] scikit-im...@googlegroups.com [2] http://scikit-image.org/docs/dev/user_guide/numpy_images.html#coordinate-conventions PS: As to the renamed Fortran-ordered numpy, may I suggest funpy. The F is for Fortran and the fun is for all the fun you'll have maintaining it. =P On Mon, 3 Aug 2015 at 6:28 am Daniel Sank sank.dan...@gmail.com wrote: Kang, Thank you for explaining your motivation. It's clear from your last note, as you said, that your desire for column-first indexing has nothing to do with in-memory data layout. That being the case, I strongly urge you to just use bare numpy and do not use the fortran_zeros function I recommended before. Changing the in-memory layout via the order keyword in numpy.zeros will not change the way indexing works at all. You gain absolutely nothing by changing the in-memory order unless you are writing some C or Fortran code which will interact with the data in memory. To see what I mean, consider the following examples: x = np.array([1, 2, 3], [4, 5, 6]]) x.shape (2, 3) and x = np.array([1, 2, 3], [4, 5, 6]], order='F') x.shape (2, 3) You see that changing the in-memory order has nothing whatsoever to do with the array's shape or how you access it. You will see run time error. Depending on environment, you may get useful error message (i.e. index out of range), but sometimes you just get bad image results. Could you give a very simple example of what you mean? I can't think of how this could ever happen and your fear here makes me think there's a fundamental misunderstanding about how array operations in numpy and other programming languages work. As an example, iteration in numpy goes through the first index: x = np.array([[1, 2, 3], [4, 5, 6]]) for foo in x: ... Inside the for loop, foo takes on the values [1, 2, 3] on the first iteration and [4, 5, 6] on the second. If you want to iterate through the columns just do this instead x = np.array([[1, 2, 3], [4, 5, 6]]) for foo in x.T: ... If your complaint is that you want np.array([[1, 2, 3], [4, 5, 6]]) to produce an array with shape (3, 2) then you should own up to the fact that the array constructor expects it the other way around and do this x = np.array([[1, 2, 3], [4, 5, 6]]).T instead. This is infinity times better than trying to write a shim function or patch numpy because with .T you're using (fast) built-in functionality which other people your code will understand. The real message here is that whether the first index runs over rows or columns is actually meaningless. The only places the row versus column issue has any meaning is when doing input/output (in which case you should use the transpose if you actually need it), or when doing iteration. One thing that would make sense if you're reading from a binary file format which uses column-major format would be to write your own reader function: def read_fortran_style_binary_file(file): return np.fromfile(file).T
Re: [Numpy-discussion] 1.10.x is branched
On Mon, Aug 3, 2015 at 5:22 AM, Charles R Harris charlesr.har...@gmail.com wrote: Hi All, Numpy 1.10.x is branched. There is still some cleanup to do before the alpha release, but that should be coming in a couple of days. http://mail.scipy.org/mailman/listinfo/numpy-discussion Thanks Chuck. Looks like it's shaping up nicely. Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Change default order to Fortran order
On Aug 2, 2015 11:06 PM, Kang Wang kwan...@wisc.edu wrote: This is very good discussion. Thank you all for replying. I can see the fundamental difference is that I always think/talk/read/write a 3D image as I(x, y, z), not (plane, row, column) . I am coming from MRI (magnetic resonance imaging) research, and I can assure you that the entire MRI community is using (x, y, z), including books, journal papers, conference abstracts, presentations, everything. We even talk about what we called logical x/y/z and physical x/y/z, and the rotation matrix that converts the two coordinate systems. The radiologists are also used to (x, y, z). For example, we always say my image is 256 by 256 by 20 slices, and we never say 20 by 256 by 256. So, basically, at least in MRI, we always talk about an image as I(x, y, z), and we always assume that x is the fastest changing index. That's why I prefer column-major (because it is more natural). Of course, I can totally get my work done by using row-major, I just have to always remind myself write last dimension index first when coding. I actually have done this before, and I found it would be so much easier if just using column-major. Why not just use I[x, y, z] like you're used to, and let the computer worry about the physical layout in memory? Sometimes this will be Fortran order and sometimes C order and sometimes something else, but you don't have to know or care; 99% of the time it doesn't matter. The worst case is that when you use a python wrapper to call into a library that can only handle Fortran order, then the wrapper will quietly have to convert the memory order around and it will be slightly slower than if you had used Fortran order in the first place. But in practice you'll barely ever notice this, and when you do, *then* you can tell numpy explicitly what memory layout you want in the situation where it matters. General principle: do what's easiest for the programmer, not what's easiest for the computer. -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Change default order to Fortran order
Hi, On Mon, Aug 3, 2015 at 8:09 AM, Nathaniel Smith n...@pobox.com wrote: On Aug 2, 2015 11:06 PM, Kang Wang kwan...@wisc.edu wrote: This is very good discussion. Thank you all for replying. I can see the fundamental difference is that I always think/talk/read/write a 3D image as I(x, y, z), not (plane, row, column) . I am coming from MRI (magnetic resonance imaging) research, and I can assure you that the entire MRI community is using (x, y, z), including books, journal papers, conference abstracts, presentations, everything. We even talk about what we called logical x/y/z and physical x/y/z, and the rotation matrix that converts the two coordinate systems. The radiologists are also used to (x, y, z). For example, we always say my image is 256 by 256 by 20 slices, and we never say 20 by 256 by 256. So, basically, at least in MRI, we always talk about an image as I(x, y, z), and we always assume that x is the fastest changing index. That's why I prefer column-major (because it is more natural). Of course, I can totally get my work done by using row-major, I just have to always remind myself write last dimension index first when coding. I actually have done this before, and I found it would be so much easier if just using column-major. Why not just use I[x, y, z] like you're used to, and let the computer worry about the physical layout in memory? Sometimes this will be Fortran order and sometimes C order and sometimes something else, but you don't have to know or care; 99% of the time it doesn't matter. The worst case is that when you use a python wrapper to call into a library that can only handle Fortran order, then the wrapper will quietly have to convert the memory order around and it will be slightly slower than if you had used Fortran order in the first place. But in practice you'll barely ever notice this, and when you do, *then* you can tell numpy explicitly what memory layout you want in the situation where it matters. Yes - if you are using numpy, you really have to look numpy in the eye and say: I will let you worry about the array element order in memory, and in return, you promise to make indexing work as I would expect Just for example, let's say you loaded an MRI image into memory: In [1]: import nibabel In [2]: img = nibabel.load('my_mri.nii') In [3]: data = img.get_data() Because NIfTI images are Fortran memory layout, this happens to be the memory layout you get for your array: In [4]: data.flags Out[4]: C_CONTIGUOUS : False F_CONTIGUOUS : True OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False But now - in Python - all I care about is what data I have on the first, second, third axes. For example, I could do this: In [5]: data_copy = data.copy() This has exactly the same values as the original array, and at the same index positions: In [7]: import numpy as np In [8]: np.all(data == data) Out[8]: memmap(True, dtype=bool) but I now have a C memory layout array. In [9]: data_copy.flags Out[9]: C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False Worse than that, if I slice my original data array, then I get an array that is neither C- or Fortran- compatible in memory: In [10]: data_view = data[:, :, ::2] In [11]: data_view.flags Out[11]: C_CONTIGUOUS : False F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False So - if you want every array to be Fortran-contiguous in memory, I would not start with numpy at all, I would write your own array library. The alternative - or the numpy way - is to give up on enforcing a particular layout in memory, until you need to pass an array to some C or C++ or Fortran code that needs some particular layout, in which case you get your extension code to copy the array into the required layout on entry. Of course this is what numpy itself has to do when interfacing with external libraries like BLAS or LAPACK. Cheers, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Change default order to Fortran order
Juan Nunez-Iglesias jni.s...@gmail.com wrote: The short version is that you'll save yourself a lot of pain by starting to think of your images as (plane, row, column) instead of (x, y, z). There are several things to consider here. 1. The vertices in computer graphics (OpenGL) are (x,y,z). 2. OpenGL rotation matrices and projection matrice are stored in column major order. 3. OpenGL frame buffers are indexed (x,y) in column major order with (0,0) being lower left. 4. ITK and VTK depends on OpenGL and are thus using column major order. 5. Those who use Matlab or Fortran in addition to Python prefer column major order. 6. BLAS and LAPACK use column major order. 7. The common notation in image prorcessing (as opposed to computer graphics in geberal) is indexing (row, column), in row major order, with (0,0) being upper left. All in all, this is a strong case for prefering column major order and the common mathematical notation (x,y,z). Also notice how the ususal notation in image pricessing differs from OpenGL. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Change default order to Fortran order
well its really great idea. i can help on python but i don't have any knowledge on fortran. On Sun, Aug 2, 2015 at 7:25 PM, Kang Wang kwan...@wisc.edu wrote: Hi, I am an imaging researcher, and a new Python user. My first Python project is to somehow modify NumPy source code such that everything is Fortran column-major by default. I read about the information in the link below, but for us, the fact is that *we absolutely want to use Fortran column major, and we want to make it default. Explicitly writing order = 'F' all over the place is not acceptable to us*. http://docs.scipy.org/doc/numpy/reference/internals.html#multidimensional-array-indexing-order-issues I tried searching in this email list, as well as google search in general. However, I have not found anything useful. This must be a common request/need, I believe. Can anyone provide any insight/help? Thank you very much, Kang -- *Kang Wang, Ph.D.* Highland Ave., Room 1113 Madison, WI 53705-2275 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposal: Deprecate np.int, np.float, etc.?
On Mon, Aug 3, 2015 at 11:05 AM, Sturla Molden sturla.mol...@gmail.com wrote: On 03/08/15 18:25, Chris Barker wrote: [NOTE: is there a C long dtype? I can't find it at the moment...] There is, it is called np.int. well, IIUC, np.int is the python integer type, which is a C long in all the implemtations of cPython that I know about -- but is that a guarantee? in the future as well? For instance, if it were up to me, I'd use an int_64_t on all 64 bit platforms, rather than having that odd 32 bit on Windows, 64 bit on *nix silliness This just illustrates the problem... So another minor proposal: add a numpy.longc type, which would be platform C long. (and probably just an alias to something already there). -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposal: Deprecate np.int, np.float, etc.?
On 03/08/15 18:25, Chris Barker wrote: 2) The vagaries of the standard C types: int, long, etc (spelled np.intc, which is a int32 on my machine, anyway) [NOTE: is there a C long dtype? I can't find it at the moment...] There is, it is called np.int. This just illustrates the problem... Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposal: Deprecate np.int, np.float, etc.?
On 08/03/2015 12:25 PM, Chris Barker wrote: 2) The vagaries of the standard C types: int, long, etc (spelled np.intc, which is a int32 on my machine, anyway) [NOTE: is there a C long dtype? I can't find it at the moment...] Numpy does define the platform dependent C integer types short, long, longlong and their unsigned versions according to the docs. size_t is the same size as intc. Even though float and double are virtually always IEEE single and double precision, maybe for consistency we should also define np.floatc, np.doublec and np.longdoublec? Allan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Change default order to Fortran order
On Mon, Aug 3, 2015 at 10:24 AM, Matthew Brett matthew.br...@gmail.com wrote: On Mon, Aug 3, 2015 at 5:01 PM, Sturla Molden sturla.mol...@gmail.com wrote: Matthew Brett matthew.br...@gmail.com wrote: Sure, but to avoid confusion, maybe move the discussion of image indexing order to another thread? I think this thread is about memory layout, which is a different issue. It is actually a bit convoluted and not completely orthogonal. Memory layout does not matter for 2d ndexing, i.e. (x,y) vs. (row, column), if you are careful when iterating, but it does matter for Nd indexing. There is a reason to prefer (x,y,z,t,r) in column major order or (recording, time, slice, row, column) in row major order. Otherwise you can get very inefficient memory traversals. Then if you work with visualization libraries that expects (x,y,z) and column major order, e.g. ITK, VTK and OpenGL, this is really what you want to use. And the choise of indexing (x,y,z) cannot be seen as independent of the memory layout. Remember, it is not just a matter of mapping coordinates to pixels. The data sets are so large in MRI processing that memory layout does matter. I completely agree that memory layout can affect performance, and this can be important. On the other hand, I think you agree that the relationship of axis order to memory layout is just a matter of mapping coordinates to pixels. So you can change the axis ordering without changing the memory layout and the memory layout without changing the axis ordering. Of course you could argue that it would be simpler to fuse the two issues, and enforce one memory layout - say Fortran.The result might well be easier think about, but it wouldn't be much like numpy, and it would have lots of performance and memory disadvantages. Cheers, I would also strongly suggest that once you have decided on a convention that you thoroughly document it somewhere. That will not only help you, but anyone who later needs to maintain the code will bless you rather than d*nm you to eternal torture by the seven demons of Ipsos. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Change default order to Fortran order
On Mon Aug 3 10:49:35 2015 GMT+0200, Matthew Brett wrote: Hi, On Mon, Aug 3, 2015 at 8:09 AM, Nathaniel Smith n...@pobox.com wrote: On Aug 2, 2015 11:06 PM, Kang Wang kwan...@wisc.edu wrote: This is very good discussion. Thank you all for replying. I can see the fundamental difference is that I always think/talk/read/write a 3D image as I(x, y, z), not (plane, row, column) . I am coming from MRI (magnetic resonance imaging) research, and I can assure you that the entire MRI community is using (x, y, z), including books, journal papers, conference abstracts, presentations, everything. We even talk about what we called logical x/y/z and physical x/y/z, and the rotation matrix that converts the two coordinate systems. The radiologists are also used to (x, y, z). For example, we always say my image is 256 by 256 by 20 slices, and we never say 20 by 256 by 256. So, basically, at least in MRI, we always talk about an image as I(x, y, z), and we always assume that x is the fastest changing index. That's why I prefer column-major (because it is more natural). Of course, I can totally get my work done by using row-major, I just have to always remind myself write last dimension index first when coding. I actually have done this before, and I found it would be so much easier if just using column-major. Why not just use I[x, y, z] like you're used to, and let the computer worry about the physical layout in memory? Sometimes this will be Fortran order and sometimes C order and sometimes something else, but you don't have to know or care; 99% of the time it doesn't matter. The worst case is that when you use a python wrapper to call into a library that can only handle Fortran order, then the wrapper will quietly have to convert the memory order around and it will be slightly slower than if you had used Fortran order in the first place. But in practice you'll barely ever notice this, and when you do, *then* you can tell numpy explicitly what memory layout you want in the situation where it matters. Yes - if you are using numpy, you really have to look numpy in the eye and say: I will let you worry about the array element order in memory, and in return, you promise to make indexing work as I would expect Just for example, let's say you loaded an MRI image into memory: In [1]: import nibabel In [2]: img = nibabel.load('my_mri.nii') In [3]: data = img.get_data() Because NIfTI images are Fortran memory layout, this happens to be the memory layout you get for your array: In [4]: data.flags Out[4]: C_CONTIGUOUS : False F_CONTIGUOUS : True OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False But now - in Python - all I care about is what data I have on the first, second, third axes. For example, I could do this: In [5]: data_copy = data.copy() This has exactly the same values as the original array, and at the same index positions: In [7]: import numpy as np In [8]: np.all(data == data) Out[8]: memmap(True, dtype=bool) but I now have a C memory layout array. In [9]: data_copy.flags Out[9]: C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False Yeah, I would like to second those arguments. Most of the time, there is no need to worry about layout. For large chunks you allocate, it may make sense for speed, etc. So you can alias creation functions. Generally, I would suggest to simply not worry about the memory layout. Also do not *trust* the layout for most function returns. If you need a specific layout to interface other code, always check what you got it. -Sebastian Worse than that, if I slice my original data array, then I get an array that is neither C- or Fortran- compatible in memory: In [10]: data_view = data[:, :, ::2] In [11]: data_view.flags Out[11]: C_CONTIGUOUS : False F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False So - if you want every array to be Fortran-contiguous in memory, I would not start with numpy at all, I would write your own array library. The alternative - or the numpy way - is to give up on enforcing a particular layout in memory, until you need to pass an array to some C or C++ or Fortran code that needs some particular layout, in which case you get your extension code to copy the array into the required layout on entry. Of course this is what numpy itself has to do when interfacing with external libraries like BLAS or LAPACK. Cheers, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Change default order to Fortran order
I agree that often you don't need to worry about the memory order. However, it is not uncommon in medical imaging to go back and forth between a 2D or 3D image representation and a 1D array representation (e.g. as often used in image reconstruction algorithms). I found that the main time it was necessary to pay careful attention to the memory layout was when converting Matlab scripts that involve reshaping operations. On Mon, Aug 3, 2015 at 8:02 AM, Sebastian Berg sebast...@sipsolutions.net wrote: On Mon Aug 3 10:49:35 2015 GMT+0200, Matthew Brett wrote: Hi, On Mon, Aug 3, 2015 at 8:09 AM, Nathaniel Smith n...@pobox.com wrote: On Aug 2, 2015 11:06 PM, Kang Wang kwan...@wisc.edu wrote: This is very good discussion. Thank you all for replying. I can see the fundamental difference is that I always think/talk/read/write a 3D image as I(x, y, z), not (plane, row, column) . I am coming from MRI (magnetic resonance imaging) research, and I can assure you that the entire MRI community is using (x, y, z), including books, journal papers, conference abstracts, presentations, everything. We even talk about what we called logical x/y/z and physical x/y/z, and the rotation matrix that converts the two coordinate systems. The radiologists are also used to (x, y, z). For example, we always say my image is 256 by 256 by 20 slices, and we never say 20 by 256 by 256. So, basically, at least in MRI, we always talk about an image as I(x, y, z), and we always assume that x is the fastest changing index. That's why I prefer column-major (because it is more natural). Of course, I can totally get my work done by using row-major, I just have to always remind myself write last dimension index first when coding. I actually have done this before, and I found it would be so much easier if just using column-major. Why not just use I[x, y, z] like you're used to, and let the computer worry about the physical layout in memory? Sometimes this will be Fortran order and sometimes C order and sometimes something else, but you don't have to know or care; 99% of the time it doesn't matter. The worst case is that when you use a python wrapper to call into a library that can only handle Fortran order, then the wrapper will quietly have to convert the memory order around and it will be slightly slower than if you had used Fortran order in the first place. But in practice you'll barely ever notice this, and when you do, *then* you can tell numpy explicitly what memory layout you want in the situation where it matters. Yes - if you are using numpy, you really have to look numpy in the eye and say: I will let you worry about the array element order in memory, and in return, you promise to make indexing work as I would expect Just for example, let's say you loaded an MRI image into memory: In [1]: import nibabel In [2]: img = nibabel.load('my_mri.nii') In [3]: data = img.get_data() Because NIfTI images are Fortran memory layout, this happens to be the memory layout you get for your array: In [4]: data.flags Out[4]: C_CONTIGUOUS : False F_CONTIGUOUS : True OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False But now - in Python - all I care about is what data I have on the first, second, third axes. For example, I could do this: In [5]: data_copy = data.copy() This has exactly the same values as the original array, and at the same index positions: In [7]: import numpy as np In [8]: np.all(data == data) Out[8]: memmap(True, dtype=bool) but I now have a C memory layout array. In [9]: data_copy.flags Out[9]: C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False Yeah, I would like to second those arguments. Most of the time, there is no need to worry about layout. For large chunks you allocate, it may make sense for speed, etc. So you can alias creation functions. Generally, I would suggest to simply not worry about the memory layout. Also do not *trust* the layout for most function returns. If you need a specific layout to interface other code, always check what you got it. -Sebastian Worse than that, if I slice my original data array, then I get an array that is neither C- or Fortran- compatible in memory: In [10]: data_view = data[:, :, ::2] In [11]: data_view.flags Out[11]: C_CONTIGUOUS : False F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False So - if you want every array to be Fortran-contiguous in memory, I would not start with numpy at all, I would write your own array library. The alternative - or the numpy way - is to give up on enforcing a particular layout in memory, until you need to pass an array to some C or C++ or Fortran
Re: [Numpy-discussion] Proposal: Deprecate np.int, np.float, etc.?
On 03/08/15 20:51, Chris Barker wrote: well, IIUC, np.int http://np.int is the python integer type, which is a C long in all the implemtations of cPython that I know about -- but is that a guarantee?in the future as well? It is a Python int on Python 2. On Python 3 dtype=np.int means the dtype will be C long, because a Python int has no size limit. But np.int aliases Python int. And creating an array with dype=int therefore does not create an array of Python int, it creates an array of C long. To actually get dtype=int we have to write dtype=object, which is just crazy. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion