Re: [Numpy-discussion] Change default order to Fortran order

2015-08-03 Thread Matthew Brett
Hi,

On Mon, Aug 3, 2015 at 3:13 PM, Gregory Lee grle...@gmail.com wrote:
 I agree that often you don't need to worry about the memory order.  However,
 it is not uncommon in medical imaging to go back and forth between a 2D or
 3D image representation and a 1D array representation (e.g. as often used in
 image reconstruction algorithms).  I found that the main time it was
 necessary to pay careful attention to the memory layout was when converting
 Matlab scripts that involve reshaping operations.

Yes, good point.A typical example would be this kind of thing:

# data is a 4D array with time / volume axis last
data_2d = data.reshape((-1, data.shape[-1])

For MATLAB, the columns of this array would (by default) have the
values on the first axis fastest changing, then the second, then the
third, whereas numpy's default is the other way round.

I find I usually don't have to worry about this, because I'm later going to do:

data_processed_4d  = data_2d.reshape(data.shape)

which will reverse the previous reshape in the correct way.

But in any case - this is not directly to do with the array memory
layout.  You will get the same output from reshape whether the memory
layout of `data` was Fortran or C.

Cheers,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Change default order to Fortran order

2015-08-03 Thread Sebastian Berg
On Mon Aug 3 16:26:10 2015 GMT+0200, Matthew Brett wrote:
 Hi,
 
 On Mon, Aug 3, 2015 at 3:13 PM, Gregory Lee grle...@gmail.com wrote:
  I agree that often you don't need to worry about the memory order.  However,
  it is not uncommon in medical imaging to go back and forth between a 2D or
  3D image representation and a 1D array representation (e.g. as often used in
  image reconstruction algorithms).  I found that the main time it was
  necessary to pay careful attention to the memory layout was when converting
  Matlab scripts that involve reshaping operations.
 
 Yes, good point.A typical example would be this kind of thing:
 
 # data is a 4D array with time / volume axis last
 data_2d = data.reshape((-1, data.shape[-1])
 
 For MATLAB, the columns of this array would (by default) have the
 values on the first axis fastest changing, then the second, then the
 third, whereas numpy's default is the other way round.
 
 I find I usually don't have to worry about this, because I'm later going to 
 do:
 
 data_processed_4d  = data_2d.reshape(data.shape)
 
 which will reverse the previous reshape in the correct way.
 
 But in any case - this is not directly to do with the array memory
 layout.  You will get the same output from reshape whether the memory
 layout of `data` was Fortran or C.


Just as a remark. Reshape has an (iteration not really memory) order parameter, 
thou it may do more copies if those do not match.

- Sebastian
 
 Cheers,
 
 Matthew
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Change default order to Fortran order

2015-08-03 Thread Sturla Molden
Matthew Brett matthew.br...@gmail.com wrote:

 Sure, but to avoid confusion, maybe move the discussion of image
 indexing order to another thread?
 
 I think this thread is about memory layout, which is a different issue.

It is actually a bit convoluted and not completely orthogonal. Memory
layout does not matter for 2d ndexing, i.e. (x,y) vs. (row, column), if you
are careful when iterating, but it does matter for Nd indexing. There is a
reason to prefer (x,y,z,t,r) in column major order or (recording, time,
slice, row, column) in row major order. Otherwise you can get very
inefficient memory traversals. Then if you work with visualization
libraries that expects (x,y,z) and column major order, e.g. ITK, VTK and
OpenGL, this is really what you want to use. And the choise of indexing
(x,y,z) cannot be seen as independent of the memory layout. Remember, it is
not just a matter of mapping coordinates to pixels. The data sets are so
large in MRI processing that memory layout does matter.

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Change default order to Fortran order

2015-08-03 Thread Matthew Brett
On Mon, Aug 3, 2015 at 5:01 PM, Sturla Molden sturla.mol...@gmail.com wrote:
 Matthew Brett matthew.br...@gmail.com wrote:

 Sure, but to avoid confusion, maybe move the discussion of image
 indexing order to another thread?

 I think this thread is about memory layout, which is a different issue.

 It is actually a bit convoluted and not completely orthogonal. Memory
 layout does not matter for 2d ndexing, i.e. (x,y) vs. (row, column), if you
 are careful when iterating, but it does matter for Nd indexing. There is a
 reason to prefer (x,y,z,t,r) in column major order or (recording, time,
 slice, row, column) in row major order. Otherwise you can get very
 inefficient memory traversals. Then if you work with visualization
 libraries that expects (x,y,z) and column major order, e.g. ITK, VTK and
 OpenGL, this is really what you want to use. And the choise of indexing
 (x,y,z) cannot be seen as independent of the memory layout. Remember, it is
 not just a matter of mapping coordinates to pixels. The data sets are so
 large in MRI processing that memory layout does matter.

I completely agree that memory layout can affect performance, and this
can be important.

On the other hand, I think you agree that the relationship of axis
order to memory layout is just a matter of mapping coordinates to
pixels.

So you can change the axis ordering without changing the memory layout
and the memory layout without changing the axis ordering.

Of course you could argue that it would be simpler to fuse the two
issues, and enforce one memory layout - say Fortran.The result
might well be easier think about, but it wouldn't be much like numpy,
and it would have lots of performance and memory disadvantages.

Cheers,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: Deprecate np.int, np.float, etc.?

2015-08-03 Thread Chris Barker
On Sun, Aug 2, 2015 at 5:13 AM, Sturla Molden sturla.mol...@gmail.com
wrote:

  A long is only machine word wide on posix, in windows its not.

 Actually it is the opposite. A pointer is 64 bit on AMD64, but the
 native integer and pointer offset is only 32 bit. But it does not matter
 because it is int that should be machine word sized, not long, which it
 is on both platforms.


All this illustrates that there is a lot of platform independence and
complexity to the standard C types.

I suppose it's  a good thing -- you can use something like int in C code,
and presto! more precision in the future when you re-compile on a newer
system.

However, for any code that needs some kind of binary compatibility between
systems (or is dynamic, like python -- i.e. types are declared at run-time,
not compile time), the fixed width types are a lot safer (or at least
easier to reason about). So we have tow issue with numpy:

1) confusing python types with C types -- e.g. np.int is currently a python
integer, NOT a C int -- I think this is a litte too confusing, and should
be depricated. (and np.long -- even more confusing!!!)

2) The vagaries of the standard C types:  int, long, etc (spelled np.intc,
which is a int32 on my machine, anyway)
[NOTE: is there a C long dtype? I can't find it at the moment...]

It's probably a good idea to keep these, particularly for interfacing with
C code (like my example of calling C code that use int). Though it would be
good to make sure the docstring make it clear what they are.

However, Id like to see a recommended practice of using sized types
wherevver you can:

uint8
int32
float32
float54
etc

not sure how to propagate that practice, but I'd love to see it become
common.

Should we add aliases for the stdint names?

np.int_32_t, etc???

might be good to adhere to an established standard.



-CHB












 Sturla

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Change default order to Fortran order

2015-08-03 Thread Chris Barker
On Sun, Aug 2, 2015 at 1:46 PM, Sturla Molden sturla.mol...@gmail.com
wrote:

 On 02/08/15 22:28, Bryan Van de Ven wrote:
  And to eliminate the order kwarg, use functools.partial to patch the
 zeros function (or any others, as needed):

 This will probably break code that depends on NumPy, like SciPy and
 scikit-image. But if NumPy is all that matters, sure go ahead and monkey
 patch. Otherwise keep the patched functions in another namespace.


Id be really careful about this -- sure it's annoying, but a kind of
global change of behavior could wreak havok.

I'd create a set of Fortran-order constructors -- if it were me, I do:

fzeros
fones,

etc.

but you could, I suppose, create a namespace and ut hem all there, then
create a fnumpy that would write those over:

import numpy as np

and away you go -- but that wouldn't change any code that imports numpy in
the usual way.

-Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [pytables-dev] ANN: PyTables 3.2.1 released

2015-08-03 Thread Antonio Valentino
Good job.
Thanks Andrea

--
Antonio Valentino



 Il giorno 04/ago/2015, alle ore 02:38, Andrea Bedini 
 and...@andreabedini.com ha scritto:
 
 ===
 Announcing PyTables 3.2.1
 ===
 
 We are happy to announce PyTables 3.2.1.
 
 
 What's new
 ==
 
 This is a bug fix release. It contains a fix for a segv fault in
 indexesextension.keysort().
 
 In case you want to know more in detail what has changed in this
 version, please refer to: http://www.pytables.org/release_notes.html
 
 For an online version of the manual, visit:
 http://www.pytables.org/usersguide/index.html
 
 
 What it is?
 ===
 
 PyTables is a library for managing hierarchical datasets and
 designed to efficiently cope with extremely large amounts of data with
 support for full 64-bit file addressing.  PyTables runs on top of
 the HDF5 library and NumPy package for achieving maximum throughput and
 convenient use.  PyTables includes OPSI, a new indexing technology,
 allowing to perform data lookups in tables exceeding 10 gigarows
 (10**10 rows) in less than a tenth of a second.
 
 
 Resources
 =
 
 About PyTables: http://www.pytables.org
 
 About the HDF5 library: http://hdfgroup.org/HDF5/
 
 About NumPy: http://numpy.scipy.org/
 
 
 Acknowledgments
 ===
 
 Thanks to many users who provided feature improvements, patches, bug
 reports, support and suggestions.  See the ``THANKS`` file in the
 distribution package for a (incomplete) list of contributors.  Most
 specially, a lot of kudos go to the HDF5 and NumPy makers.
 Without them, PyTables simply would not exist.
 
 
 Share your experience
 =
 
 Let us know of any bugs, suggestions, gripes, kudos, etc. you may have.
 
 
 
 
  **Enjoy data!**
 
  -- The PyTables Developers
 
 
 --
 You received this message because you are subscribed to the Google Groups 
 pytables-dev group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to pytables-dev+unsubscr...@googlegroups.com.
 Visit this group at http://groups.google.com/group/pytables-dev.
 For more options, visit https://groups.google.com/d/optout.



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Change default order to Fortran order

2015-08-03 Thread Kang Wang
This is very good discussion. Thank you all for replying.

I can see the fundamental difference is that I always think/talk/read/write a 
3D image as I(x, y, z), not (plane, row, column) . I am coming from MRI 
(magnetic resonance imaging) research, and I can assure you that the entire MRI 
community is using (x, y, z), including books, journal papers, conference 
abstracts, presentations, everything. We even talk about what we called 
logical x/y/z and physical x/y/z, and the rotation matrix that converts the 
two coordinate systems. The radiologists are also used to (x, y, z). For 
example, we always say my image is 256 by 256 by 20 slices, and we never say 
20 by 256 by 256.


So, basically, at least in MRI, we always talk about an image as I(x, y, z), 
and we always assume that x is the fastest changing index. That's why I 
prefer column-major (because it is more natural).


Of course, I can totally get my work done by using row-major, I just have to 
always remind myself write last dimension index first when coding. I actually 
have done this before, and I found it would be so much easier if just using 
column-major.


Kang

On 08/02/15, Juan Nunez-Iglesias  jni.s...@gmail.com wrote:
 Hi Kang,
 
 Feel free to come chat about your application on the scikit-image list [1]! 
 I'll note that we've been through the array order discussion many times there 
 and even have a doc page about it [2].
 
 The short version is that you'll save yourself a lot of pain by starting to 
 think of your images as (plane, row, column) instead of (x, y, z). The syntax 
 actually becomes friendlier too. For example, to do something to each slice 
 of data, you do:
 
 for plane in image:
  plane += foo
 
 
 instead of
 
 
 for z in image.shape[2]:
  image[:, :, z] += foo
 
 
 for example.
 
 
 Juan.
 
 
 [1] scikit-im...@googlegroups.com
 [2] 
 http://scikit-image.org/docs/dev/user_guide/numpy_images.html#coordinate-conventions
 
 
 PS: As to the renamed Fortran-ordered numpy, may I suggest funpy. The F is 
 for Fortran and the fun is for all the fun you'll have maintaining it. =P
 
 On Mon, 3 Aug 2015 at 6:28 am Daniel Sank sank.dan...@gmail.com wrote:
 
 
  Kang,
  
  Thank you for explaining your motivation. It's clear from your last note, 
  as you said, that your desire for column-first indexing has nothing to do 
  with in-memory data layout. That being the case, I strongly urge you to 
  just use bare numpy and do not use the fortran_zeros function I 
  recommended before. Changing the in-memory layout via the order keyword 
  in numpy.zeros will not change the way indexing works at all. You gain 
  absolutely nothing by changing the in-memory order unless you are writing 
  some C or Fortran code which will interact with the data in memory.
  
  
  To see what I mean, consider the following examples:
  
  
  x = np.array([1, 2, 3], [4, 5, 6]])
  x.shape
   (2, 3)
  
  
  and
  
  
  x = np.array([1, 2, 3], [4, 5, 6]], order='F')
  x.shape
   (2, 3)
  
  
  
  You see that changing the in-memory order has nothing whatsoever to do with 
  the array's shape or how you access it.
  
  
  
   You will see run time error. Depending on environment, you may get useful 
   error message
   (i.e. index out of range), but sometimes you just get bad image results.
  
  
  
  Could you give a very simple example of what you mean? I can't think of how 
  this could ever happen and your fear here makes me think there's a 
  fundamental misunderstanding about how array operations in numpy and other 
  programming languages work. As an example, iteration in numpy goes through 
  the first index:
  
  
  x = np.array([[1, 2, 3], [4, 5, 6]])
  for foo in x:
   ...
  
  
  Inside the for loop, foo takes on the values [1, 2, 3] on the first 
  iteration and [4, 5, 6] on the second. If you want to iterate through the 
  columns just do this instead
  
  
  x = np.array([[1, 2, 3], [4, 5, 6]])
  for foo in x.T:
   ...
  
  
  
  If your complaint is that you want np.array([[1, 2, 3], [4, 5, 6]]) to 
  produce an array with shape (3, 2) then you should own up to the fact that 
  the array constructor expects it the other way around and do this
  
  
  
  x = np.array([[1, 2, 3], [4, 5, 6]]).T
  
  
  
  instead. This is infinity times better than trying to write a shim function 
  or patch numpy because with .T you're using (fast) built-in functionality 
  which other people your code will understand.
  
  The real message here is that whether the first index runs over rows or 
  columns is actually meaningless. The only places the row versus column 
  issue has any meaning is when doing input/output (in which case you should 
  use the transpose if you actually need it), or when doing iteration. One 
  thing that would make sense if you're reading from a binary file format 
  which uses column-major format would be to write your own reader function:
  
  
  
  def read_fortran_style_binary_file(file):
   return np.fromfile(file).T
  
  
  

Re: [Numpy-discussion] 1.10.x is branched

2015-08-03 Thread Ralf Gommers
On Mon, Aug 3, 2015 at 5:22 AM, Charles R Harris charlesr.har...@gmail.com
wrote:

 Hi All,

 Numpy 1.10.x is branched. There is still some cleanup to do before the
 alpha release, but that should be coming in a couple of days.
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


Thanks Chuck. Looks like it's shaping up nicely.

Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Change default order to Fortran order

2015-08-03 Thread Nathaniel Smith
On Aug 2, 2015 11:06 PM, Kang Wang kwan...@wisc.edu wrote:

 This is very good discussion. Thank you all for replying.

 I can see the fundamental difference is that I always
think/talk/read/write a 3D image as I(x, y, z), not (plane, row, column) .
I am coming from MRI (magnetic resonance imaging) research, and I can
assure you that the entire MRI community is using (x, y, z), including
books, journal papers, conference abstracts, presentations, everything. We
even talk about what we called logical x/y/z and physical x/y/z, and
the rotation matrix that converts the two coordinate systems. The
radiologists are also used to (x, y, z). For example, we always say my
image is 256 by 256 by 20 slices, and we never say 20 by 256 by 256.

 So, basically, at least in MRI, we always talk about an image as I(x, y,
z), and we always assume that x is the fastest changing index. That's why
I prefer column-major (because it is more natural).

 Of course, I can totally get my work done by using row-major, I just have
to always remind myself write last dimension index first when coding. I
actually have done this before, and I found it would be so much easier if
just using column-major.

Why not just use I[x, y, z] like you're used to, and let the computer worry
about the physical layout in memory? Sometimes this will be Fortran order
and sometimes C order and sometimes something else, but you don't have to
know or care; 99% of the time it doesn't matter. The worst case is that
when you use a python wrapper to call into a library that can only handle
Fortran order, then the wrapper will quietly have to convert the memory
order around and it will be slightly slower than if you had used Fortran
order in the first place. But in practice you'll barely ever notice this,
and when you do, *then* you can tell numpy explicitly what memory layout
you want in the situation where it matters.

General principle: do what's easiest for the programmer, not what's easiest
for the computer.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Change default order to Fortran order

2015-08-03 Thread Matthew Brett
Hi,

On Mon, Aug 3, 2015 at 8:09 AM, Nathaniel Smith n...@pobox.com wrote:
 On Aug 2, 2015 11:06 PM, Kang Wang kwan...@wisc.edu wrote:

 This is very good discussion. Thank you all for replying.

 I can see the fundamental difference is that I always
 think/talk/read/write a 3D image as I(x, y, z), not (plane, row, column) . I
 am coming from MRI (magnetic resonance imaging) research, and I can assure
 you that the entire MRI community is using (x, y, z), including books,
 journal papers, conference abstracts, presentations, everything. We even
 talk about what we called logical x/y/z and physical x/y/z, and the
 rotation matrix that converts the two coordinate systems. The radiologists
 are also used to (x, y, z). For example, we always say my image is 256 by
 256 by 20 slices, and we never say 20 by 256 by 256.

 So, basically, at least in MRI, we always talk about an image as I(x, y,
 z), and we always assume that x is the fastest changing index. That's why
 I prefer column-major (because it is more natural).

 Of course, I can totally get my work done by using row-major, I just have
 to always remind myself write last dimension index first when coding. I
 actually have done this before, and I found it would be so much easier if
 just using column-major.

 Why not just use I[x, y, z] like you're used to, and let the computer worry
 about the physical layout in memory? Sometimes this will be Fortran order
 and sometimes C order and sometimes something else, but you don't have to
 know or care; 99% of the time it doesn't matter. The worst case is that when
 you use a python wrapper to call into a library that can only handle Fortran
 order, then the wrapper will quietly have to convert the memory order around
 and it will be slightly slower than if you had used Fortran order in the
 first place. But in practice you'll barely ever notice this, and when you
 do, *then* you can tell numpy explicitly what memory layout you want in the
 situation where it matters.

Yes - if you are using numpy, you really have to look numpy in the eye and say:

I will let you worry about the array element order in memory, and in
return, you promise to make indexing work as I would expect

Just for example, let's say you loaded an MRI image into memory:

In [1]: import nibabel
In [2]: img = nibabel.load('my_mri.nii')
In [3]: data = img.get_data()

Because NIfTI images are Fortran memory layout, this happens to be the
memory layout you get for your array:

In [4]: data.flags
Out[4]:
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

But now - in Python - all I care about is what data I have on the
first, second, third axes.  For example, I could do this:

In [5]: data_copy = data.copy()

This has exactly the same values as the original array, and at the
same index positions:

In [7]: import numpy as np
In [8]: np.all(data == data)
Out[8]: memmap(True, dtype=bool)

but I now have a C memory layout array.

In [9]: data_copy.flags
Out[9]:
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

Worse than that, if I slice my original data array, then I get an
array that is neither C- or Fortran- compatible in memory:

In [10]: data_view = data[:, :, ::2]
In [11]: data_view.flags
Out[11]:
  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

So - if you want every array to be Fortran-contiguous in memory, I
would not start with numpy at all, I would write your own array
library.

The alternative - or the numpy way - is to give up on enforcing a
particular layout in memory, until you need to pass an array to some C
or C++ or Fortran code that needs some particular layout, in which
case you get your extension code to copy the array into the required
layout on entry.   Of course this is what numpy itself has to do when
interfacing with external libraries like BLAS or LAPACK.

Cheers,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Change default order to Fortran order

2015-08-03 Thread Sturla Molden
Juan Nunez-Iglesias jni.s...@gmail.com wrote:

 The short version is that you'll save yourself a lot of pain by starting to
 think of your images as (plane, row, column) instead of (x, y, z). 

There are several things to consider here.

1. The vertices in computer graphics (OpenGL) are (x,y,z).

2. OpenGL rotation matrices and projection matrice are stored in column
major order.

3. OpenGL frame buffers are indexed (x,y) in column major order with (0,0)
being lower left.

4. ITK and VTK depends on OpenGL and are thus using column major order.

5. Those who use Matlab or Fortran in addition to Python prefer column
major order.

6. BLAS and LAPACK use column major order.

7. The common notation in image prorcessing (as opposed to computer
graphics in geberal) is indexing (row, column), in row major order, with
(0,0) being upper left.

All in all, this is a strong case for prefering column major order and the
common mathematical notation (x,y,z). 

Also notice how the ususal notation in image pricessing differs from
OpenGL.


Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Change default order to Fortran order

2015-08-03 Thread SMRUTI RANJAN SAHOO
well its really great idea. i can help on python but i don't have any
knowledge on fortran.

On Sun, Aug 2, 2015 at 7:25 PM, Kang Wang kwan...@wisc.edu wrote:

 Hi,

 I am an imaging researcher, and a new Python user. My first Python project
 is to somehow modify NumPy source code such that everything is Fortran
 column-major by default.

 I read about the information in the link below, but for us, the fact is
 that *we absolutely want to  use Fortran column major, and we want to
 make it default. Explicitly writing  order = 'F'  all over the place is
 not acceptable to us*.

 http://docs.scipy.org/doc/numpy/reference/internals.html#multidimensional-array-indexing-order-issues

 I tried searching in this email list, as well as google search in general.
 However, I have not found anything useful. This must be a common
 request/need, I believe.

 Can anyone provide any insight/help?

 Thank you very much,

 Kang

 --
 *Kang Wang, Ph.D.*
  Highland Ave., Room 1113
 Madison, WI 53705-2275
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: Deprecate np.int, np.float, etc.?

2015-08-03 Thread Chris Barker
On Mon, Aug 3, 2015 at 11:05 AM, Sturla Molden sturla.mol...@gmail.com
wrote:

 On 03/08/15 18:25, Chris Barker wrote:



   [NOTE: is there a C long dtype? I can't find it at the moment...]

 There is, it is called np.int.


well, IIUC,  np.int is the python integer type, which is a C long in all
the implemtations of cPython that I know about -- but is that a guarantee?
in the future as well? For instance, if it were up to me, I'd use an
int_64_t on all 64 bit platforms, rather than having that odd 32 bit on
Windows, 64 bit on *nix silliness

This just illustrates the problem...


So another minor proposal: add a numpy.longc type, which would be platform
C long. (and probably just an alias to something already there).

-Chris

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: Deprecate np.int, np.float, etc.?

2015-08-03 Thread Sturla Molden
On 03/08/15 18:25, Chris Barker wrote:

 2) The vagaries of the standard C types:  int, long, etc (spelled
 np.intc, which is a int32 on my machine, anyway)
  [NOTE: is there a C long dtype? I can't find it at the moment...]

There is, it is called np.int. This just illustrates the problem...


Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: Deprecate np.int, np.float, etc.?

2015-08-03 Thread Allan Haldane
On 08/03/2015 12:25 PM, Chris Barker wrote:
 2) The vagaries of the standard C types:  int, long, etc (spelled
 np.intc, which is a int32 on my machine, anyway)
 [NOTE: is there a C long dtype? I can't find it at the moment...]

Numpy does define the platform dependent C integer types short, long,
longlong and their unsigned versions according to the docs. size_t is
the same size as intc.

Even though float and double are virtually always IEEE single and double
precision, maybe for consistency we should also define np.floatc,
np.doublec and np.longdoublec?

Allan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Change default order to Fortran order

2015-08-03 Thread Charles R Harris
On Mon, Aug 3, 2015 at 10:24 AM, Matthew Brett matthew.br...@gmail.com
wrote:

 On Mon, Aug 3, 2015 at 5:01 PM, Sturla Molden sturla.mol...@gmail.com
 wrote:
  Matthew Brett matthew.br...@gmail.com wrote:
 
  Sure, but to avoid confusion, maybe move the discussion of image
  indexing order to another thread?
 
  I think this thread is about memory layout, which is a different issue.
 
  It is actually a bit convoluted and not completely orthogonal. Memory
  layout does not matter for 2d ndexing, i.e. (x,y) vs. (row, column), if
 you
  are careful when iterating, but it does matter for Nd indexing. There is
 a
  reason to prefer (x,y,z,t,r) in column major order or (recording, time,
  slice, row, column) in row major order. Otherwise you can get very
  inefficient memory traversals. Then if you work with visualization
  libraries that expects (x,y,z) and column major order, e.g. ITK, VTK and
  OpenGL, this is really what you want to use. And the choise of indexing
  (x,y,z) cannot be seen as independent of the memory layout. Remember, it
 is
  not just a matter of mapping coordinates to pixels. The data sets are so
  large in MRI processing that memory layout does matter.

 I completely agree that memory layout can affect performance, and this
 can be important.

 On the other hand, I think you agree that the relationship of axis
 order to memory layout is just a matter of mapping coordinates to
 pixels.

 So you can change the axis ordering without changing the memory layout
 and the memory layout without changing the axis ordering.

 Of course you could argue that it would be simpler to fuse the two
 issues, and enforce one memory layout - say Fortran.The result
 might well be easier think about, but it wouldn't be much like numpy,
 and it would have lots of performance and memory disadvantages.

 Cheers,


I would also strongly suggest that once you have decided on a convention
that you thoroughly document it somewhere. That will not only help you, but
anyone who later needs to maintain the code will bless you rather than d*nm
you to eternal torture by the seven demons of Ipsos.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Change default order to Fortran order

2015-08-03 Thread Sebastian Berg
On Mon Aug 3 10:49:35 2015 GMT+0200, Matthew Brett wrote:
 Hi,
 
 On Mon, Aug 3, 2015 at 8:09 AM, Nathaniel Smith n...@pobox.com wrote:
  On Aug 2, 2015 11:06 PM, Kang Wang kwan...@wisc.edu wrote:
 
  This is very good discussion. Thank you all for replying.
 
  I can see the fundamental difference is that I always
  think/talk/read/write a 3D image as I(x, y, z), not (plane, row, column) . 
  I
  am coming from MRI (magnetic resonance imaging) research, and I can assure
  you that the entire MRI community is using (x, y, z), including books,
  journal papers, conference abstracts, presentations, everything. We even
  talk about what we called logical x/y/z and physical x/y/z, and the
  rotation matrix that converts the two coordinate systems. The radiologists
  are also used to (x, y, z). For example, we always say my image is 256 by
  256 by 20 slices, and we never say 20 by 256 by 256.
 
  So, basically, at least in MRI, we always talk about an image as I(x, y,
  z), and we always assume that x is the fastest changing index. That's why
  I prefer column-major (because it is more natural).
 
  Of course, I can totally get my work done by using row-major, I just have
  to always remind myself write last dimension index first when coding. I
  actually have done this before, and I found it would be so much easier if
  just using column-major.
 
  Why not just use I[x, y, z] like you're used to, and let the computer worry
  about the physical layout in memory? Sometimes this will be Fortran order
  and sometimes C order and sometimes something else, but you don't have to
  know or care; 99% of the time it doesn't matter. The worst case is that when
  you use a python wrapper to call into a library that can only handle Fortran
  order, then the wrapper will quietly have to convert the memory order around
  and it will be slightly slower than if you had used Fortran order in the
  first place. But in practice you'll barely ever notice this, and when you
  do, *then* you can tell numpy explicitly what memory layout you want in the
  situation where it matters.
 
 Yes - if you are using numpy, you really have to look numpy in the eye and 
 say:
 
 I will let you worry about the array element order in memory, and in
 return, you promise to make indexing work as I would expect
 
 Just for example, let's say you loaded an MRI image into memory:
 
 In [1]: import nibabel
 In [2]: img = nibabel.load('my_mri.nii')
 In [3]: data = img.get_data()
 
 Because NIfTI images are Fortran memory layout, this happens to be the
 memory layout you get for your array:
 
 In [4]: data.flags
 Out[4]:
   C_CONTIGUOUS : False
   F_CONTIGUOUS : True
   OWNDATA : False
   WRITEABLE : True
   ALIGNED : True
   UPDATEIFCOPY : False
 
 But now - in Python - all I care about is what data I have on the
 first, second, third axes.  For example, I could do this:
 
 In [5]: data_copy = data.copy()
 
 This has exactly the same values as the original array, and at the
 same index positions:
 
 In [7]: import numpy as np
 In [8]: np.all(data == data)
 Out[8]: memmap(True, dtype=bool)
 
 but I now have a C memory layout array.
 
 In [9]: data_copy.flags
 Out[9]:
   C_CONTIGUOUS : True
   F_CONTIGUOUS : False
   OWNDATA : True
   WRITEABLE : True
   ALIGNED : True
   UPDATEIFCOPY : False


Yeah, I would like to second those arguments. Most of the time, there is no 
need to worry about layout. For large chunks you allocate, it may make sense 
for speed, etc. So you can alias creation functions.  Generally, I would 
suggest to simply not worry about the memory layout. Also do not *trust* the 
layout for most function returns. If you need a specific layout to interface 
other code, always check what you got it.

-Sebastian 

 
 Worse than that, if I slice my original data array, then I get an
 array that is neither C- or Fortran- compatible in memory:
 
 In [10]: data_view = data[:, :, ::2]
 In [11]: data_view.flags
 Out[11]:
   C_CONTIGUOUS : False
   F_CONTIGUOUS : False
   OWNDATA : False
   WRITEABLE : True
   ALIGNED : True
   UPDATEIFCOPY : False
 
 So - if you want every array to be Fortran-contiguous in memory, I
 would not start with numpy at all, I would write your own array
 library.
 
 The alternative - or the numpy way - is to give up on enforcing a
 particular layout in memory, until you need to pass an array to some C
 or C++ or Fortran code that needs some particular layout, in which
 case you get your extension code to copy the array into the required
 layout on entry.   Of course this is what numpy itself has to do when
 interfacing with external libraries like BLAS or LAPACK.
 
 Cheers,
 
 Matthew
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Change default order to Fortran order

2015-08-03 Thread Gregory Lee
I agree that often you don't need to worry about the memory order.
However, it is not uncommon in medical imaging to go back and forth between
a 2D or 3D image representation and a 1D array representation (e.g. as
often used in image reconstruction algorithms).  I found that the main time
it was necessary to pay careful attention to the memory layout was when
converting Matlab scripts that involve reshaping operations.

On Mon, Aug 3, 2015 at 8:02 AM, Sebastian Berg sebast...@sipsolutions.net
wrote:

 On Mon Aug 3 10:49:35 2015 GMT+0200, Matthew Brett wrote:
  Hi,
 
  On Mon, Aug 3, 2015 at 8:09 AM, Nathaniel Smith n...@pobox.com wrote:
   On Aug 2, 2015 11:06 PM, Kang Wang kwan...@wisc.edu wrote:
  
   This is very good discussion. Thank you all for replying.
  
   I can see the fundamental difference is that I always
   think/talk/read/write a 3D image as I(x, y, z), not (plane, row,
 column) . I
   am coming from MRI (magnetic resonance imaging) research, and I can
 assure
   you that the entire MRI community is using (x, y, z), including books,
   journal papers, conference abstracts, presentations, everything. We
 even
   talk about what we called logical x/y/z and physical x/y/z, and
 the
   rotation matrix that converts the two coordinate systems. The
 radiologists
   are also used to (x, y, z). For example, we always say my image is
 256 by
   256 by 20 slices, and we never say 20 by 256 by 256.
  
   So, basically, at least in MRI, we always talk about an image as I(x,
 y,
   z), and we always assume that x is the fastest changing index.
 That's why
   I prefer column-major (because it is more natural).
  
   Of course, I can totally get my work done by using row-major, I just
 have
   to always remind myself write last dimension index first when
 coding. I
   actually have done this before, and I found it would be so much
 easier if
   just using column-major.
  
   Why not just use I[x, y, z] like you're used to, and let the computer
 worry
   about the physical layout in memory? Sometimes this will be Fortran
 order
   and sometimes C order and sometimes something else, but you don't have
 to
   know or care; 99% of the time it doesn't matter. The worst case is
 that when
   you use a python wrapper to call into a library that can only handle
 Fortran
   order, then the wrapper will quietly have to convert the memory order
 around
   and it will be slightly slower than if you had used Fortran order in
 the
   first place. But in practice you'll barely ever notice this, and when
 you
   do, *then* you can tell numpy explicitly what memory layout you want
 in the
   situation where it matters.
 
  Yes - if you are using numpy, you really have to look numpy in the eye
 and say:
 
  I will let you worry about the array element order in memory, and in
  return, you promise to make indexing work as I would expect
 
  Just for example, let's say you loaded an MRI image into memory:
 
  In [1]: import nibabel
  In [2]: img = nibabel.load('my_mri.nii')
  In [3]: data = img.get_data()
 
  Because NIfTI images are Fortran memory layout, this happens to be the
  memory layout you get for your array:
 
  In [4]: data.flags
  Out[4]:
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
 
  But now - in Python - all I care about is what data I have on the
  first, second, third axes.  For example, I could do this:
 
  In [5]: data_copy = data.copy()
 
  This has exactly the same values as the original array, and at the
  same index positions:
 
  In [7]: import numpy as np
  In [8]: np.all(data == data)
  Out[8]: memmap(True, dtype=bool)
 
  but I now have a C memory layout array.
 
  In [9]: data_copy.flags
  Out[9]:
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
 

 Yeah, I would like to second those arguments. Most of the time, there is
 no need to worry about layout. For large chunks you allocate, it may make
 sense for speed, etc. So you can alias creation functions.  Generally, I
 would suggest to simply not worry about the memory layout. Also do not
 *trust* the layout for most function returns. If you need a specific layout
 to interface other code, always check what you got it.

 -Sebastian


  Worse than that, if I slice my original data array, then I get an
  array that is neither C- or Fortran- compatible in memory:
 
  In [10]: data_view = data[:, :, ::2]
  In [11]: data_view.flags
  Out[11]:
C_CONTIGUOUS : False
F_CONTIGUOUS : False
OWNDATA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
 
  So - if you want every array to be Fortran-contiguous in memory, I
  would not start with numpy at all, I would write your own array
  library.
 
  The alternative - or the numpy way - is to give up on enforcing a
  particular layout in memory, until you need to pass an array to some C
  or C++ or Fortran 

Re: [Numpy-discussion] Proposal: Deprecate np.int, np.float, etc.?

2015-08-03 Thread Sturla Molden
On 03/08/15 20:51, Chris Barker wrote:

 well, IIUC, np.int http://np.int is the python integer type, which is
 a C long in all the implemtations of cPython that I know about -- but is
 that a guarantee?in the future as well?

It is a Python int on Python 2.

On Python 3 dtype=np.int means the dtype will be C long, because a 
Python int has no size limit. But np.int aliases Python int. And 
creating an array with dype=int therefore does not create an array of 
Python int, it creates an array of C long. To actually get dtype=int we 
have to write dtype=object, which is just crazy.


Sturla



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion