Re: [Numpy-discussion] object array alignment issues

2009-10-20 Thread Michael Droettboom
Thanks.  It's passing the related unit tests on Sparc SunOS 5, and Linux 
x86.

Cheers,
Mike

Charles R Harris wrote:
>
>
> On Tue, Oct 20, 2009 at 10:13 AM, Charles R Harris 
> mailto:charlesr.har...@gmail.com>> wrote:
>
>
>
> On Tue, Oct 20, 2009 at 7:02 AM, Michael Droettboom
> mailto:md...@stsci.edu>> wrote:
>
> I've resubmitted the patch without whitespace-only changes.
>
> For what it's worth, I had followed the directions here:
>
> http://projects.scipy.org/numpy/wiki/EmacsSetup
>
> which say to perform "untabify" and "whitespace-cleanup".  Are
> those not
> current?  I had added these to my pre-save hooks under my
> numpy tree.
>
>
> The problem is that hard tabs have crept into the file. The strict
> approach in this case is to make two patches: the first cleans up
> the hard tabs, the second fixes the problems.
>
> How about I fix up the hard tabs and then you can make another patch?
>
>
> I applied the patch. Can you test it?
>
> Chuck
>
> 
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>   

-- 
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] object array alignment issues

2009-10-20 Thread Charles R Harris
On Tue, Oct 20, 2009 at 10:13 AM, Charles R Harris <
charlesr.har...@gmail.com> wrote:

>
>
> On Tue, Oct 20, 2009 at 7:02 AM, Michael Droettboom wrote:
>
>> I've resubmitted the patch without whitespace-only changes.
>>
>> For what it's worth, I had followed the directions here:
>>
>> http://projects.scipy.org/numpy/wiki/EmacsSetup
>>
>> which say to perform "untabify" and "whitespace-cleanup".  Are those not
>> current?  I had added these to my pre-save hooks under my numpy tree.
>>
>>
> The problem is that hard tabs have crept into the file. The strict approach
> in this case is to make two patches: the first cleans up the hard tabs, the
> second fixes the problems.
>
> How about I fix up the hard tabs and then you can make another patch?
>
>
I applied the patch. Can you test it?

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] object array alignment issues

2009-10-20 Thread Charles R Harris
On Tue, Oct 20, 2009 at 7:02 AM, Michael Droettboom  wrote:

> I've resubmitted the patch without whitespace-only changes.
>
> For what it's worth, I had followed the directions here:
>
> http://projects.scipy.org/numpy/wiki/EmacsSetup
>
> which say to perform "untabify" and "whitespace-cleanup".  Are those not
> current?  I had added these to my pre-save hooks under my numpy tree.
>
>
The problem is that hard tabs have crept into the file. The strict approach
in this case is to make two patches: the first cleans up the hard tabs, the
second fixes the problems.

How about I fix up the hard tabs and then you can make another patch?



Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] object array alignment issues

2009-10-20 Thread Michael Droettboom
I've resubmitted the patch without whitespace-only changes.

For what it's worth, I had followed the directions here:

http://projects.scipy.org/numpy/wiki/EmacsSetup

which say to perform "untabify" and "whitespace-cleanup".  Are those not 
current?  I had added these to my pre-save hooks under my numpy tree.

Cheers,
Mike

Charles R Harris wrote:
>
>
> On Mon, Oct 19, 2009 at 4:36 PM, Robert Kern  > wrote:
>
> On Mon, Oct 19, 2009 at 17:28, Charles R Harris
> mailto:charlesr.har...@gmail.com>> wrote:
> >
> > On Mon, Oct 19, 2009 at 3:55 PM, Travis Oliphant
> mailto:oliph...@enthought.com>>
> > wrote:
>
> >> Right now, though, the patch has too many white-space only
> changes in
> >> it.  Could you submit a new patch that removes those changes?
> >
> > The old whitespace is hard tabs and needs to be replaced anyway.
> The new
> > whitespace doesn't always get the indentation right, however.
> That file
> > needs a style/whitespace cleanup.
>
> That's fine, but whitespace cleanup needs to be done in commits that
> are separate from the functional changes.
>
>
> I agree, but it can be tricky to preserve hard tabs when your editor 
> uses spaces and has hard tabs set to 8 spaces. That file is on my 
> cleanup list anyway, I'll try to get to it this weekend.
>
> Chuck
>
> 
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>   

-- 
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] object array alignment issues

2009-10-19 Thread Charles R Harris
On Mon, Oct 19, 2009 at 4:36 PM, Robert Kern  wrote:

> On Mon, Oct 19, 2009 at 17:28, Charles R Harris
>  wrote:
> >
> > On Mon, Oct 19, 2009 at 3:55 PM, Travis Oliphant  >
> > wrote:
>
> >> Right now, though, the patch has too many white-space only changes in
> >> it.  Could you submit a new patch that removes those changes?
> >
> > The old whitespace is hard tabs and needs to be replaced anyway. The new
> > whitespace doesn't always get the indentation right, however. That file
> > needs a style/whitespace cleanup.
>
> That's fine, but whitespace cleanup needs to be done in commits that
> are separate from the functional changes.
>
>
I agree, but it can be tricky to preserve hard tabs when your editor uses
spaces and has hard tabs set to 8 spaces. That file is on my cleanup list
anyway, I'll try to get to it this weekend.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] object array alignment issues

2009-10-19 Thread Robert Kern
On Mon, Oct 19, 2009 at 17:28, Charles R Harris
 wrote:
>
> On Mon, Oct 19, 2009 at 3:55 PM, Travis Oliphant 
> wrote:

>> Right now, though, the patch has too many white-space only changes in
>> it.  Could you submit a new patch that removes those changes?
>
> The old whitespace is hard tabs and needs to be replaced anyway. The new
> whitespace doesn't always get the indentation right, however. That file
> needs a style/whitespace cleanup.

That's fine, but whitespace cleanup needs to be done in commits that
are separate from the functional changes.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] object array alignment issues

2009-10-19 Thread Charles R Harris
On Mon, Oct 19, 2009 at 3:55 PM, Travis Oliphant wrote:

>
> On Oct 19, 2009, at 9:55 AM, Michael Droettboom wrote:
>
> > I've filed a bug and attached a patch:
> >
> > http://projects.scipy.org/numpy/ticket/1267
> >
> > No guarantees that I've found all of the alignment issues.  I did a
> > grep
> > for "PyObject **" to find possible locations where PyObject * in
> > arrays
> > were being dereferenced.  If I could write a unit test to make it fall
> > over on Solaris, then I fixed it, otherwise I left it alone.  For
> > example, there are places where misaligned dereferencing is
> > theoretically possible (OBJECT_dot, OBJECT_compare), but a higher
> > level
> > function already did a "BEHAVED" array cast.  In those cases I added a
> > unit test so hopefully we'll be able to catch it in the future if the
> > caller no longer ensures well-behavedness.
>
>
> This patch looks great technically.  Thank you for tracking this down
> and correcting my error.
>
> Right now, though, the patch has too many white-space only changes in
> it.  Could you submit a new patch that removes those changes?
>
>
The old whitespace is hard tabs and needs to be replaced anyway. The new
whitespace doesn't always get the indentation right, however. That file
needs a style/whitespace cleanup.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] object array alignment issues

2009-10-19 Thread Travis Oliphant

On Oct 19, 2009, at 9:55 AM, Michael Droettboom wrote:

> I've filed a bug and attached a patch:
>
> http://projects.scipy.org/numpy/ticket/1267
>
> No guarantees that I've found all of the alignment issues.  I did a  
> grep
> for "PyObject **" to find possible locations where PyObject * in  
> arrays
> were being dereferenced.  If I could write a unit test to make it fall
> over on Solaris, then I fixed it, otherwise I left it alone.  For
> example, there are places where misaligned dereferencing is
> theoretically possible (OBJECT_dot, OBJECT_compare), but a higher  
> level
> function already did a "BEHAVED" array cast.  In those cases I added a
> unit test so hopefully we'll be able to catch it in the future if the
> caller no longer ensures well-behavedness.


This patch looks great technically.  Thank you for tracking this down  
and correcting my error.

Right now, though, the patch has too many white-space only changes in  
it.  Could you submit a new patch that removes those changes?

Thanks,

-Travis

--
Travis Oliphant
Enthought Inc.
1-512-536-1057
http://www.enthought.com
oliph...@enthought.com





___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] object array alignment issues

2009-10-19 Thread Michael Droettboom
I've filed a bug and attached a patch:

http://projects.scipy.org/numpy/ticket/1267

No guarantees that I've found all of the alignment issues.  I did a grep 
for "PyObject **" to find possible locations where PyObject * in arrays 
were being dereferenced.  If I could write a unit test to make it fall 
over on Solaris, then I fixed it, otherwise I left it alone.  For 
example, there are places where misaligned dereferencing is 
theoretically possible (OBJECT_dot, OBJECT_compare), but a higher level 
function already did a "BEHAVED" array cast.  In those cases I added a 
unit test so hopefully we'll be able to catch it in the future if the 
caller no longer ensures well-behavedness.

The unit tests are passing with this patch on Sparc (SunOS 5.8), x86 
(RHEL 4) and x86_64 (RHEL 4).  Those of you who care about less common 
architectures may want to try the patch out.  Since I don't know the 
alignment requirements of all of the supported platforms, I erred on the 
side of caution: only x86 and x86_64 will perform unaligned pointer 
dereferencing -- Everything else will use the slower-but-sure-to-work 
memcpy approach.  That can easily be changed in npy_cpu.h if necessary.

Mike

Charles R Harris wrote:
>
>
> On Sun, Oct 18, 2009 at 6:04 AM, Michael Droettboom  > wrote:
>
> On 10/16/2009 11:35 PM, Travis Oliphant wrote:
> >
> > On Oct 15, 2009, at 11:40 AM, Michael Droettboom wrote:
> >
> >> I recently committed a regression test and bugfix for object
> pointers in
> >> record arrays of unaligned size (meaning where each record is not a
> >> multiple of sizeof(PyObject **)).
> >>
> >> For example:
> >>
> >>a1 = np.zeros((10,), dtype=[('o', 'O'), ('c', 'c')])
> >>a2 = np.zeros((10,), 'S10')
> >># This copying would segfault
> >>a1['o'] = a2
> >>
> >> http://projects.scipy.org/numpy/ticket/1198
> >>
> >> Unfortunately, this unit test has opened up a whole hornet's
> nest of
> >> alignment issues on Solaris.  The various reference counting
> functions
> >> (PyArray_INCREF etc.) in refcnt.c all fail on unaligned object
> pointers,
> >> for instance.  Interestingly, there are comments in there saying
> >> "handles misaligned data" (eg. line 190), but in fact it
> doesn't, and
> >> doesn't look to me like it would.  But I won't rule out a
> mistake in
> >> building it on my part.
> >
> > Thanks for this bug report.  It would be very helpful if you
> could
> > provide the line number where the code is giving a bus error and
> > explain why you think the code in question does not handle
> misaligned
> > data (it still seems like it should to me --- but perhaps I must be
> > missing something --- I don't have a Solaris box to test on).
> > Perhaps, the real problem is elsewhere (such as other places
> where the
> > mistake of forgetting about striding needing to be aligned also
> before
> > pursuing the fast alignment path that you pointed out in another
> place
> > of code).
> >
> > This was the thinking for why the code (that I think is in question)
> > should handle mis-aligned data:
> >
> > 1) pointers that are not aligned to the correct size need to be
> copied
> > to an aligned memory area before being de-referenced.
> > 2) static variables defined in a function will be aligned by the C
> > compiler.
> >
> > So, what the code in refcnt.c does is to copy the value in the NumPy
> > data-area (i.e. pointed to by it->dataptr) to another memory
> location
> > (the stack variable temp), dereference it and then increment it's
> > reference count.
> >
> > 196:  temp = (PyObject **)it->dataptr;
> > 197:  Py_XINCREF(*temp);
> This is exactly an instance that fails.  Let's say we have a
> PyObject at
> an aligned location 0x4000 (PyObjects themselves always seem to be
> aligned -- I strongly suspect CPython is enforcing that).  Then,
> we can
> create a recarray such that some of the PyObject*'s in it are at
> unaligned locations.  For example, if the dtype is 'O,c', you have a
> record stride of 5 which creates unaligned PyObject*'s:
>
>ccc
>0123456789abcde
> ^^
>
> Now in the code above, let's assume that it->dataptr points to an
> unaligned location, 0x8005.  Assigning it to temp puts the same
> unaligned value in temp, 0x8005.  That is:
>
> &temp == 0x1000 /* The location of temp *is* on the stack and
> aligned */
>temp == 0x8005 /* But its value as a pointer points to an unaligned
> memory location */
>*temp == 0x4000 /* Dereferencing it should get us back to the
> original
>   PyObject * pointer, but dereferencing an
> unaligned memory location
>   

Re: [Numpy-discussion] object array alignment issues

2009-10-18 Thread Charles R Harris
On Sun, Oct 18, 2009 at 6:04 AM, Michael Droettboom  wrote:

> On 10/16/2009 11:35 PM, Travis Oliphant wrote:
> >
> > On Oct 15, 2009, at 11:40 AM, Michael Droettboom wrote:
> >
> >> I recently committed a regression test and bugfix for object pointers in
> >> record arrays of unaligned size (meaning where each record is not a
> >> multiple of sizeof(PyObject **)).
> >>
> >> For example:
> >>
> >>a1 = np.zeros((10,), dtype=[('o', 'O'), ('c', 'c')])
> >>a2 = np.zeros((10,), 'S10')
> >># This copying would segfault
> >>a1['o'] = a2
> >>
> >> http://projects.scipy.org/numpy/ticket/1198
> >>
> >> Unfortunately, this unit test has opened up a whole hornet's nest of
> >> alignment issues on Solaris.  The various reference counting functions
> >> (PyArray_INCREF etc.) in refcnt.c all fail on unaligned object pointers,
> >> for instance.  Interestingly, there are comments in there saying
> >> "handles misaligned data" (eg. line 190), but in fact it doesn't, and
> >> doesn't look to me like it would.  But I won't rule out a mistake in
> >> building it on my part.
> >
> > Thanks for this bug report.  It would be very helpful if you could
> > provide the line number where the code is giving a bus error and
> > explain why you think the code in question does not handle misaligned
> > data (it still seems like it should to me --- but perhaps I must be
> > missing something --- I don't have a Solaris box to test on).
> > Perhaps, the real problem is elsewhere (such as other places where the
> > mistake of forgetting about striding needing to be aligned also before
> > pursuing the fast alignment path that you pointed out in another place
> > of code).
> >
> > This was the thinking for why the code (that I think is in question)
> > should handle mis-aligned data:
> >
> > 1) pointers that are not aligned to the correct size need to be copied
> > to an aligned memory area before being de-referenced.
> > 2) static variables defined in a function will be aligned by the C
> > compiler.
> >
> > So, what the code in refcnt.c does is to copy the value in the NumPy
> > data-area (i.e. pointed to by it->dataptr) to another memory location
> > (the stack variable temp), dereference it and then increment it's
> > reference count.
> >
> > 196:  temp = (PyObject **)it->dataptr;
> > 197:  Py_XINCREF(*temp);
> This is exactly an instance that fails.  Let's say we have a PyObject at
> an aligned location 0x4000 (PyObjects themselves always seem to be
> aligned -- I strongly suspect CPython is enforcing that).  Then, we can
> create a recarray such that some of the PyObject*'s in it are at
> unaligned locations.  For example, if the dtype is 'O,c', you have a
> record stride of 5 which creates unaligned PyObject*'s:
>
>ccc
>0123456789abcde
> ^^
>
> Now in the code above, let's assume that it->dataptr points to an
> unaligned location, 0x8005.  Assigning it to temp puts the same
> unaligned value in temp, 0x8005.  That is:
>
> &temp == 0x1000 /* The location of temp *is* on the stack and aligned */
>temp == 0x8005 /* But its value as a pointer points to an unaligned
> memory location */
>*temp == 0x4000 /* Dereferencing it should get us back to the original
>   PyObject * pointer, but dereferencing an
> unaligned memory location
>   fails with a bus error on Solaris */
>
> So the bus error occurs on line 197.
>
> Note that something like:
>
>PyObject* temp;
>temp = *(PyObject **)it->dataptr;
>
> would also fail.
>
> The solution (this is what works for me, though there may be a better way):
>
> PyObject *temp; /* NB: temp is now a (PyObject *), not a (PyObject
> **) */
> /* memcpy works byte-by-byte, so can handle an unaligned assignment */
> memcpy(&temp, it->dataptr, sizeof(PyObject *));
> Py_XINCREF(temp);
>
> I'm proposing adding a macro which on Intel/AMD would be defined as:
>
> #define COPY_PYOBJECT_PTR(dst, src) (*(dst) = *(src))
>
> and on alignment-required platforms as:
>
> #define COPY_PYOBJECT_PTR(dst, src) (memcpy((dst), (src),
> sizeof(PyObject *))
>
> and it would be used something like:
>
> COPY_PYOBJECT_PTR(&temp, it->dataptr);
>
>
This looks right to me, but I'll let Travis sign off on it.



Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] object array alignment issues

2009-10-18 Thread Michael Droettboom
On 10/16/2009 11:35 PM, Travis Oliphant wrote:
>
> On Oct 15, 2009, at 11:40 AM, Michael Droettboom wrote:
>
>> I recently committed a regression test and bugfix for object pointers in
>> record arrays of unaligned size (meaning where each record is not a
>> multiple of sizeof(PyObject **)).
>>
>> For example:
>>
>>a1 = np.zeros((10,), dtype=[('o', 'O'), ('c', 'c')])
>>a2 = np.zeros((10,), 'S10')
>># This copying would segfault
>>a1['o'] = a2
>>
>> http://projects.scipy.org/numpy/ticket/1198
>>
>> Unfortunately, this unit test has opened up a whole hornet's nest of
>> alignment issues on Solaris.  The various reference counting functions
>> (PyArray_INCREF etc.) in refcnt.c all fail on unaligned object pointers,
>> for instance.  Interestingly, there are comments in there saying
>> "handles misaligned data" (eg. line 190), but in fact it doesn't, and
>> doesn't look to me like it would.  But I won't rule out a mistake in
>> building it on my part.
>
> Thanks for this bug report.  It would be very helpful if you could 
> provide the line number where the code is giving a bus error and 
> explain why you think the code in question does not handle misaligned 
> data (it still seems like it should to me --- but perhaps I must be 
> missing something --- I don't have a Solaris box to test on).   
> Perhaps, the real problem is elsewhere (such as other places where the 
> mistake of forgetting about striding needing to be aligned also before 
> pursuing the fast alignment path that you pointed out in another place 
> of code).
>
> This was the thinking for why the code (that I think is in question) 
> should handle mis-aligned data:
>
> 1) pointers that are not aligned to the correct size need to be copied 
> to an aligned memory area before being de-referenced.
> 2) static variables defined in a function will be aligned by the C 
> compiler.
>
> So, what the code in refcnt.c does is to copy the value in the NumPy 
> data-area (i.e. pointed to by it->dataptr) to another memory location 
> (the stack variable temp), dereference it and then increment it's 
> reference count.
>
> 196:  temp = (PyObject **)it->dataptr;
> 197:  Py_XINCREF(*temp);
This is exactly an instance that fails.  Let's say we have a PyObject at 
an aligned location 0x4000 (PyObjects themselves always seem to be 
aligned -- I strongly suspect CPython is enforcing that).  Then, we can 
create a recarray such that some of the PyObject*'s in it are at 
unaligned locations.  For example, if the dtype is 'O,c', you have a 
record stride of 5 which creates unaligned PyObject*'s:

ccc
0123456789abcde
 ^^

Now in the code above, let's assume that it->dataptr points to an 
unaligned location, 0x8005.  Assigning it to temp puts the same 
unaligned value in temp, 0x8005.  That is:

&temp == 0x1000 /* The location of temp *is* on the stack and aligned */
temp == 0x8005 /* But its value as a pointer points to an unaligned 
memory location */
*temp == 0x4000 /* Dereferencing it should get us back to the original
   PyObject * pointer, but dereferencing an 
unaligned memory location
   fails with a bus error on Solaris */

So the bus error occurs on line 197.

Note that something like:

PyObject* temp;
temp = *(PyObject **)it->dataptr;

would also fail.

The solution (this is what works for me, though there may be a better way):

 PyObject *temp; /* NB: temp is now a (PyObject *), not a (PyObject 
**) */
 /* memcpy works byte-by-byte, so can handle an unaligned assignment */
 memcpy(&temp, it->dataptr, sizeof(PyObject *));
 Py_XINCREF(temp);

I'm proposing adding a macro which on Intel/AMD would be defined as:

#define COPY_PYOBJECT_PTR(dst, src) (*(dst) = *(src))

and on alignment-required platforms as:

#define COPY_PYOBJECT_PTR(dst, src) (memcpy((dst), (src), 
sizeof(PyObject *))

and it would be used something like:

COPY_PYOBJECT_PTR(&temp, it->dataptr);

If you agree with this assessment, I'm working on a patch for all of the 
locations that require this change.  All that I've found so far are 
related to object arrays.  It seems that many places where this would be 
an issue for numeric types are already using this memcpy technique (e.g. 
*_copyswap in arraytype.c.src:1716).  I think this issue shows up in 
object arrays much more because there are many more places where the 
unaligned memory is dereferenced (in order to do reference counting).

So here's the traceback from:

a1 = np.zeros((10,), dtype=[('o', 'O'), ('c', 'c'), ('i', 'i'), ('c2', 
'c')])

Unfortunately, I'm having trouble getting line numbers out of the 
debugger, but "print statement debugging" tells me the inner most frame 
here is in refcount.c:

275PyObject **temp;
276Py_XINCREF(obj);
277temp = (PyObject **)optr;
278*temp = obj; /* <-- here */
279return;

My fix was:

Py_XINCREF(obj);
memcpy(optr, &obj, si

Re: [Numpy-discussion] object array alignment issues

2009-10-17 Thread Francesc Alted
A Friday 16 October 2009 18:05:05 Sturla Molden escrigué:
> Francesc Alted skrev:
> > The response is clear: avoid memcpy() if you can.  It is true that
> > memcpy() performance has improved quite a lot in latest gcc (it has been
> > quite good in Win versions since many years ago), but working with data
> > in-place (i.e. avoiding a memory copy) is always faster (and most
> > specially for large arrays that don't fit in cache processors).
> >
> > My own experiments says that, with an Intel Core2 processor the typical
> > speed- ups for avoiding memcpy() are 2x.
>
> If the underlying array is strided, I have seen the opposite as well.
> "Copy-in copy-out" is a common optimization used by Fortran compilers
> when working with strided arrays. The catch is that the work array has
> to fit in cache for this to make any sence. Anyhow, you cannot use
> memcpy for this kind of optimization - it assumes both buffers are
> contiguous. But working with arrays directly instead of copies is not
> always the faster option.

Mmh, don't know about Fortran (too many years without programming it), but in 
C it seems evident that performing a memcpy() is always slower, at least with 
modern CPUs (like the Intel Core2 that I'm using now):

In [43]: import numpy as np

In [44]: import numexpr as ne

In [45]: r = np.zeros(1e6, 'i1,i4,f8')

In [46]: f1, f2 = r['f1'], r['f2']

In [47]: f1.flags.aligned, f2.flags.aligned
Out[47]: (False, False)

In [48]: timeit f1*f2  # NumPy do copies before carrying out operations
100 loops, best of 3: 14.6 ms per loop

In [49]: timeit ne.evaluate('f1*f2')   # numexpr uses plain unaligned access
100 loops, best of 3: 5.77 ms per loop   # 2.5x faster than numpy

Using strides, the result is similar:

In [50]: f1, f2 = r['f1'][::2], r['f2'][::2]  # check with strides

In [51]: f1.flags.aligned, f2.flags.aligned
Out[51]: (False, False)

In [52]: timeit f1*f2
100 loops, best of 3: 7.52 ms per loop

In [53]: timeit ne.evaluate('f1*f2')
100 loops, best of 3: 3.96 ms per loop   # 1.9x faster than numpy

And, when using large strides so that the resulting arrays fit in cache:

In [54]: f1, f2 = r['f1'][::10], r['f2'][::10]  # big stride (fits in cache)

In [55]: timeit f1*f2
100 loops, best of 3: 3.51 ms per loop

In [56]: timeit ne.evaluate('f1*f2')
100 loops, best of 3: 2.61 ms per loop  # 34% faster than numpy

Which, although not much, still gives an advantage to the direct approach.
So, at least in C, operating with unaligned data on (modern) AMD/Intel 
processors seems to be fastest (at least in this quick-and-dirty benchmark).   
In fact, performance is very close to contiguous and aligned data:

In [58]: f1, f2 = r['f1'].copy(), r['f2'].copy()   # aligned and contiguous

In [59]: timeit f1*f2
100 loops, best of 3: 5.2 ms per loop

In [60]: timeit ne.evaluate('f1*f2')
100 loops, best of 3: 4.74 ms per loop

so 5.77 ms (unaligned data, In [49]) is not very far from 4.74 ms (aligned 
data, In [60]) and close to 'optimal' numpy performance (5.2 ms, In [59]).  
And, as I said before, the plans of AMD/Intel are to reduce this gap still 
further.

For unaligned arrays that fits in cache the results are even more dramatic:

In [61]: r = np.zeros(1e5, 'i1,i4,f8')

In [62]: f1, f2 = r['f1'], r['f2']

In [63]: timeit f1*f2
1000 loops, best of 3: 1.37 ms per loop

In [64]: timeit ne.evaluate('f1*f2')
1000 loops, best of 3: 293 µs per loop  #  4.7x speedup

but not sure why...

Cheers,

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] object array alignment issues

2009-10-16 Thread Charles R Harris
On Fri, Oct 16, 2009 at 9:35 PM, Travis Oliphant wrote:

>
> On Oct 15, 2009, at 11:40 AM, Michael Droettboom wrote:
>
> I recently committed a regression test and bugfix for object pointers in
> record arrays of unaligned size (meaning where each record is not a
> multiple of sizeof(PyObject **)).
>
> For example:
>
>a1 = np.zeros((10,), dtype=[('o', 'O'), ('c', 'c')])
>a2 = np.zeros((10,), 'S10')
># This copying would segfault
>a1['o'] = a2
>
> http://projects.scipy.org/numpy/ticket/1198
>
> Unfortunately, this unit test has opened up a whole hornet's nest of
> alignment issues on Solaris.  The various reference counting functions
> (PyArray_INCREF etc.) in refcnt.c all fail on unaligned object pointers,
> for instance.  Interestingly, there are comments in there saying
> "handles misaligned data" (eg. line 190), but in fact it doesn't, and
> doesn't look to me like it would.  But I won't rule out a mistake in
> building it on my part.
>
>
> Thanks for this bug report.  It would be very helpful if you could
> provide the line number where the code is giving a bus error and explain why
> you think the code in question does not handle misaligned data (it still
> seems like it should to me --- but perhaps I must be missing something --- I
> don't have a Solaris box to test on).   Perhaps, the real problem is
> elsewhere (such as other places where the mistake of forgetting about
> striding needing to be aligned also before pursuing the fast alignment path
> that you pointed out in another place of code).
>
> This was the thinking for why the code (that I think is in question) should
> handle mis-aligned data:
>
> 1) pointers that are not aligned to the correct size need to be copied to
> an aligned memory area before being de-referenced.
> 2) static variables defined in a function will be aligned by the C
> compiler.
>
> So, what the code in refcnt.c does is to copy the value in the NumPy
> data-area (i.e. pointed to by it->dataptr) to another memory location (the
> stack variable temp), dereference it and then increment it's reference
> count.
>
> 196:  temp = (PyObject **)it->dataptr;
> 197:  Py_XINCREF(*temp);
>

Doesn't it->dataptr need to be copied to temp, not just assigned?

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] object array alignment issues

2009-10-16 Thread Travis Oliphant


On Oct 15, 2009, at 11:40 AM, Michael Droettboom wrote:

I recently committed a regression test and bugfix for object  
pointers in

record arrays of unaligned size (meaning where each record is not a
multiple of sizeof(PyObject **)).

For example:

   a1 = np.zeros((10,), dtype=[('o', 'O'), ('c', 'c')])
   a2 = np.zeros((10,), 'S10')
   # This copying would segfault
   a1['o'] = a2

http://projects.scipy.org/numpy/ticket/1198

Unfortunately, this unit test has opened up a whole hornet's nest of
alignment issues on Solaris.  The various reference counting functions
(PyArray_INCREF etc.) in refcnt.c all fail on unaligned object  
pointers,

for instance.  Interestingly, there are comments in there saying
"handles misaligned data" (eg. line 190), but in fact it doesn't, and
doesn't look to me like it would.  But I won't rule out a mistake in
building it on my part.


Thanks for this bug report.  It would be very helpful if you could  
provide the line number where the code is giving a bus error and  
explain why you think the code in question does not handle misaligned  
data (it still seems like it should to me --- but perhaps I must be  
missing something --- I don't have a Solaris box to test on).
Perhaps, the real problem is elsewhere (such as other places where the  
mistake of forgetting about striding needing to be aligned also before  
pursuing the fast alignment path that you pointed out in another place  
of code).


This was the thinking for why the code (that I think is in question)  
should handle mis-aligned data:


1) pointers that are not aligned to the correct size need to be copied  
to an aligned memory area before being de-referenced.
2) static variables defined in a function will be aligned by the C  
compiler.


So, what the code in refcnt.c does is to copy the value in the NumPy  
data-area (i.e. pointed to by it->dataptr) to another memory location  
(the stack variable temp), dereference it and then increment it's  
reference count.


196:  temp = (PyObject **)it->dataptr;
197:  Py_XINCREF(*temp);

I'm puzzled why this should fail.The stack trace showing where  
this fails would be very useful in figuring out what to fix.



This is all independent of defining a variable to decide whether or  
not to even care about worrying about un-aligned data (which we could  
avoid worrying about on Intel and AMD).


I'm all in favor of such a flag if it would speed up code, but I don't  
see it as the central issue here.


Any more details about the bug you have found would be greatly  
appreciated.


-Travis




___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] object array alignment issues

2009-10-16 Thread Sturla Molden
Francesc Alted skrev:
> The response is clear: avoid memcpy() if you can.  It is true that memcpy() 
> performance has improved quite a lot in latest gcc (it has been quite good in 
> Win versions since many years ago), but working with data in-place (i.e. 
> avoiding a memory copy) is always faster (and most specially for large arrays 
> that don't fit in cache processors).
>
> My own experiments says that, with an Intel Core2 processor the typical speed-
> ups for avoiding memcpy() are 2x. 
If the underlying array is strided, I have seen the opposite as well. 
"Copy-in copy-out" is a common optimization used by Fortran compilers 
when working with strided arrays. The catch is that the work array has 
to fit in cache for this to make any sence. Anyhow, you cannot use 
memcpy for this kind of optimization - it assumes both buffers are 
contiguous. But working with arrays directly instead of copies is not 
always the faster option.

S.M.















>  And I've read somewhere that both AMD and 
> Intel are trying to make unaligned operations to go even faster in next 
> architectures (the goal is that there should be no speed difference in 
> accessing aligned or unaligned data).
>
>   
>> I believe the memcpy approach is used for other unaligned parts of void
>> types. There is an inherent performance penalty there, but I don't see how
>> it can be avoided when using what are essentially packed structures. As to
>> memcpy, it's performance seems to depend on the compiler/compiler version,
>> old versions of gcc had *horrible* implementations of memcpy. I believe the
>> situation has since improved. However, I'm not sure we should be coding to
>> compiler issues unless it is unavoidable or the gain is huge.
>> 
>
> IMO, NumPy can be improved for unaligned data handling.  For example, Numexpr 
> is using this small snippet:
>
> from cpuinfo import cpu
> if cpu.is_AMD() or cpu.is_Intel():
> is_cpu_amd_intel = True
> else:
> is_cpu_amd_intel = False
>
> for detecting AMD/Intel architectures and allowing the code to avoid memcpy() 
> calls for the unaligned arrays.
>
> The above code uses the excellent ``cpuinfo.py`` module from Pearu Peterson, 
> which is distributed under NumPy, so it should not be too difficult to take 
> advantage of this for avoiding unnecessary copies in this scenario.
>
>   

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] object array alignment issues

2009-10-16 Thread Michael Droettboom
On 10/16/2009 07:53 AM, Pauli Virtanen wrote:
> Fri, 16 Oct 2009 12:07:10 +0200, Francesc Alted wrote:
> [clip]
>
>> IMO, NumPy can be improved for unaligned data handling.  For example,
>> Numexpr is using this small snippet:
>>
>> from cpuinfo import cpu
>> if cpu.is_AMD() or cpu.is_Intel():
>>  is_cpu_amd_intel = True
>> else:
>>  is_cpu_amd_intel = False
>>
>> for detecting AMD/Intel architectures and allowing the code to avoid
>> memcpy() calls for the unaligned arrays.
>>
>> The above code uses the excellent ``cpuinfo.py`` module from Pearu
>> Peterson, which is distributed under NumPy, so it should not be too
>> difficult to take advantage of this for avoiding unnecessary copies in
>> this scenario.
>>  
> I suppose this kind of check is easiest to do at compile-time, and
> defining a -DFORCE_ALIGNED? This wouldn't cause performance penalties for
> those architectures for which they are not necessary.
>
>
That's close to the solution I'm arriving at.

I'm thinking of adding a macro "DEREF_UNALIGNED_PYOBJECT_PTR" which 
would do the right thing depending on the type of architecture.  There 
should be no impact on architectures that handle unaligned pointers, and 
slightly slower (but correct) performance on other architectures.

Mike

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] object array alignment issues

2009-10-16 Thread Francesc Alted
A Friday 16 October 2009 14:02:03 David Cournapeau escrigué:
> On Fri, Oct 16, 2009 at 8:53 PM, Pauli Virtanen  wrote:
> > Fri, 16 Oct 2009 12:07:10 +0200, Francesc Alted wrote:
> > [clip]
> >
> >> IMO, NumPy can be improved for unaligned data handling.  For example,
> >> Numexpr is using this small snippet:
> >>
> >> from cpuinfo import cpu
> >> if cpu.is_AMD() or cpu.is_Intel():
> >>     is_cpu_amd_intel = True
> >> else:
> >>     is_cpu_amd_intel = False
> >>
> >> for detecting AMD/Intel architectures and allowing the code to avoid
> >> memcpy() calls for the unaligned arrays.
> >>
> >> The above code uses the excellent ``cpuinfo.py`` module from Pearu
> >> Peterson, which is distributed under NumPy, so it should not be too
> >> difficult to take advantage of this for avoiding unnecessary copies in
> >> this scenario.
> >
> > I suppose this kind of check is easiest to do at compile-time, and
> > defining a -DFORCE_ALIGNED? This wouldn't cause performance penalties for
> > those architectures for which they are not necessary.
>
> I wonder whether we could switch at runtime (import time) - it could
> be useful for testing.
>
> That being said, I agree that the cpu checks should be done at compile
> time - we had quite a few problems with cpuinfo in the past with new
> cpu/unhandled cpu, I think a compilation-based method is much more
> robust (and simpler) here. There are things where C is just much
> easier than python :)

Agreed.  I'm relaying in ``cpuinfo.py`` just because it provides what I need 
in an easy way.  BTW, the detection of AMD/Intel (just the vendor) processors 
seems to work flawlessly for the platforms that I've checked (but I suppose 
that you are talking about other characteristics, like SSE version, etc).

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] object array alignment issues

2009-10-16 Thread David Cournapeau
On Fri, Oct 16, 2009 at 8:53 PM, Pauli Virtanen  wrote:
> Fri, 16 Oct 2009 12:07:10 +0200, Francesc Alted wrote:
> [clip]
>> IMO, NumPy can be improved for unaligned data handling.  For example,
>> Numexpr is using this small snippet:
>>
>> from cpuinfo import cpu
>> if cpu.is_AMD() or cpu.is_Intel():
>>     is_cpu_amd_intel = True
>> else:
>>     is_cpu_amd_intel = False
>>
>> for detecting AMD/Intel architectures and allowing the code to avoid
>> memcpy() calls for the unaligned arrays.
>>
>> The above code uses the excellent ``cpuinfo.py`` module from Pearu
>> Peterson, which is distributed under NumPy, so it should not be too
>> difficult to take advantage of this for avoiding unnecessary copies in
>> this scenario.
>
> I suppose this kind of check is easiest to do at compile-time, and
> defining a -DFORCE_ALIGNED? This wouldn't cause performance penalties for
> those architectures for which they are not necessary.

I wonder whether we could switch at runtime (import time) - it could
be useful for testing.

That being said, I agree that the cpu checks should be done at compile
time - we had quite a few problems with cpuinfo in the past with new
cpu/unhandled cpu, I think a compilation-based method is much more
robust (and simpler) here. There are things where C is just much
easier than python :)

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] object array alignment issues

2009-10-16 Thread Pauli Virtanen
Fri, 16 Oct 2009 12:07:10 +0200, Francesc Alted wrote:
[clip]
> IMO, NumPy can be improved for unaligned data handling.  For example,
> Numexpr is using this small snippet:
> 
> from cpuinfo import cpu
> if cpu.is_AMD() or cpu.is_Intel():
> is_cpu_amd_intel = True
> else:
> is_cpu_amd_intel = False
> 
> for detecting AMD/Intel architectures and allowing the code to avoid
> memcpy() calls for the unaligned arrays.
> 
> The above code uses the excellent ``cpuinfo.py`` module from Pearu
> Peterson, which is distributed under NumPy, so it should not be too
> difficult to take advantage of this for avoiding unnecessary copies in
> this scenario.

I suppose this kind of check is easiest to do at compile-time, and 
defining a -DFORCE_ALIGNED? This wouldn't cause performance penalties for 
those architectures for which they are not necessary.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] object array alignment issues

2009-10-16 Thread Francesc Alted
A Thursday 15 October 2009 19:00:04 Charles R Harris escrigué:
> > So, how to fix this?
> >
> > One obvious workaround is for users to pass "align=True" to the dtype
> > constructor.  This works if the dtype descriptor is a dictionary or
> > comma-separated string.  Is there a reason it couldn't be made to work
> > with the string-of-tuples form that I'm missing?  It would be marginally
> > more convenient from my application, but that's just a finesse issue.
> >
> > However, perhaps we should try to fix the underlying alignment
> > problems?  Unfortunately, it's not clear to me how to resolve them
> > without at least some performance penalty.  You either do an alignment
> > check of the pointer, and then memcpy if unaligned, or just always use
> > memcpy.  Not sure which is faster, as memcpy may have a fast path
> > already. These are object arrays anyway, so there's plenty of overhead
> > already, and I don't think this would affect regular numerical arrays.

The response is clear: avoid memcpy() if you can.  It is true that memcpy() 
performance has improved quite a lot in latest gcc (it has been quite good in 
Win versions since many years ago), but working with data in-place (i.e. 
avoiding a memory copy) is always faster (and most specially for large arrays 
that don't fit in cache processors).

My own experiments says that, with an Intel Core2 processor the typical speed-
ups for avoiding memcpy() are 2x.  And I've read somewhere that both AMD and 
Intel are trying to make unaligned operations to go even faster in next 
architectures (the goal is that there should be no speed difference in 
accessing aligned or unaligned data).

> I believe the memcpy approach is used for other unaligned parts of void
> types. There is an inherent performance penalty there, but I don't see how
> it can be avoided when using what are essentially packed structures. As to
> memcpy, it's performance seems to depend on the compiler/compiler version,
> old versions of gcc had *horrible* implementations of memcpy. I believe the
> situation has since improved. However, I'm not sure we should be coding to
> compiler issues unless it is unavoidable or the gain is huge.

IMO, NumPy can be improved for unaligned data handling.  For example, Numexpr 
is using this small snippet:

from cpuinfo import cpu
if cpu.is_AMD() or cpu.is_Intel():
is_cpu_amd_intel = True
else:
is_cpu_amd_intel = False

for detecting AMD/Intel architectures and allowing the code to avoid memcpy() 
calls for the unaligned arrays.

The above code uses the excellent ``cpuinfo.py`` module from Pearu Peterson, 
which is distributed under NumPy, so it should not be too difficult to take 
advantage of this for avoiding unnecessary copies in this scenario.

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] object array alignment issues

2009-10-15 Thread Charles R Harris
On Thu, Oct 15, 2009 at 10:40 AM, Michael Droettboom wrote:

> I recently committed a regression test and bugfix for object pointers in
> record arrays of unaligned size (meaning where each record is not a
> multiple of sizeof(PyObject **)).
>
> For example:
>
>a1 = np.zeros((10,), dtype=[('o', 'O'), ('c', 'c')])
>a2 = np.zeros((10,), 'S10')
># This copying would segfault
>a1['o'] = a2
>
> http://projects.scipy.org/numpy/ticket/1198
>
> Unfortunately, this unit test has opened up a whole hornet's nest of
> alignment issues on Solaris.


No surprise there. Good unit tests seem to routinely uncover hornet's nests
and Solaris is a platform that exercises the alignment part of the code. I
think it is great that you are finding these problems. We folks working on
Intel don't see them so much.


> The various reference counting functions
> (PyArray_INCREF etc.) in refcnt.c all fail on unaligned object pointers,
> for instance.  Interestingly, there are comments in there saying
> "handles misaligned data" (eg. line 190), but in fact it doesn't, and
> doesn't look to me like it would.  But I won't rule out a mistake in
> building it on my part.
>
> So, how to fix this?
>
> One obvious workaround is for users to pass "align=True" to the dtype
> constructor.  This works if the dtype descriptor is a dictionary or
> comma-separated string.  Is there a reason it couldn't be made to work
> with the string-of-tuples form that I'm missing?  It would be marginally
> more convenient from my application, but that's just a finesse issue.
>
> However, perhaps we should try to fix the underlying alignment
> problems?  Unfortunately, it's not clear to me how to resolve them
> without at least some performance penalty.  You either do an alignment
> check of the pointer, and then memcpy if unaligned, or just always use
> memcpy.  Not sure which is faster, as memcpy may have a fast path
> already. These are object arrays anyway, so there's plenty of overhead
> already, and I don't think this would affect regular numerical arrays.
>
>
I believe the memcpy approach is used for other unaligned parts of void
types. There is an inherent performance penalty there, but I don't see how
it can be avoided when using what are essentially packed structures. As to
memcpy, it's performance seems to depend on the compiler/compiler version,
old versions of gcc had *horrible* implementations of memcpy. I believe the
situation has since improved. However, I'm not sure we should be coding to
compiler issues unless it is unavoidable or the gain is huge.


> If we choose not to fix it, perhaps we should we try to warn when
> creating an unaligned recarray on platforms where it matters?  I do
> worry about having something that works perfectly well on one platform
> fail on another.
>
> In the meantime, I'll just mark the new regression test to "skip on
> Solaris".
>
>
Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] object array alignment issues

2009-10-15 Thread Michael Droettboom
I recently committed a regression test and bugfix for object pointers in 
record arrays of unaligned size (meaning where each record is not a 
multiple of sizeof(PyObject **)).

For example:

a1 = np.zeros((10,), dtype=[('o', 'O'), ('c', 'c')])
a2 = np.zeros((10,), 'S10')
# This copying would segfault
a1['o'] = a2

http://projects.scipy.org/numpy/ticket/1198

Unfortunately, this unit test has opened up a whole hornet's nest of 
alignment issues on Solaris.  The various reference counting functions 
(PyArray_INCREF etc.) in refcnt.c all fail on unaligned object pointers, 
for instance.  Interestingly, there are comments in there saying 
"handles misaligned data" (eg. line 190), but in fact it doesn't, and 
doesn't look to me like it would.  But I won't rule out a mistake in 
building it on my part.

So, how to fix this? 

One obvious workaround is for users to pass "align=True" to the dtype 
constructor.  This works if the dtype descriptor is a dictionary or 
comma-separated string.  Is there a reason it couldn't be made to work 
with the string-of-tuples form that I'm missing?  It would be marginally 
more convenient from my application, but that's just a finesse issue.

However, perhaps we should try to fix the underlying alignment 
problems?  Unfortunately, it's not clear to me how to resolve them 
without at least some performance penalty.  You either do an alignment 
check of the pointer, and then memcpy if unaligned, or just always use 
memcpy.  Not sure which is faster, as memcpy may have a fast path 
already. These are object arrays anyway, so there's plenty of overhead 
already, and I don't think this would affect regular numerical arrays. 

If we choose not to fix it, perhaps we should we try to warn when 
creating an unaligned recarray on platforms where it matters?  I do 
worry about having something that works perfectly well on one platform 
fail on another.

In the meantime, I'll just mark the new regression test to "skip on 
Solaris".

Mike

-- 
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion