Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-03-07 Thread Warren Weckesser
On Tue, Mar 6, 2012 at 4:45 PM, Chris Barker chris.bar...@noaa.gov wrote:

 On Thu, Mar 1, 2012 at 10:58 PM, Jay Bourque jayv...@gmail.com wrote:

  1. Loading text files using loadtxt/genfromtxt need a significant
  performance boost (I think at least an order of magnitude increase in
  performance is very doable based on what I've seen with Erin's recfile
 code)

  2. Improved memory usage. Memory used for reading in a text file
 shouldn’t
  be more than the file itself, and less if only reading a subset of file.

  3. Keep existing interfaces for reading text files (loadtxt, genfromtxt,
  etc). No new ones.

  4. Underlying code should keep IO iteration and transformation of data
  separate (awaiting more thoughts from Travis on this).

  5. Be able to plug in different transformations of data at low level
 (also
  awaiting more thoughts from Travis).

  6. memory mapping of text files?

  7. Eventually reduce memory usage even more by using same object for
  duplicate values in array (depends on implementing enum dtype?)

  Anything else?

 Yes -- I'd like to see the solution be able to do high -performance
 reads of a portion of a file -- not always the whole thing. I seem to
 have a number of custom text files that I need to read that are laid
 out in chunks: a bit of a header, then a block of number, another
 header, another block. I'm happy to read and parse the header sections
 with pure pyton, but would love a way to read the blocks of numbers
 into a numpy array fast. This will probably come out of the box with
 any of the proposed solutions, as long as they start at the current
 position of a passes-in fiel object, and can be told how much to read,
 then leave the file pointer in the correct position.



If you are setup with Cython to build extension modules, and you don't mind
testing an unreleased and experimental reader, you can try the text reader
that I'm working on: https://github.com/WarrenWeckesser/textreader

You can read a file like this, where the first line gives the number of
rows of the following array, and that pattern repeats:

5
1.0, 2.0, 3.0
4.0, 5.0, 6.0
7.0, 8.0, 9.0
10.0, 11.0, 12.0
13.0, 14.0, 15.0
3
1.0, 1.5, 2.0, 2.5
3.0, 3.5, 4.0, 4.5
5.0, 5.5, 6.0, 6.5
1
1.0D2, 1.25D-1, 6.25D-2, 99

with code like this:

import numpy as np
from textreader import readrows

filename = 'data/multi.dat'

f = open(filename, 'r')
line = f.readline()
while len(line)  0:
nrows = int(line)
a = readrows(f, np.float32, numrows=nrows, sci='D', delimiter=',')
print a:
print a
print
line = f.readline()


Warren
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Missing data again

2012-03-07 Thread Pierre Haessig
Hi,

Thanks you very much for your lights !

Le 06/03/2012 21:59, Nathaniel Smith a écrit :
 Right -- R has a very impoverished type system as compared to numpy.
 There's basically four types: numeric (meaning double precision
 float), integer, logical (boolean), and character (string). And
 in practice the integer type is essentially unused, because R parses
 numbers like 1 as being floating point, not integer; the only way to
 get an integer value is to explicitly cast to it. Each of these types
 has a specific bit-pattern set aside for representing NA. And...
 that's it. It's very simple when it works, but also very limited.
I also suspected R to be less powerful in terms of types.
However, I think  the fact that It's very simple when it works is
important to take into account. At the end of the day, when using all
the fanciness it is not only about can I have some NAs in my array ?
but also how *easily* can I have some NAs in my array ?. It's about
balancing the how easy and the how powerful.

The easyness-of-use is the reason of my concern about having separate
types nafloatNN and floatNN. Of course, I won't argue that not
breaking everything is even more important !!

Coming back to Travis proposition bit-pattern approaches to missing
data (*at least* for float64 and int32) need to be implemented., I
wonder what is the amount of extra work to go from nafloat64 to
nafloat32/16 ? Is there an hardware support NaN payloads with these
smaller floats ? If not, or if it is too complicated, I feel it is
acceptable to say it's too complicated and fall back to mask. One may
have to choose between fancy types and fancy NAs...

Best,
Pierre



signature.asc
Description: OpenPGP digital signature
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] addition, multiplication of a polynomial and np.{float, int}

2012-03-07 Thread Pierre Haessig
Hi,
Le 06/03/2012 22:19, Charles R Harris a écrit :
 Use polynomial.Polynomial and you won't have this problem.
I was not familiar with the poly1d vs. Polynomial choice.

Now, I found in the doc some more or less explicit guidelines in:
http://docs.scipy.org/doc/numpy/reference/routines.polynomials.html
The polynomial package is newer and more complete than poly1d and the
convenience classes are better behaved in the numpy environment.

However, poly1d, which is nicely documented doesn't mention the
competitor.
Going further in this transition, do you feel it would make sense adding
a See Also section in poly1d function ?

Best,
Pierre



signature.asc
Description: OpenPGP digital signature
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fixing PyArray_Descr flags member size, ABI vs pickling issue

2012-03-07 Thread David Cournapeau
On Tue, Mar 6, 2012 at 1:44 PM, Robert Kern robert.k...@gmail.com wrote:

 On Tue, Mar 6, 2012 at 18:25, Travis Oliphant tra...@continuum.io wrote:
  Why do we want to return a single string char instead of an int?

 I suspect just to ensure that any provided value fits in the range
 0..255. But that's easily done explicitly.


That was not even the issue in the end, my initial analysis was wrong.

In any case, I have now a new PR that fixes both dtypes.flags value and
dtype hashing reported in #2017: https://github.com/numpy/numpy/pull/231

regards,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] addition, multiplication of a polynomial and np.{float, int}

2012-03-07 Thread Charles R Harris
On Wed, Mar 7, 2012 at 9:45 AM, Pierre Haessig pierre.haes...@crans.orgwrote:

 Hi,
 Le 06/03/2012 22:19, Charles R Harris a écrit :
  Use polynomial.Polynomial and you won't have this problem.
 I was not familiar with the poly1d vs. Polynomial choice.

 Now, I found in the doc some more or less explicit guidelines in:
 http://docs.scipy.org/doc/numpy/reference/routines.polynomials.html
 The polynomial package is newer and more complete than poly1d and the
 convenience classes are better behaved in the numpy environment.

 However, poly1d, which is nicely documented doesn't mention the
 competitor.
 Going further in this transition, do you feel it would make sense adding
 a See Also section in poly1d function ?


That's a good idea, I'll take care of it. Note the caveat about the
coefficients going in the opposite direction.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Missing data again

2012-03-07 Thread Nathaniel Smith
On Wed, Mar 7, 2012 at 4:35 PM, Pierre Haessig pierre.haes...@crans.org wrote:
 Hi,

 Thanks you very much for your lights !

 Le 06/03/2012 21:59, Nathaniel Smith a écrit :
 Right -- R has a very impoverished type system as compared to numpy.
 There's basically four types: numeric (meaning double precision
 float), integer, logical (boolean), and character (string). And
 in practice the integer type is essentially unused, because R parses
 numbers like 1 as being floating point, not integer; the only way to
 get an integer value is to explicitly cast to it. Each of these types
 has a specific bit-pattern set aside for representing NA. And...
 that's it. It's very simple when it works, but also very limited.
 I also suspected R to be less powerful in terms of types.
 However, I think  the fact that It's very simple when it works is
 important to take into account. At the end of the day, when using all
 the fanciness it is not only about can I have some NAs in my array ?
 but also how *easily* can I have some NAs in my array ?. It's about
 balancing the how easy and the how powerful.

 The easyness-of-use is the reason of my concern about having separate
 types nafloatNN and floatNN. Of course, I won't argue that not
 breaking everything is even more important !!

It's a good point, I just don't see how we can really tell what the
trade-offs are at this point. You should bring this up again once more
of the big picture stuff is hammered out.

 Coming back to Travis proposition bit-pattern approaches to missing
 data (*at least* for float64 and int32) need to be implemented., I
 wonder what is the amount of extra work to go from nafloat64 to
 nafloat32/16 ? Is there an hardware support NaN payloads with these
 smaller floats ? If not, or if it is too complicated, I feel it is
 acceptable to say it's too complicated and fall back to mask. One may
 have to choose between fancy types and fancy NAs...

All modern floating point formats can represent NaNs with payloads, so
in principle there's no difficulty in supporting NA the same way for
all of them. If you're using float16 because you want to offload
computation to a GPU then I would test carefully before trusting the
GPU to handle NaNs correctly, and there may need to be a bit of care
to make sure that casts between these types properly map NAs to NAs,
but generally it should be fine.

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Missing data again

2012-03-07 Thread Charles R Harris
On Wed, Mar 7, 2012 at 9:35 AM, Pierre Haessig pierre.haes...@crans.orgwrote:

 Hi,

 Thanks you very much for your lights !

 Le 06/03/2012 21:59, Nathaniel Smith a écrit :
  Right -- R has a very impoverished type system as compared to numpy.
  There's basically four types: numeric (meaning double precision
  float), integer, logical (boolean), and character (string). And
  in practice the integer type is essentially unused, because R parses
  numbers like 1 as being floating point, not integer; the only way to
  get an integer value is to explicitly cast to it. Each of these types
  has a specific bit-pattern set aside for representing NA. And...
  that's it. It's very simple when it works, but also very limited.
 I also suspected R to be less powerful in terms of types.
 However, I think  the fact that It's very simple when it works is
 important to take into account. At the end of the day, when using all
 the fanciness it is not only about can I have some NAs in my array ?
 but also how *easily* can I have some NAs in my array ?. It's about
 balancing the how easy and the how powerful.

 The easyness-of-use is the reason of my concern about having separate
 types nafloatNN and floatNN. Of course, I won't argue that not
 breaking everything is even more important !!

 Coming back to Travis proposition bit-pattern approaches to missing
 data (*at least* for float64 and int32) need to be implemented., I
 wonder what is the amount of extra work to go from nafloat64 to
 nafloat32/16 ? Is there an hardware support NaN payloads with these
 smaller floats ? If not, or if it is too complicated, I feel it is
 acceptable to say it's too complicated and fall back to mask. One may
 have to choose between fancy types and fancy NAs...


I'm in agreement here, and that was a major consideration in making a
'masked' implementation first. Also, different folks adopt different values
for 'missing' data, and distributing one or several masks along with the
data is another common practice.

One inconvenience I have run into with the current API is that is should be
easier to clear the mask from an ignored value without taking a new view
or assigning known data. So maybe two types of masks (different payloads),
or an additional flag could be helpful. The process of assigning masks
could also be made a bit easier than using fancy indexing.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] checking for c compiler during build

2012-03-07 Thread Skipper Seabold
Is there a way to use numpy.distuils to programmatically check for a C
compiler at build time in a platform independent way?
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Missing data again

2012-03-07 Thread Lluís
Charles R Harris writes:
[...]
 One inconvenience I have run into with the current API is that is should be
 easier to clear the mask from an ignored value without taking a new view or
 assigning known data.

AFAIR, the inability to directly access a mask attribute was intentional to
make bit-patterns and masks indistinguishable from the POV of the array user.

What's the workflow that leads you to un-ignore specific elements?


 So maybe two types of masks (different payloads), or an additional flag could
 be helpful.

Do you mean different NA values? If that's the case, I think it was taken into
account when implementing the current mechanisms (and was also mentioned in the
NEP), so that it could be supported by both bit-patterns and masks (as one of
the main design points was to make them indistinguishable in the common case).

I think the name was parametrized dtypes.


 The process of assigning masks could also be made a bit easier than using
 fancy indexing.

I don't get what you mean here, sorry.

Do you mean here that this is too cumbersome to use?

 a[a  5] = np.NA

(obviously oversimplified example where everything looks sufficiently simple :))




Lluis

-- 
 And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer.
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] [enhancement] sum_angle() and sum_polar()

2012-03-07 Thread Robert Jördens
Hi everyone,
I am proposing to add the the two following functions to
numpy/lib/twodim_base.py:

sum_angle() computes the sum of a 2-d array along an angled axis
sum_polar() computes the sum of a 2-d array along radial lines or
along azimuthal circles

https://github.com/numpy/numpy/pull/230

Comments?

When I was looking for a solution to these problems of calculating
special sums of 2-d arrays I could not find anything and it took me a
while to figure out a (hopefully) useful and consistent algorithm.
I can see how one would extend these to higher dimensions but that
would preclude using bincount() to do the heavy lifting.
Looking at some other functions, the doctests might need to be split
into real examples and unittests.

Best,

-- 
Robert Jordens.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Missing data again

2012-03-07 Thread Charles R Harris
On Wed, Mar 7, 2012 at 11:21 AM, Lluís xscr...@gmx.net wrote:

 Charles R Harris writes:
 [...]
  One inconvenience I have run into with the current API is that is should
 be
  easier to clear the mask from an ignored value without taking a new
 view or
  assigning known data.

 AFAIR, the inability to directly access a mask attribute was intentional
 to
 make bit-patterns and masks indistinguishable from the POV of the array
 user.

 What's the workflow that leads you to un-ignore specific elements?



Because they are not 'unknown', just (temporarily) 'ignored'. This might be
the case if you are experimenting with what happens if certain data is left
out of a fit. The current implementation tries to handle both these case,
and can do so, I would just like the 'ignored' use to be more convenient
than it is.


  So maybe two types of masks (different payloads), or an additional flag
 could
  be helpful.

 Do you mean different NA values? If that's the case, I think it was taken
 into
 account when implementing the current mechanisms (and was also mentioned
 in the
 NEP), so that it could be supported by both bit-patterns and masks (as one
 of
 the main design points was to make them indistinguishable in the common
 case).


No, the mask as currently implemented is eight bits and can be extended to
handle different mask values, aka, payloads.


 I think the name was parametrized dtypes.


They don't interest me in the least. But that is a whole different area of
discussion.



  The process of assigning masks could also be made a bit easier than using
  fancy indexing.

 I don't get what you mean here, sorry.


Suppose I receive a data set, say an hdf file, that also includes a mask.
I'd like to load the data and apply the mask directly without doing
something like

data[mask] = np.NA


Do you mean here that this is too cumbersome to use?

 a[a  5] = np.NA

 (obviously oversimplified example where everything looks sufficiently
 simple :))


Mostly speed and memory.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Github key audit.

2012-03-07 Thread Charles R Harris
Hi All,

Many here have probably received the message from github about push/pull
access being blocked until you have auditied your ssh keys. To generate a
key fingerprint on fedora, I did the following:

$charris@f16 ~$ ssh-keygen -l -f .ssh/id_dsa.pub

I don't how this looks for those of you using windows.

The compromise looks to be a 'whitehat' attack telling the rails developers
to FIX THE G*DD*MNED PROBLEM. More here, Github
compromizedhttp://lwn.net/Articles/485162/
.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] checking for c compiler during build

2012-03-07 Thread Skipper Seabold
On Wed, Mar 7, 2012 at 12:35 PM, Skipper Seabold jsseab...@gmail.com wrote:
 Is there a way to use numpy.distuils to programmatically check for a C
 compiler at build time in a platform independent way?

Wading through the numpy/distutils code some more. Would something as
simple as this work all the time? Seems to do the trick for me.

from distutils.dist import Distribution
from distutils.command.config import config as distutils_config
from distutils import log

dummy_c_text = r'''
void do_nothing(void);
int main(void) {
do_nothing();
return 0;
}
'''

def has_c_compiler():
c = distutils_config(Distribution())
try:
success = c.try_compile(dummy_c_text)
return True
except:
log.info(No C compiler detected of files.)
return False

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Missing data again

2012-03-07 Thread Nathaniel Smith
On Wed, Mar 7, 2012 at 5:17 PM, Charles R Harris
charlesr.har...@gmail.com wrote:
 On Wed, Mar 7, 2012 at 9:35 AM, Pierre Haessig pierre.haes...@crans.org
 Coming back to Travis proposition bit-pattern approaches to missing
 data (*at least* for float64 and int32) need to be implemented., I
 wonder what is the amount of extra work to go from nafloat64 to
 nafloat32/16 ? Is there an hardware support NaN payloads with these
 smaller floats ? If not, or if it is too complicated, I feel it is
 acceptable to say it's too complicated and fall back to mask. One may
 have to choose between fancy types and fancy NAs...

 I'm in agreement here, and that was a major consideration in making a
 'masked' implementation first.

When it comes to missing data, bitpatterns can do everything that
masks can do, are no more complicated to implement, and have better
performance characteristics.

 Also, different folks adopt different values
 for 'missing' data, and distributing one or several masks along with the
 data is another common practice.

True, but not really relevant to the current debate, because you have
to handle such issues as part of your general data import workflow
anyway, and none of these is any more complicated no matter which
implementations are available.

 One inconvenience I have run into with the current API is that is should be
 easier to clear the mask from an ignored value without taking a new view
 or assigning known data. So maybe two types of masks (different payloads),
 or an additional flag could be helpful. The process of assigning masks could
 also be made a bit easier than using fancy indexing.

So this, uh... this was actually the whole goal of the alterNEP
design for masks -- making all this stuff easy for people (like you,
apparently?) that want support for ignored values, separately from
missing data, and want a nice clean API for it. Basically having a
separate .mask attribute which was an ordinary, assignable array
broadcastable to the attached array's shape. Nobody seemed interested
in talking about it much then but maybe there's interest now?

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Missing data again

2012-03-07 Thread Charles R Harris
On Wed, Mar 7, 2012 at 12:26 PM, Nathaniel Smith n...@pobox.com wrote:

 On Wed, Mar 7, 2012 at 5:17 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:
  On Wed, Mar 7, 2012 at 9:35 AM, Pierre Haessig pierre.haes...@crans.org
 
  Coming back to Travis proposition bit-pattern approaches to missing
  data (*at least* for float64 and int32) need to be implemented., I
  wonder what is the amount of extra work to go from nafloat64 to
  nafloat32/16 ? Is there an hardware support NaN payloads with these
  smaller floats ? If not, or if it is too complicated, I feel it is
  acceptable to say it's too complicated and fall back to mask. One may
  have to choose between fancy types and fancy NAs...
 
  I'm in agreement here, and that was a major consideration in making a
  'masked' implementation first.

 When it comes to missing data, bitpatterns can do everything that
 masks can do, are no more complicated to implement, and have better
 performance characteristics.


Maybe for float, for other things, no. And we have lots of otherthings. The
performance is a strawman, and it *isn't* easier to implement.


  Also, different folks adopt different values
  for 'missing' data, and distributing one or several masks along with the
  data is another common practice.

 True, but not really relevant to the current debate, because you have
 to handle such issues as part of your general data import workflow
 anyway, and none of these is any more complicated no matter which
 implementations are available.

  One inconvenience I have run into with the current API is that is should
 be
  easier to clear the mask from an ignored value without taking a new
 view
  or assigning known data. So maybe two types of masks (different
 payloads),
  or an additional flag could be helpful. The process of assigning masks
 could
  also be made a bit easier than using fancy indexing.

 So this, uh... this was actually the whole goal of the alterNEP
 design for masks -- making all this stuff easy for people (like you,
 apparently?) that want support for ignored values, separately from
 missing data, and want a nice clean API for it. Basically having a
 separate .mask attribute which was an ordinary, assignable array
 broadcastable to the attached array's shape. Nobody seemed interested
 in talking about it much then but maybe there's interest now?


Come off it, Nathaniel, the problem is minor and fixable. The intent of the
initial implementation was to discover such things. These things are less
accessible with the current API *precisely* because of the feedback from R
users. It didn't start that way.

We now have something to evolve into what we want. That is a heck of a lot
more useful than endless discussion.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Missing data again

2012-03-07 Thread Benjamin Root
On Wed, Mar 7, 2012 at 1:26 PM, Nathaniel Smith n...@pobox.com wrote:

 On Wed, Mar 7, 2012 at 5:17 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:
  On Wed, Mar 7, 2012 at 9:35 AM, Pierre Haessig pierre.haes...@crans.org
 
  Coming back to Travis proposition bit-pattern approaches to missing
  data (*at least* for float64 and int32) need to be implemented., I
  wonder what is the amount of extra work to go from nafloat64 to
  nafloat32/16 ? Is there an hardware support NaN payloads with these
  smaller floats ? If not, or if it is too complicated, I feel it is
  acceptable to say it's too complicated and fall back to mask. One may
  have to choose between fancy types and fancy NAs...
 
  I'm in agreement here, and that was a major consideration in making a
  'masked' implementation first.

 When it comes to missing data, bitpatterns can do everything that
 masks can do, are no more complicated to implement, and have better
 performance characteristics.


Not true.  bitpatterns inherently destroys the data, while masks do not.
For matplotlib, we can not use bitpatterns because it could over-write user
data (or we have to copy the data).  I would imagine other extension
writers would have similar issues when they need to play around with input
data in a safe manner.

Also, I doubt that the performance characteristics for strings and integers
are the same as it is for masks.

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Missing data again

2012-03-07 Thread Matthew Brett
Hi,

On Wed, Mar 7, 2012 at 11:37 AM, Charles R Harris
charlesr.har...@gmail.com wrote:


 On Wed, Mar 7, 2012 at 12:26 PM, Nathaniel Smith n...@pobox.com wrote:

 On Wed, Mar 7, 2012 at 5:17 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:
  On Wed, Mar 7, 2012 at 9:35 AM, Pierre Haessig
  pierre.haes...@crans.org
  Coming back to Travis proposition bit-pattern approaches to missing
  data (*at least* for float64 and int32) need to be implemented., I
  wonder what is the amount of extra work to go from nafloat64 to
  nafloat32/16 ? Is there an hardware support NaN payloads with these
  smaller floats ? If not, or if it is too complicated, I feel it is
  acceptable to say it's too complicated and fall back to mask. One may
  have to choose between fancy types and fancy NAs...
 
  I'm in agreement here, and that was a major consideration in making a
  'masked' implementation first.

 When it comes to missing data, bitpatterns can do everything that
 masks can do, are no more complicated to implement, and have better
 performance characteristics.


 Maybe for float, for other things, no. And we have lots of otherthings. The
 performance is a strawman, and it *isn't* easier to implement.


  Also, different folks adopt different values
  for 'missing' data, and distributing one or several masks along with the
  data is another common practice.

 True, but not really relevant to the current debate, because you have
 to handle such issues as part of your general data import workflow
 anyway, and none of these is any more complicated no matter which
 implementations are available.

  One inconvenience I have run into with the current API is that is should
  be
  easier to clear the mask from an ignored value without taking a new
  view
  or assigning known data. So maybe two types of masks (different
  payloads),
  or an additional flag could be helpful. The process of assigning masks
  could
  also be made a bit easier than using fancy indexing.

 So this, uh... this was actually the whole goal of the alterNEP
 design for masks -- making all this stuff easy for people (like you,
 apparently?) that want support for ignored values, separately from
 missing data, and want a nice clean API for it. Basically having a
 separate .mask attribute which was an ordinary, assignable array
 broadcastable to the attached array's shape. Nobody seemed interested
 in talking about it much then but maybe there's interest now?


 Come off it, Nathaniel, the problem is minor and fixable. The intent of the
 initial implementation was to discover such things. These things are less
 accessible with the current API *precisely* because of the feedback from R
 users. It didn't start that way.

 We now have something to evolve into what we want. That is a heck of a lot
 more useful than endless discussion.

The endless discussion is for the following reason:

- The discussion was never adequately resolved.

The discussion was never adequately resolved because there was not
enough work done to understand the various arguments.   In particular,
you've several times said things that indicate to me, as to Nathaniel,
that you either have not read or have not understood the points that
Nathaniel was making.

Travis' recent email - to me - also indicates that there is still a
genuine problem here that has not been adequately explored.

There is no future in trying to stop discussion, and trying to do so
will only prolong it and make it less useful.  It will make the
discussion - endless.

If you want to help - read the alterNEP, respond to it directly, and
further the discussion by engaged debate.

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Missing data again

2012-03-07 Thread Eric Firing
On 03/07/2012 09:26 AM, Nathaniel Smith wrote:
 On Wed, Mar 7, 2012 at 5:17 PM, Charles R Harris
 charlesr.har...@gmail.com  wrote:
 On Wed, Mar 7, 2012 at 9:35 AM, Pierre Haessigpierre.haes...@crans.org
 Coming back to Travis proposition bit-pattern approaches to missing
 data (*at least* for float64 and int32) need to be implemented., I
 wonder what is the amount of extra work to go from nafloat64 to
 nafloat32/16 ? Is there an hardware support NaN payloads with these
 smaller floats ? If not, or if it is too complicated, I feel it is
 acceptable to say it's too complicated and fall back to mask. One may
 have to choose between fancy types and fancy NAs...

 I'm in agreement here, and that was a major consideration in making a
 'masked' implementation first.

 When it comes to missing data, bitpatterns can do everything that
 masks can do, are no more complicated to implement, and have better
 performance characteristics.

 Also, different folks adopt different values
 for 'missing' data, and distributing one or several masks along with the
 data is another common practice.

 True, but not really relevant to the current debate, because you have
 to handle such issues as part of your general data import workflow
 anyway, and none of these is any more complicated no matter which
 implementations are available.

 One inconvenience I have run into with the current API is that is should be
 easier to clear the mask from an ignored value without taking a new view
 or assigning known data. So maybe two types of masks (different payloads),
 or an additional flag could be helpful. The process of assigning masks could
 also be made a bit easier than using fancy indexing.

 So this, uh... this was actually the whole goal of the alterNEP
 design for masks -- making all this stuff easy for people (like you,
 apparently?) that want support for ignored values, separately from
 missing data, and want a nice clean API for it. Basically having a
 separate .mask attribute which was an ordinary, assignable array
 broadcastable to the attached array's shape. Nobody seemed interested
 in talking about it much then but maybe there's interest now?

In other words, good low-level support for numpy.ma functionality?  With 
a migration path so that a separate numpy.ma might wither away?  Yes, 
there is interest; this is exactly what I think is needed for my own 
style of applications (which I think are common at least in geoscience), 
and for matplotlib.  The question is how to achieve it as simply and 
cleanly as possible while also satisfying the needs of the R users, and 
while making it easy for matplotlib, for example, to handle *any* 
reasonable input: ma, other masking, nan, or NA-bitpattern.

It may be that a rather pragmatic approach to implementation will prove 
better than a highly idealized set of data models.  Or, it may be that a 
dual approach is best, in which the flag value missing data 
implementation is tightly bound to the R model and the mask 
implementation is explicitly designed for the numpy.ma model. In any 
case, a reasonable level of agreement on the goals is needed.  I presume 
Travis's involvement will facilitate a clarification of the goals and of 
the implementation; and I expect that much of Mark's work will end up 
serving well, even if much needs to be added and the API evolves 
considerably.

Eric


 -- Nathaniel
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] (2012) Accessing LAPACK and BLAS from the numpy C API

2012-03-07 Thread V. Armando Solé

On 06/03/2012 20:57, Sturla Molden wrote:

On 05.03.2012 14:26, V. Armando Solé wrote:


In 2009 there was a thread in this mailing list concerning the access to
BLAS from C extension modules.

If I have properly understood the thread:

http://mail.scipy.org/pipermail/numpy-discussion/2009-November/046567.html

the answer by then was that those functions were not exposed (only f2py
functions).

I just wanted to know if the situation has changed since 2009 because it
is not uncommon that to optimize some operations one has to sooner or
later access BLAS functions that are already wrapped in numpy (either
from ATLAS, from the Intel MKL, ...)

Why do you want to do this? It does not make your life easier to use
NumPy or SciPy's Python wrappers from C. Just use BLAS directly from C
instead.

Wow! It certainly makes my life much, much easier. I can compile and 
distribute my python extension *even without having ATLAS, BLAS or MKL 
installed*.
Please note I am not using the python wrappers from C. That would make 
no sense. I am using the underlying libraries supplied with python from C.


I had already used the information Robert Kern provided on the 2009 
thread and obtained the PyCObject as:


from scipy.linalg.blas import fblas
dgemm = fblas.dgemm._cpointer
sgemm = fblas.sgemm._cpointer

but I did not find a way to obtain those pointers from numpy. That was 
the goal of my post. My extension needs SciPy installed just to fetch 
the pointer. It would be very nice to have a way to get similar 
information from numpy.


I have made a test on a Debian machine with BLAS installed but no 
ATLAS- Extension slow but working.
Then the system maintainer has installed ATLAS - The extension flies. 
So, one can distribute a python extension that works on its own but that 
can take profit of any advanced library the end user might have installed.


Your point of view is valid if one is not going to distribute the 
extension module but I *have to* distribute the module for Linux and for 
windows. To have a proper fortran compiler for windows 64 bit compatible 
with python is already an issue. If I have to distribute my own ATLAS or 
MKL then it gets even worse. All those issues are solved just by using 
the pointer to the function.


Concerning licenses, if the end user has the right to use MKL, then he 
has the right to use it via my extension. It is not me who is using MKL


Armando
PS. The only issue I see with the whole approach is safety because the 
extension might be used to call some nasty function.



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] use for missing (ignored) data?

2012-03-07 Thread Neal Becker
I'm wondering what is the use for the ignored data feature?

I can use:

A[valid_A_indexes] = whatever

to process only the 'non-ignored' portions of A.  So at least some simple cases 
of ignored data are already supported without introducing a new type.

OTOH:

w = A[valid_A_indexes]

will copy A's data, and subsequent use of 

w[:] = something

will not update A.

Is this the reason for wanting the ignored data feature?

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] use for missing (ignored) data?

2012-03-07 Thread Charles R Harris
On Wed, Mar 7, 2012 at 1:05 PM, Neal Becker ndbeck...@gmail.com wrote:

 I'm wondering what is the use for the ignored data feature?

 I can use:

 A[valid_A_indexes] = whatever

 to process only the 'non-ignored' portions of A.  So at least some simple
 cases
 of ignored data are already supported without introducing a new type.

 OTOH:

 w = A[valid_A_indexes]

 will copy A's data, and subsequent use of

 w[:] = something

 will not update A.

 Is this the reason for wanting the ignored data feature?


Suppose you are working with plotted data and want to turn points on/off by
clicking on them interactively to see how that affects a fit. Why make
multiple copies, change sizes, destroy data, and all that nonsense? Just
have the click update the mask and redraw.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] use for missing (ignored) data?

2012-03-07 Thread Neal Becker
Charles R Harris wrote:

 On Wed, Mar 7, 2012 at 1:05 PM, Neal Becker ndbeck...@gmail.com wrote:
 
 I'm wondering what is the use for the ignored data feature?

 I can use:

 A[valid_A_indexes] = whatever

 to process only the 'non-ignored' portions of A.  So at least some simple
 cases
 of ignored data are already supported without introducing a new type.

 OTOH:

 w = A[valid_A_indexes]

 will copy A's data, and subsequent use of

 w[:] = something

 will not update A.

 Is this the reason for wanting the ignored data feature?

 
 Suppose you are working with plotted data and want to turn points on/off by
 clicking on them interactively to see how that affects a fit. Why make
 multiple copies, change sizes, destroy data, and all that nonsense? Just
 have the click update the mask and redraw.
 
 Chuck

But does

some_func (A[valid_data_mask])

actually perform a copy?

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Missing data again

2012-03-07 Thread Pierre Haessig
Hi,
Le 07/03/2012 20:57, Eric Firing a écrit :
 In other words, good low-level support for numpy.ma functionality?
Coming back to *existing* ma support, I was just wondering whether it
was now possible to np.save a masked array.
(I'm using numpy 1.5)
In the end, this is the most annoying problem I have with the existing
ma module which otherwise is pretty useful to me. I'm happy not to need
to process 100% of my data though.

Best,
Pierre



signature.asc
Description: OpenPGP digital signature
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Missing data again

2012-03-07 Thread Eric Firing
On 03/07/2012 11:15 AM, Pierre Haessig wrote:
 Hi,
 Le 07/03/2012 20:57, Eric Firing a écrit :
 In other words, good low-level support for numpy.ma functionality?
 Coming back to *existing* ma support, I was just wondering whether it
 was now possible to np.save a masked array.
 (I'm using numpy 1.5)

No, not with the mask preserved.  This is one of the improvements I am 
hoping for with the upcoming missing data work.

Eric

 In the end, this is the most annoying problem I have with the existing
 ma module which otherwise is pretty useful to me. I'm happy not to need
 to process 100% of my data though.

 Best,
 Pierre




 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] use for missing (ignored) data?

2012-03-07 Thread Benjamin Root
On Wednesday, March 7, 2012, Neal Becker ndbeck...@gmail.com wrote:
 Charles R Harris wrote:

 On Wed, Mar 7, 2012 at 1:05 PM, Neal Becker ndbeck...@gmail.com wrote:

 I'm wondering what is the use for the ignored data feature?

 I can use:

 A[valid_A_indexes] = whatever

 to process only the 'non-ignored' portions of A.  So at least some
simple
 cases
 of ignored data are already supported without introducing a new type.

 OTOH:

 w = A[valid_A_indexes]

 will copy A's data, and subsequent use of

 w[:] = something

 will not update A.

 Is this the reason for wanting the ignored data feature?


 Suppose you are working with plotted data and want to turn points on/off
by
 clicking on them interactively to see how that affects a fit. Why make
 multiple copies, change sizes, destroy data, and all that nonsense? Just
 have the click update the mask and redraw.

 Chuck

 But does

 some_func (A[valid_data_mask])

 actually perform a copy?



Yes! If it isn't sliced, or accessed by a scalar index, then you are given
a copy.  Fancy indexing and Boolean indexing will not return a view.

Note that assignments to a Boolean-indexed array by a scalar is
special-cased. I.e.,

A[valid_points] = 5

will do what you expect. But,

A[valid_points] += 5

may not, IIRC.

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] addition, multiplication of a polynomial and np.{float, int}

2012-03-07 Thread Pierre Haessig
Hi Charles,
Le 07/03/2012 18:00, Charles R Harris a écrit :

 That's a good idea, I'll take care of it. Note the caveat about the
 coefficients going in the opposite direction.
Great ! In the mean time I changed a bit the root polynomials reference
to emphasize the new Polynomial class.

http://docs.scipy.org/numpy/docs/numpy-docs/reference/routines.polynomials.rst/

I feel like that the preexisting text was more targeting people with a
preexisting knowledge of Poly1d than people looking for Polynomials in
NumPy in general. (Poly1d is unfortunately the number 1 Google result
for polynomials numpy...)

Please make sure I didn't write something completely stupid. Also, I
tried to include links using :doc: but the editor complains. I hope it
will work after a sphinx compilation. There is a build bot, isn't it ?
The aim of the first link is that the user-in-a-hurry ends up on the
best piece of documentation available, namely Using the Convenience
Classes after a short intro. Does this sounds good ?

Best,
Pierre



signature.asc
Description: OpenPGP digital signature
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Github key audit.

2012-03-07 Thread josef . pktd
On Wed, Mar 7, 2012 at 1:54 PM, Charles R Harris
charlesr.har...@gmail.com wrote:
 Hi All,

 Many here have probably received the message from github about push/pull
 access being blocked until you have auditied your ssh keys. To generate a
 key fingerprint on fedora, I did the following:

 $charris@f16 ~$ ssh-keygen -l -f .ssh/id_dsa.pub

 I don't how this looks for those of you using windows.

I didn't do much, just confirmed with github that my ssh key is really
mine, which git gui was nice enough to tell me.

Josef


 The compromise looks to be a 'whitehat' attack telling the rails developers
 to FIX THE G*DD*MNED PROBLEM. More here, Github compromized.

 Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Missing data again

2012-03-07 Thread Nathaniel Smith
On Wed, Mar 7, 2012 at 7:37 PM, Charles R Harris
charlesr.har...@gmail.com wrote:


 On Wed, Mar 7, 2012 at 12:26 PM, Nathaniel Smith n...@pobox.com wrote:
 When it comes to missing data, bitpatterns can do everything that
 masks can do, are no more complicated to implement, and have better
 performance characteristics.


 Maybe for float, for other things, no. And we have lots of otherthings.

It would be easier to discuss this if you'd, like, discuss :-(. If you
know of some advantage that masks have over bitpatterns when it comes
to missing data, can you please share it, instead of just asserting
it?

Not that I'm immune... I perhaps should have been more explicit
myself, when I said performance characteristics, let me clarify that
I was thinking of both speed (for floats) and memory (for
most-but-not-all things).

 The
 performance is a strawman,

How many users need to speak up to say that this is a serious problem
they have with the current implementation before you stop calling it a
strawman? Because when Wes says that it's not going to fly for his
stats/econometics cases, and the neuroimaging folk like Gary and Matt
say it's not going to fly for their use cases... surely just waving
that away is a bit dismissive?

I'm not saying that we *have* to implement bitpatterns because
performance is *the most important feature* -- I'm just saying, well,
what I said. For *missing data use* cases, bitpatterns have better
performance characteristics than masks. If we decide that these use
cases are important, then we should take this into account and weigh
it against other considerations. Maybe what you think is that these
use cases shouldn't be the focus of this feature and it should focus
on the ignored use cases instead? That would be a legitimate
argument... but if that's what you want to say, say it, don't just
dismiss your users!

 and it *isn't* easier to implement.

If I thought bitpatterns would be easier to implement, I would have
said so... What I said was that they're not harder. You have some
extra complexity, mostly in casting, and some reduced complexity -- no
need to allocate and manipulate the mask. (E.g., simple same-type
assignments and slicing require special casing for masks, but not for
bitpatterns.) In many places the complexity is identical -- printing
routines need to check for either special bitpatterns or masked
values, whatever. Ufunc loops need to either find the appropriate part
of the mask, or create a temporary mask buffer by calling a dtype
func, whatever. On net they seem about equivalent, complexity-wise.

...I assume you disagree with this analysis, since I've said it
before, wrote up a sketch for how the implementation would work at the
C level, etc., and you continue to claim that simplicity is a
compelling advantage for the masked approach. But I still don't know
why you think that :-(.

  Also, different folks adopt different values
  for 'missing' data, and distributing one or several masks along with the
  data is another common practice.

 True, but not really relevant to the current debate, because you have
 to handle such issues as part of your general data import workflow
 anyway, and none of these is any more complicated no matter which
 implementations are available.

  One inconvenience I have run into with the current API is that is should
  be
  easier to clear the mask from an ignored value without taking a new
  view
  or assigning known data. So maybe two types of masks (different
  payloads),
  or an additional flag could be helpful. The process of assigning masks
  could
  also be made a bit easier than using fancy indexing.

 So this, uh... this was actually the whole goal of the alterNEP
 design for masks -- making all this stuff easy for people (like you,
 apparently?) that want support for ignored values, separately from
 missing data, and want a nice clean API for it. Basically having a
 separate .mask attribute which was an ordinary, assignable array
 broadcastable to the attached array's shape. Nobody seemed interested
 in talking about it much then but maybe there's interest now?


 Come off it, Nathaniel, the problem is minor and fixable. The intent of the
 initial implementation was to discover such things.

Implementation can be wonderful, I absolutely agree. But you
understand that I'd be more impressed by this example if your
discovery weren't something I had been arguing for since before the
implementation began :-).

 These things are less
 accessible with the current API *precisely* because of the feedback from R
 users. It didn't start that way.

 We now have something to evolve into what we want. That is a heck of a lot
 more useful than endless discussion.

No, you are still missing the point completely! There is no what *we*
want, because what you want is different than what I want. The
masking stuff in the alterNEP was an attempt to give people like you
who wanted ignored support what they wanted, and the bitpattern
stuff was to 

Re: [Numpy-discussion] Missing data again

2012-03-07 Thread Nathaniel Smith
On Wed, Mar 7, 2012 at 7:39 PM, Benjamin Root ben.r...@ou.edu wrote:
 On Wed, Mar 7, 2012 at 1:26 PM, Nathaniel Smith n...@pobox.com wrote:
 When it comes to missing data, bitpatterns can do everything that
 masks can do, are no more complicated to implement, and have better
 performance characteristics.


 Not true.  bitpatterns inherently destroys the data, while masks do not.

Yes, that's why I only wrote that this is true for missing data, not
in general :-). If you have data that is being destroyed, then that's
not missing data, by definition. We don't have consensus yet on
whether that's the use case we are aiming for, but it's the one that
Pierre was worrying about.

 For matplotlib, we can not use bitpatterns because it could over-write user
 data (or we have to copy the data).  I would imagine other extension writers
 would have similar issues when they need to play around with input data in a
 safe manner.

Right. You clearly need some sort of masking, either an explicit mask
array that you keep somewhere, or one that gets attached to the
underlying ndarray in some non-destructive way.

 Also, I doubt that the performance characteristics for strings and integers
 are the same as it is for masks.

Not sure what you mean by this, but I'd be happy to hear more.

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] use for missing (ignored) data?

2012-03-07 Thread Nathaniel Smith
On Wed, Mar 7, 2012 at 8:05 PM, Neal Becker ndbeck...@gmail.com wrote:
 I'm wondering what is the use for the ignored data feature?

 I can use:

 A[valid_A_indexes] = whatever

 to process only the 'non-ignored' portions of A.  So at least some simple 
 cases
 of ignored data are already supported without introducing a new type.

 OTOH:

 w = A[valid_A_indexes]

 will copy A's data, and subsequent use of

 w[:] = something

 will not update A.

 Is this the reason for wanting the ignored data feature?

Hi Neal,

There are a few reasons that I know of why people want more support
from numpy for ignored data/masks, specifically (as opposed to missing
data or other related concepts):

1) If you're often working on some subset of your data, then it's
convenient to set the mask once and have it stay in effect for further
operations. Anything you can accomplish this way can also be
accomplished by keeping an explicit mask array and using it for
indexing by hand, but in some situations it may be more convenient
not to.

2) Operating on subsets of an array without making a copy. Like
Benjamin pointed out, indexing with a mask makes a copy. This is slow,
and what's worse, people who work with large data sets (e.g., big fMRI
volumes) may not have enough memory to afford such a copy. This
problem can be solved by using the new where= argument to ufuncs
(which skips the copy). (But then see (1) -- passing where= to a bunch
of functions takes more typing than just setting it once and leaving
it.)

3) Suppose there's a 3rd-party function that takes an array --
borrowing Charles example, say it's draw_points(arr). Now you want to
apply it to just a subset of your data, and want to avoid a copy. It
would be nice if the original author had made it draw_points(arr,
mask), but they didn't. Well, if you have masking built in to your
array type, then maybe you can call this as draw_points(masked_arr)
and it will Just Work. I.e., maybe people who aren't thinking about
masking will sometimes write code that accidentally works with masking
anyway. I'm not sure how much I'd trust this, but I guess it's nice
when it happens. And if it does work, then implementing the show/hide
point functionality will be easier. (And if it doesn't work, and
masking is built into numpy.ndarray, then maybe you can use this to
argue with the original author that this is a bug, not just a missing
feature. Again, I'm not sure if this is a good thing on net: one could
argue that people shouldn't be forced to think about masking every
time they write any function, just in case it becomes relevant later.
But certainly it'd be useful sometimes.)

There may be other motivations that I'm not aware of, of course.

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Casting rules changed in trunk?

2012-03-07 Thread Matthew Brett
Hi,

I noticed a casting change running the test suite on our image reader,
nibabel: 
https://github.com/nipy/nibabel/blob/master/nibabel/tests/test_casting.py

For this script:

pre
import numpy as np

Adata = np.zeros((2,), dtype=np.uint8)
Bdata = np.zeros((2,), dtype=np.int16)
Bzero = np.int16(0)
Bbig = np.int16(256)

print np.__version__
print 'Array add', (Adata + Bdata).dtype
print 'Scalar 0 add', (Adata + Bzero).dtype
print 'Scalar 256 add', (Adata + Bbig).dtype
/pre

1.4.1
Array add int16
Scalar 0 add uint8
Scalar 256 add uint8

1.5.1
Array add int16
Scalar 0 add uint8
Scalar 256 add uint8

1.6.1
Array add int16
Scalar 0 add uint8
Scalar 256 add int16

1.7.0.dev-aae5b0a
Array add int16
Scalar 0 add uint8
Scalar 256 add uint16

I can understand the uint8 outputs from numpy  1.6 - the rule being
not to upcast for scalars.

I can understand the int16 output from 1.6.1 on the basis that the
value is outside uint8 range and therefore we might prefer a type that
can handle values from both uint8 and int16.

Was the current change intended?  It has the following odd effect:

In [5]: Adata + np.int16(257)
Out[5]: array([257, 257], dtype=uint16)

In [7]: Adata + np.int16(-257)
Out[7]: array([-257, -257], dtype=int16)

In [8]: Adata - np.int16(257)
Out[8]: array([65279, 65279], dtype=uint16)

but I guess you can argue that there are odd effects for other choices too,

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] subclassing array in c

2012-03-07 Thread Christoph Gohle
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi, 

I have been struggeling for quite some time now. Desperate as I am, now I need 
help. 

I was trying to subclass ndarrays in a c extension (see code below) and do 
constantly get segfaults. I have been checking my INCREF and DECREF stuff up 
and down but can't find the error. Probably I got something completely wrong... 
anybody able to help?

Thanks,
Christoph
- -
#include Python.h
#include structmember.h

#include numpy/arrayobject.h

static PyObject *SpamError;

typedef struct {
  PyArrayObject base;
  PyDictObject* unit;
} UnitArrayObject;

PyTypeObject unitArrayObjectType;

static int
checkunit(PyObject *unit1, PyObject *unit2) {
  return PyObject_Compare(unit1, unit2);
}

static PyObject *
unitArray_new(PyTypeObject *cls, PyObject *args, PyObject *kwargs) {
PyObject *data = NULL,
 *unit = NULL;
  PyArray_Descr* dtype = NULL;
  PyObject *res = NULL, *tmp = NULL;

if (!PyArg_ParseTuple(args, OO|O, data, unit, 
PyArray_DescrConverter, dtype)) {
Py_XDECREF(dtype);
return NULL;
}
  
  res = PyArray_FromAny(data, dtype, 0, 0, NPY_ENSURECOPY, NULL);
if (res == NULL) {
Py_XDECREF(dtype);
//TODO: raise exception?
return NULL;
}

if (PyObject_IsInstance(data, (PyObject*)cls)) {
if (unit!=NULL  
!checkunit((PyObject*)((UnitArrayObject*)data)-unit,unit)) {
Py_XDECREF(res);
//TODO: raise exception
return NULL;
}
} else {
if (PyObject_IsTrue(unit)) {
  tmp = res;
res = PyArray_View((PyArrayObject*)res, NULL, 
unitArrayObjectType);
  if (tmp!=res) {
Py_XDECREF(tmp);
  }
  ((UnitArrayObject*)res)-unit = (PyDictObject*)unit;
  Py_INCREF(unit);
  if (unit!=NULL) {
  }
}
}
return res;
}

static PyObject*
unitArray__array_finalize__(PyObject* new, PyObject* args) {
PyObject *attr = NULL, *tmp = NULL;
  PyObject *parent = NULL;

  if (!PyArg_ParseTuple(args, O, parent)) {
return NULL;
  }
   if (parent!=NULL) {
 attr = PyObject_GetAttrString(parent, unit);
 if (attr == NULL) {
//parent has no 'unit' so we make a new empty one
   attr = PyDict_New();
   PyErr_Clear();
 }
   } 
  tmp = (PyObject*)((UnitArrayObject*)new)-unit;
((UnitArrayObject*)new)-unit = (PyDictObject*)attr;

  Py_INCREF(Py_None);
  return Py_None;
}

static PyObject* 
unitArray__array_wrap__(PyObject *self, PyObject *args) {
PyObject *array = NULL, *context = NULL;

if (!PyArg_ParseTuple(args, OO, array, context)) {
//TODO: raise exception
return NULL;
}

printf(%s,PyString_AsString(PyObject_Str(context)));

  Py_INCREF(array);
  return array;
}


static PyMethodDef unitArrayMethods[] = {
  {__array_finalize__, unitArray__array_finalize__, METH_VARARGS, array 
finalize method},
  {__array_wrap__, unitArray__array_wrap__, METH_VARARGS, array wrap 
method},
  {NULL, NULL, 0, NULL}
};

static PyMemberDef unitArrayMembers[] = {
  {unit, T_OBJECT, offsetof(UnitArrayObject, unit),  0, dictionary 
containing unit info.},
  {NULL, 0, 0, 0, NULL}
};

PyTypeObject unitArrayObjectType = {
PyObject_HEAD_INIT(NULL)
0,  /* ob_size*/
spam.UnitArray,   /* tp_name*/
sizeof(UnitArrayObject),/* tp_basicsize   */
0,  /* tp_itemsize*/
0,  /* tp_dealloc */
0,  /* tp_print   */
0,  /* tp_getattr */
0,  /* tp_setattr */
0,  /* tp_compare */
0,  /* tp_repr*/
0,  /* tp_as_number   */
0,  /* tp_as_sequence */
0,  /* tp_as_mapping  */
0,  /* tp_hash*/
0,  /* tp_call*/
0,  /* tp_str */
0,  /* tp_getattro*/
0,  /* tp_setattro*/
0,  /* tp_as_buffer   */
Py_TPFLAGS_DEFAULT, /* tp_flags   */
A numpy array with units, /* tp_doc */
  0,/* traverseproc */
  0,/* tp_clear*/
  0,/* tp_richcompare */
  0,/* tp_weaklistoffset */
  0,/* tp_iter */
  0,/* tp_iternext */
  unitArrayMethods,/* tp_methods */
  

Re: [Numpy-discussion] subclassing array in c

2012-03-07 Thread Val Kalatsky
Seeing the backtrace would be helpful.
Can you do whatever leads to the segfault
from python run from gdb?
Val

On Wed, Mar 7, 2012 at 7:04 PM, Christoph Gohle
christoph.go...@mpq.mpg.dewrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Hi,

 I have been struggeling for quite some time now. Desperate as I am, now I
 need help.

 I was trying to subclass ndarrays in a c extension (see code below) and do
 constantly get segfaults. I have been checking my INCREF and DECREF stuff
 up and down but can't find the error. Probably I got something completely
 wrong... anybody able to help?

 Thanks,
 Christoph
 - -
 #include Python.h
 #include structmember.h

 #include numpy/arrayobject.h

 static PyObject *SpamError;

 typedef struct {
  PyArrayObject base;
  PyDictObject* unit;
 } UnitArrayObject;

 PyTypeObject unitArrayObjectType;

 static int
 checkunit(PyObject *unit1, PyObject *unit2) {
  return PyObject_Compare(unit1, unit2);
 }

 static PyObject *
 unitArray_new(PyTypeObject *cls, PyObject *args, PyObject *kwargs) {
PyObject *data = NULL,
 *unit = NULL;
  PyArray_Descr* dtype = NULL;
  PyObject *res = NULL, *tmp = NULL;

if (!PyArg_ParseTuple(args, OO|O, data, unit,
 PyArray_DescrConverter, dtype)) {
Py_XDECREF(dtype);
return NULL;
}

  res = PyArray_FromAny(data, dtype, 0, 0, NPY_ENSURECOPY, NULL);
if (res == NULL) {
Py_XDECREF(dtype);
//TODO: raise exception?
return NULL;
}

if (PyObject_IsInstance(data, (PyObject*)cls)) {
if (unit!=NULL 
 !checkunit((PyObject*)((UnitArrayObject*)data)-unit,unit)) {
Py_XDECREF(res);
//TODO: raise exception
return NULL;
}
} else {
if (PyObject_IsTrue(unit)) {
  tmp = res;
res = PyArray_View((PyArrayObject*)res, NULL,
 unitArrayObjectType);
  if (tmp!=res) {
Py_XDECREF(tmp);
  }
  ((UnitArrayObject*)res)-unit = (PyDictObject*)unit;
  Py_INCREF(unit);
  if (unit!=NULL) {
  }
}
}
return res;
 }

 static PyObject*
 unitArray__array_finalize__(PyObject* new, PyObject* args) {
PyObject *attr = NULL, *tmp = NULL;
  PyObject *parent = NULL;

  if (!PyArg_ParseTuple(args, O, parent)) {
return NULL;
  }
   if (parent!=NULL) {
 attr = PyObject_GetAttrString(parent, unit);
 if (attr == NULL) {
//parent has no 'unit' so we make a new empty one
   attr = PyDict_New();
   PyErr_Clear();
 }
   }
  tmp = (PyObject*)((UnitArrayObject*)new)-unit;
((UnitArrayObject*)new)-unit = (PyDictObject*)attr;

  Py_INCREF(Py_None);
  return Py_None;
 }

 static PyObject*
 unitArray__array_wrap__(PyObject *self, PyObject *args) {
PyObject *array = NULL, *context = NULL;

if (!PyArg_ParseTuple(args, OO, array, context)) {
//TODO: raise exception
return NULL;
}

printf(%s,PyString_AsString(PyObject_Str(context)));

  Py_INCREF(array);
  return array;
 }


 static PyMethodDef unitArrayMethods[] = {
  {__array_finalize__, unitArray__array_finalize__, METH_VARARGS, array
 finalize method},
  {__array_wrap__, unitArray__array_wrap__, METH_VARARGS, array wrap
 method},
  {NULL, NULL, 0, NULL}
 };

 static PyMemberDef unitArrayMembers[] = {
  {unit, T_OBJECT, offsetof(UnitArrayObject, unit),  0, dictionary
 containing unit info.},
  {NULL, 0, 0, 0, NULL}
 };

 PyTypeObject unitArrayObjectType = {
PyObject_HEAD_INIT(NULL)
0,  /* ob_size*/
spam.UnitArray,   /* tp_name*/
sizeof(UnitArrayObject),/* tp_basicsize   */
0,  /* tp_itemsize*/
0,  /* tp_dealloc */
0,  /* tp_print   */
0,  /* tp_getattr */
0,  /* tp_setattr */
0,  /* tp_compare */
0,  /* tp_repr*/
0,  /* tp_as_number   */
0,  /* tp_as_sequence */
0,  /* tp_as_mapping  */
0,  /* tp_hash*/
0,  /* tp_call*/
0,  /* tp_str */
0,  /* tp_getattro*/
0,  /* tp_setattro*/
0,  /* tp_as_buffer   */
Py_TPFLAGS_DEFAULT, /* tp_flags   */
A numpy array with units, /* tp_doc */
  0,/* traverseproc */
  0,  

Re: [Numpy-discussion] subclassing array in c

2012-03-07 Thread Val Kalatsky
Tried it on my Ubuntu 10.10 box, no problem:

1) Saved as spampub.c
2) Compiled with (setup.py attached): python setup.py build_ext -i
3) Tested from ipython:
In [1]: import spampub
In [2]: ua=spampub.UnitArray([0,1,2,3.0],'liter')
In [3]: ua
Out[3]: UnitArray([ 0.,  1.,  2.,  3.])
In [4]: ua.unit
Out[4]: 'liter'


On Wed, Mar 7, 2012 at 7:15 PM, Val Kalatsky kalat...@gmail.com wrote:


 Seeing the backtrace would be helpful.
 Can you do whatever leads to the segfault
 from python run from gdb?
 Val


 On Wed, Mar 7, 2012 at 7:04 PM, Christoph Gohle 
 christoph.go...@mpq.mpg.de wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Hi,

 I have been struggeling for quite some time now. Desperate as I am, now I
 need help.

 I was trying to subclass ndarrays in a c extension (see code below) and
 do constantly get segfaults. I have been checking my INCREF and DECREF
 stuff up and down but can't find the error. Probably I got something
 completely wrong... anybody able to help?

 Thanks,
 Christoph
 - -
 #include Python.h
 #include structmember.h

 #include numpy/arrayobject.h

 static PyObject *SpamError;

 typedef struct {
  PyArrayObject base;
  PyDictObject* unit;
 } UnitArrayObject;

 PyTypeObject unitArrayObjectType;

 static int
 checkunit(PyObject *unit1, PyObject *unit2) {
  return PyObject_Compare(unit1, unit2);
 }

 static PyObject *
 unitArray_new(PyTypeObject *cls, PyObject *args, PyObject *kwargs) {
PyObject *data = NULL,
 *unit = NULL;
  PyArray_Descr* dtype = NULL;
  PyObject *res = NULL, *tmp = NULL;

if (!PyArg_ParseTuple(args, OO|O, data, unit,
 PyArray_DescrConverter, dtype)) {
Py_XDECREF(dtype);
return NULL;
}

  res = PyArray_FromAny(data, dtype, 0, 0, NPY_ENSURECOPY, NULL);
if (res == NULL) {
Py_XDECREF(dtype);
//TODO: raise exception?
return NULL;
}

if (PyObject_IsInstance(data, (PyObject*)cls)) {
if (unit!=NULL 
 !checkunit((PyObject*)((UnitArrayObject*)data)-unit,unit)) {
Py_XDECREF(res);
//TODO: raise exception
return NULL;
}
} else {
if (PyObject_IsTrue(unit)) {
  tmp = res;
res = PyArray_View((PyArrayObject*)res, NULL,
 unitArrayObjectType);
  if (tmp!=res) {
Py_XDECREF(tmp);
  }
  ((UnitArrayObject*)res)-unit = (PyDictObject*)unit;
  Py_INCREF(unit);
  if (unit!=NULL) {
  }
}
}
return res;
 }

 static PyObject*
 unitArray__array_finalize__(PyObject* new, PyObject* args) {
PyObject *attr = NULL, *tmp = NULL;
  PyObject *parent = NULL;

  if (!PyArg_ParseTuple(args, O, parent)) {
return NULL;
  }
   if (parent!=NULL) {
 attr = PyObject_GetAttrString(parent, unit);
 if (attr == NULL) {
//parent has no 'unit' so we make a new empty one
   attr = PyDict_New();
   PyErr_Clear();
 }
   }
  tmp = (PyObject*)((UnitArrayObject*)new)-unit;
((UnitArrayObject*)new)-unit = (PyDictObject*)attr;

  Py_INCREF(Py_None);
  return Py_None;
 }

 static PyObject*
 unitArray__array_wrap__(PyObject *self, PyObject *args) {
PyObject *array = NULL, *context = NULL;

if (!PyArg_ParseTuple(args, OO, array, context)) {
//TODO: raise exception
return NULL;
}

printf(%s,PyString_AsString(PyObject_Str(context)));

  Py_INCREF(array);
  return array;
 }


 static PyMethodDef unitArrayMethods[] = {
  {__array_finalize__, unitArray__array_finalize__, METH_VARARGS, array
 finalize method},
  {__array_wrap__, unitArray__array_wrap__, METH_VARARGS, array wrap
 method},
  {NULL, NULL, 0, NULL}
 };

 static PyMemberDef unitArrayMembers[] = {
  {unit, T_OBJECT, offsetof(UnitArrayObject, unit),  0, dictionary
 containing unit info.},
  {NULL, 0, 0, 0, NULL}
 };

 PyTypeObject unitArrayObjectType = {
PyObject_HEAD_INIT(NULL)
0,  /* ob_size*/
spam.UnitArray,   /* tp_name*/
sizeof(UnitArrayObject),/* tp_basicsize   */
0,  /* tp_itemsize*/
0,  /* tp_dealloc */
0,  /* tp_print   */
0,  /* tp_getattr */
0,  /* tp_setattr */
0,  /* tp_compare */
0,  /* tp_repr*/
0,  /* tp_as_number   */
0,  /* tp_as_sequence */
0,  /* tp_as_mapping  */
0,  /* tp_hash*/
0,  /* tp_call*/
  

Re: [Numpy-discussion] subclassing array in c

2012-03-07 Thread Christoph Gohlke
FWIW, this crashes on Windows with numpy 1.6.1 but not numpy 1.7-git 
debug build.

Christoph Gohlke


On 3/7/2012 5:36 PM, Val Kalatsky wrote:

 Tried it on my Ubuntu 10.10 box, no problem:

 1) Saved as spampub.c
 2) Compiled with (setup.py attached): python setup.py build_ext -i
 3) Tested from ipython:
 In [1]: import spampub
 In [2]: ua=spampub.UnitArray([0,1,2,3.0],'liter')
 In [3]: ua
 Out[3]: UnitArray([ 0.,  1.,  2.,  3.])
 In [4]: ua.unit
 Out[4]: 'liter'


 On Wed, Mar 7, 2012 at 7:15 PM, Val Kalatsky kalat...@gmail.com
 mailto:kalat...@gmail.com wrote:


 Seeing the backtrace would be helpful.
 Can you do whatever leads to the segfault
 from python run from gdb?
 Val


 On Wed, Mar 7, 2012 at 7:04 PM, Christoph Gohle
 christoph.go...@mpq.mpg.de mailto:christoph.go...@mpq.mpg.de wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Hi,

 I have been struggeling for quite some time now. Desperate as I
 am, now I need help.

 I was trying to subclass ndarrays in a c extension (see code
 below) and do constantly get segfaults. I have been checking my
 INCREF and DECREF stuff up and down but can't find the error.
 Probably I got something completely wrong... anybody able to help?

 Thanks,
 Christoph
 - -
 #include Python.h
 #include structmember.h

 #include numpy/arrayobject.h

 static PyObject *SpamError;

 typedef struct {
   PyArrayObject base;
   PyDictObject* unit;
 } UnitArrayObject;

 PyTypeObject unitArrayObjectType;

 static int
 checkunit(PyObject *unit1, PyObject *unit2) {
   return PyObject_Compare(unit1, unit2);
 }

 static PyObject *
 unitArray_new(PyTypeObject *cls, PyObject *args, PyObject *kwargs) {
 PyObject *data = NULL,
  *unit = NULL;
   PyArray_Descr* dtype = NULL;
   PyObject *res = NULL, *tmp = NULL;

 if (!PyArg_ParseTuple(args, OO|O, data, unit,
 PyArray_DescrConverter, dtype)) {
 Py_XDECREF(dtype);
 return NULL;
 }

   res = PyArray_FromAny(data, dtype, 0, 0, NPY_ENSURECOPY, NULL);
 if (res == NULL) {
 Py_XDECREF(dtype);
 //TODO: raise exception?
 return NULL;
 }

 if (PyObject_IsInstance(data, (PyObject*)cls)) {
 if (unit!=NULL 
 !checkunit((PyObject*)((UnitArrayObject*)data)-unit,unit)) {
 Py_XDECREF(res);
 //TODO: raise exception
 return NULL;
 }
 } else {
 if (PyObject_IsTrue(unit)) {
   tmp = res;
 res = PyArray_View((PyArrayObject*)res,
 NULL, unitArrayObjectType);
   if (tmp!=res) {
 Py_XDECREF(tmp);
   }
   ((UnitArrayObject*)res)-unit = (PyDictObject*)unit;
   Py_INCREF(unit);
   if (unit!=NULL) {
   }
 }
 }
 return res;
 }

 static PyObject*
 unitArray__array_finalize__(PyObject* new, PyObject* args) {
 PyObject *attr = NULL, *tmp = NULL;
   PyObject *parent = NULL;

   if (!PyArg_ParseTuple(args, O, parent)) {
 return NULL;
   }
if (parent!=NULL) {
  attr = PyObject_GetAttrString(parent, unit);
  if (attr == NULL) {
 //parent has no 'unit' so we make a new empty one
attr = PyDict_New();
PyErr_Clear();
  }
}
   tmp = (PyObject*)((UnitArrayObject*)new)-unit;
 ((UnitArrayObject*)new)-unit = (PyDictObject*)attr;

   Py_INCREF(Py_None);
   return Py_None;
 }

 static PyObject*
 unitArray__array_wrap__(PyObject *self, PyObject *args) {
 PyObject *array = NULL, *context = NULL;

 if (!PyArg_ParseTuple(args, OO, array, context)) {
 //TODO: raise exception
 return NULL;
 }

 printf(%s,PyString_AsString(PyObject_Str(context)));

   Py_INCREF(array);
   return array;
 }


 static PyMethodDef unitArrayMethods[] = {
   {__array_finalize__, unitArray__array_finalize__,
 METH_VARARGS, array finalize method},
   {__array_wrap__, unitArray__array_wrap__, METH_VARARGS,
 array wrap method},
   {NULL, NULL, 0, NULL}
 };

 static PyMemberDef 

Re: [Numpy-discussion] use for missing (ignored) data?

2012-03-07 Thread Benjamin Root
On Wednesday, March 7, 2012, Nathaniel Smith n...@pobox.com wrote:
 On Wed, Mar 7, 2012 at 8:05 PM, Neal Becker ndbeck...@gmail.com wrote:
 I'm wondering what is the use for the ignored data feature?

 I can use:

 A[valid_A_indexes] = whatever

 to process only the 'non-ignored' portions of A.  So at least some
simple cases
 of ignored data are already supported without introducing a new type.

 OTOH:

 w = A[valid_A_indexes]

 will copy A's data, and subsequent use of

 w[:] = something

 will not update A.

 Is this the reason for wanting the ignored data feature?

 Hi Neal,

 There are a few reasons that I know of why people want more support
 from numpy for ignored data/masks, specifically (as opposed to missing
 data or other related concepts):

 1) If you're often working on some subset of your data, then it's
 convenient to set the mask once and have it stay in effect for further
 operations. Anything you can accomplish this way can also be
 accomplished by keeping an explicit mask array and using it for
 indexing by hand, but in some situations it may be more convenient
 not to.

 2) Operating on subsets of an array without making a copy. Like
 Benjamin pointed out, indexing with a mask makes a copy. This is slow,
 and what's worse, people who work with large data sets (e.g., big fMRI
 volumes) may not have enough memory to afford such a copy. This
 problem can be solved by using the new where= argument to ufuncs
 (which skips the copy). (But then see (1) -- passing where= to a bunch
 of functions takes more typing than just setting it once and leaving
 it.)

 3) Suppose there's a 3rd-party function that takes an array --
 borrowing Charles example, say it's draw_points(arr). Now you want to
 apply it to just a subset of your data, and want to avoid a copy. It
 would be nice if the original author had made it draw_points(arr,
 mask), but they didn't. Well, if you have masking built in to your
 array type, then maybe you can call this as draw_points(masked_arr)
 and it will Just Work. I.e., maybe people who aren't thinking about
 masking will sometimes write code that accidentally works with masking
 anyway. I'm not sure how much I'd trust this, but I guess it's nice
 when it happens. And if it does work, then implementing the show/hide
 point functionality will be easier. (And if it doesn't work, and
 masking is built into numpy.ndarray, then maybe you can use this to
 argue with the original author that this is a bug, not just a missing
 feature. Again, I'm not sure if this is a good thing on net: one could
 argue that people shouldn't be forced to think about masking every
 time they write any function, just in case it becomes relevant later.
 But certainly it'd be useful sometimes.)

 There may be other motivations that I'm not aware of, of course.

 -- Nathaniel


I think you got most of the motivations right. I would say on the last
point that extension authors should be able to say does not support NA!.
 The important thing is that it makes it more up-front.

An additional motivation is with regards to mathematical operations.
 Personally, I hate getting bitten by a function that takes a max(), and I
have a NaN in the array.  In addition, what about adding two arrays
together that may or may not have different masks?  This has been the major
advantage of no.ma.  All of Mostly Works.

Cheers!
Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] subclassing array in c

2012-03-07 Thread Christoph Gohle
Dear Val,

I agree that more detail is needed. Sorry for that it was late yesterday.

 I am running Python 2.6.1, numpy development branch 
(numpy-2.0.0.dev_20101104-py2.6-macosx-10.6-universal.egg). maybe I should 
switch to release?

I compile with your setup.py using 'python setup.py build_ext -i' and run the 
commands below in ipython. As you can see, I don't get a crash for a single 
call to the constructor, consistent with your observation. And it looks like, 
one has to use dict in the unit to make it crash.

Can you make sense out of that?


In [1]: import spampub

In [4]: spampub.UnitArray(1,{'s':1})
Out[4]: UnitArray(1)

In [6]: a=[spampub.UnitArray(i,{'s':i}) for i in xrange(1000)]
Segmentation fault

backtrace is the following:

Exception Type:  EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x0021
Crashed Thread:  0  Dispatch queue: com.apple.main-thread

Thread 0 Crashed:  Dispatch queue: com.apple.main-thread
0   org.python.python   0x0001000b0b8e PyObject_GC_Track + 
1473
1   org.python.python   0x0001000b1255 _PyObject_GC_Malloc 
+ 191
2   org.python.python   0x0001000b12d0 _PyObject_GC_New + 21
3   org.python.python   0x00010003a856 PyDict_New + 116
4   org.python.python   0x00010003a99c _PyDict_NewPresized 
+ 24
5   org.python.python   0x0001000880cb PyEval_EvalFrameEx + 
11033
6   org.python.python   0x00010008acce PyEval_EvalCodeEx + 
1803
7   org.python.python   0x00010008735d PyEval_EvalFrameEx + 
7595
8   org.python.python   0x00010008acce PyEval_EvalCodeEx + 
1803
9   org.python.python   0x00010008935e PyEval_EvalFrameEx + 
15788
10  org.python.python   0x00010008acce PyEval_EvalCodeEx + 
1803
11  org.python.python   0x00010008935e PyEval_EvalFrameEx + 
15788
12  org.python.python   0x0001000892e1 PyEval_EvalFrameEx + 
15663
13  org.python.python   0x00010008acce PyEval_EvalCodeEx + 
1803
14  org.python.python   0x00010008935e PyEval_EvalFrameEx + 
15788
15  org.python.python   0x00010008acce PyEval_EvalCodeEx + 
1803
16  org.python.python   0x00010008935e PyEval_EvalFrameEx + 
15788
17  org.python.python   0x00010008acce PyEval_EvalCodeEx + 
1803
18  org.python.python   0x00010008935e PyEval_EvalFrameEx + 
15788
19  org.python.python   0x00010008acce PyEval_EvalCodeEx + 
1803
20  org.python.python   0x00010008935e PyEval_EvalFrameEx + 
15788
21  org.python.python   0x00010008acce PyEval_EvalCodeEx + 
1803
22  org.python.python   0x00010008ad61 PyEval_EvalCode + 54
23  org.python.python   0x0001000a265a Py_CompileString + 78
24  org.python.python   0x0001000a2723 PyRun_FileExFlags + 
150
25  org.python.python   0x0001000a423d 
PyRun_SimpleFileExFlags + 704
26  org.python.python   0x0001000b0286 Py_Main + 2718
27  org.python.python.app   0x00010e6c start + 52

Am 08.03.2012 um 02:36 schrieb Val Kalatsky:

 
 Tried it on my Ubuntu 10.10 box, no problem:
 
 1) Saved as spampub.c
 2) Compiled with (setup.py attached): python setup.py build_ext -i
 3) Tested from ipython:
 In [1]: import spampub
 In [2]: ua=spampub.UnitArray([0,1,2,3.0],'liter')
 In [3]: ua
 Out[3]: UnitArray([ 0.,  1.,  2.,  3.])
 In [4]: ua.unit
 Out[4]: 'liter'
 
 
 On Wed, Mar 7, 2012 at 7:15 PM, Val Kalatsky kalat...@gmail.com wrote:
 
 Seeing the backtrace would be helpful. 
 Can you do whatever leads to the segfault 
 from python run from gdb?
 Val
 
 
 On Wed, Mar 7, 2012 at 7:04 PM, Christoph Gohle christoph.go...@mpq.mpg.de 
 wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Hi,
 
 I have been struggeling for quite some time now. Desperate as I am, now I 
 need help.
 
 I was trying to subclass ndarrays in a c extension (see code below) and do 
 constantly get segfaults. I have been checking my INCREF and DECREF stuff up 
 and down but can't find the error. Probably I got something completely 
 wrong... anybody able to help?
 
 Thanks,
 Christoph
 - -
 #include Python.h
 #include structmember.h
 
 #include numpy/arrayobject.h
 
 static PyObject *SpamError;
 
 typedef struct {
  PyArrayObject base;
  PyDictObject* unit;
 } UnitArrayObject;
 
 PyTypeObject unitArrayObjectType;
 
 static int
 checkunit(PyObject *unit1, PyObject *unit2) {
  return PyObject_Compare(unit1, unit2);
 }
 
 static PyObject *
 unitArray_new(PyTypeObject *cls, PyObject *args, PyObject *kwargs) {
PyObject *data = NULL,
 *unit = NULL;
  PyArray_Descr* dtype = NULL;
  PyObject *res = NULL, *tmp = NULL;