Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers
On Tue, Mar 6, 2012 at 4:45 PM, Chris Barker chris.bar...@noaa.gov wrote: On Thu, Mar 1, 2012 at 10:58 PM, Jay Bourque jayv...@gmail.com wrote: 1. Loading text files using loadtxt/genfromtxt need a significant performance boost (I think at least an order of magnitude increase in performance is very doable based on what I've seen with Erin's recfile code) 2. Improved memory usage. Memory used for reading in a text file shouldn’t be more than the file itself, and less if only reading a subset of file. 3. Keep existing interfaces for reading text files (loadtxt, genfromtxt, etc). No new ones. 4. Underlying code should keep IO iteration and transformation of data separate (awaiting more thoughts from Travis on this). 5. Be able to plug in different transformations of data at low level (also awaiting more thoughts from Travis). 6. memory mapping of text files? 7. Eventually reduce memory usage even more by using same object for duplicate values in array (depends on implementing enum dtype?) Anything else? Yes -- I'd like to see the solution be able to do high -performance reads of a portion of a file -- not always the whole thing. I seem to have a number of custom text files that I need to read that are laid out in chunks: a bit of a header, then a block of number, another header, another block. I'm happy to read and parse the header sections with pure pyton, but would love a way to read the blocks of numbers into a numpy array fast. This will probably come out of the box with any of the proposed solutions, as long as they start at the current position of a passes-in fiel object, and can be told how much to read, then leave the file pointer in the correct position. If you are setup with Cython to build extension modules, and you don't mind testing an unreleased and experimental reader, you can try the text reader that I'm working on: https://github.com/WarrenWeckesser/textreader You can read a file like this, where the first line gives the number of rows of the following array, and that pattern repeats: 5 1.0, 2.0, 3.0 4.0, 5.0, 6.0 7.0, 8.0, 9.0 10.0, 11.0, 12.0 13.0, 14.0, 15.0 3 1.0, 1.5, 2.0, 2.5 3.0, 3.5, 4.0, 4.5 5.0, 5.5, 6.0, 6.5 1 1.0D2, 1.25D-1, 6.25D-2, 99 with code like this: import numpy as np from textreader import readrows filename = 'data/multi.dat' f = open(filename, 'r') line = f.readline() while len(line) 0: nrows = int(line) a = readrows(f, np.float32, numrows=nrows, sci='D', delimiter=',') print a: print a print line = f.readline() Warren ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Missing data again
Hi, Thanks you very much for your lights ! Le 06/03/2012 21:59, Nathaniel Smith a écrit : Right -- R has a very impoverished type system as compared to numpy. There's basically four types: numeric (meaning double precision float), integer, logical (boolean), and character (string). And in practice the integer type is essentially unused, because R parses numbers like 1 as being floating point, not integer; the only way to get an integer value is to explicitly cast to it. Each of these types has a specific bit-pattern set aside for representing NA. And... that's it. It's very simple when it works, but also very limited. I also suspected R to be less powerful in terms of types. However, I think the fact that It's very simple when it works is important to take into account. At the end of the day, when using all the fanciness it is not only about can I have some NAs in my array ? but also how *easily* can I have some NAs in my array ?. It's about balancing the how easy and the how powerful. The easyness-of-use is the reason of my concern about having separate types nafloatNN and floatNN. Of course, I won't argue that not breaking everything is even more important !! Coming back to Travis proposition bit-pattern approaches to missing data (*at least* for float64 and int32) need to be implemented., I wonder what is the amount of extra work to go from nafloat64 to nafloat32/16 ? Is there an hardware support NaN payloads with these smaller floats ? If not, or if it is too complicated, I feel it is acceptable to say it's too complicated and fall back to mask. One may have to choose between fancy types and fancy NAs... Best, Pierre signature.asc Description: OpenPGP digital signature ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] addition, multiplication of a polynomial and np.{float, int}
Hi, Le 06/03/2012 22:19, Charles R Harris a écrit : Use polynomial.Polynomial and you won't have this problem. I was not familiar with the poly1d vs. Polynomial choice. Now, I found in the doc some more or less explicit guidelines in: http://docs.scipy.org/doc/numpy/reference/routines.polynomials.html The polynomial package is newer and more complete than poly1d and the convenience classes are better behaved in the numpy environment. However, poly1d, which is nicely documented doesn't mention the competitor. Going further in this transition, do you feel it would make sense adding a See Also section in poly1d function ? Best, Pierre signature.asc Description: OpenPGP digital signature ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fixing PyArray_Descr flags member size, ABI vs pickling issue
On Tue, Mar 6, 2012 at 1:44 PM, Robert Kern robert.k...@gmail.com wrote: On Tue, Mar 6, 2012 at 18:25, Travis Oliphant tra...@continuum.io wrote: Why do we want to return a single string char instead of an int? I suspect just to ensure that any provided value fits in the range 0..255. But that's easily done explicitly. That was not even the issue in the end, my initial analysis was wrong. In any case, I have now a new PR that fixes both dtypes.flags value and dtype hashing reported in #2017: https://github.com/numpy/numpy/pull/231 regards, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] addition, multiplication of a polynomial and np.{float, int}
On Wed, Mar 7, 2012 at 9:45 AM, Pierre Haessig pierre.haes...@crans.orgwrote: Hi, Le 06/03/2012 22:19, Charles R Harris a écrit : Use polynomial.Polynomial and you won't have this problem. I was not familiar with the poly1d vs. Polynomial choice. Now, I found in the doc some more or less explicit guidelines in: http://docs.scipy.org/doc/numpy/reference/routines.polynomials.html The polynomial package is newer and more complete than poly1d and the convenience classes are better behaved in the numpy environment. However, poly1d, which is nicely documented doesn't mention the competitor. Going further in this transition, do you feel it would make sense adding a See Also section in poly1d function ? That's a good idea, I'll take care of it. Note the caveat about the coefficients going in the opposite direction. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Missing data again
On Wed, Mar 7, 2012 at 4:35 PM, Pierre Haessig pierre.haes...@crans.org wrote: Hi, Thanks you very much for your lights ! Le 06/03/2012 21:59, Nathaniel Smith a écrit : Right -- R has a very impoverished type system as compared to numpy. There's basically four types: numeric (meaning double precision float), integer, logical (boolean), and character (string). And in practice the integer type is essentially unused, because R parses numbers like 1 as being floating point, not integer; the only way to get an integer value is to explicitly cast to it. Each of these types has a specific bit-pattern set aside for representing NA. And... that's it. It's very simple when it works, but also very limited. I also suspected R to be less powerful in terms of types. However, I think the fact that It's very simple when it works is important to take into account. At the end of the day, when using all the fanciness it is not only about can I have some NAs in my array ? but also how *easily* can I have some NAs in my array ?. It's about balancing the how easy and the how powerful. The easyness-of-use is the reason of my concern about having separate types nafloatNN and floatNN. Of course, I won't argue that not breaking everything is even more important !! It's a good point, I just don't see how we can really tell what the trade-offs are at this point. You should bring this up again once more of the big picture stuff is hammered out. Coming back to Travis proposition bit-pattern approaches to missing data (*at least* for float64 and int32) need to be implemented., I wonder what is the amount of extra work to go from nafloat64 to nafloat32/16 ? Is there an hardware support NaN payloads with these smaller floats ? If not, or if it is too complicated, I feel it is acceptable to say it's too complicated and fall back to mask. One may have to choose between fancy types and fancy NAs... All modern floating point formats can represent NaNs with payloads, so in principle there's no difficulty in supporting NA the same way for all of them. If you're using float16 because you want to offload computation to a GPU then I would test carefully before trusting the GPU to handle NaNs correctly, and there may need to be a bit of care to make sure that casts between these types properly map NAs to NAs, but generally it should be fine. -- Nathaniel ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Missing data again
On Wed, Mar 7, 2012 at 9:35 AM, Pierre Haessig pierre.haes...@crans.orgwrote: Hi, Thanks you very much for your lights ! Le 06/03/2012 21:59, Nathaniel Smith a écrit : Right -- R has a very impoverished type system as compared to numpy. There's basically four types: numeric (meaning double precision float), integer, logical (boolean), and character (string). And in practice the integer type is essentially unused, because R parses numbers like 1 as being floating point, not integer; the only way to get an integer value is to explicitly cast to it. Each of these types has a specific bit-pattern set aside for representing NA. And... that's it. It's very simple when it works, but also very limited. I also suspected R to be less powerful in terms of types. However, I think the fact that It's very simple when it works is important to take into account. At the end of the day, when using all the fanciness it is not only about can I have some NAs in my array ? but also how *easily* can I have some NAs in my array ?. It's about balancing the how easy and the how powerful. The easyness-of-use is the reason of my concern about having separate types nafloatNN and floatNN. Of course, I won't argue that not breaking everything is even more important !! Coming back to Travis proposition bit-pattern approaches to missing data (*at least* for float64 and int32) need to be implemented., I wonder what is the amount of extra work to go from nafloat64 to nafloat32/16 ? Is there an hardware support NaN payloads with these smaller floats ? If not, or if it is too complicated, I feel it is acceptable to say it's too complicated and fall back to mask. One may have to choose between fancy types and fancy NAs... I'm in agreement here, and that was a major consideration in making a 'masked' implementation first. Also, different folks adopt different values for 'missing' data, and distributing one or several masks along with the data is another common practice. One inconvenience I have run into with the current API is that is should be easier to clear the mask from an ignored value without taking a new view or assigning known data. So maybe two types of masks (different payloads), or an additional flag could be helpful. The process of assigning masks could also be made a bit easier than using fancy indexing. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] checking for c compiler during build
Is there a way to use numpy.distuils to programmatically check for a C compiler at build time in a platform independent way? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Missing data again
Charles R Harris writes: [...] One inconvenience I have run into with the current API is that is should be easier to clear the mask from an ignored value without taking a new view or assigning known data. AFAIR, the inability to directly access a mask attribute was intentional to make bit-patterns and masks indistinguishable from the POV of the array user. What's the workflow that leads you to un-ignore specific elements? So maybe two types of masks (different payloads), or an additional flag could be helpful. Do you mean different NA values? If that's the case, I think it was taken into account when implementing the current mechanisms (and was also mentioned in the NEP), so that it could be supported by both bit-patterns and masks (as one of the main design points was to make them indistinguishable in the common case). I think the name was parametrized dtypes. The process of assigning masks could also be made a bit easier than using fancy indexing. I don't get what you mean here, sorry. Do you mean here that this is too cumbersome to use? a[a 5] = np.NA (obviously oversimplified example where everything looks sufficiently simple :)) Lluis -- And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer. -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] [enhancement] sum_angle() and sum_polar()
Hi everyone, I am proposing to add the the two following functions to numpy/lib/twodim_base.py: sum_angle() computes the sum of a 2-d array along an angled axis sum_polar() computes the sum of a 2-d array along radial lines or along azimuthal circles https://github.com/numpy/numpy/pull/230 Comments? When I was looking for a solution to these problems of calculating special sums of 2-d arrays I could not find anything and it took me a while to figure out a (hopefully) useful and consistent algorithm. I can see how one would extend these to higher dimensions but that would preclude using bincount() to do the heavy lifting. Looking at some other functions, the doctests might need to be split into real examples and unittests. Best, -- Robert Jordens. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Missing data again
On Wed, Mar 7, 2012 at 11:21 AM, Lluís xscr...@gmx.net wrote: Charles R Harris writes: [...] One inconvenience I have run into with the current API is that is should be easier to clear the mask from an ignored value without taking a new view or assigning known data. AFAIR, the inability to directly access a mask attribute was intentional to make bit-patterns and masks indistinguishable from the POV of the array user. What's the workflow that leads you to un-ignore specific elements? Because they are not 'unknown', just (temporarily) 'ignored'. This might be the case if you are experimenting with what happens if certain data is left out of a fit. The current implementation tries to handle both these case, and can do so, I would just like the 'ignored' use to be more convenient than it is. So maybe two types of masks (different payloads), or an additional flag could be helpful. Do you mean different NA values? If that's the case, I think it was taken into account when implementing the current mechanisms (and was also mentioned in the NEP), so that it could be supported by both bit-patterns and masks (as one of the main design points was to make them indistinguishable in the common case). No, the mask as currently implemented is eight bits and can be extended to handle different mask values, aka, payloads. I think the name was parametrized dtypes. They don't interest me in the least. But that is a whole different area of discussion. The process of assigning masks could also be made a bit easier than using fancy indexing. I don't get what you mean here, sorry. Suppose I receive a data set, say an hdf file, that also includes a mask. I'd like to load the data and apply the mask directly without doing something like data[mask] = np.NA Do you mean here that this is too cumbersome to use? a[a 5] = np.NA (obviously oversimplified example where everything looks sufficiently simple :)) Mostly speed and memory. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Github key audit.
Hi All, Many here have probably received the message from github about push/pull access being blocked until you have auditied your ssh keys. To generate a key fingerprint on fedora, I did the following: $charris@f16 ~$ ssh-keygen -l -f .ssh/id_dsa.pub I don't how this looks for those of you using windows. The compromise looks to be a 'whitehat' attack telling the rails developers to FIX THE G*DD*MNED PROBLEM. More here, Github compromizedhttp://lwn.net/Articles/485162/ . Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] checking for c compiler during build
On Wed, Mar 7, 2012 at 12:35 PM, Skipper Seabold jsseab...@gmail.com wrote: Is there a way to use numpy.distuils to programmatically check for a C compiler at build time in a platform independent way? Wading through the numpy/distutils code some more. Would something as simple as this work all the time? Seems to do the trick for me. from distutils.dist import Distribution from distutils.command.config import config as distutils_config from distutils import log dummy_c_text = r''' void do_nothing(void); int main(void) { do_nothing(); return 0; } ''' def has_c_compiler(): c = distutils_config(Distribution()) try: success = c.try_compile(dummy_c_text) return True except: log.info(No C compiler detected of files.) return False Skipper ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Missing data again
On Wed, Mar 7, 2012 at 5:17 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Wed, Mar 7, 2012 at 9:35 AM, Pierre Haessig pierre.haes...@crans.org Coming back to Travis proposition bit-pattern approaches to missing data (*at least* for float64 and int32) need to be implemented., I wonder what is the amount of extra work to go from nafloat64 to nafloat32/16 ? Is there an hardware support NaN payloads with these smaller floats ? If not, or if it is too complicated, I feel it is acceptable to say it's too complicated and fall back to mask. One may have to choose between fancy types and fancy NAs... I'm in agreement here, and that was a major consideration in making a 'masked' implementation first. When it comes to missing data, bitpatterns can do everything that masks can do, are no more complicated to implement, and have better performance characteristics. Also, different folks adopt different values for 'missing' data, and distributing one or several masks along with the data is another common practice. True, but not really relevant to the current debate, because you have to handle such issues as part of your general data import workflow anyway, and none of these is any more complicated no matter which implementations are available. One inconvenience I have run into with the current API is that is should be easier to clear the mask from an ignored value without taking a new view or assigning known data. So maybe two types of masks (different payloads), or an additional flag could be helpful. The process of assigning masks could also be made a bit easier than using fancy indexing. So this, uh... this was actually the whole goal of the alterNEP design for masks -- making all this stuff easy for people (like you, apparently?) that want support for ignored values, separately from missing data, and want a nice clean API for it. Basically having a separate .mask attribute which was an ordinary, assignable array broadcastable to the attached array's shape. Nobody seemed interested in talking about it much then but maybe there's interest now? -- Nathaniel ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Missing data again
On Wed, Mar 7, 2012 at 12:26 PM, Nathaniel Smith n...@pobox.com wrote: On Wed, Mar 7, 2012 at 5:17 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Wed, Mar 7, 2012 at 9:35 AM, Pierre Haessig pierre.haes...@crans.org Coming back to Travis proposition bit-pattern approaches to missing data (*at least* for float64 and int32) need to be implemented., I wonder what is the amount of extra work to go from nafloat64 to nafloat32/16 ? Is there an hardware support NaN payloads with these smaller floats ? If not, or if it is too complicated, I feel it is acceptable to say it's too complicated and fall back to mask. One may have to choose between fancy types and fancy NAs... I'm in agreement here, and that was a major consideration in making a 'masked' implementation first. When it comes to missing data, bitpatterns can do everything that masks can do, are no more complicated to implement, and have better performance characteristics. Maybe for float, for other things, no. And we have lots of otherthings. The performance is a strawman, and it *isn't* easier to implement. Also, different folks adopt different values for 'missing' data, and distributing one or several masks along with the data is another common practice. True, but not really relevant to the current debate, because you have to handle such issues as part of your general data import workflow anyway, and none of these is any more complicated no matter which implementations are available. One inconvenience I have run into with the current API is that is should be easier to clear the mask from an ignored value without taking a new view or assigning known data. So maybe two types of masks (different payloads), or an additional flag could be helpful. The process of assigning masks could also be made a bit easier than using fancy indexing. So this, uh... this was actually the whole goal of the alterNEP design for masks -- making all this stuff easy for people (like you, apparently?) that want support for ignored values, separately from missing data, and want a nice clean API for it. Basically having a separate .mask attribute which was an ordinary, assignable array broadcastable to the attached array's shape. Nobody seemed interested in talking about it much then but maybe there's interest now? Come off it, Nathaniel, the problem is minor and fixable. The intent of the initial implementation was to discover such things. These things are less accessible with the current API *precisely* because of the feedback from R users. It didn't start that way. We now have something to evolve into what we want. That is a heck of a lot more useful than endless discussion. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Missing data again
On Wed, Mar 7, 2012 at 1:26 PM, Nathaniel Smith n...@pobox.com wrote: On Wed, Mar 7, 2012 at 5:17 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Wed, Mar 7, 2012 at 9:35 AM, Pierre Haessig pierre.haes...@crans.org Coming back to Travis proposition bit-pattern approaches to missing data (*at least* for float64 and int32) need to be implemented., I wonder what is the amount of extra work to go from nafloat64 to nafloat32/16 ? Is there an hardware support NaN payloads with these smaller floats ? If not, or if it is too complicated, I feel it is acceptable to say it's too complicated and fall back to mask. One may have to choose between fancy types and fancy NAs... I'm in agreement here, and that was a major consideration in making a 'masked' implementation first. When it comes to missing data, bitpatterns can do everything that masks can do, are no more complicated to implement, and have better performance characteristics. Not true. bitpatterns inherently destroys the data, while masks do not. For matplotlib, we can not use bitpatterns because it could over-write user data (or we have to copy the data). I would imagine other extension writers would have similar issues when they need to play around with input data in a safe manner. Also, I doubt that the performance characteristics for strings and integers are the same as it is for masks. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Missing data again
Hi, On Wed, Mar 7, 2012 at 11:37 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Wed, Mar 7, 2012 at 12:26 PM, Nathaniel Smith n...@pobox.com wrote: On Wed, Mar 7, 2012 at 5:17 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Wed, Mar 7, 2012 at 9:35 AM, Pierre Haessig pierre.haes...@crans.org Coming back to Travis proposition bit-pattern approaches to missing data (*at least* for float64 and int32) need to be implemented., I wonder what is the amount of extra work to go from nafloat64 to nafloat32/16 ? Is there an hardware support NaN payloads with these smaller floats ? If not, or if it is too complicated, I feel it is acceptable to say it's too complicated and fall back to mask. One may have to choose between fancy types and fancy NAs... I'm in agreement here, and that was a major consideration in making a 'masked' implementation first. When it comes to missing data, bitpatterns can do everything that masks can do, are no more complicated to implement, and have better performance characteristics. Maybe for float, for other things, no. And we have lots of otherthings. The performance is a strawman, and it *isn't* easier to implement. Also, different folks adopt different values for 'missing' data, and distributing one or several masks along with the data is another common practice. True, but not really relevant to the current debate, because you have to handle such issues as part of your general data import workflow anyway, and none of these is any more complicated no matter which implementations are available. One inconvenience I have run into with the current API is that is should be easier to clear the mask from an ignored value without taking a new view or assigning known data. So maybe two types of masks (different payloads), or an additional flag could be helpful. The process of assigning masks could also be made a bit easier than using fancy indexing. So this, uh... this was actually the whole goal of the alterNEP design for masks -- making all this stuff easy for people (like you, apparently?) that want support for ignored values, separately from missing data, and want a nice clean API for it. Basically having a separate .mask attribute which was an ordinary, assignable array broadcastable to the attached array's shape. Nobody seemed interested in talking about it much then but maybe there's interest now? Come off it, Nathaniel, the problem is minor and fixable. The intent of the initial implementation was to discover such things. These things are less accessible with the current API *precisely* because of the feedback from R users. It didn't start that way. We now have something to evolve into what we want. That is a heck of a lot more useful than endless discussion. The endless discussion is for the following reason: - The discussion was never adequately resolved. The discussion was never adequately resolved because there was not enough work done to understand the various arguments. In particular, you've several times said things that indicate to me, as to Nathaniel, that you either have not read or have not understood the points that Nathaniel was making. Travis' recent email - to me - also indicates that there is still a genuine problem here that has not been adequately explored. There is no future in trying to stop discussion, and trying to do so will only prolong it and make it less useful. It will make the discussion - endless. If you want to help - read the alterNEP, respond to it directly, and further the discussion by engaged debate. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Missing data again
On 03/07/2012 09:26 AM, Nathaniel Smith wrote: On Wed, Mar 7, 2012 at 5:17 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Wed, Mar 7, 2012 at 9:35 AM, Pierre Haessigpierre.haes...@crans.org Coming back to Travis proposition bit-pattern approaches to missing data (*at least* for float64 and int32) need to be implemented., I wonder what is the amount of extra work to go from nafloat64 to nafloat32/16 ? Is there an hardware support NaN payloads with these smaller floats ? If not, or if it is too complicated, I feel it is acceptable to say it's too complicated and fall back to mask. One may have to choose between fancy types and fancy NAs... I'm in agreement here, and that was a major consideration in making a 'masked' implementation first. When it comes to missing data, bitpatterns can do everything that masks can do, are no more complicated to implement, and have better performance characteristics. Also, different folks adopt different values for 'missing' data, and distributing one or several masks along with the data is another common practice. True, but not really relevant to the current debate, because you have to handle such issues as part of your general data import workflow anyway, and none of these is any more complicated no matter which implementations are available. One inconvenience I have run into with the current API is that is should be easier to clear the mask from an ignored value without taking a new view or assigning known data. So maybe two types of masks (different payloads), or an additional flag could be helpful. The process of assigning masks could also be made a bit easier than using fancy indexing. So this, uh... this was actually the whole goal of the alterNEP design for masks -- making all this stuff easy for people (like you, apparently?) that want support for ignored values, separately from missing data, and want a nice clean API for it. Basically having a separate .mask attribute which was an ordinary, assignable array broadcastable to the attached array's shape. Nobody seemed interested in talking about it much then but maybe there's interest now? In other words, good low-level support for numpy.ma functionality? With a migration path so that a separate numpy.ma might wither away? Yes, there is interest; this is exactly what I think is needed for my own style of applications (which I think are common at least in geoscience), and for matplotlib. The question is how to achieve it as simply and cleanly as possible while also satisfying the needs of the R users, and while making it easy for matplotlib, for example, to handle *any* reasonable input: ma, other masking, nan, or NA-bitpattern. It may be that a rather pragmatic approach to implementation will prove better than a highly idealized set of data models. Or, it may be that a dual approach is best, in which the flag value missing data implementation is tightly bound to the R model and the mask implementation is explicitly designed for the numpy.ma model. In any case, a reasonable level of agreement on the goals is needed. I presume Travis's involvement will facilitate a clarification of the goals and of the implementation; and I expect that much of Mark's work will end up serving well, even if much needs to be added and the API evolves considerably. Eric -- Nathaniel ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] (2012) Accessing LAPACK and BLAS from the numpy C API
On 06/03/2012 20:57, Sturla Molden wrote: On 05.03.2012 14:26, V. Armando Solé wrote: In 2009 there was a thread in this mailing list concerning the access to BLAS from C extension modules. If I have properly understood the thread: http://mail.scipy.org/pipermail/numpy-discussion/2009-November/046567.html the answer by then was that those functions were not exposed (only f2py functions). I just wanted to know if the situation has changed since 2009 because it is not uncommon that to optimize some operations one has to sooner or later access BLAS functions that are already wrapped in numpy (either from ATLAS, from the Intel MKL, ...) Why do you want to do this? It does not make your life easier to use NumPy or SciPy's Python wrappers from C. Just use BLAS directly from C instead. Wow! It certainly makes my life much, much easier. I can compile and distribute my python extension *even without having ATLAS, BLAS or MKL installed*. Please note I am not using the python wrappers from C. That would make no sense. I am using the underlying libraries supplied with python from C. I had already used the information Robert Kern provided on the 2009 thread and obtained the PyCObject as: from scipy.linalg.blas import fblas dgemm = fblas.dgemm._cpointer sgemm = fblas.sgemm._cpointer but I did not find a way to obtain those pointers from numpy. That was the goal of my post. My extension needs SciPy installed just to fetch the pointer. It would be very nice to have a way to get similar information from numpy. I have made a test on a Debian machine with BLAS installed but no ATLAS- Extension slow but working. Then the system maintainer has installed ATLAS - The extension flies. So, one can distribute a python extension that works on its own but that can take profit of any advanced library the end user might have installed. Your point of view is valid if one is not going to distribute the extension module but I *have to* distribute the module for Linux and for windows. To have a proper fortran compiler for windows 64 bit compatible with python is already an issue. If I have to distribute my own ATLAS or MKL then it gets even worse. All those issues are solved just by using the pointer to the function. Concerning licenses, if the end user has the right to use MKL, then he has the right to use it via my extension. It is not me who is using MKL Armando PS. The only issue I see with the whole approach is safety because the extension might be used to call some nasty function. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] use for missing (ignored) data?
I'm wondering what is the use for the ignored data feature? I can use: A[valid_A_indexes] = whatever to process only the 'non-ignored' portions of A. So at least some simple cases of ignored data are already supported without introducing a new type. OTOH: w = A[valid_A_indexes] will copy A's data, and subsequent use of w[:] = something will not update A. Is this the reason for wanting the ignored data feature? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] use for missing (ignored) data?
On Wed, Mar 7, 2012 at 1:05 PM, Neal Becker ndbeck...@gmail.com wrote: I'm wondering what is the use for the ignored data feature? I can use: A[valid_A_indexes] = whatever to process only the 'non-ignored' portions of A. So at least some simple cases of ignored data are already supported without introducing a new type. OTOH: w = A[valid_A_indexes] will copy A's data, and subsequent use of w[:] = something will not update A. Is this the reason for wanting the ignored data feature? Suppose you are working with plotted data and want to turn points on/off by clicking on them interactively to see how that affects a fit. Why make multiple copies, change sizes, destroy data, and all that nonsense? Just have the click update the mask and redraw. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] use for missing (ignored) data?
Charles R Harris wrote: On Wed, Mar 7, 2012 at 1:05 PM, Neal Becker ndbeck...@gmail.com wrote: I'm wondering what is the use for the ignored data feature? I can use: A[valid_A_indexes] = whatever to process only the 'non-ignored' portions of A. So at least some simple cases of ignored data are already supported without introducing a new type. OTOH: w = A[valid_A_indexes] will copy A's data, and subsequent use of w[:] = something will not update A. Is this the reason for wanting the ignored data feature? Suppose you are working with plotted data and want to turn points on/off by clicking on them interactively to see how that affects a fit. Why make multiple copies, change sizes, destroy data, and all that nonsense? Just have the click update the mask and redraw. Chuck But does some_func (A[valid_data_mask]) actually perform a copy? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Missing data again
Hi, Le 07/03/2012 20:57, Eric Firing a écrit : In other words, good low-level support for numpy.ma functionality? Coming back to *existing* ma support, I was just wondering whether it was now possible to np.save a masked array. (I'm using numpy 1.5) In the end, this is the most annoying problem I have with the existing ma module which otherwise is pretty useful to me. I'm happy not to need to process 100% of my data though. Best, Pierre signature.asc Description: OpenPGP digital signature ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Missing data again
On 03/07/2012 11:15 AM, Pierre Haessig wrote: Hi, Le 07/03/2012 20:57, Eric Firing a écrit : In other words, good low-level support for numpy.ma functionality? Coming back to *existing* ma support, I was just wondering whether it was now possible to np.save a masked array. (I'm using numpy 1.5) No, not with the mask preserved. This is one of the improvements I am hoping for with the upcoming missing data work. Eric In the end, this is the most annoying problem I have with the existing ma module which otherwise is pretty useful to me. I'm happy not to need to process 100% of my data though. Best, Pierre ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] use for missing (ignored) data?
On Wednesday, March 7, 2012, Neal Becker ndbeck...@gmail.com wrote: Charles R Harris wrote: On Wed, Mar 7, 2012 at 1:05 PM, Neal Becker ndbeck...@gmail.com wrote: I'm wondering what is the use for the ignored data feature? I can use: A[valid_A_indexes] = whatever to process only the 'non-ignored' portions of A. So at least some simple cases of ignored data are already supported without introducing a new type. OTOH: w = A[valid_A_indexes] will copy A's data, and subsequent use of w[:] = something will not update A. Is this the reason for wanting the ignored data feature? Suppose you are working with plotted data and want to turn points on/off by clicking on them interactively to see how that affects a fit. Why make multiple copies, change sizes, destroy data, and all that nonsense? Just have the click update the mask and redraw. Chuck But does some_func (A[valid_data_mask]) actually perform a copy? Yes! If it isn't sliced, or accessed by a scalar index, then you are given a copy. Fancy indexing and Boolean indexing will not return a view. Note that assignments to a Boolean-indexed array by a scalar is special-cased. I.e., A[valid_points] = 5 will do what you expect. But, A[valid_points] += 5 may not, IIRC. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] addition, multiplication of a polynomial and np.{float, int}
Hi Charles, Le 07/03/2012 18:00, Charles R Harris a écrit : That's a good idea, I'll take care of it. Note the caveat about the coefficients going in the opposite direction. Great ! In the mean time I changed a bit the root polynomials reference to emphasize the new Polynomial class. http://docs.scipy.org/numpy/docs/numpy-docs/reference/routines.polynomials.rst/ I feel like that the preexisting text was more targeting people with a preexisting knowledge of Poly1d than people looking for Polynomials in NumPy in general. (Poly1d is unfortunately the number 1 Google result for polynomials numpy...) Please make sure I didn't write something completely stupid. Also, I tried to include links using :doc: but the editor complains. I hope it will work after a sphinx compilation. There is a build bot, isn't it ? The aim of the first link is that the user-in-a-hurry ends up on the best piece of documentation available, namely Using the Convenience Classes after a short intro. Does this sounds good ? Best, Pierre signature.asc Description: OpenPGP digital signature ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Github key audit.
On Wed, Mar 7, 2012 at 1:54 PM, Charles R Harris charlesr.har...@gmail.com wrote: Hi All, Many here have probably received the message from github about push/pull access being blocked until you have auditied your ssh keys. To generate a key fingerprint on fedora, I did the following: $charris@f16 ~$ ssh-keygen -l -f .ssh/id_dsa.pub I don't how this looks for those of you using windows. I didn't do much, just confirmed with github that my ssh key is really mine, which git gui was nice enough to tell me. Josef The compromise looks to be a 'whitehat' attack telling the rails developers to FIX THE G*DD*MNED PROBLEM. More here, Github compromized. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Missing data again
On Wed, Mar 7, 2012 at 7:37 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Wed, Mar 7, 2012 at 12:26 PM, Nathaniel Smith n...@pobox.com wrote: When it comes to missing data, bitpatterns can do everything that masks can do, are no more complicated to implement, and have better performance characteristics. Maybe for float, for other things, no. And we have lots of otherthings. It would be easier to discuss this if you'd, like, discuss :-(. If you know of some advantage that masks have over bitpatterns when it comes to missing data, can you please share it, instead of just asserting it? Not that I'm immune... I perhaps should have been more explicit myself, when I said performance characteristics, let me clarify that I was thinking of both speed (for floats) and memory (for most-but-not-all things). The performance is a strawman, How many users need to speak up to say that this is a serious problem they have with the current implementation before you stop calling it a strawman? Because when Wes says that it's not going to fly for his stats/econometics cases, and the neuroimaging folk like Gary and Matt say it's not going to fly for their use cases... surely just waving that away is a bit dismissive? I'm not saying that we *have* to implement bitpatterns because performance is *the most important feature* -- I'm just saying, well, what I said. For *missing data use* cases, bitpatterns have better performance characteristics than masks. If we decide that these use cases are important, then we should take this into account and weigh it against other considerations. Maybe what you think is that these use cases shouldn't be the focus of this feature and it should focus on the ignored use cases instead? That would be a legitimate argument... but if that's what you want to say, say it, don't just dismiss your users! and it *isn't* easier to implement. If I thought bitpatterns would be easier to implement, I would have said so... What I said was that they're not harder. You have some extra complexity, mostly in casting, and some reduced complexity -- no need to allocate and manipulate the mask. (E.g., simple same-type assignments and slicing require special casing for masks, but not for bitpatterns.) In many places the complexity is identical -- printing routines need to check for either special bitpatterns or masked values, whatever. Ufunc loops need to either find the appropriate part of the mask, or create a temporary mask buffer by calling a dtype func, whatever. On net they seem about equivalent, complexity-wise. ...I assume you disagree with this analysis, since I've said it before, wrote up a sketch for how the implementation would work at the C level, etc., and you continue to claim that simplicity is a compelling advantage for the masked approach. But I still don't know why you think that :-(. Also, different folks adopt different values for 'missing' data, and distributing one or several masks along with the data is another common practice. True, but not really relevant to the current debate, because you have to handle such issues as part of your general data import workflow anyway, and none of these is any more complicated no matter which implementations are available. One inconvenience I have run into with the current API is that is should be easier to clear the mask from an ignored value without taking a new view or assigning known data. So maybe two types of masks (different payloads), or an additional flag could be helpful. The process of assigning masks could also be made a bit easier than using fancy indexing. So this, uh... this was actually the whole goal of the alterNEP design for masks -- making all this stuff easy for people (like you, apparently?) that want support for ignored values, separately from missing data, and want a nice clean API for it. Basically having a separate .mask attribute which was an ordinary, assignable array broadcastable to the attached array's shape. Nobody seemed interested in talking about it much then but maybe there's interest now? Come off it, Nathaniel, the problem is minor and fixable. The intent of the initial implementation was to discover such things. Implementation can be wonderful, I absolutely agree. But you understand that I'd be more impressed by this example if your discovery weren't something I had been arguing for since before the implementation began :-). These things are less accessible with the current API *precisely* because of the feedback from R users. It didn't start that way. We now have something to evolve into what we want. That is a heck of a lot more useful than endless discussion. No, you are still missing the point completely! There is no what *we* want, because what you want is different than what I want. The masking stuff in the alterNEP was an attempt to give people like you who wanted ignored support what they wanted, and the bitpattern stuff was to
Re: [Numpy-discussion] Missing data again
On Wed, Mar 7, 2012 at 7:39 PM, Benjamin Root ben.r...@ou.edu wrote: On Wed, Mar 7, 2012 at 1:26 PM, Nathaniel Smith n...@pobox.com wrote: When it comes to missing data, bitpatterns can do everything that masks can do, are no more complicated to implement, and have better performance characteristics. Not true. bitpatterns inherently destroys the data, while masks do not. Yes, that's why I only wrote that this is true for missing data, not in general :-). If you have data that is being destroyed, then that's not missing data, by definition. We don't have consensus yet on whether that's the use case we are aiming for, but it's the one that Pierre was worrying about. For matplotlib, we can not use bitpatterns because it could over-write user data (or we have to copy the data). I would imagine other extension writers would have similar issues when they need to play around with input data in a safe manner. Right. You clearly need some sort of masking, either an explicit mask array that you keep somewhere, or one that gets attached to the underlying ndarray in some non-destructive way. Also, I doubt that the performance characteristics for strings and integers are the same as it is for masks. Not sure what you mean by this, but I'd be happy to hear more. -- Nathaniel ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] use for missing (ignored) data?
On Wed, Mar 7, 2012 at 8:05 PM, Neal Becker ndbeck...@gmail.com wrote: I'm wondering what is the use for the ignored data feature? I can use: A[valid_A_indexes] = whatever to process only the 'non-ignored' portions of A. So at least some simple cases of ignored data are already supported without introducing a new type. OTOH: w = A[valid_A_indexes] will copy A's data, and subsequent use of w[:] = something will not update A. Is this the reason for wanting the ignored data feature? Hi Neal, There are a few reasons that I know of why people want more support from numpy for ignored data/masks, specifically (as opposed to missing data or other related concepts): 1) If you're often working on some subset of your data, then it's convenient to set the mask once and have it stay in effect for further operations. Anything you can accomplish this way can also be accomplished by keeping an explicit mask array and using it for indexing by hand, but in some situations it may be more convenient not to. 2) Operating on subsets of an array without making a copy. Like Benjamin pointed out, indexing with a mask makes a copy. This is slow, and what's worse, people who work with large data sets (e.g., big fMRI volumes) may not have enough memory to afford such a copy. This problem can be solved by using the new where= argument to ufuncs (which skips the copy). (But then see (1) -- passing where= to a bunch of functions takes more typing than just setting it once and leaving it.) 3) Suppose there's a 3rd-party function that takes an array -- borrowing Charles example, say it's draw_points(arr). Now you want to apply it to just a subset of your data, and want to avoid a copy. It would be nice if the original author had made it draw_points(arr, mask), but they didn't. Well, if you have masking built in to your array type, then maybe you can call this as draw_points(masked_arr) and it will Just Work. I.e., maybe people who aren't thinking about masking will sometimes write code that accidentally works with masking anyway. I'm not sure how much I'd trust this, but I guess it's nice when it happens. And if it does work, then implementing the show/hide point functionality will be easier. (And if it doesn't work, and masking is built into numpy.ndarray, then maybe you can use this to argue with the original author that this is a bug, not just a missing feature. Again, I'm not sure if this is a good thing on net: one could argue that people shouldn't be forced to think about masking every time they write any function, just in case it becomes relevant later. But certainly it'd be useful sometimes.) There may be other motivations that I'm not aware of, of course. -- Nathaniel ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Casting rules changed in trunk?
Hi, I noticed a casting change running the test suite on our image reader, nibabel: https://github.com/nipy/nibabel/blob/master/nibabel/tests/test_casting.py For this script: pre import numpy as np Adata = np.zeros((2,), dtype=np.uint8) Bdata = np.zeros((2,), dtype=np.int16) Bzero = np.int16(0) Bbig = np.int16(256) print np.__version__ print 'Array add', (Adata + Bdata).dtype print 'Scalar 0 add', (Adata + Bzero).dtype print 'Scalar 256 add', (Adata + Bbig).dtype /pre 1.4.1 Array add int16 Scalar 0 add uint8 Scalar 256 add uint8 1.5.1 Array add int16 Scalar 0 add uint8 Scalar 256 add uint8 1.6.1 Array add int16 Scalar 0 add uint8 Scalar 256 add int16 1.7.0.dev-aae5b0a Array add int16 Scalar 0 add uint8 Scalar 256 add uint16 I can understand the uint8 outputs from numpy 1.6 - the rule being not to upcast for scalars. I can understand the int16 output from 1.6.1 on the basis that the value is outside uint8 range and therefore we might prefer a type that can handle values from both uint8 and int16. Was the current change intended? It has the following odd effect: In [5]: Adata + np.int16(257) Out[5]: array([257, 257], dtype=uint16) In [7]: Adata + np.int16(-257) Out[7]: array([-257, -257], dtype=int16) In [8]: Adata - np.int16(257) Out[8]: array([65279, 65279], dtype=uint16) but I guess you can argue that there are odd effects for other choices too, Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] subclassing array in c
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, I have been struggeling for quite some time now. Desperate as I am, now I need help. I was trying to subclass ndarrays in a c extension (see code below) and do constantly get segfaults. I have been checking my INCREF and DECREF stuff up and down but can't find the error. Probably I got something completely wrong... anybody able to help? Thanks, Christoph - - #include Python.h #include structmember.h #include numpy/arrayobject.h static PyObject *SpamError; typedef struct { PyArrayObject base; PyDictObject* unit; } UnitArrayObject; PyTypeObject unitArrayObjectType; static int checkunit(PyObject *unit1, PyObject *unit2) { return PyObject_Compare(unit1, unit2); } static PyObject * unitArray_new(PyTypeObject *cls, PyObject *args, PyObject *kwargs) { PyObject *data = NULL, *unit = NULL; PyArray_Descr* dtype = NULL; PyObject *res = NULL, *tmp = NULL; if (!PyArg_ParseTuple(args, OO|O, data, unit, PyArray_DescrConverter, dtype)) { Py_XDECREF(dtype); return NULL; } res = PyArray_FromAny(data, dtype, 0, 0, NPY_ENSURECOPY, NULL); if (res == NULL) { Py_XDECREF(dtype); //TODO: raise exception? return NULL; } if (PyObject_IsInstance(data, (PyObject*)cls)) { if (unit!=NULL !checkunit((PyObject*)((UnitArrayObject*)data)-unit,unit)) { Py_XDECREF(res); //TODO: raise exception return NULL; } } else { if (PyObject_IsTrue(unit)) { tmp = res; res = PyArray_View((PyArrayObject*)res, NULL, unitArrayObjectType); if (tmp!=res) { Py_XDECREF(tmp); } ((UnitArrayObject*)res)-unit = (PyDictObject*)unit; Py_INCREF(unit); if (unit!=NULL) { } } } return res; } static PyObject* unitArray__array_finalize__(PyObject* new, PyObject* args) { PyObject *attr = NULL, *tmp = NULL; PyObject *parent = NULL; if (!PyArg_ParseTuple(args, O, parent)) { return NULL; } if (parent!=NULL) { attr = PyObject_GetAttrString(parent, unit); if (attr == NULL) { //parent has no 'unit' so we make a new empty one attr = PyDict_New(); PyErr_Clear(); } } tmp = (PyObject*)((UnitArrayObject*)new)-unit; ((UnitArrayObject*)new)-unit = (PyDictObject*)attr; Py_INCREF(Py_None); return Py_None; } static PyObject* unitArray__array_wrap__(PyObject *self, PyObject *args) { PyObject *array = NULL, *context = NULL; if (!PyArg_ParseTuple(args, OO, array, context)) { //TODO: raise exception return NULL; } printf(%s,PyString_AsString(PyObject_Str(context))); Py_INCREF(array); return array; } static PyMethodDef unitArrayMethods[] = { {__array_finalize__, unitArray__array_finalize__, METH_VARARGS, array finalize method}, {__array_wrap__, unitArray__array_wrap__, METH_VARARGS, array wrap method}, {NULL, NULL, 0, NULL} }; static PyMemberDef unitArrayMembers[] = { {unit, T_OBJECT, offsetof(UnitArrayObject, unit), 0, dictionary containing unit info.}, {NULL, 0, 0, 0, NULL} }; PyTypeObject unitArrayObjectType = { PyObject_HEAD_INIT(NULL) 0, /* ob_size*/ spam.UnitArray, /* tp_name*/ sizeof(UnitArrayObject),/* tp_basicsize */ 0, /* tp_itemsize*/ 0, /* tp_dealloc */ 0, /* tp_print */ 0, /* tp_getattr */ 0, /* tp_setattr */ 0, /* tp_compare */ 0, /* tp_repr*/ 0, /* tp_as_number */ 0, /* tp_as_sequence */ 0, /* tp_as_mapping */ 0, /* tp_hash*/ 0, /* tp_call*/ 0, /* tp_str */ 0, /* tp_getattro*/ 0, /* tp_setattro*/ 0, /* tp_as_buffer */ Py_TPFLAGS_DEFAULT, /* tp_flags */ A numpy array with units, /* tp_doc */ 0,/* traverseproc */ 0,/* tp_clear*/ 0,/* tp_richcompare */ 0,/* tp_weaklistoffset */ 0,/* tp_iter */ 0,/* tp_iternext */ unitArrayMethods,/* tp_methods */
Re: [Numpy-discussion] subclassing array in c
Seeing the backtrace would be helpful. Can you do whatever leads to the segfault from python run from gdb? Val On Wed, Mar 7, 2012 at 7:04 PM, Christoph Gohle christoph.go...@mpq.mpg.dewrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, I have been struggeling for quite some time now. Desperate as I am, now I need help. I was trying to subclass ndarrays in a c extension (see code below) and do constantly get segfaults. I have been checking my INCREF and DECREF stuff up and down but can't find the error. Probably I got something completely wrong... anybody able to help? Thanks, Christoph - - #include Python.h #include structmember.h #include numpy/arrayobject.h static PyObject *SpamError; typedef struct { PyArrayObject base; PyDictObject* unit; } UnitArrayObject; PyTypeObject unitArrayObjectType; static int checkunit(PyObject *unit1, PyObject *unit2) { return PyObject_Compare(unit1, unit2); } static PyObject * unitArray_new(PyTypeObject *cls, PyObject *args, PyObject *kwargs) { PyObject *data = NULL, *unit = NULL; PyArray_Descr* dtype = NULL; PyObject *res = NULL, *tmp = NULL; if (!PyArg_ParseTuple(args, OO|O, data, unit, PyArray_DescrConverter, dtype)) { Py_XDECREF(dtype); return NULL; } res = PyArray_FromAny(data, dtype, 0, 0, NPY_ENSURECOPY, NULL); if (res == NULL) { Py_XDECREF(dtype); //TODO: raise exception? return NULL; } if (PyObject_IsInstance(data, (PyObject*)cls)) { if (unit!=NULL !checkunit((PyObject*)((UnitArrayObject*)data)-unit,unit)) { Py_XDECREF(res); //TODO: raise exception return NULL; } } else { if (PyObject_IsTrue(unit)) { tmp = res; res = PyArray_View((PyArrayObject*)res, NULL, unitArrayObjectType); if (tmp!=res) { Py_XDECREF(tmp); } ((UnitArrayObject*)res)-unit = (PyDictObject*)unit; Py_INCREF(unit); if (unit!=NULL) { } } } return res; } static PyObject* unitArray__array_finalize__(PyObject* new, PyObject* args) { PyObject *attr = NULL, *tmp = NULL; PyObject *parent = NULL; if (!PyArg_ParseTuple(args, O, parent)) { return NULL; } if (parent!=NULL) { attr = PyObject_GetAttrString(parent, unit); if (attr == NULL) { //parent has no 'unit' so we make a new empty one attr = PyDict_New(); PyErr_Clear(); } } tmp = (PyObject*)((UnitArrayObject*)new)-unit; ((UnitArrayObject*)new)-unit = (PyDictObject*)attr; Py_INCREF(Py_None); return Py_None; } static PyObject* unitArray__array_wrap__(PyObject *self, PyObject *args) { PyObject *array = NULL, *context = NULL; if (!PyArg_ParseTuple(args, OO, array, context)) { //TODO: raise exception return NULL; } printf(%s,PyString_AsString(PyObject_Str(context))); Py_INCREF(array); return array; } static PyMethodDef unitArrayMethods[] = { {__array_finalize__, unitArray__array_finalize__, METH_VARARGS, array finalize method}, {__array_wrap__, unitArray__array_wrap__, METH_VARARGS, array wrap method}, {NULL, NULL, 0, NULL} }; static PyMemberDef unitArrayMembers[] = { {unit, T_OBJECT, offsetof(UnitArrayObject, unit), 0, dictionary containing unit info.}, {NULL, 0, 0, 0, NULL} }; PyTypeObject unitArrayObjectType = { PyObject_HEAD_INIT(NULL) 0, /* ob_size*/ spam.UnitArray, /* tp_name*/ sizeof(UnitArrayObject),/* tp_basicsize */ 0, /* tp_itemsize*/ 0, /* tp_dealloc */ 0, /* tp_print */ 0, /* tp_getattr */ 0, /* tp_setattr */ 0, /* tp_compare */ 0, /* tp_repr*/ 0, /* tp_as_number */ 0, /* tp_as_sequence */ 0, /* tp_as_mapping */ 0, /* tp_hash*/ 0, /* tp_call*/ 0, /* tp_str */ 0, /* tp_getattro*/ 0, /* tp_setattro*/ 0, /* tp_as_buffer */ Py_TPFLAGS_DEFAULT, /* tp_flags */ A numpy array with units, /* tp_doc */ 0,/* traverseproc */ 0,
Re: [Numpy-discussion] subclassing array in c
Tried it on my Ubuntu 10.10 box, no problem: 1) Saved as spampub.c 2) Compiled with (setup.py attached): python setup.py build_ext -i 3) Tested from ipython: In [1]: import spampub In [2]: ua=spampub.UnitArray([0,1,2,3.0],'liter') In [3]: ua Out[3]: UnitArray([ 0., 1., 2., 3.]) In [4]: ua.unit Out[4]: 'liter' On Wed, Mar 7, 2012 at 7:15 PM, Val Kalatsky kalat...@gmail.com wrote: Seeing the backtrace would be helpful. Can you do whatever leads to the segfault from python run from gdb? Val On Wed, Mar 7, 2012 at 7:04 PM, Christoph Gohle christoph.go...@mpq.mpg.de wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, I have been struggeling for quite some time now. Desperate as I am, now I need help. I was trying to subclass ndarrays in a c extension (see code below) and do constantly get segfaults. I have been checking my INCREF and DECREF stuff up and down but can't find the error. Probably I got something completely wrong... anybody able to help? Thanks, Christoph - - #include Python.h #include structmember.h #include numpy/arrayobject.h static PyObject *SpamError; typedef struct { PyArrayObject base; PyDictObject* unit; } UnitArrayObject; PyTypeObject unitArrayObjectType; static int checkunit(PyObject *unit1, PyObject *unit2) { return PyObject_Compare(unit1, unit2); } static PyObject * unitArray_new(PyTypeObject *cls, PyObject *args, PyObject *kwargs) { PyObject *data = NULL, *unit = NULL; PyArray_Descr* dtype = NULL; PyObject *res = NULL, *tmp = NULL; if (!PyArg_ParseTuple(args, OO|O, data, unit, PyArray_DescrConverter, dtype)) { Py_XDECREF(dtype); return NULL; } res = PyArray_FromAny(data, dtype, 0, 0, NPY_ENSURECOPY, NULL); if (res == NULL) { Py_XDECREF(dtype); //TODO: raise exception? return NULL; } if (PyObject_IsInstance(data, (PyObject*)cls)) { if (unit!=NULL !checkunit((PyObject*)((UnitArrayObject*)data)-unit,unit)) { Py_XDECREF(res); //TODO: raise exception return NULL; } } else { if (PyObject_IsTrue(unit)) { tmp = res; res = PyArray_View((PyArrayObject*)res, NULL, unitArrayObjectType); if (tmp!=res) { Py_XDECREF(tmp); } ((UnitArrayObject*)res)-unit = (PyDictObject*)unit; Py_INCREF(unit); if (unit!=NULL) { } } } return res; } static PyObject* unitArray__array_finalize__(PyObject* new, PyObject* args) { PyObject *attr = NULL, *tmp = NULL; PyObject *parent = NULL; if (!PyArg_ParseTuple(args, O, parent)) { return NULL; } if (parent!=NULL) { attr = PyObject_GetAttrString(parent, unit); if (attr == NULL) { //parent has no 'unit' so we make a new empty one attr = PyDict_New(); PyErr_Clear(); } } tmp = (PyObject*)((UnitArrayObject*)new)-unit; ((UnitArrayObject*)new)-unit = (PyDictObject*)attr; Py_INCREF(Py_None); return Py_None; } static PyObject* unitArray__array_wrap__(PyObject *self, PyObject *args) { PyObject *array = NULL, *context = NULL; if (!PyArg_ParseTuple(args, OO, array, context)) { //TODO: raise exception return NULL; } printf(%s,PyString_AsString(PyObject_Str(context))); Py_INCREF(array); return array; } static PyMethodDef unitArrayMethods[] = { {__array_finalize__, unitArray__array_finalize__, METH_VARARGS, array finalize method}, {__array_wrap__, unitArray__array_wrap__, METH_VARARGS, array wrap method}, {NULL, NULL, 0, NULL} }; static PyMemberDef unitArrayMembers[] = { {unit, T_OBJECT, offsetof(UnitArrayObject, unit), 0, dictionary containing unit info.}, {NULL, 0, 0, 0, NULL} }; PyTypeObject unitArrayObjectType = { PyObject_HEAD_INIT(NULL) 0, /* ob_size*/ spam.UnitArray, /* tp_name*/ sizeof(UnitArrayObject),/* tp_basicsize */ 0, /* tp_itemsize*/ 0, /* tp_dealloc */ 0, /* tp_print */ 0, /* tp_getattr */ 0, /* tp_setattr */ 0, /* tp_compare */ 0, /* tp_repr*/ 0, /* tp_as_number */ 0, /* tp_as_sequence */ 0, /* tp_as_mapping */ 0, /* tp_hash*/ 0, /* tp_call*/
Re: [Numpy-discussion] subclassing array in c
FWIW, this crashes on Windows with numpy 1.6.1 but not numpy 1.7-git debug build. Christoph Gohlke On 3/7/2012 5:36 PM, Val Kalatsky wrote: Tried it on my Ubuntu 10.10 box, no problem: 1) Saved as spampub.c 2) Compiled with (setup.py attached): python setup.py build_ext -i 3) Tested from ipython: In [1]: import spampub In [2]: ua=spampub.UnitArray([0,1,2,3.0],'liter') In [3]: ua Out[3]: UnitArray([ 0., 1., 2., 3.]) In [4]: ua.unit Out[4]: 'liter' On Wed, Mar 7, 2012 at 7:15 PM, Val Kalatsky kalat...@gmail.com mailto:kalat...@gmail.com wrote: Seeing the backtrace would be helpful. Can you do whatever leads to the segfault from python run from gdb? Val On Wed, Mar 7, 2012 at 7:04 PM, Christoph Gohle christoph.go...@mpq.mpg.de mailto:christoph.go...@mpq.mpg.de wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, I have been struggeling for quite some time now. Desperate as I am, now I need help. I was trying to subclass ndarrays in a c extension (see code below) and do constantly get segfaults. I have been checking my INCREF and DECREF stuff up and down but can't find the error. Probably I got something completely wrong... anybody able to help? Thanks, Christoph - - #include Python.h #include structmember.h #include numpy/arrayobject.h static PyObject *SpamError; typedef struct { PyArrayObject base; PyDictObject* unit; } UnitArrayObject; PyTypeObject unitArrayObjectType; static int checkunit(PyObject *unit1, PyObject *unit2) { return PyObject_Compare(unit1, unit2); } static PyObject * unitArray_new(PyTypeObject *cls, PyObject *args, PyObject *kwargs) { PyObject *data = NULL, *unit = NULL; PyArray_Descr* dtype = NULL; PyObject *res = NULL, *tmp = NULL; if (!PyArg_ParseTuple(args, OO|O, data, unit, PyArray_DescrConverter, dtype)) { Py_XDECREF(dtype); return NULL; } res = PyArray_FromAny(data, dtype, 0, 0, NPY_ENSURECOPY, NULL); if (res == NULL) { Py_XDECREF(dtype); //TODO: raise exception? return NULL; } if (PyObject_IsInstance(data, (PyObject*)cls)) { if (unit!=NULL !checkunit((PyObject*)((UnitArrayObject*)data)-unit,unit)) { Py_XDECREF(res); //TODO: raise exception return NULL; } } else { if (PyObject_IsTrue(unit)) { tmp = res; res = PyArray_View((PyArrayObject*)res, NULL, unitArrayObjectType); if (tmp!=res) { Py_XDECREF(tmp); } ((UnitArrayObject*)res)-unit = (PyDictObject*)unit; Py_INCREF(unit); if (unit!=NULL) { } } } return res; } static PyObject* unitArray__array_finalize__(PyObject* new, PyObject* args) { PyObject *attr = NULL, *tmp = NULL; PyObject *parent = NULL; if (!PyArg_ParseTuple(args, O, parent)) { return NULL; } if (parent!=NULL) { attr = PyObject_GetAttrString(parent, unit); if (attr == NULL) { //parent has no 'unit' so we make a new empty one attr = PyDict_New(); PyErr_Clear(); } } tmp = (PyObject*)((UnitArrayObject*)new)-unit; ((UnitArrayObject*)new)-unit = (PyDictObject*)attr; Py_INCREF(Py_None); return Py_None; } static PyObject* unitArray__array_wrap__(PyObject *self, PyObject *args) { PyObject *array = NULL, *context = NULL; if (!PyArg_ParseTuple(args, OO, array, context)) { //TODO: raise exception return NULL; } printf(%s,PyString_AsString(PyObject_Str(context))); Py_INCREF(array); return array; } static PyMethodDef unitArrayMethods[] = { {__array_finalize__, unitArray__array_finalize__, METH_VARARGS, array finalize method}, {__array_wrap__, unitArray__array_wrap__, METH_VARARGS, array wrap method}, {NULL, NULL, 0, NULL} }; static PyMemberDef
Re: [Numpy-discussion] use for missing (ignored) data?
On Wednesday, March 7, 2012, Nathaniel Smith n...@pobox.com wrote: On Wed, Mar 7, 2012 at 8:05 PM, Neal Becker ndbeck...@gmail.com wrote: I'm wondering what is the use for the ignored data feature? I can use: A[valid_A_indexes] = whatever to process only the 'non-ignored' portions of A. So at least some simple cases of ignored data are already supported without introducing a new type. OTOH: w = A[valid_A_indexes] will copy A's data, and subsequent use of w[:] = something will not update A. Is this the reason for wanting the ignored data feature? Hi Neal, There are a few reasons that I know of why people want more support from numpy for ignored data/masks, specifically (as opposed to missing data or other related concepts): 1) If you're often working on some subset of your data, then it's convenient to set the mask once and have it stay in effect for further operations. Anything you can accomplish this way can also be accomplished by keeping an explicit mask array and using it for indexing by hand, but in some situations it may be more convenient not to. 2) Operating on subsets of an array without making a copy. Like Benjamin pointed out, indexing with a mask makes a copy. This is slow, and what's worse, people who work with large data sets (e.g., big fMRI volumes) may not have enough memory to afford such a copy. This problem can be solved by using the new where= argument to ufuncs (which skips the copy). (But then see (1) -- passing where= to a bunch of functions takes more typing than just setting it once and leaving it.) 3) Suppose there's a 3rd-party function that takes an array -- borrowing Charles example, say it's draw_points(arr). Now you want to apply it to just a subset of your data, and want to avoid a copy. It would be nice if the original author had made it draw_points(arr, mask), but they didn't. Well, if you have masking built in to your array type, then maybe you can call this as draw_points(masked_arr) and it will Just Work. I.e., maybe people who aren't thinking about masking will sometimes write code that accidentally works with masking anyway. I'm not sure how much I'd trust this, but I guess it's nice when it happens. And if it does work, then implementing the show/hide point functionality will be easier. (And if it doesn't work, and masking is built into numpy.ndarray, then maybe you can use this to argue with the original author that this is a bug, not just a missing feature. Again, I'm not sure if this is a good thing on net: one could argue that people shouldn't be forced to think about masking every time they write any function, just in case it becomes relevant later. But certainly it'd be useful sometimes.) There may be other motivations that I'm not aware of, of course. -- Nathaniel I think you got most of the motivations right. I would say on the last point that extension authors should be able to say does not support NA!. The important thing is that it makes it more up-front. An additional motivation is with regards to mathematical operations. Personally, I hate getting bitten by a function that takes a max(), and I have a NaN in the array. In addition, what about adding two arrays together that may or may not have different masks? This has been the major advantage of no.ma. All of Mostly Works. Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] subclassing array in c
Dear Val, I agree that more detail is needed. Sorry for that it was late yesterday. I am running Python 2.6.1, numpy development branch (numpy-2.0.0.dev_20101104-py2.6-macosx-10.6-universal.egg). maybe I should switch to release? I compile with your setup.py using 'python setup.py build_ext -i' and run the commands below in ipython. As you can see, I don't get a crash for a single call to the constructor, consistent with your observation. And it looks like, one has to use dict in the unit to make it crash. Can you make sense out of that? In [1]: import spampub In [4]: spampub.UnitArray(1,{'s':1}) Out[4]: UnitArray(1) In [6]: a=[spampub.UnitArray(i,{'s':i}) for i in xrange(1000)] Segmentation fault backtrace is the following: Exception Type: EXC_BAD_ACCESS (SIGSEGV) Exception Codes: KERN_INVALID_ADDRESS at 0x0021 Crashed Thread: 0 Dispatch queue: com.apple.main-thread Thread 0 Crashed: Dispatch queue: com.apple.main-thread 0 org.python.python 0x0001000b0b8e PyObject_GC_Track + 1473 1 org.python.python 0x0001000b1255 _PyObject_GC_Malloc + 191 2 org.python.python 0x0001000b12d0 _PyObject_GC_New + 21 3 org.python.python 0x00010003a856 PyDict_New + 116 4 org.python.python 0x00010003a99c _PyDict_NewPresized + 24 5 org.python.python 0x0001000880cb PyEval_EvalFrameEx + 11033 6 org.python.python 0x00010008acce PyEval_EvalCodeEx + 1803 7 org.python.python 0x00010008735d PyEval_EvalFrameEx + 7595 8 org.python.python 0x00010008acce PyEval_EvalCodeEx + 1803 9 org.python.python 0x00010008935e PyEval_EvalFrameEx + 15788 10 org.python.python 0x00010008acce PyEval_EvalCodeEx + 1803 11 org.python.python 0x00010008935e PyEval_EvalFrameEx + 15788 12 org.python.python 0x0001000892e1 PyEval_EvalFrameEx + 15663 13 org.python.python 0x00010008acce PyEval_EvalCodeEx + 1803 14 org.python.python 0x00010008935e PyEval_EvalFrameEx + 15788 15 org.python.python 0x00010008acce PyEval_EvalCodeEx + 1803 16 org.python.python 0x00010008935e PyEval_EvalFrameEx + 15788 17 org.python.python 0x00010008acce PyEval_EvalCodeEx + 1803 18 org.python.python 0x00010008935e PyEval_EvalFrameEx + 15788 19 org.python.python 0x00010008acce PyEval_EvalCodeEx + 1803 20 org.python.python 0x00010008935e PyEval_EvalFrameEx + 15788 21 org.python.python 0x00010008acce PyEval_EvalCodeEx + 1803 22 org.python.python 0x00010008ad61 PyEval_EvalCode + 54 23 org.python.python 0x0001000a265a Py_CompileString + 78 24 org.python.python 0x0001000a2723 PyRun_FileExFlags + 150 25 org.python.python 0x0001000a423d PyRun_SimpleFileExFlags + 704 26 org.python.python 0x0001000b0286 Py_Main + 2718 27 org.python.python.app 0x00010e6c start + 52 Am 08.03.2012 um 02:36 schrieb Val Kalatsky: Tried it on my Ubuntu 10.10 box, no problem: 1) Saved as spampub.c 2) Compiled with (setup.py attached): python setup.py build_ext -i 3) Tested from ipython: In [1]: import spampub In [2]: ua=spampub.UnitArray([0,1,2,3.0],'liter') In [3]: ua Out[3]: UnitArray([ 0., 1., 2., 3.]) In [4]: ua.unit Out[4]: 'liter' On Wed, Mar 7, 2012 at 7:15 PM, Val Kalatsky kalat...@gmail.com wrote: Seeing the backtrace would be helpful. Can you do whatever leads to the segfault from python run from gdb? Val On Wed, Mar 7, 2012 at 7:04 PM, Christoph Gohle christoph.go...@mpq.mpg.de wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, I have been struggeling for quite some time now. Desperate as I am, now I need help. I was trying to subclass ndarrays in a c extension (see code below) and do constantly get segfaults. I have been checking my INCREF and DECREF stuff up and down but can't find the error. Probably I got something completely wrong... anybody able to help? Thanks, Christoph - - #include Python.h #include structmember.h #include numpy/arrayobject.h static PyObject *SpamError; typedef struct { PyArrayObject base; PyDictObject* unit; } UnitArrayObject; PyTypeObject unitArrayObjectType; static int checkunit(PyObject *unit1, PyObject *unit2) { return PyObject_Compare(unit1, unit2); } static PyObject * unitArray_new(PyTypeObject *cls, PyObject *args, PyObject *kwargs) { PyObject *data = NULL, *unit = NULL; PyArray_Descr* dtype = NULL; PyObject *res = NULL, *tmp = NULL;