Re: [Numpy-discussion] debian benchmarks
A Monday 05 July 2010 15:32:51 Isaac Gouy escrigué: Sturla Molden sturla at molden.no writes: It is also the kind of tasks where NumPy would help. It would be nice to get NumPy into the shootout. At least for the sake of advertising http://shootout.alioth.debian.org/u32/program.php?test=spectralnormlang=py thonid=2 Let's join the game :-) If I run the above version on my desktop computer (Intel E8600 Duo @ 3 GHz, DDR2 @ 800 MHz memory) I get: $ time python -OO spectralnorm-numpy.py 5500 1.274224153 real0m9.724s user0m9.295s sys 0m0.269s which should correspond to the 12.86s in shootout (so my machine is around 30% faster). But, if I use ATLAS (3.9.25) so as to accelerate linear algebra: $ python -OO spectralnorm-numpy.py 5500 1.274224153 real0m5.862s user0m5.566s sys 0m0.225s Then, my profile said that building M matrix took a lot of time. After using numexpr to improve this (see attached script), I get: $ python -OO spectralnorm-numpy-numexpr.py 5500 1.274224153 real0m3.333s user0m3.071s sys 0m0.163s Interestingly, memory consumption also dropped from 480 MB to 255 MB. Finally, if using Intel's MKL for taking advantage of my 2 cores: $ python -OO spectralnorm-numpy-numexpr.py 5500 1.274224153 real0m2.785s user0m4.117s sys 0m0.139s which is a 3.5x improvement over the initial version. Also, this seems faster (around ~25%), and consumes similar memory than the fastest version written in pure C in interesting alternatives section: http://shootout.alioth.debian.org/u32/performance.php?test=spectralnorm#about I suppose that, provided that Matlab also have a JIT and supports Intel's MKL, it could beat this mark too. Any Matlab user would accept the challenge? -- Francesc Alted # The Computer Language Benchmarks Game # http://shootout.alioth.debian.org/ # # Contributed by Sebastien Loisel # Fixed by Isaac Gouy # Sped up by Josh Goldfoot # Dirtily sped up by Simon Descarpentries # Sped up with numpy by Kittipong Piyawanno # More speed-up with Numexpr by Francesc Alted from sys import argv import numpy as np import numexpr as ne def spectralnorm(n): u = np.matrix(np.ones(n)) j = np.arange(n, dtype=np.float64) M = np.empty(n*n, dtype=np.float64).reshape(n,n) expr = 1.0 / ((i + j) * (i + j + 1) / 2 + i + 1) for i in xrange(n): M[i] = ne.evaluate(expr) MT = M.T for i in xrange(10): v = (u*MT)*M u = (v*MT)*M print %0.9f % ((u*v.T).sum() / (v*v.T).sum())**0.5 spectralnorm(int(argv[1])) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Ternary plots anywhere?
Hi Ariel, Ariel Rokem wrote: Hi Armando, Here's something in that direction: http://nature.berkeley.edu/~chlewis/Sourcecode.html http://nature.berkeley.edu/%7Echlewis/Sourcecode.html Hope that helps - Ariel It really helps. It looks more complete than the only thing I had found (http://focacciaman.blogspot.com/2008/05/ternary-plotting-in-python-take-2.html) Thanks a lot, Armando ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Yet another axes naming package [Was: Re: BOF notes: Fernando's proposal: NumPy ndarray with named axes]
Jonathan March writes: Fernando Perez proposed a NumPy enhancement, an ndarray with named axes, prototyped as DataArray by him, Mike Trumpis, Jonathan Taylor, Matthew Brett, Kilian Koepsell and Stefan van der Walt. I haven't had a thorough look into it, but this work as well as others listed in the 'NdarrayWithNamedAxes' wiki page are similar in spirit to some numpy extensions I've been developing. You can find the code and some initial documentation at: https://people.gso.ac.upc.edu/vilanova/doc/sciexp2 I was not planning to announce it until around 1.0, as the numpy structures are still crude and lack some operations for dynamically extending the structure both in shape and the number of fields on each record (I have some fixes that still need to be committed), but after seeing some related announcements lately, I think we all might benefit from trying to join ideas and efforts. I'll try to shortly explain with an example the part that is related to numpy (that is, the third frontend that appears on the User Guide: 'plotter', which currently has documentation that is worse than poor). Suppose you have a set of benchmarks that have been simulated with different simulator parameters, such that you have one result file for each executed combination of the variables: * benchmark * parameter1 * parameter2 Of course, for each execution you'll also have multiple results (what I call valuenames; simply fields in a record array, in fact). NOTE: scripts for such executions can be generated with the first frontend ('launchgen'). Then you can find and extract those results (package 'sciexp2.gather') and organize them into an N-dimensional 'Data' object (package 'sciexp2.data'), where the first dimension has (for example) the combinations of parameter1-parameter2 values, and the 2nd dimension contains one element for each benchmark (method 'sciexp2.data.Data.reshape'). Now, you can index/slice the structure with integers (as always) _as well as_ with: * strings: simple indexing as well as slicing * filters: slicing with a stepping These are translated into integers through the metadata (benchmark name and/or values of the 2 parameters), stored in 'sciexp2.data.Dimension' objects. For example, to get the numbers of tests where parameter1 is between 10 and 100 and just for benchmarks named 'bench1' and 'bench2': data[::10 parameter1 parameter1 100,[bench1, bench2]] There is a third package extending matplotlib that I have not uploaded (nor fully developed) that is meant to use the dimension and record metadata in the Data object, such that data can be easily plotted. It extracts labels for axis and legends from metadata, and can exand operations. For example: * Plot one figure for each benchmark simply declaring the figure as to be expanded through the 'benchmark' variable. * Plot multiple lines/bars/whatever with a single plot command, like plot such and such for each benchmark, or plot such and such for each configuration and cluster by benchmark name. More extensive examples can be seen on the following URL, which is from a much older version that wasn't using numpy nor matplotlib, and provided a somewhat functional API (SIZE, CPREFETCH, RPREFETCH and SIMULATOR are execution parameters in these examples; fun starts at line 78): https://projects.gso.ac.upc.edu/projects/sciexp2/repository/revisions/200/entry/progs/sciexp2/tags/0.5/plotter/examples/01-spec-figures.cfg Finally, some things that have been bugging me about numppy are: * My 'Data' object is similar to a 'reacarray', such that record elements (what I call valuenames), can be accessed as attributes. But to avoid the cost of a recarray, I use an ndarray with records. This has the unfortunate effect that valuenames cannot be accessed as attributes on a record, but only when it really is a 'Data' object. Tried to add some methods to numpy.void from my python code to access record fields as attributes, but of course that's not possible. * I'd like to associate extra information to dtype, instead of manually carrying it around on every operation accessing a record field. Namely: * a description; such that it can be automatically used as axis/legend labels in matplotlib. * unit information; such that units of results can be automatically computed when operating with numpy, and later extracted when plotted with matplotlib. For this, existing packages like 'units' in PyPy could be used. * The ability for operating on records instead of separate record fields, such that i can: b = a[0] + a[1] instead of: b_f1 = a[0][f1] + a[1][f1] b_f2 = a[0][f2] + a[1][f2] whenever possible. Comments are welcome. apa! -- And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer. -- The Princess of Pure Reason, as told by
[Numpy-discussion] OT? Distutils extension with shared libs
I know it is not directly related to numpy (even if it uses numpy.distutils), but I ask you folks how do you deal with code depending on other libs. In libIM7 projet ( https://launchpad.net/libim7 ), I wrap code from a device constructor with ctypes in order to read Particle Image Velocimetry (PIV) files stored by their software (format im7 and vc7). There is a dependency on zlib which is easy to solve in linux (installing zlib-dev package in debian). But as I want to use it also in windows (sharing the commercial dongle amongst various colleagues is a unconfortable solution), I am trying to configure the setup.py both for win and linux. But I am new to dev in windows... My questions are then: - how do you deal with dependencies in distutils? - what do you need to build against zlib (or another lib) in windows using distutils ? Thanks Fabricio ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] reverse cumsum?
Hi, Is there a simple way to get a cumsum in reverse order? So far, the best I've come up with is to use fancy indexing twice to reverse things: x = np.arange(10) np.cumsum(x[np.arange(9, -1, -1)])[np.arange(9, -1, -1)] array([45, 45, 44, 42, 39, 35, 30, 24, 17, 9]) If it matters, I only care about the 1-d case at this point. Thanks, Ken ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes
On Tue, Jul 6, 2010 at 10:47 AM, Joshua Holbrook josh.holbr...@gmail.com wrote: I really really really want to work on this. I already forked datarray on github and did some research on What Other People Have Done ( http://jesusabdullah.github.com/2010/07/02/datarray.html ). With any luck I'll contribute something actually useful. :) Anyways! --Josh Thanks, Josh. Also note this page here http://scipy.org/StatisticalDataStructures that I think I already mentioned to you. The pandas and larry guys have spent a good deal of time discussing this already, especially with respect to speed and timings (ie., DatArray will most likely need to be optimized). I think Keith already has a benchmark script somewhere(?). We have already had discussions on the pystatsmodels mailing list over here http://groups.google.ca/group/pystatsmodels, so you might want to search a bit, though I think it's time to move the discussion to the numpy list here. And of course, please make use of the mailing list for design choices, questions, and soliciting feedback, as I think this project is of interest to many people. Cheers, Skipper ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] [ANN] la 0.4, the labeled array
The main class of the la package is a labeled array, larry. A larry consists of data and labels. The data is stored as a NumPy array and the labels as a list of lists (one list per dimension). Alignment by label is automatic when you add (or subtract, multiply, divide) two larrys. The focus of this release was binary operations between unaligned larrys with user control of the join method (five available) and the fill method. A general binary function, la.binaryop(), was added as were the convenience functions add, subtract, multiply, divide. Supporting functions such as la.align(), which aligns two larrys, were also added. download http://pypi.python.org/pypi/la doc http://larry.sourceforge.net code http://github.com/kwgoodman/la list1 http://groups.google.ca/group/pystatsmodels list2 http://groups.google.com/group/labeled-array RELEASE NOTES New larry methods - ismissing: A bool larry with element-wise marking of missing values - take: A copy of the specified elements of a larry along an axis New functions - rand: Random samples from a uniform distribution - randn: Random samples from a Gaussian distribution - missing_marker: Return missing value marker for the given larry - ismissing: A bool Numpy array with element-wise marking of missing values - correlation: Correlation of two Numpy arrays along the specified axis - split: Split into train and test data along given axis - listmap_fill: Index map a list onto another and index of unmappable elements - listmap_fill: Cython version of listmap_fill - align: Align two larrys using one of five join methods - info: la package information such as version number and HDF5 availability - binaryop: Binary operation on two larrys with given function and join method - add: Sum of two larrys using given join and fill methods - subtract: Difference of two larrys using given join and fill methods - multiply: Multiply two larrys element-wise using given join and fill methods - divide: Divide two larrys element-wise using given join and fill methods Enhancements - listmap now has option to ignore unmappable elements instead of KeyError - listmap.pyx now has option to ignore unmappable elements instead of KeyError - larry.morph() is much faster as are methods, such as merge, that use it Breakage from la 0.3 - Development moved from launchpad to github - func.py and afunc.py renamed flarry.py and farray.py to match new flabel.py. Broke: from la.func import stack; Did not break: from la import stack - Default binary operators (+, -, ...) no longer raise an error when no labels overlap Bug fixes - #590270 Index with 1d array bug: lar[1darray,:] worked; lar[1darray] crashed ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes
On Tue, Jul 6, 2010 at 7:47 AM, Joshua Holbrook josh.holbr...@gmail.com wrote: I really really really want to work on this. I already forked datarray on github and did some research on What Other People Have Done ( http://jesusabdullah.github.com/2010/07/02/datarray.html ). With any luck I'll contribute something actually useful. :) I like the figure! To do label indexing on a larry you need to use lix, so lar.lix[...] ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes
On Tue, Jul 6, 2010 at 11:55 AM, Keith Goodman kwgood...@gmail.com wrote: On Tue, Jul 6, 2010 at 7:47 AM, Joshua Holbrook josh.holbr...@gmail.com wrote: I really really really want to work on this. I already forked datarray on github and did some research on What Other People Have Done ( http://jesusabdullah.github.com/2010/07/02/datarray.html ). With any luck I'll contribute something actually useful. :) I like the figure! To do label indexing on a larry you need to use lix, so lar.lix[...] FYI, if you didn't see it, there are also usage docs in dataarray/doc that you can build with sphinx that show a lot of the thinking and examples (they spent time looking at pandas and larry). One question that was asked of Wes, that I'd propose to you as well Keith, is that if DataArray became part of NumPy, do you think you could use it to work on top of for larry? Skipper ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes
I'm kinda-sorta still getting around to building/reading the sphinx docs for datarray. _ Like, I've gone through them before, but it was more cursory than I'd like. Honestly, I kinda let myself get caught up in trying to automate the process of getting them onto github pages. I have to admit that I didn't 100% understand the reasoning behind not allowing integer ticks (I blame jet lag--it's a nice scapegoat). I believe it originally had to do with what you meant if you typed, say, A[3:london]; Did you mean the underlying ndarray index 3, or the outer level tick 3? I think if you didn't allow integers, then you could simply wrap your 3 in a string: A[3:London] so it's probably not a deal-breaker, but I would imagine that using (a) separate method(s) for label-based indexing may make allowing integer-datatyped labels. Thoughts? --Josh On Tue, Jul 6, 2010 at 8:23 AM, Keith Goodman kwgood...@gmail.com wrote: On Tue, Jul 6, 2010 at 9:13 AM, Skipper Seabold jsseab...@gmail.com wrote: On Tue, Jul 6, 2010 at 11:55 AM, Keith Goodman kwgood...@gmail.com wrote: On Tue, Jul 6, 2010 at 7:47 AM, Joshua Holbrook josh.holbr...@gmail.com wrote: I really really really want to work on this. I already forked datarray on github and did some research on What Other People Have Done ( http://jesusabdullah.github.com/2010/07/02/datarray.html ). With any luck I'll contribute something actually useful. :) I like the figure! To do label indexing on a larry you need to use lix, so lar.lix[...] FYI, if you didn't see it, there are also usage docs in dataarray/doc that you can build with sphinx that show a lot of the thinking and examples (they spent time looking at pandas and larry). One question that was asked of Wes, that I'd propose to you as well Keith, is that if DataArray became part of NumPy, do you think you could use it to work on top of for larry? This is all very exciting. I did not know that DataArray had ticks so I never took a close look at it. After reading the sphinx doc, one question I had was how firm is the decision to not allow integer ticks? I use int ticks a lot. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes
On Tue, Jul 6, 2010 at 8:42 AM, Skipper Seabold jsseab...@gmail.com wrote: On Tue, Jul 6, 2010 at 12:36 PM, Joshua Holbrook josh.holbr...@gmail.com wrote: I'm kinda-sorta still getting around to building/reading the sphinx docs for datarray. _ Like, I've gone through them before, but it was more cursory than I'd like. Honestly, I kinda let myself get caught up in trying to automate the process of getting them onto github pages. I have to admit that I didn't 100% understand the reasoning behind not allowing integer ticks (I blame jet lag--it's a nice scapegoat). I believe it originally had to do with what you meant if you typed, say, A[3:london]; Did you mean the underlying ndarray index 3, or the outer level tick 3? I think if you didn't allow integers, then you could simply wrap your 3 in a string: A[3:London] so it's probably not a deal-breaker, but I would imagine that using (a) separate method(s) for label-based indexing may make allowing integer-datatyped labels. Thoughts? Would you mind bottom-posting/ posting in-line to make the thread easier to follow? --Josh On Tue, Jul 6, 2010 at 8:23 AM, Keith Goodman kwgood...@gmail.com wrote: On Tue, Jul 6, 2010 at 9:13 AM, Skipper Seabold jsseab...@gmail.com wrote: On Tue, Jul 6, 2010 at 11:55 AM, Keith Goodman kwgood...@gmail.com wrote: On Tue, Jul 6, 2010 at 7:47 AM, Joshua Holbrook josh.holbr...@gmail.com wrote: I really really really want to work on this. I already forked datarray on github and did some research on What Other People Have Done ( http://jesusabdullah.github.com/2010/07/02/datarray.html ). With any luck I'll contribute something actually useful. :) I like the figure! To do label indexing on a larry you need to use lix, so lar.lix[...] FYI, if you didn't see it, there are also usage docs in dataarray/doc that you can build with sphinx that show a lot of the thinking and examples (they spent time looking at pandas and larry). One question that was asked of Wes, that I'd propose to you as well Keith, is that if DataArray became part of NumPy, do you think you could use it to work on top of for larry? This is all very exciting. I did not know that DataArray had ticks so I never took a close look at it. After reading the sphinx doc, one question I had was how firm is the decision to not allow integer ticks? I use int ticks a lot. I think what Josh said is right. However, we proposed having all of the new labeled axis access pushed to a .aix (or whatever) method, so as to avoid any confusion, as the original object can be accessed just as an ndarray. I'm not sure where this leaves us vis-a-vis ints as ticks. Skipper ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Sorry re: posting at-top. I guess habit surpassed observation of community norms for a second there. Whups! My opinion on the matter is that, as a matter of purity, labels should all have the string datatype. That said, I'd imagine that passing an int as an argument would be fine, due to python's loosey-goosey attitude towards datatypes. :) That, or, y'know, str(myint). --Josh ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes
On Tue, Jul 6, 2010 at 9:52 AM, Joshua Holbrook josh.holbr...@gmail.com wrote: On Tue, Jul 6, 2010 at 8:42 AM, Skipper Seabold jsseab...@gmail.com wrote: On Tue, Jul 6, 2010 at 12:36 PM, Joshua Holbrook josh.holbr...@gmail.com wrote: I'm kinda-sorta still getting around to building/reading the sphinx docs for datarray. _ Like, I've gone through them before, but it was more cursory than I'd like. Honestly, I kinda let myself get caught up in trying to automate the process of getting them onto github pages. I have to admit that I didn't 100% understand the reasoning behind not allowing integer ticks (I blame jet lag--it's a nice scapegoat). I believe it originally had to do with what you meant if you typed, say, A[3:london]; Did you mean the underlying ndarray index 3, or the outer level tick 3? I think if you didn't allow integers, then you could simply wrap your 3 in a string: A[3:London] so it's probably not a deal-breaker, but I would imagine that using (a) separate method(s) for label-based indexing may make allowing integer-datatyped labels. Thoughts? Would you mind bottom-posting/ posting in-line to make the thread easier to follow? --Josh On Tue, Jul 6, 2010 at 8:23 AM, Keith Goodman kwgood...@gmail.com wrote: On Tue, Jul 6, 2010 at 9:13 AM, Skipper Seabold jsseab...@gmail.com wrote: On Tue, Jul 6, 2010 at 11:55 AM, Keith Goodman kwgood...@gmail.com wrote: On Tue, Jul 6, 2010 at 7:47 AM, Joshua Holbrook josh.holbr...@gmail.com wrote: I really really really want to work on this. I already forked datarray on github and did some research on What Other People Have Done ( http://jesusabdullah.github.com/2010/07/02/datarray.html ). With any luck I'll contribute something actually useful. :) I like the figure! To do label indexing on a larry you need to use lix, so lar.lix[...] FYI, if you didn't see it, there are also usage docs in dataarray/doc that you can build with sphinx that show a lot of the thinking and examples (they spent time looking at pandas and larry). One question that was asked of Wes, that I'd propose to you as well Keith, is that if DataArray became part of NumPy, do you think you could use it to work on top of for larry? This is all very exciting. I did not know that DataArray had ticks so I never took a close look at it. After reading the sphinx doc, one question I had was how firm is the decision to not allow integer ticks? I use int ticks a lot. I think what Josh said is right. However, we proposed having all of the new labeled axis access pushed to a .aix (or whatever) method, so as to avoid any confusion, as the original object can be accessed just as an ndarray. I'm not sure where this leaves us vis-a-vis ints as ticks. Skipper ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Sorry re: posting at-top. I guess habit surpassed observation of community norms for a second there. Whups! My opinion on the matter is that, as a matter of purity, labels should all have the string datatype. That said, I'd imagine that passing an int as an argument would be fine, due to python's loosey-goosey attitude towards datatypes. :) That, or, y'know, str(myint). Ideally (for me), the only requirement for ticks would be hashable and unique along any one axis. So, for example, datetime.date() could be a tick but a list could not be a tick (not hashable). ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes
On Tue, Jul 6, 2010 at 12:56 PM, Keith Goodman kwgood...@gmail.com wrote: On Tue, Jul 6, 2010 at 9:52 AM, Joshua Holbrook josh.holbr...@gmail.com wrote: On Tue, Jul 6, 2010 at 8:42 AM, Skipper Seabold jsseab...@gmail.com wrote: On Tue, Jul 6, 2010 at 12:36 PM, Joshua Holbrook josh.holbr...@gmail.com wrote: I'm kinda-sorta still getting around to building/reading the sphinx docs for datarray. _ Like, I've gone through them before, but it was more cursory than I'd like. Honestly, I kinda let myself get caught up in trying to automate the process of getting them onto github pages. I have to admit that I didn't 100% understand the reasoning behind not allowing integer ticks (I blame jet lag--it's a nice scapegoat). I believe it originally had to do with what you meant if you typed, say, A[3:london]; Did you mean the underlying ndarray index 3, or the outer level tick 3? I think if you didn't allow integers, then you could simply wrap your 3 in a string: A[3:London] so it's probably not a deal-breaker, but I would imagine that using (a) separate method(s) for label-based indexing may make allowing integer-datatyped labels. Thoughts? Would you mind bottom-posting/ posting in-line to make the thread easier to follow? --Josh On Tue, Jul 6, 2010 at 8:23 AM, Keith Goodman kwgood...@gmail.com wrote: On Tue, Jul 6, 2010 at 9:13 AM, Skipper Seabold jsseab...@gmail.com wrote: On Tue, Jul 6, 2010 at 11:55 AM, Keith Goodman kwgood...@gmail.com wrote: On Tue, Jul 6, 2010 at 7:47 AM, Joshua Holbrook josh.holbr...@gmail.com wrote: I really really really want to work on this. I already forked datarray on github and did some research on What Other People Have Done ( http://jesusabdullah.github.com/2010/07/02/datarray.html ). With any luck I'll contribute something actually useful. :) I like the figure! To do label indexing on a larry you need to use lix, so lar.lix[...] FYI, if you didn't see it, there are also usage docs in dataarray/doc that you can build with sphinx that show a lot of the thinking and examples (they spent time looking at pandas and larry). One question that was asked of Wes, that I'd propose to you as well Keith, is that if DataArray became part of NumPy, do you think you could use it to work on top of for larry? This is all very exciting. I did not know that DataArray had ticks so I never took a close look at it. After reading the sphinx doc, one question I had was how firm is the decision to not allow integer ticks? I use int ticks a lot. I think what Josh said is right. However, we proposed having all of the new labeled axis access pushed to a .aix (or whatever) method, so as to avoid any confusion, as the original object can be accessed just as an ndarray. I'm not sure where this leaves us vis-a-vis ints as ticks. Skipper ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Sorry re: posting at-top. I guess habit surpassed observation of community norms for a second there. Whups! My opinion on the matter is that, as a matter of purity, labels should all have the string datatype. That said, I'd imagine that passing an int as an argument would be fine, due to python's loosey-goosey attitude towards datatypes. :) That, or, y'know, str(myint). Ideally (for me), the only requirement for ticks would be hashable and unique along any one axis. So, for example, datetime.date() could be a tick but a list could not be a tick (not hashable). ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Gmail needs to really get its act and enable bottom-posting by default. Definitely an annoyance There are many issues at play here so I wanted to give some of my thoughts re: building pandas, larry, etc. on top of DataArray (or whatever it is that makes its way into NumPy), can put this on the wiki, too: 1. Giving semantic information to axes (not ticks, though) I think this is very useful but wouldn't be immediately useful in pandas except perhaps moving axis names elsewhere (which are currently a part of the data-structures and always have the same name). I wouldn't be immediately comfortable say, making a pandas DataFrame a subclass of DataArray and making them implicitly interoperable. Going back and forth e.g. from DataArray and DataFrame *should* be an easy operation-- you could imagine using DataArray to serialize both pandas and larry objects for example! 2. Container for axis metadata (Axis object in datarray, Index in pandas, ...) I would be more than happy to offload the ordered set data structure onto NumPy. In pandas, Index is that container-- it's an ndarray subclass with a handful of methods and a reverse index (e.g. if you have ['d', 'b', 'a' 'c'] you have a dict somewhere with {'d' : 0, 'b' : 1,
Re: [Numpy-discussion] numpy on windows 64 bit
On 7/5/2010 4:19 AM, Robin wrote: On Mon, Jul 5, 2010 at 12:09 PM, David Cournapeaucourn...@gmail.com wrote: Short of saying what those failures are, we can't help you, Thanks for reply... Somehow my message got truncated - I had written more detail about the errors! I noticed that on windows sys.maxint is the 32bit value (2147483647 This is not surprising: sys.maxint gives you the max value of a long, which is 32 bits even on 64 bits on windows. I just got to figuring this out... But it makes some problems. The main one I'm having is that I assume because of this problem array shapes are longs instead of ints (ie x.shape[0] is a long). This breaks np.random.permutation(x.shape[1]) which I use all over the place (I opened a ticket for this, #1535). Something I asked in the previous mail that got lost is what is the best cross platform way of doing this? np.random.permutation(int(x.shape[1]))? I proposed a fix at http://projects.scipy.org/numpy/ticket/1535. Does it work for you? Actually that and the problems with scipy.sparse (spsolve doesn't work) cover all of the errors I'm seeing... (I detailed those in a seperate mail to the scipy list). -- Christoph ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes
Just to give a data point, my research group and I would be very excited at the idea of having Fernando's data arrays in Numpy. We can't offer to maintain it, because we are already fairly involved in machine learning and neuroimaging specific code, but we would be able to rely on it more in our packages, and we love it! Gaël On Mon, Jul 05, 2010 at 11:31:02PM -0500, Jonathan March wrote: Fernando Perez proposed a NumPy enhancement, an ndarray with named axes, prototyped as DataArray by him, Mike Trumpis, Jonathan Taylor, Matthew Brett, Kilian Koepsell and Stefan van der Walt. At SciPy 2010 on July 1, Fernando convened a BOF (Birds of a Feather) discussion of this proposal. The notes from this BOF can be found at: [1]http://projects.scipy.org/numpy/wiki/NdarrayWithNamedAxes (linked from the Plans section of [2]http://projects.scipy.org/numpy ) HELP NEEDED: Fernando does not have the resources to drive the project beyond this prototype, which already does what he needs. If this is to go anywhere, it needs people to do the work. Please step forward. References Visible links 1. http://projects.scipy.org/numpy/wiki/NdarrayWithNamedAxes 2. http://projects.scipy.org/numpy ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Gael Varoquaux Research Fellow, INRIA Laboratoire de Neuro-Imagerie Assistee par Ordinateur NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-78-35 Mobile: ++ 33-6-28-25-64-62 http://gael-varoquaux.info ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy on windows 64 bit
On Tue, Jul 6, 2010 at 6:57 PM, Christoph Gohlke cgoh...@uci.edu wrote: I proposed a fix at http://projects.scipy.org/numpy/ticket/1535. Does it work for you? Thanks very much... that looks great. Since it works with long's it fixes my problems (I think it will also fix a couple of the failing scipy tests) Cheers Robin ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes
My opinion on the matter is that, as a matter of purity, labels should all have the string datatype. That said, I'd imagine that passing an int as an argument would be fine, due to python's loosey-goosey attitude towards datatypes. :) That, or, y'know, str(myint). That's kind of what I went for in sciexp2. Integers are maintained to index the structure, and strings are internally translated into the real integers or lists of them (e.g., a filter, see below). All translation into the real integers happens in the Dimension object [1] (an Axis in datarray), which supports all the indexing methods in numpy (slices, iterables, etc), plus what I call filters (i.e., slicing by tick values) [2] If you download the code, you can see the documentation for the user API in a nicer way with './sciexp2/trunk/plotter -d'. After looking into [3], sciexp2 seems conceptually equivalent to datarray. The main difference I see is that sciexp2 supports compound ticks, in the sense that, for me, ticks are formed by a sequence of variables meaningful to the user, which are merged into a single unique string following a user-provided expression: Dimension.expression - @par...@-@PARAM2@ Dimension.contents - [1-z1, 1-z2, 2-z1, 2-z5, ...] So that the user is able not only to index through tick strings (e.g., data[v1-z1]), but also to arbitrarily slice the structure according to each of the separate values of each variable (e.g., data[::PARAM1 = 3 PARAM2 == 'z6'] or any other boolean expression involving any or both of PARAM1 and PARAM2). The other difference is that the Data object in sciexp2 also uses record arrays (but not recarrays, as the documentation talked about extra costs). The idea is that record fields contain the results of a single experiment, and experiment parameters (one variable for each experiment parameter) are arbitrarily mapped into axis/dimensions (thus, the values of experiment parameters form the ticks/indexes of that dimension). This allows the user to store heterogeneous results on a single 'Data' object (e.g., mix integers, floats, strings, dates, etc). As a final note, and as there is no formal documentation for the plotter part (only the API documentation), you can quickly test it with './sciexp2/plotter -i' (opens an IPython shell with everything imported). Then, suppose you have various csv files, with a header line describing each column, and path names are 'foo/bar-baz.results': find_files(@FOO@/@b...@-@b...@.results) extract(default_source, csv, count=LINE) # build a Data with 1 dimension data = from_rawdata(default_rawdata) print data.ndim, data.dim().expression print list(data.dim()) # reshape to multiple dimensions rdata = data.reshape([FOO], [BAR, BAZ], [LINE]) print rdata.ndim, rdata.dim(0).expression, rdata.dim(1).expression print list(rdata.dim(0)) print list(rdata.dim(1)) # now you can start playing with accesses to ticks (as returned by previous # prints), lists of those, slices or filters (e.g., rdata[::FOO == # 'foo1']) # you can also access record fields by means of 'data.name' # if you put this in a file, simply execute './sciexp2/plotter -f file', # and at the end: shell() apa! Footnotes: [1] https://projects.gso.ac.upc.edu/projects/sciexp2/repository/entry/trunk/sciexp2/data/__init__.py#L762 [2] https://projects.gso.ac.upc.edu/projects/sciexp2/repository/entry/trunk/sciexp2/data/__init__.py#L561 [3] http://jesusabdullah.github.com/2010/07/02/datarray.html -- And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer. -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] reverse cumsum?
On Tue, Jul 6, 2010 at 2:23 PM, Alan G Isaac ais...@american.edu wrote: On 7/6/2010 3:37 PM, Joshua Holbrook wrote: In [10]: np.array(list(reversed(np.arange(10).cumsum( Out[10]: array([45, 36, 28, 21, 15, 10, 6, 3, 1, 0]) That might appear to match the subject line but does not match the OP's example output, which was [45, 45, 44, 42, 39, 35, 30, 24, 17, 9]. You are giving the equivalent of x.cumsum()[::-1], while the OP asked for the equivalent of x[::-1].cumsum()[::-1]. fwiw, Alan Isaac ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Oh snap. Good call--idk what I was thinking. Tired, I guess. :) In that case, if you were going to use reversed() things would get a bit nastier: In [13]: np.array(list(reversed(np.array([9-i for i in xrange(10)]).cumsum( Out[13]: array([45, 45, 44, 42, 39, 35, 30, 24, 17, 9]) ...which is gross enough that this approach is probably worth abandoning. I think Ken's suggestion may be the best so far... I meant to say Alan's suggestion, i.e. x[::-1].cumsum()[::-1]. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] OT? Distutils extension with shared libs
On Wed, Jul 7, 2010 at 12:34 AM, si...@lma.cnrs-mrs.fr wrote: More precisely, the constructor provides C source code to access data and metadata with files ReadIM{7,x}.{c.h}. I wrote a tiny ctypes wrappers in order to have a object-oriented class in python that handling reading the data files written by the constructor software. One issue is that ReadIM7.h includes zlib.h. On linux, it is easy to install zlib-dev package. All is simple, as it is installed in standard repertory. On windows of course no standards. How would you then proceed? Do I have to distribute zlib.h, and also zconf.h, zlib.lib, libz.a and libz.dll.a needed to get it work ? Three solutions: - ask your users to build the software and install zlib by themselves. On windows, I am afraid it means you concretely limit your userbase to practically 0. - build zlib as part of the build process, and keep zlib internally. - include a copy of the zlib library (the binary) in the tarball. Other issue: with all these files in ./src, I have the following configuration: ext = Extension('_im7', sources=['src/ReadIM7.cpp', 'src/ReadIMX.cpp'], include_dirs=['src'],\ libraries=['zlib',],\ library_dirs=['src'],\ define_macros=[('_WIN32', None), ('BUILD_DLL', None)], \ extra_compile_args=['-ansi', '-pedantic', '-g', '-v']) it builds a _im7.pyd file that ctypes is not able to load as it expects a _im7.dll file with ctypes.cdll.loadlibrary('_im7')... You cannot build a library loadable with ctypes with distutils nor numpy.distutils. You need to implement it in distutils, or copy the code from one of the project which implemented it cheers, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] finfo.eps v. finfo.epsneg
np.finfo('float64').eps # returns a scalar 2.2204460492503131e-16 np.finfo('float64').epsneg # returns an array array(1.1102230246251565e-16) Bug or feature? DG ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion