Re: [Numpy-discussion] debian benchmarks

2010-07-06 Thread Francesc Alted
A Monday 05 July 2010 15:32:51 Isaac Gouy escrigué:
 Sturla Molden sturla at molden.no writes:
  It is also the kind of tasks where NumPy would help. It would be nice to
  get NumPy into the shootout. At least for the sake of advertising
 
 http://shootout.alioth.debian.org/u32/program.php?test=spectralnormlang=py
 thonid=2

Let's join the game :-)  If I run the above version on my desktop computer 
(Intel E8600 Duo @ 3 GHz, DDR2 @ 800 MHz memory) I get:

$ time python -OO spectralnorm-numpy.py 5500
1.274224153

real0m9.724s
user0m9.295s
sys 0m0.269s

which should correspond to the 12.86s in shootout (so my machine is around 30% 
faster).  But, if I use ATLAS (3.9.25) so as to accelerate linear algebra:

$ python -OO spectralnorm-numpy.py 5500
1.274224153

real0m5.862s
user0m5.566s
sys 0m0.225s

Then, my profile said that building M matrix took a lot of time.  After using 
numexpr to improve this (see attached script), I get:

$ python -OO spectralnorm-numpy-numexpr.py 5500
1.274224153

real0m3.333s
user0m3.071s
sys 0m0.163s

Interestingly, memory consumption also dropped from 480 MB to 255 MB.  
Finally, if using Intel's MKL for taking advantage of my 2 cores:

$ python -OO spectralnorm-numpy-numexpr.py 5500
1.274224153

real0m2.785s
user0m4.117s
sys 0m0.139s

which is a 3.5x improvement over the initial version.  Also, this seems faster 
(around ~25%), and consumes similar memory than the fastest version written in 
pure C in interesting alternatives section:

http://shootout.alioth.debian.org/u32/performance.php?test=spectralnorm#about

I suppose that, provided that Matlab also have a JIT and supports Intel's MKL, 
it could beat this mark too.  Any Matlab user would accept the challenge?

-- 
Francesc Alted
# The Computer Language Benchmarks Game
# http://shootout.alioth.debian.org/
#
# Contributed by Sebastien Loisel
# Fixed by Isaac Gouy
# Sped up by Josh Goldfoot
# Dirtily sped up by Simon Descarpentries
# Sped up with numpy by Kittipong Piyawanno
# More speed-up with Numexpr by Francesc Alted

from sys import argv
import numpy as np
import numexpr as ne

def spectralnorm(n):
u = np.matrix(np.ones(n))
j = np.arange(n, dtype=np.float64)
M = np.empty(n*n, dtype=np.float64).reshape(n,n)
expr = 1.0 / ((i + j) * (i + j + 1) / 2 + i + 1)
for i in xrange(n):
M[i] = ne.evaluate(expr)
MT = M.T
for i in xrange(10):
v = (u*MT)*M
u = (v*MT)*M
print %0.9f % ((u*v.T).sum() / (v*v.T).sum())**0.5

spectralnorm(int(argv[1]))
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Ternary plots anywhere?

2010-07-06 Thread V. Armando Solé
Hi Ariel,

Ariel Rokem wrote:
 Hi Armando,

 Here's something in that direction:

 http://nature.berkeley.edu/~chlewis/Sourcecode.html 
 http://nature.berkeley.edu/%7Echlewis/Sourcecode.html

 Hope that helps - Ariel

It really helps. It looks more complete than the only thing I had found 
(http://focacciaman.blogspot.com/2008/05/ternary-plotting-in-python-take-2.html)

Thanks a lot,

Armando


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Yet another axes naming package [Was: Re: BOF notes: Fernando's proposal: NumPy ndarray with named axes]

2010-07-06 Thread Lluís
Jonathan March writes:

 Fernando Perez proposed a NumPy enhancement, an ndarray with named axes,
 prototyped as DataArray by him, Mike Trumpis, Jonathan Taylor, Matthew
 Brett, Kilian Koepsell and Stefan van der Walt.

I haven't had a thorough look into it, but this work as well as others listed in
the 'NdarrayWithNamedAxes' wiki page are similar in spirit to some numpy
extensions I've been developing.

You can find the code and some initial documentation at:
https://people.gso.ac.upc.edu/vilanova/doc/sciexp2

I was not planning to announce it until around 1.0, as the numpy structures are
still crude and lack some operations for dynamically extending the structure
both in shape and the number of fields on each record (I have some fixes that
still need to be committed), but after seeing some related announcements lately,
I think we all might benefit from trying to join ideas and efforts.

I'll try to shortly explain with an example the part that is related to numpy
(that is, the third frontend that appears on the User Guide: 'plotter', which
currently has documentation that is worse than poor).

Suppose you have a set of benchmarks that have been simulated with different
simulator parameters, such that you have one result file for each executed
combination of the variables:
  * benchmark
  * parameter1
  * parameter2

Of course, for each execution you'll also have multiple results (what I call
valuenames; simply fields in a record array, in fact).

NOTE: scripts for such executions can be generated with the first frontend
  ('launchgen').

Then you can find and extract those results (package 'sciexp2.gather') and
organize them into an N-dimensional 'Data' object (package 'sciexp2.data'),
where the first dimension has (for example) the combinations of
parameter1-parameter2 values, and the 2nd dimension contains one element for
each benchmark (method 'sciexp2.data.Data.reshape').

Now, you can index/slice the structure with integers (as always) _as well as_
with:
  * strings: simple indexing as well as slicing
  * filters: slicing with a stepping

These are translated into integers through the metadata (benchmark name and/or
values of the 2 parameters), stored in 'sciexp2.data.Dimension' objects.

For example, to get the numbers of tests where parameter1 is between 10 and 100
and just for benchmarks named 'bench1' and 'bench2':

   data[::10  parameter1  parameter1  100,[bench1, bench2]]


There is a third package extending matplotlib that I have not uploaded (nor
fully developed) that is meant to use the dimension and record metadata in the
Data object, such that data can be easily plotted.

It extracts labels for axis and legends from metadata, and can exand
operations. For example:
  * Plot one figure for each benchmark simply declaring the figure as to be
expanded through the 'benchmark' variable.
  * Plot multiple lines/bars/whatever with a single plot command, like plot
such and such for each benchmark, or plot such and such for each
configuration and cluster by benchmark name.

More extensive examples can be seen on the following URL, which is from a much
older version that wasn't using numpy nor matplotlib, and provided a somewhat
functional API (SIZE, CPREFETCH, RPREFETCH and SIMULATOR are execution
parameters in these examples; fun starts at line 78):
  
https://projects.gso.ac.upc.edu/projects/sciexp2/repository/revisions/200/entry/progs/sciexp2/tags/0.5/plotter/examples/01-spec-figures.cfg


Finally, some things that have been bugging me about numppy are:

  * My 'Data' object is similar to a 'reacarray', such that record elements
(what I call valuenames), can be accessed as attributes. But to avoid the
cost of a recarray, I use an ndarray with records.
This has the unfortunate effect that valuenames cannot be accessed as
attributes on a record, but only when it really is a 'Data' object.
Tried to add some methods to numpy.void from my python code to access record
fields as attributes, but of course that's not possible.

  * I'd like to associate extra information to dtype, instead of manually
carrying it around on every operation accessing a record field. Namely:
 * a description; such that it can be automatically used as axis/legend
   labels in matplotlib.
 * unit information; such that units of results can be automatically
   computed when operating with numpy, and later extracted when plotted with
   matplotlib.
   For this, existing packages like 'units' in PyPy could be used.

  * The ability for operating on records instead of separate record fields, such
that i can:
b = a[0] + a[1]
instead of:
b_f1 = a[0][f1] + a[1][f1]
b_f2 = a[0][f2] + a[1][f2]
whenever possible.


Comments are welcome.

apa!

-- 
 And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer.
 -- The Princess of Pure Reason, as told by 

[Numpy-discussion] OT? Distutils extension with shared libs

2010-07-06 Thread Fabrice Silva
I know it is not directly related to numpy (even if it uses
numpy.distutils), but I ask you folks how do you deal with code
depending on other libs.

In libIM7 projet ( https://launchpad.net/libim7 ), I wrap code from a
device constructor with ctypes in order to read Particle Image
Velocimetry (PIV) files stored by their software (format im7 and vc7).

There is a dependency on zlib which is easy to solve in linux
(installing zlib-dev package in debian). But as I want to use it also in
windows (sharing the commercial dongle amongst various colleagues is a
unconfortable solution), I am trying to configure the setup.py both for
win and linux. But I am new to dev in windows...

My questions are then:
- how do you deal with dependencies in distutils?
- what do you need to build against zlib (or another lib) in windows
using distutils ?

Thanks

Fabricio

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] reverse cumsum?

2010-07-06 Thread Ken Basye
Hi,
   Is there a simple way to get a cumsum in reverse order?  So far, the 
best I've come up with is to use fancy indexing twice to reverse things:

  x = np.arange(10)
  np.cumsum(x[np.arange(9, -1, -1)])[np.arange(9, -1, -1)]
array([45, 45, 44, 42, 39, 35, 30, 24, 17,  9])

If it matters, I only care about the 1-d case at this point.

  Thanks,
  Ken

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-06 Thread Skipper Seabold
On Tue, Jul 6, 2010 at 10:47 AM, Joshua Holbrook
josh.holbr...@gmail.com wrote:
 I really really really want to work on this. I already forked datarray
 on github and did some research on What Other People Have Done (
 http://jesusabdullah.github.com/2010/07/02/datarray.html ). With any
 luck I'll contribute something actually useful. :)

 Anyways!

 --Josh


Thanks, Josh.  Also note this page here
http://scipy.org/StatisticalDataStructures that I think I already
mentioned to you.

The pandas and larry guys have spent a good deal of time discussing
this already, especially with respect to speed and timings (ie.,
DatArray will most likely need to be optimized).  I think Keith
already has a benchmark script somewhere(?).  We have already had
discussions on the pystatsmodels mailing list over here
http://groups.google.ca/group/pystatsmodels, so you might want to
search a bit, though I think it's time to move the discussion to the
numpy list here.

And of course, please make use of the mailing list for design choices,
questions, and soliciting feedback, as I think this project is of
interest to many people.

Cheers,

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] [ANN] la 0.4, the labeled array

2010-07-06 Thread Keith Goodman
The main class of the la package is a labeled array, larry. A larry
consists of data and labels. The data is stored as a NumPy array and
the labels as a list of lists (one list per dimension).

Alignment by label is automatic when you add (or subtract, multiply,
divide) two larrys.

The focus of this release was binary operations between unaligned
larrys with user control of the join method (five available) and the
fill method. A general binary function, la.binaryop(), was added as
were the convenience functions add, subtract, multiply, divide.
Supporting functions such as la.align(), which aligns two larrys, were
also added.

download  http://pypi.python.org/pypi/la
doc  http://larry.sourceforge.net
code  http://github.com/kwgoodman/la
list1  http://groups.google.ca/group/pystatsmodels
list2  http://groups.google.com/group/labeled-array

RELEASE NOTES

New larry methods

- ismissing: A bool larry with element-wise marking of missing values
- take: A copy of the specified elements of a larry along an axis

New functions

- rand: Random samples from a uniform distribution
- randn: Random samples from a Gaussian distribution
- missing_marker: Return missing value marker for the given larry
- ismissing: A bool Numpy array with element-wise marking of missing values
- correlation: Correlation of two Numpy arrays along the specified axis
- split: Split into train and test data along given axis
- listmap_fill: Index map a list onto another and index of unmappable elements
- listmap_fill: Cython version of listmap_fill
- align: Align two larrys using one of five join methods
- info: la package information such as version number and HDF5 availability
- binaryop: Binary operation on two larrys with given function and join method
- add: Sum of two larrys using given join and fill methods
- subtract: Difference of two larrys using given join and fill methods
- multiply: Multiply two larrys element-wise using given join and fill methods
- divide: Divide two larrys element-wise using given join and fill methods

Enhancements

- listmap now has option to ignore unmappable elements instead of KeyError
- listmap.pyx now has option to ignore unmappable elements instead of KeyError
- larry.morph() is much faster as are methods, such as merge, that use it

Breakage from la 0.3

- Development moved from launchpad to github
- func.py and afunc.py renamed flarry.py and farray.py to match new flabel.py.
  Broke: from la.func import stack; Did not break: from la import stack
- Default binary operators (+, -, ...) no longer raise an error when no labels
  overlap

Bug fixes

- #590270 Index with 1d array bug: lar[1darray,:] worked; lar[1darray] crashed
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-06 Thread Keith Goodman
On Tue, Jul 6, 2010 at 7:47 AM, Joshua Holbrook josh.holbr...@gmail.com wrote:
 I really really really want to work on this. I already forked datarray
 on github and did some research on What Other People Have Done (
 http://jesusabdullah.github.com/2010/07/02/datarray.html ). With any
 luck I'll contribute something actually useful. :)

I like the figure!

To do label indexing on a larry you need to use lix, so lar.lix[...]
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-06 Thread Skipper Seabold
On Tue, Jul 6, 2010 at 11:55 AM, Keith Goodman kwgood...@gmail.com wrote:
 On Tue, Jul 6, 2010 at 7:47 AM, Joshua Holbrook josh.holbr...@gmail.com 
 wrote:
 I really really really want to work on this. I already forked datarray
 on github and did some research on What Other People Have Done (
 http://jesusabdullah.github.com/2010/07/02/datarray.html ). With any
 luck I'll contribute something actually useful. :)

 I like the figure!

 To do label indexing on a larry you need to use lix, so lar.lix[...]

FYI, if you didn't see it, there are also usage docs in dataarray/doc
that you can build with sphinx that show a lot of the thinking and
examples (they spent time looking at pandas and larry).

One question that was asked of Wes, that I'd propose to you as well
Keith, is that if DataArray became part of NumPy, do you think you
could use it to work on top of for larry?

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-06 Thread Joshua Holbrook
I'm kinda-sorta still getting around to building/reading the sphinx
docs for datarray. _ Like, I've gone through them before, but it was
more cursory than I'd like. Honestly, I kinda let myself get caught up
in trying to automate the process of getting them onto github pages.

I have to admit that I didn't 100% understand the reasoning behind not
allowing integer ticks (I blame jet lag--it's a nice scapegoat). I
believe it originally had to do with what you meant if you typed, say,
A[3:london]; Did you mean the underlying ndarray index 3, or the
outer level tick 3? I think if you didn't allow integers, then you
could simply wrap your 3 in a string: A[3:London] so it's
probably not a deal-breaker, but I would imagine that using (a)
separate method(s) for label-based indexing may make allowing
integer-datatyped labels.

Thoughts?

--Josh

On Tue, Jul 6, 2010 at 8:23 AM, Keith Goodman kwgood...@gmail.com wrote:
 On Tue, Jul 6, 2010 at 9:13 AM, Skipper Seabold jsseab...@gmail.com wrote:
 On Tue, Jul 6, 2010 at 11:55 AM, Keith Goodman kwgood...@gmail.com wrote:
 On Tue, Jul 6, 2010 at 7:47 AM, Joshua Holbrook josh.holbr...@gmail.com 
 wrote:
 I really really really want to work on this. I already forked datarray
 on github and did some research on What Other People Have Done (
 http://jesusabdullah.github.com/2010/07/02/datarray.html ). With any
 luck I'll contribute something actually useful. :)

 I like the figure!

 To do label indexing on a larry you need to use lix, so lar.lix[...]

 FYI, if you didn't see it, there are also usage docs in dataarray/doc
 that you can build with sphinx that show a lot of the thinking and
 examples (they spent time looking at pandas and larry).

 One question that was asked of Wes, that I'd propose to you as well
 Keith, is that if DataArray became part of NumPy, do you think you
 could use it to work on top of for larry?

 This is all very exciting. I did not know that DataArray had ticks so
 I never took a close look at it.

 After reading the sphinx doc, one question I had was how firm is the
 decision to not allow integer ticks? I use int ticks a lot.
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-06 Thread Joshua Holbrook
On Tue, Jul 6, 2010 at 8:42 AM, Skipper Seabold jsseab...@gmail.com wrote:
 On Tue, Jul 6, 2010 at 12:36 PM, Joshua Holbrook
 josh.holbr...@gmail.com wrote:
 I'm kinda-sorta still getting around to building/reading the sphinx
 docs for datarray. _ Like, I've gone through them before, but it was
 more cursory than I'd like. Honestly, I kinda let myself get caught up
 in trying to automate the process of getting them onto github pages.

 I have to admit that I didn't 100% understand the reasoning behind not
 allowing integer ticks (I blame jet lag--it's a nice scapegoat). I
 believe it originally had to do with what you meant if you typed, say,
 A[3:london]; Did you mean the underlying ndarray index 3, or the
 outer level tick 3? I think if you didn't allow integers, then you
 could simply wrap your 3 in a string: A[3:London] so it's
 probably not a deal-breaker, but I would imagine that using (a)
 separate method(s) for label-based indexing may make allowing
 integer-datatyped labels.

 Thoughts?

 Would you mind bottom-posting/ posting in-line to make the thread
 easier to follow?


 --Josh

 On Tue, Jul 6, 2010 at 8:23 AM, Keith Goodman kwgood...@gmail.com wrote:
 On Tue, Jul 6, 2010 at 9:13 AM, Skipper Seabold jsseab...@gmail.com wrote:
 On Tue, Jul 6, 2010 at 11:55 AM, Keith Goodman kwgood...@gmail.com wrote:
 On Tue, Jul 6, 2010 at 7:47 AM, Joshua Holbrook josh.holbr...@gmail.com 
 wrote:
 I really really really want to work on this. I already forked datarray
 on github and did some research on What Other People Have Done (
 http://jesusabdullah.github.com/2010/07/02/datarray.html ). With any
 luck I'll contribute something actually useful. :)

 I like the figure!

 To do label indexing on a larry you need to use lix, so lar.lix[...]

 FYI, if you didn't see it, there are also usage docs in dataarray/doc
 that you can build with sphinx that show a lot of the thinking and
 examples (they spent time looking at pandas and larry).

 One question that was asked of Wes, that I'd propose to you as well
 Keith, is that if DataArray became part of NumPy, do you think you
 could use it to work on top of for larry?

 This is all very exciting. I did not know that DataArray had ticks so
 I never took a close look at it.

 After reading the sphinx doc, one question I had was how firm is the
 decision to not allow integer ticks? I use int ticks a lot.

 I think what Josh said is right.  However, we proposed having all of
 the new labeled axis access pushed to a .aix (or whatever) method, so
 as to avoid any confusion, as the original object can be accessed just
 as an ndarray.  I'm not sure where this leaves us vis-a-vis ints as
 ticks.

 Skipper
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


Sorry re: posting at-top. I guess habit surpassed observation of
community norms for a second there. Whups!

My opinion on the matter is that, as a matter of purity, labels
should all have the string datatype. That said, I'd imagine that
passing an int as an argument would be fine, due to python's
loosey-goosey attitude towards datatypes. :) That, or, y'know,
str(myint).

--Josh
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-06 Thread Keith Goodman
On Tue, Jul 6, 2010 at 9:52 AM, Joshua Holbrook josh.holbr...@gmail.com wrote:
 On Tue, Jul 6, 2010 at 8:42 AM, Skipper Seabold jsseab...@gmail.com wrote:
 On Tue, Jul 6, 2010 at 12:36 PM, Joshua Holbrook
 josh.holbr...@gmail.com wrote:
 I'm kinda-sorta still getting around to building/reading the sphinx
 docs for datarray. _ Like, I've gone through them before, but it was
 more cursory than I'd like. Honestly, I kinda let myself get caught up
 in trying to automate the process of getting them onto github pages.

 I have to admit that I didn't 100% understand the reasoning behind not
 allowing integer ticks (I blame jet lag--it's a nice scapegoat). I
 believe it originally had to do with what you meant if you typed, say,
 A[3:london]; Did you mean the underlying ndarray index 3, or the
 outer level tick 3? I think if you didn't allow integers, then you
 could simply wrap your 3 in a string: A[3:London] so it's
 probably not a deal-breaker, but I would imagine that using (a)
 separate method(s) for label-based indexing may make allowing
 integer-datatyped labels.

 Thoughts?

 Would you mind bottom-posting/ posting in-line to make the thread
 easier to follow?


 --Josh

 On Tue, Jul 6, 2010 at 8:23 AM, Keith Goodman kwgood...@gmail.com wrote:
 On Tue, Jul 6, 2010 at 9:13 AM, Skipper Seabold jsseab...@gmail.com 
 wrote:
 On Tue, Jul 6, 2010 at 11:55 AM, Keith Goodman kwgood...@gmail.com 
 wrote:
 On Tue, Jul 6, 2010 at 7:47 AM, Joshua Holbrook 
 josh.holbr...@gmail.com wrote:
 I really really really want to work on this. I already forked datarray
 on github and did some research on What Other People Have Done (
 http://jesusabdullah.github.com/2010/07/02/datarray.html ). With any
 luck I'll contribute something actually useful. :)

 I like the figure!

 To do label indexing on a larry you need to use lix, so lar.lix[...]

 FYI, if you didn't see it, there are also usage docs in dataarray/doc
 that you can build with sphinx that show a lot of the thinking and
 examples (they spent time looking at pandas and larry).

 One question that was asked of Wes, that I'd propose to you as well
 Keith, is that if DataArray became part of NumPy, do you think you
 could use it to work on top of for larry?

 This is all very exciting. I did not know that DataArray had ticks so
 I never took a close look at it.

 After reading the sphinx doc, one question I had was how firm is the
 decision to not allow integer ticks? I use int ticks a lot.

 I think what Josh said is right.  However, we proposed having all of
 the new labeled axis access pushed to a .aix (or whatever) method, so
 as to avoid any confusion, as the original object can be accessed just
 as an ndarray.  I'm not sure where this leaves us vis-a-vis ints as
 ticks.

 Skipper
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


 Sorry re: posting at-top. I guess habit surpassed observation of
 community norms for a second there. Whups!

 My opinion on the matter is that, as a matter of purity, labels
 should all have the string datatype. That said, I'd imagine that
 passing an int as an argument would be fine, due to python's
 loosey-goosey attitude towards datatypes. :) That, or, y'know,
 str(myint).

Ideally (for me), the only requirement for ticks would be hashable and
unique along any one axis. So, for example, datetime.date() could be a
tick but a list could not be a tick (not hashable).
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-06 Thread Wes McKinney
On Tue, Jul 6, 2010 at 12:56 PM, Keith Goodman kwgood...@gmail.com wrote:
 On Tue, Jul 6, 2010 at 9:52 AM, Joshua Holbrook josh.holbr...@gmail.com 
 wrote:
 On Tue, Jul 6, 2010 at 8:42 AM, Skipper Seabold jsseab...@gmail.com wrote:
 On Tue, Jul 6, 2010 at 12:36 PM, Joshua Holbrook
 josh.holbr...@gmail.com wrote:
 I'm kinda-sorta still getting around to building/reading the sphinx
 docs for datarray. _ Like, I've gone through them before, but it was
 more cursory than I'd like. Honestly, I kinda let myself get caught up
 in trying to automate the process of getting them onto github pages.

 I have to admit that I didn't 100% understand the reasoning behind not
 allowing integer ticks (I blame jet lag--it's a nice scapegoat). I
 believe it originally had to do with what you meant if you typed, say,
 A[3:london]; Did you mean the underlying ndarray index 3, or the
 outer level tick 3? I think if you didn't allow integers, then you
 could simply wrap your 3 in a string: A[3:London] so it's
 probably not a deal-breaker, but I would imagine that using (a)
 separate method(s) for label-based indexing may make allowing
 integer-datatyped labels.

 Thoughts?

 Would you mind bottom-posting/ posting in-line to make the thread
 easier to follow?


 --Josh

 On Tue, Jul 6, 2010 at 8:23 AM, Keith Goodman kwgood...@gmail.com wrote:
 On Tue, Jul 6, 2010 at 9:13 AM, Skipper Seabold jsseab...@gmail.com 
 wrote:
 On Tue, Jul 6, 2010 at 11:55 AM, Keith Goodman kwgood...@gmail.com 
 wrote:
 On Tue, Jul 6, 2010 at 7:47 AM, Joshua Holbrook 
 josh.holbr...@gmail.com wrote:
 I really really really want to work on this. I already forked datarray
 on github and did some research on What Other People Have Done (
 http://jesusabdullah.github.com/2010/07/02/datarray.html ). With any
 luck I'll contribute something actually useful. :)

 I like the figure!

 To do label indexing on a larry you need to use lix, so lar.lix[...]

 FYI, if you didn't see it, there are also usage docs in dataarray/doc
 that you can build with sphinx that show a lot of the thinking and
 examples (they spent time looking at pandas and larry).

 One question that was asked of Wes, that I'd propose to you as well
 Keith, is that if DataArray became part of NumPy, do you think you
 could use it to work on top of for larry?

 This is all very exciting. I did not know that DataArray had ticks so
 I never took a close look at it.

 After reading the sphinx doc, one question I had was how firm is the
 decision to not allow integer ticks? I use int ticks a lot.

 I think what Josh said is right.  However, we proposed having all of
 the new labeled axis access pushed to a .aix (or whatever) method, so
 as to avoid any confusion, as the original object can be accessed just
 as an ndarray.  I'm not sure where this leaves us vis-a-vis ints as
 ticks.

 Skipper
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


 Sorry re: posting at-top. I guess habit surpassed observation of
 community norms for a second there. Whups!

 My opinion on the matter is that, as a matter of purity, labels
 should all have the string datatype. That said, I'd imagine that
 passing an int as an argument would be fine, due to python's
 loosey-goosey attitude towards datatypes. :) That, or, y'know,
 str(myint).

 Ideally (for me), the only requirement for ticks would be hashable and
 unique along any one axis. So, for example, datetime.date() could be a
 tick but a list could not be a tick (not hashable).
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


Gmail needs to really get its act and enable bottom-posting by
default. Definitely an annoyance

There are many issues at play here so I wanted to give some of my
thoughts re: building pandas, larry, etc. on top of DataArray (or
whatever it is that makes its way into NumPy), can put this on the
wiki, too:

1. Giving semantic information to axes (not ticks, though)

I think this is very useful but wouldn't be immediately useful in
pandas except perhaps moving axis names elsewhere (which are currently
a part of the data-structures and always have the same name). I
wouldn't be immediately comfortable say, making a pandas DataFrame a
subclass of DataArray and making them implicitly interoperable. Going
back and forth e.g. from DataArray and DataFrame *should* be an easy
operation-- you could imagine using DataArray to serialize both pandas
and larry objects for example!

2. Container for axis metadata (Axis object in datarray, Index in pandas, ...)

I would be more than happy to offload the ordered set data structure
onto NumPy. In pandas, Index is that container-- it's an ndarray
subclass with a handful of methods and a reverse index (e.g. if you
have ['d', 'b', 'a' 'c'] you have a dict somewhere with {'d' : 0, 'b'
: 1, 

Re: [Numpy-discussion] numpy on windows 64 bit

2010-07-06 Thread Christoph Gohlke


On 7/5/2010 4:19 AM, Robin wrote:
 On Mon, Jul 5, 2010 at 12:09 PM, David Cournapeaucourn...@gmail.com  wrote:

 Short of saying what those failures are, we can't help you,

 Thanks for reply... Somehow my message got truncated - I had written
 more detail about the errors!

 I noticed that on windows sys.maxint is the 32bit value (2147483647

 This is not surprising: sys.maxint gives you the max value of a long,
 which is 32 bits even on 64 bits on windows.

 I just got to figuring this out... But it makes some problems. The
 main one I'm having is that I assume because of this problem array
 shapes are longs instead of ints (ie x.shape[0] is a long).

 This breaks np.random.permutation(x.shape[1]) which I use all over the
 place (I opened a ticket for this, #1535). Something I asked in the
 previous mail that got lost is what is the best cross platform way of
 doing this?
 np.random.permutation(int(x.shape[1]))?

I proposed a fix at http://projects.scipy.org/numpy/ticket/1535. Does it 
work for you?


 Actually that and the problems with scipy.sparse (spsolve doesn't
 work) cover all of the errors I'm seeing... (I detailed those in a
 seperate mail to the scipy list).



--
Christoph
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-06 Thread Gael Varoquaux

Just to give a data point, my research group and I would be very excited
at the idea of having Fernando's data arrays in Numpy. We can't offer to
maintain it, because we are already fairly involved in machine learning
and neuroimaging specific code, but we would be able to rely on it more
in our packages, and we love it!

Gaël

On Mon, Jul 05, 2010 at 11:31:02PM -0500, Jonathan March wrote:
Fernando Perez proposed a NumPy enhancement, an ndarray with named axes,
prototyped as DataArray by him, Mike Trumpis, Jonathan Taylor, Matthew
Brett, Kilian Koepsell and Stefan van der Walt.

At SciPy 2010 on July 1, Fernando convened a BOF (Birds of a Feather)
discussion of this proposal.

The notes from this BOF can be found at:
[1]http://projects.scipy.org/numpy/wiki/NdarrayWithNamedAxes
(linked from the Plans section of [2]http://projects.scipy.org/numpy )

HELP NEEDED: Fernando does not have the resources to drive the project
beyond this prototype, which already does what he needs. If this is to go
anywhere, it needs people to do the work. Please step forward.

 References

Visible links
1. http://projects.scipy.org/numpy/wiki/NdarrayWithNamedAxes
2. http://projects.scipy.org/numpy

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


-- 
Gael Varoquaux
Research Fellow, INRIA
Laboratoire de Neuro-Imagerie Assistee par Ordinateur
NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
Phone:  ++ 33-1-69-08-78-35
Mobile: ++ 33-6-28-25-64-62
http://gael-varoquaux.info
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy on windows 64 bit

2010-07-06 Thread Robin
On Tue, Jul 6, 2010 at 6:57 PM, Christoph Gohlke cgoh...@uci.edu wrote:

 I proposed a fix at http://projects.scipy.org/numpy/ticket/1535. Does it
 work for you?

Thanks very much... that looks great. Since it works with long's it
fixes my problems (I think it will also fix a couple of the failing
scipy tests)

Cheers

Robin
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

2010-07-06 Thread Lluís
 My opinion on the matter is that, as a matter of purity, labels
 should all have the string datatype. That said, I'd imagine that
 passing an int as an argument would be fine, due to python's
 loosey-goosey attitude towards datatypes. :) That, or, y'know,
 str(myint).

That's kind of what I went for in sciexp2. Integers are maintained to index the
structure, and strings are internally translated into the real integers or lists
of them (e.g., a filter, see below).

All translation into the real integers happens in the Dimension object [1] (an
Axis in datarray), which supports all the indexing methods in numpy (slices,
iterables, etc), plus what I call filters (i.e., slicing by tick values) [2]

If you download the code, you can see the documentation for the user API in a
nicer way with './sciexp2/trunk/plotter -d'.

After looking into [3], sciexp2 seems conceptually equivalent to datarray. The
main difference I see is that sciexp2 supports compound ticks, in the sense
that, for me, ticks are formed by a sequence of variables meaningful to the
user, which are merged into a single unique string following a user-provided
expression:

  Dimension.expression - @par...@-@PARAM2@
  Dimension.contents - [1-z1, 1-z2, 2-z1, 2-z5, ...]

So that the user is able not only to index through tick strings (e.g.,
data[v1-z1]), but also to arbitrarily slice the structure according to each of
the separate values of each variable (e.g., data[::PARAM1 = 3  PARAM2 ==
'z6'] or any other boolean expression involving any or both of PARAM1 and
PARAM2).

The other difference is that the Data object in sciexp2 also uses record arrays
(but not recarrays, as the documentation talked about extra costs). The idea is
that record fields contain the results of a single experiment, and experiment
parameters (one variable for each experiment parameter) are arbitrarily mapped
into axis/dimensions (thus, the values of experiment parameters form the
ticks/indexes of that dimension). This allows the user to store heterogeneous
results on a single 'Data' object (e.g., mix integers, floats, strings, dates,
etc).

As a final note, and as there is no formal documentation for the plotter part
(only the API documentation), you can quickly test it with './sciexp2/plotter
-i' (opens an IPython shell with everything imported).

Then, suppose you have various csv files, with a header line describing each
column, and path names are 'foo/bar-baz.results':

 find_files(@FOO@/@b...@-@b...@.results)
 extract(default_source, csv, count=LINE)

 # build a Data with 1 dimension
 data = from_rawdata(default_rawdata)
 print data.ndim, data.dim().expression
 print list(data.dim())

 # reshape to multiple dimensions
 rdata = data.reshape([FOO], [BAR, BAZ], [LINE])
 print rdata.ndim, rdata.dim(0).expression, rdata.dim(1).expression
 print list(rdata.dim(0))
 print list(rdata.dim(1))

 # now you can start playing with accesses to ticks (as returned by previous
 # prints), lists of those, slices or filters (e.g., rdata[::FOO ==
 # 'foo1'])

 # you can also access record fields by means of 'data.name'

 # if you put this in a file, simply execute './sciexp2/plotter -f file',
 # and at the end:
 shell()


apa!

Footnotes: 
[1]  
https://projects.gso.ac.upc.edu/projects/sciexp2/repository/entry/trunk/sciexp2/data/__init__.py#L762
[2]  
https://projects.gso.ac.upc.edu/projects/sciexp2/repository/entry/trunk/sciexp2/data/__init__.py#L561
[3]  http://jesusabdullah.github.com/2010/07/02/datarray.html

-- 
 And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer.
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] reverse cumsum?

2010-07-06 Thread Joshua Holbrook
On Tue, Jul 6, 2010 at 2:23 PM, Alan G Isaac ais...@american.edu wrote:
 On 7/6/2010 3:37 PM, Joshua Holbrook wrote:
 In [10]: np.array(list(reversed(np.arange(10).cumsum(
 Out[10]: array([45, 36, 28, 21, 15, 10,  6,  3,  1,  0])



 That might appear to match the subject line
 but does not match the OP's example output,
 which was [45, 45, 44, 42, 39, 35, 30, 24, 17,  9].

 You are giving the equivalent of x.cumsum()[::-1],
 while the OP asked for the equivalent of x[::-1].cumsum()[::-1].

 fwiw,
 Alan Isaac
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


Oh snap. Good call--idk what I was thinking. Tired, I guess. :) In
that case, if you were going to use reversed() things would get a bit
nastier:

In [13]: np.array(list(reversed(np.array([9-i for i in xrange(10)]).cumsum(
Out[13]: array([45, 45, 44, 42, 39, 35, 30, 24, 17,  9])

...which is gross enough that this approach is probably worth abandoning.

 I think Ken's suggestion may be the best so far...

I meant to say Alan's suggestion, i.e. x[::-1].cumsum()[::-1].
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] OT? Distutils extension with shared libs

2010-07-06 Thread David Cournapeau
On Wed, Jul 7, 2010 at 12:34 AM,  si...@lma.cnrs-mrs.fr wrote:
 More precisely, the constructor provides C source code to access data
 and metadata with files ReadIM{7,x}.{c.h}.
 I wrote a tiny ctypes wrappers in order to have a object-oriented
 class in python that handling reading the data files written by the
 constructor software.

 One issue is that ReadIM7.h includes zlib.h. On linux, it is easy to
 install zlib-dev package. All is simple, as it is installed in
 standard repertory. On windows of course no standards. How would you
 then proceed? Do I have to distribute zlib.h, and also zconf.h,
 zlib.lib, libz.a and libz.dll.a needed to get it work ?

Three solutions:
 - ask your users to build the software and install zlib by
themselves. On windows, I am afraid it means you concretely limit your
userbase to practically 0.
 - build zlib as part of the build process, and keep zlib internally.
 - include a copy of the zlib library (the binary) in the tarball.


 Other issue: with all these files in ./src, I have the following
 configuration:
     ext = Extension('_im7',
         sources=['src/ReadIM7.cpp', 'src/ReadIMX.cpp'],
         include_dirs=['src'],\
         libraries=['zlib',],\
         library_dirs=['src'],\
         define_macros=[('_WIN32', None), ('BUILD_DLL', None)], \
         extra_compile_args=['-ansi', '-pedantic', '-g', '-v'])

 it builds a _im7.pyd file that ctypes is not able to load as it
 expects a _im7.dll file with
 ctypes.cdll.loadlibrary('_im7')...

You cannot build a library loadable with ctypes with distutils nor
numpy.distutils. You need to implement it in distutils, or copy the
code from one of the project which implemented it

cheers,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] finfo.eps v. finfo.epsneg

2010-07-06 Thread David Goldsmith
 np.finfo('float64').eps # returns a scalar
2.2204460492503131e-16
 np.finfo('float64').epsneg # returns an array
array(1.1102230246251565e-16)

Bug or feature?

DG
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion