[Numpy-discussion] Datarray BoF, part2

2010-07-21 Thread Keith Goodman
About a dozen people attended what was billed as a continuation of the
SciPy 2010 datarray BoF. We met at UC Berkeley on July 19 as part of
the py4science series.

A datarray is a subclass of a Numpy array that adds the ability to
label the axes and to label the elements along each axis.

We spent most of the time discussing how to index with tick labels.
The main issue is with integers: is an integer index a tick name or a
position index?

At the top level, datarrays always use regular Numpy indexing: an int
is a position, never a label. So darr[0] always returns the first
element of the datarray.

The ambiguity occurs in specialized indexing methods that allow
indexing by tick label name (because the name could be an int). To
break the ambiguity, the proposal was to provide several tick indexing
methods[1]:

1. Integers are always labels
2. Integers are never treated as labels
3. Try 1, then 2

We also discussed allowing axis labels to be any hashable object
(currently only strings are allowed). The main problem: integers.
Currently if an axis is labeled, say, time, you can do
darr.sum(axis=time). What happens when an axis is labeled with an
int? What does the 2 in darr.sum(axis=2) refer to? A position or a
label? The same problem exists for floats since a float is (currently)
a valid axis for Numpy arrays.

References:
[1] 
http://github.com/fperez/datarray/commit/3c5151baa233675b355058eb3ba028d2629bece5
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Datarray BoF, part2

2010-07-21 Thread John Salvatier
I don't really know much about this topic, but what about a flag at array
creation time (or whenever you define labels) that says whether valid
indexes will be treated as labels or indexes for that array?

On Wed, Jul 21, 2010 at 9:37 AM, Keith Goodman kwgood...@gmail.com wrote:

 About a dozen people attended what was billed as a continuation of the
 SciPy 2010 datarray BoF. We met at UC Berkeley on July 19 as part of
 the py4science series.

 A datarray is a subclass of a Numpy array that adds the ability to
 label the axes and to label the elements along each axis.

 We spent most of the time discussing how to index with tick labels.
 The main issue is with integers: is an integer index a tick name or a
 position index?

 At the top level, datarrays always use regular Numpy indexing: an int
 is a position, never a label. So darr[0] always returns the first
 element of the datarray.

 The ambiguity occurs in specialized indexing methods that allow
 indexing by tick label name (because the name could be an int). To
 break the ambiguity, the proposal was to provide several tick indexing
 methods[1]:

 1. Integers are always labels
 2. Integers are never treated as labels
 3. Try 1, then 2

 We also discussed allowing axis labels to be any hashable object
 (currently only strings are allowed). The main problem: integers.
 Currently if an axis is labeled, say, time, you can do
 darr.sum(axis=time). What happens when an axis is labeled with an
 int? What does the 2 in darr.sum(axis=2) refer to? A position or a
 label? The same problem exists for floats since a float is (currently)
 a valid axis for Numpy arrays.

 References:
 [1]
 http://github.com/fperez/datarray/commit/3c5151baa233675b355058eb3ba028d2629bece5
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Datarray BoF, part2

2010-07-21 Thread M Trumpis
On Wed, Jul 21, 2010 at 10:08 AM, Keith Goodman kwgood...@gmail.com wrote:
 On Wed, Jul 21, 2010 at 9:56 AM, John Salvatier
 jsalv...@u.washington.edu wrote:
 I don't really know much about this topic, but what about a flag at array
 creation time (or whenever you define labels) that says whether valid
 indexes will be treated as labels or indexes for that array?

 It's an interesting idea. My guess is that you'd end up having to
 check the attribute all the time when writing code:

 if dar.intaslabel:
    dar2 = dar[tickmap(i)]
 else:
    dar2 = dar[i]

My thoughts too.. it's dangerous to simply have a toggle that changes
what darr.slicingobject[2] means. Much safer to have slicing options
that always try to approach slicing with consistent rules.

Separately, regarding the permissible axis labels, I think we must not
allow any enumerated axis labels (ie, ints and floats). I don't
remember if there was a consensus about that yesterday. We don't have
the flexibility in the ndarray API to allow for the expression
darr.method(axis=2) to mean not the 2nd dimension, but the Axis with
label==2

Mike

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Datarray BoF, part2

2010-07-21 Thread Keith Goodman
On Wed, Jul 21, 2010 at 10:58 AM, M Trumpis mtrum...@berkeley.edu wrote:

 Separately, regarding the permissible axis labels, I think we must not
 allow any enumerated axis labels (ie, ints and floats). I don't
 remember if there was a consensus about that yesterday. We don't have
 the flexibility in the ndarray API to allow for the expression
 darr.method(axis=2) to mean not the 2nd dimension, but the Axis with
 label==2

So the axis label rule could be either:

1. str only
2. Any hashable object except int or float

#1 is looking better and better. Plus you already coded it :)
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Datarray BoF, part2

2010-07-21 Thread Keith Goodman
On Wed, Jul 21, 2010 at 11:41 AM, Vincent Davis
vinc...@vincentdavis.net wrote:
 On Wed, Jul 21, 2010 at 11:08 AM, Keith Goodman kwgood...@gmail.com wrote:
 On Wed, Jul 21, 2010 at 9:56 AM, John Salvatier
 jsalv...@u.washington.edu wrote:
 I don't really know much about this topic, but what about a flag at array
 creation time (or whenever you define labels) that says whether valid
 indexes will be treated as labels or indexes for that array?

 It's an interesting idea. My guess is that you'd end up having to
 check the attribute all the time when writing code:

 if dar.intaslabel:
    dar2 = dar[tickmap(i)]
 else:
    dar2 = dar[i]

 Obviously there are several aspects of a labels that need to be
 considered. An important on is if an operation breaks the meaning of
 the labels. I like the idea of tickmap(i) it could have a lot of
 features like grouping... Maybe even work on structure arrays(maybe).
 The flag could connect the tickmap() to the array. Then if an
 operation was performed on the array to would result in the labels no
 longer being meaningful then the flag would change. In this way
 tickmap(i) checks for the flags and each axis could have a flag.
 (I am sure there is lots I am missing)

Each axis currently has a tick map: Axis._tick_dict. From a datarray
you can access it like this:

 from datarray import DataArray
 x = DataArray([1,2,3], labels=[('time', ['n1', 'n2', 'n3'])])
 x.axis.time._tick_dict
   {'n1': 0, 'n2': 1, 'n3': 2}
 x.axes[0]._tick_dict
   {'n1': 0, 'n2': 1, 'n3': 2}
 x.axis['time']._tick_dict
   {'n1': 0, 'n2': 1, 'n3': 2}

On a separate note, I think having both axis and axes attribute is
confusing. Would it be possible to only have one of them? Here's a
proposal: 
http://github.com/fperez/datarray/commit/01b2d3d2082ade38ec89dbca0c070dd4fc9d9ca3#L1R330
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] recarray slow?

2010-07-21 Thread wheres pythonmonks
I have an recarray -- the first column is date.

I have the following function to compute the number of unique dates in
my data set:


def byName(): return(len(list(set(d['Date'])) ))

Question:  is the string 'Date' looked up at each iteration?  If so,
this is dumb, but explains my horrible performance.
Or, is there a better way to code the above?

Can I convert this to something indexed by column number and convert
'Date' to column number 0 upfront?  Would this help with speed?

W
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] recarray slow?

2010-07-21 Thread Robert Kern
On Wed, Jul 21, 2010 at 15:12, wheres pythonmonks
wherespythonmo...@gmail.com wrote:
 I have an recarray -- the first column is date.

 I have the following function to compute the number of unique dates in
 my data set:


 def byName(): return(len(list(set(d['Date'])) ))

 Question:  is the string 'Date' looked up at each iteration?  If so,
 this is dumb, but explains my horrible performance.
 Or, is there a better way to code the above?

len(np.unique(d['Date']))

If you can come up with a self-contained example that we can
benchmark, it would help. In my examples, I don't see any hideous
performance, but my examples may be missing some crucially important
detail about your data that is causing your performance problems.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] recarray slow?

2010-07-21 Thread Pauli Virtanen
Wed, 21 Jul 2010 15:12:14 -0400, wheres pythonmonks wrote:

 I have an recarray -- the first column is date.
 
 I have the following function to compute the number of unique dates in
 my data set:
 
 
 def byName(): return(len(list(set(d['Date'])) ))

What this code does is:

1. d['Date']

   Extract an array slice containing the dates. This is fast.

2. set(d['Date'])

   Make copies of each array item, and box them into Python objects. 
   This is slow.

   Insert each of the objects in the set. Also this is somewhat slow.

3. list(set(d['Date']))

   Get each item in the set, and insert them to a new list.
   This is somewhat slow, and unnecessary if you only want to
   count.

4. len(list(set(d['Date'])))


So the slowness arises because the code is copying data around, and 
boxing it into Python objects.

You should try using Numpy functions (these don't re-box the data) to do 
this. http://docs.scipy.org/doc/numpy/reference/routines.set.html

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] recarray slow?

2010-07-21 Thread wheres pythonmonks
Thank you very much  better crack open a numpy reference manual
instead of relying on my python intuition.

On Wed, Jul 21, 2010 at 3:44 PM, Pauli Virtanen p...@iki.fi wrote:
 Wed, 21 Jul 2010 15:12:14 -0400, wheres pythonmonks wrote:

 I have an recarray -- the first column is date.

 I have the following function to compute the number of unique dates in
 my data set:


 def byName(): return(len(list(set(d['Date'])) ))

 What this code does is:

 1. d['Date']

   Extract an array slice containing the dates. This is fast.

 2. set(d['Date'])

   Make copies of each array item, and box them into Python objects.
   This is slow.

   Insert each of the objects in the set. Also this is somewhat slow.

 3. list(set(d['Date']))

   Get each item in the set, and insert them to a new list.
   This is somewhat slow, and unnecessary if you only want to
   count.

 4. len(list(set(d['Date'])))


 So the slowness arises because the code is copying data around, and
 boxing it into Python objects.

 You should try using Numpy functions (these don't re-box the data) to do
 this. http://docs.scipy.org/doc/numpy/reference/routines.set.html

 --
 Pauli Virtanen

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] doc string linalg.solve -- works for b is matrix

2010-07-21 Thread Mark Bakker
I am using 1.3.0.
Glad to hear it is correct in 1.4.0
Sorry for bothering you with an old version, but I am very happy with this
feature!
Mark


 What version of numpy are you using?  That docstring was updated in that
 fashion about 8 mo. ago (at least in the Wiki; I'm not sure exactly when it
 was merged, but it does appear that way in version 1.4.0).

 DG

 I am using linalg.solve to solve a system of linear equations.  As I have
 to
  solve multiple systems with the same matrix, but different right-hand
 sides,
  I tried to make the right-hand side a matrix and that works fine.
  So the docstring should say:
 
  Solve the equation ``a x = b`` for ``x``.
 
  Parameters
  --
  a : array_like, shape (M, M)
  Input equation coefficients.
  b : array_like, shape (M,) or array_like, shape (M,N)
  N can be arbitrary size
  Equation target values.
 
  Returns
  ---
  x : array, shape (M,) or array, shape (M,N)
 
  Thanks,
 
  Mark


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] recarray slow?

2010-07-21 Thread wheres pythonmonks
However: is there an automatic way to convert a named index to a position?

What about looping over tuples of my recarray:

for t in d:
date = t['Date']


I guess that the above does have to lookup 'Date' each time.
But the following does not need the hash lookup for each tuple:

for t in d:
date = t[0]


Should I create a map from dtype.names(), and use that to look up the
index based on the name in advance?  (if I really really want to
factorize out the lookup of 'Date']



On Wed, Jul 21, 2010 at 3:47 PM, wheres pythonmonks
wherespythonmo...@gmail.com wrote:
 Thank you very much  better crack open a numpy reference manual
 instead of relying on my python intuition.

 On Wed, Jul 21, 2010 at 3:44 PM, Pauli Virtanen p...@iki.fi wrote:
 Wed, 21 Jul 2010 15:12:14 -0400, wheres pythonmonks wrote:

 I have an recarray -- the first column is date.

 I have the following function to compute the number of unique dates in
 my data set:


 def byName(): return(len(list(set(d['Date'])) ))

 What this code does is:

 1. d['Date']

   Extract an array slice containing the dates. This is fast.

 2. set(d['Date'])

   Make copies of each array item, and box them into Python objects.
   This is slow.

   Insert each of the objects in the set. Also this is somewhat slow.

 3. list(set(d['Date']))

   Get each item in the set, and insert them to a new list.
   This is somewhat slow, and unnecessary if you only want to
   count.

 4. len(list(set(d['Date'])))


 So the slowness arises because the code is copying data around, and
 boxing it into Python objects.

 You should try using Numpy functions (these don't re-box the data) to do
 this. http://docs.scipy.org/doc/numpy/reference/routines.set.html

 --
 Pauli Virtanen

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] recarray slow?

2010-07-21 Thread Pierre GM

On Jul 21, 2010, at 4:22 PM, wheres pythonmonks wrote:

 However: is there an automatic way to convert a named index to a position?
 
 What about looping over tuples of my recarray:
 
 for t in d:
date = t['Date']


Why don't you use zip ?

 for (date, t) in (d['Date'], d)

That way, you save repetitive calls to __getitem__

 Should I create a map from dtype.names(), and use that to look up the
 index based on the name in advance?  (if I really really want to
 factorize out the lookup of 'Date']


Meh. I have a bad feeling about it that it won't be really performant.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] recarray slow?

2010-07-21 Thread wheres pythonmonks
What about:

idx_by_name = dict(enumerate(d.dtype.names))

Then I can look up the index of the columns I want before the loop,
and then access by the index during the loop.

- W



On Wed, Jul 21, 2010 at 4:29 PM, Pierre GM pgmdevl...@gmail.com wrote:

 On Jul 21, 2010, at 4:22 PM, wheres pythonmonks wrote:

 However: is there an automatic way to convert a named index to a position?

 What about looping over tuples of my recarray:

 for t in d:
    date = t['Date']
    

 Why don't you use zip ?

 for (date, t) in (d['Date'], d)

 That way, you save repetitive calls to __getitem__

 Should I create a map from dtype.names(), and use that to look up the
 index based on the name in advance?  (if I really really want to
 factorize out the lookup of 'Date']


 Meh. I have a bad feeling about it that it won't be really performant.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] recarray slow?

2010-07-21 Thread Pierre GM

On Jul 21, 2010, at 4:35 PM, wheres pythonmonks wrote:

 What about:
 
 idx_by_name = dict(enumerate(d.dtype.names))
 
 Then I can look up the index of the columns I want before the loop,
 and then access by the index during the loop.

Sure. Why don't you try both approaches, time them and document it ? 
I still bet that manipulating tuples of numbers might be easier and more 
performant than juggling w/ the fields of a numpy.void, but that's a gut 
feeling only...
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] recarray slow?

2010-07-21 Thread wheres pythonmonks
My code had a bug:

idx_by_name = dict((n,i) for i,n in enumerate(d.dtype.names))



On Wed, Jul 21, 2010 at 4:49 PM, Pauli Virtanen p...@iki.fi wrote:
 Wed, 21 Jul 2010 16:22:37 -0400, wheres pythonmonks wrote:
 However: is there an automatic way to convert a named index to a
 position?

 It's not really a named index -- it's a field name. Since the fields of
 an array element can be of different size, they cannot be referred to
 with an array index (in the sense that Numpy understands the concept).

 What about looping over tuples of my recarray:

 for t in d:
     date = t['Date']
     

 I guess that the above does have to lookup 'Date' each time.

 As Pierre said, you can move the lookups outside the loop.

        for date in t['Date']:
            ...

 If you want to iterate over multiple fields, it may be best to use
 itertools.izip so that you unbox a single element at a time.

 However, I'd be quite surprised if the hash lookups would actually take a
 significant part of the run time:

 1) Python dictionaries are ubiquitous and the implementation appears
   heavily optimized to be fast with strings.

 2) The hash of a Python string is cached, and only computed only once.

 3) String literals are interned, and represented by a single object only:

    'Date' is 'Date'
   True

   So when running the above Python code, the hash of 'Date' is computed
   exactly once.

 4) For small dictionaries containing strings, such as the fields
   dictionary, I'd expect 1-3) to be dwarfed by the overhead involved
   in making Python function calls (PyArg_*) and interpreting the
   bytecode.

 So as the usual optimization mantra applies here: measure first :)

 Of course, if you measure and show that the expectations 1-4) are
 actually wrong, that's fine.

 --
 Pauli Virtanen

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Datarray BoF, part2

2010-07-21 Thread Rob Speer
I agree with the idea that axis labels must be strings.

Yes, this is the opposite of my position on tick labels (names), but
there's a reason: ticks are often defined by whatever data you happen
to be working with, but axis labels will in the vast majority of
situations be defined by the programmer as they're writing the code.
If the programmer wants to name something, they'll certainly be able
to do so with a string.

-- Rob

On Wed, Jul 21, 2010 at 2:08 PM, Keith Goodman kwgood...@gmail.com wrote:
 On Wed, Jul 21, 2010 at 10:58 AM, M Trumpis mtrum...@berkeley.edu wrote:

 Separately, regarding the permissible axis labels, I think we must not
 allow any enumerated axis labels (ie, ints and floats). I don't
 remember if there was a consensus about that yesterday. We don't have
 the flexibility in the ndarray API to allow for the expression
 darr.method(axis=2) to mean not the 2nd dimension, but the Axis with
 label==2

 So the axis label rule could be either:

 1. str only
 2. Any hashable object except int or float

 #1 is looking better and better. Plus you already coded it :)
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Datarray BoF, part2

2010-07-21 Thread Skipper Seabold
On Wed, Jul 21, 2010 at 5:32 PM, Rob Speer rsp...@mit.edu wrote:
 I agree with the idea that axis labels must be strings.

 Yes, this is the opposite of my position on tick labels (names), but
 there's a reason: ticks are often defined by whatever data you happen
 to be working with, but axis labels will in the vast majority of
 situations be defined by the programmer as they're writing the code.
 If the programmer wants to name something, they'll certainly be able
 to do so with a string.


+1 This is what I was thinking as well.

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Datarray BoF, part2

2010-07-21 Thread Joshua Holbrook
Make that +2.

--Josh

On Wed, Jul 21, 2010 at 1:38 PM, Skipper Seabold jsseab...@gmail.com wrote:
 On Wed, Jul 21, 2010 at 5:32 PM, Rob Speer rsp...@mit.edu wrote:
 I agree with the idea that axis labels must be strings.

 Yes, this is the opposite of my position on tick labels (names), but
 there's a reason: ticks are often defined by whatever data you happen
 to be working with, but axis labels will in the vast majority of
 situations be defined by the programmer as they're writing the code.
 If the programmer wants to name something, they'll certainly be able
 to do so with a string.


 +1 This is what I was thinking as well.

 Skipper
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Datarray BoF, part2

2010-07-21 Thread Keith Goodman
On Wed, Jul 21, 2010 at 2:32 PM, Rob Speer rsp...@mit.edu wrote:
 I agree with the idea that axis labels must be strings.

 Yes, this is the opposite of my position on tick labels (names), but
 there's a reason: ticks are often defined by whatever data you happen
 to be working with, but axis labels will in the vast majority of
 situations be defined by the programmer as they're writing the code.
 If the programmer wants to name something, they'll certainly be able
 to do so with a string.

What started the discussion was that someone wanted to have more than
one label name for one axis. So I suggested that if we allow any
hasable objects as axis label then, for example, a tuple could be used
to hold multiple names.

That would also allow a datarray to be flattened to 1d since the axis
labels could be combined into a tuple. So a 2d datarray with axis
names time and distance and ticks 't1', 't2' and 'd1', 'd2' could
flatten to

axis -- ('time', 'distance')
ticks -- [('t1', 'd1'), ('t1', 'd2'), ('t2', 'd1'), ('t2', 'd2')]

An unflatten function along with a fill value could unflatten the datarray.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] doc string linalg.solve -- works for b is matrix

2010-07-21 Thread David Goldsmith
Take it as a reminder: when reporting an error or problem, even if it
doesn't seem relevant, always provide version number. :-)

DG

On Wed, Jul 21, 2010 at 12:50 PM, Mark Bakker mark...@gmail.com wrote:

 I am using 1.3.0.
 Glad to hear it is correct in 1.4.0
 Sorry for bothering you with an old version, but I am very happy with this
 feature!
 Mark


 What version of numpy are you using?  That docstring was updated in that
 fashion about 8 mo. ago (at least in the Wiki; I'm not sure exactly when
 it
 was merged, but it does appear that way in version 1.4.0).

 DG

 I am using linalg.solve to solve a system of linear equations.  As I have
 to
  solve multiple systems with the same matrix, but different right-hand
 sides,
  I tried to make the right-hand side a matrix and that works fine.
  So the docstring should say:
 
  Solve the equation ``a x = b`` for ``x``.
 
  Parameters
  --
  a : array_like, shape (M, M)
  Input equation coefficients.
  b : array_like, shape (M,) or array_like, shape (M,N)
  N can be arbitrary size
  Equation target values.
 
  Returns
  ---
  x : array, shape (M,) or array, shape (M,N)
 
  Thanks,
 
  Mark


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 
Mathematician: noun, someone who disavows certainty when their uncertainty
set is non-empty, even if that set has measure zero.

Hope: noun, that delusive spirit which escaped Pandora's jar and, with her
lies, prevents mankind from committing a general suicide.  (As interpreted
by Robert Graves)
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] lstsq functionality

2010-07-21 Thread David Warde-Farley
On 2010-07-20, at 10:16 PM, Skipper Seabold wrote:

 Out of curiosity, is there an explicit way to check if these share memory?


You could do the exact calculations (I think) but this isn't actually 
implemented in NumPy, though np.may_share_memory is a conservative test for it 
that will err on the side of false positives.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] (no subject)

2010-07-21 Thread Ioan-Alexandru Lazar
Hello everyone,

I'm currently planning to use a Python-based infrastructure for our HPC
project.
I've previously used NumPy and SciPy for basic scientific computing tasks,
but
performance hasn't been quite an issue for me until now. At the moment I'm
not too
sure as to what to do next though, and I was hoping that someone with more
experience in performance-related issues could point me to a way out of this.

The trouble lays in the following piece of code:

===
w = 2 * math.pi * f
M = A - (1j*w*E)
n = M.shape[1]
B1 = numpy.zeros(n)
B2 = numpy.zeros(n)
B1[n-2] = 1.0
B2[n-1] = 1.0
- slow part starts here
umfpack.numeric(M)
x1 = umfpack.solve( um.UMFPACK_A, M, B1, autoTranspose = False)
x2 = umfpack.solve( um.UMFPACK_A, M, B2, autoTranspose = False)
solution = scipy.array([ [ x1[n-2], x2[n-2] ], [ x1[n-1], x2[n-1] ]])
return solution


This isn't really too much -- it's generating a small
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] UMFPACK interface is unexpectedly slow

2010-07-21 Thread Ioan-Alexandru Lazar
Hello everyone,

First of all, let me apologize for my earlier message; I made the mistake
of trying to indent my code using SquirrelMail's horrible interface -- and
pressing Tab and Space resulted in sending my (incomplete) e-mail to the
list. Cursed be Opera's keyboard shortcuts now :-).

I'm currently planning to use a Python-based infrastructure for our HPC
project.
I've previously used NumPy and SciPy for basic scientific computing tasks,
so
performance hasn't been quite an issue for me until now. At the moment I'm
not too
sure as to what to do next though, and I was hoping that someone with more
experience in performance-related issues could point me to a way out of this.

The trouble lays in the following piece of code:

===
w = 2 * math.pi * f
M = A - (1j*w*E)
n = M.shape[1]
B1 = numpy.zeros(n)
B2 = numpy.zeros(n)
B1[n-2] = 1.0
B2[n-1] = 1.0
- slow part starts here
umfpack.numeric(M)
x1 = umfpack.solve( um.UMFPACK_A, M, B1, autoTranspose = False)
x2 = umfpack.solve( um.UMFPACK_A, M, B2, autoTranspose = False)
solution = scipy.array([ [ x1[n-2], x2[n-2] ], [ x1[n-1], x2[n-1] ]])
return solution


This isn't really too much -- it's generating a system matrix via
operations that take little time, as I was expecting. Trouble is, the
solve part takes significantly more time than Octave -- about 4 times.

I'm using the stock version of UMFPACK in Ubuntu's repository; it's
compiled against standard BLAS, so it's fairly slow, but so is Octave --
so the problem isn't there.

I'm obviously doing something wrong related to memory management here,
because the memory consumption is also rocketing, but I'm not sure what
exactly it is that I'm doing wrong. Could you point me towards some
relevant documentation describing what I could do in order to improve the
performance, or give me some hint related to that?

Best regards,
Alexandru Lazar
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] UMFPACK interface is unexpectedly slow

2010-07-21 Thread Ioan-Alexandru Lazar
I hope I won't get identified as a spam bot :-). While I have not resolved
the problem itself, this is an issue that I cannot reproduce on our
cluster. I wanted to get back with some actual timings from the real
hardware we are going to be using and some details about the matrices, so
as not to chase ghosts, but this proved to be a headache saver.

It's still baffling because on the cluster I have also used stock packages
(albeit from Fedora, which is what our system administrator insists on
using) rather than my hand-compiled and optimized GotoBLAS and UMFPACK. It
didn't even occur to me to try to reproduce this on another system in the
last 4 hours I've been struggling with this, because I assumed that using
stock packages was giving me the uniformity I required. It seems I was
wrong. Nonetheless, I think it's safe to assume in this case that the
problem is not in NumPy or my code, and it would be wiser to bring this up
in Ubuntu's trackpad.

Thanks for your patience,
Alexandru

On Thu, July 22, 2010 4:10 am, Ioan-Alexandru Lazar wrote:
 Hello everyone,

 First of all, let me apologize for my earlier message; I made the mistake
 of trying to indent my code using SquirrelMail's horrible interface -- and
 pressing Tab and Space resulted in sending my (incomplete) e-mail to the
 list. Cursed be Opera's keyboard shortcuts now :-).

 I'm currently planning to use a Python-based infrastructure for our HPC
 project.
 I've previously used NumPy and SciPy for basic scientific computing tasks,
 so
 performance hasn't been quite an issue for me until now. At the moment I'm
 not too
 sure as to what to do next though, and I was hoping that someone with more
 experience in performance-related issues could point me to a way out of
 this.

 The trouble lays in the following piece of code:

 ===
 w = 2 * math.pi * f
 M = A - (1j*w*E)
 n = M.shape[1]
 B1 = numpy.zeros(n)
 B2 = numpy.zeros(n)
 B1[n-2] = 1.0
 B2[n-1] = 1.0
 - slow part starts here
 umfpack.numeric(M)
 x1 = umfpack.solve( um.UMFPACK_A, M, B1, autoTranspose = False)
 x2 = umfpack.solve( um.UMFPACK_A, M, B2, autoTranspose = False)
 solution = scipy.array([ [ x1[n-2], x2[n-2] ], [ x1[n-1], x2[n-1] ]])
 return solution
 

 This isn't really too much -- it's generating a system matrix via
 operations that take little time, as I was expecting. Trouble is, the
 solve part takes significantly more time than Octave -- about 4 times.

 I'm using the stock version of UMFPACK in Ubuntu's repository; it's
 compiled against standard BLAS, so it's fairly slow, but so is Octave --
 so the problem isn't there.

 I'm obviously doing something wrong related to memory management here,
 because the memory consumption is also rocketing, but I'm not sure what
 exactly it is that I'm doing wrong. Could you point me towards some
 relevant documentation describing what I could do in order to improve the
 performance, or give me some hint related to that?

 Best regards,
 Alexandru Lazar


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] lstsq functionality

2010-07-21 Thread Charles R Harris
On Wed, Jul 21, 2010 at 5:44 PM, Keith Goodman kwgood...@gmail.com wrote:

 On Wed, Jul 21, 2010 at 4:35 PM, Skipper Seabold jsseab...@gmail.com
 wrote:
  On Tue, Jul 20, 2010 at 10:24 PM, Keith Goodman kwgood...@gmail.com
 wrote:
  Good point. Looks like we can get rid of 2 copies! I didn't get rid of
  the second copy but I did cut things down just to see what the timing
  was like. I also threw out the ability to handle complex numbers (only
  saves an if iscomplex statement):
 
  Just for timing purposes though right?

 Yes.

 Can someone confirm that the copy in np.linalg.lstsq

 bstar[:b.shape[0],:n_rhs] = b.copy()

 is not needed? I'm assuming that ndarray.__setitem__ never gives a
 view of the right-hand side.
 


bstar doesn't share memory with b, so there is no need for a copy. The whole
first part of lstsq could use a cleanup, there are a lot of other things
that could be made cleaner and some other bits that don't look right. Anyone
want to take a shot at it?

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] lstsq functionality

2010-07-21 Thread Keith Goodman
On Wed, Jul 21, 2010 at 8:24 PM, Charles R Harris
charlesr.har...@gmail.com wrote:

 On Wed, Jul 21, 2010 at 5:44 PM, Keith Goodman kwgood...@gmail.com wrote:

 Can someone confirm that the copy in np.linalg.lstsq

 bstar[:b.shape[0],:n_rhs] = b.copy()

 is not needed? I'm assuming that ndarray.__setitem__ never gives a
 view of the right-hand side.
 

 bstar doesn't share memory with b, so there is no need for a copy. The whole
 first part of lstsq could use a cleanup, there are a lot of other things
 that could be made cleaner and some other bits that don't look right. Anyone
 want to take a shot at it?

 Chuck

Some minor tweaks suggestions:

 b = np.random.rand(1000)

 timeit b[:,newaxis]  # actual
100 loops, best of 3: 1.03 us per loop
 timeit b.reshape(-1,1)  # alternative
100 loops, best of 3: 445 ns per loop

 timeit len(b.shape)   # actual
1000 loops, best of 3: 145 ns per loop
 timeit b.ndim   # alternative
1000 loops, best of 3: 74.1 ns per loop
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] summarizing blocks of an array using a moving window

2010-07-21 Thread Robin Kraft
Hello all,

The short version: For a given NxN array, is there an efficient way to use a 
moving window to collect a summary statistic on a chunk of the array, and 
insert it into another array?

The long version: I am trying to resample an image loaded with GDAL into an NxN 
array. Note that this is for statistical purposes, so image quality doesn't 
matter. For the curious, the image is derived from satellite imagery and 
displays a map of hotspots of tropical deforestation at 500m resolution. I need 
to assign a count of these deforestation 'hits' to each pixel in an existing 
array of 1000m pixels.


Let's say the image looks like this: np.random.randint(0,2, 16).reshape(4,4)

array([[0, 0, 0, 1],
   [0, 0, 1, 1],
   [1, 1, 0, 0],
   [0, 0, 0, 0]])

I want to use a square, non-overlapping moving window for resampling, so that I 
get a count of all the 1's in each 2x2 window.

0, 0,   0, 1
0, 0,   1, 1 0  3
=   2  0
1, 1,   0, 0
0, 0,   0, 0

In another situation with similar data I'll need the average, or the maximum 
value, etc..

My brute-force method is to loop through the rows and columns to get little 
chunks of data to process. But must be a more elegant way to do this - it's 
probably obvious too ... (inelegant way further down).

Another way to do it would be to use np.tile to create an array called arr 
filed with blocks of [[0,1],[2,3]]. I could then use something like this to get 
the stats I want:

d0 = img[np.where(arr==0)]
d1 = img[np.where(arr==1)]
d2 = img[np.where(arr==2)]
d3 = img[np.where(arr==3)]

img_out = d0 + d1 + d2 + d3

This would be quite snappy if I could create arr efficiently. Unfortunately 
np.tile does something akin to np.hstack to create this array, so it isn't 
square and can't be reshaped appropriately (np.tile(np.arange(2**2).reshape(2, 
2), 4)):

array([[0, 1, 0, 1, 0, 1, 0, 1],
   [2, 3, 2, 3, 2, 3, 2, 3]])

Inefficient sample code below. Advice greatly appreciated!

-Robin


import numpy as np
from math import sqrt
from time import time

def rs(img_size=16, window_size=2):
w = window_size

# making sure the numbers work

if img_size % sqrt(img_size)  0:
print Square root of image size must be an integer.
print Sqrt =, sqrt(img_size)

return

elif (img_size / sqrt(img_size)) % w  0:
print Square root of image size must be evenly divisible by, w
print Sqrt =, sqrt(img_size)
print sqrt(img_size), /, w, =, sqrt(img_size)/w

return

else:

rows = int(sqrt(img_size))
cols = int(sqrt(img_size))

# create fake data: ones and zeros
img = np.random.randint(0, 2, img_size).reshape(rows, cols)

# create empty array for resampled data
img_out = np.zeros((rows/w, cols/w), dtype=np.int)

t = time()

# retreive blocks of pixels in img
# insert summary into img_out

for r in xrange(0, rows, w): # skip rows based on window size
for c in xrange(0, cols, w): # skip columns based on window size

# calculate the sum of the values in moving window,
#insert value into corresponding pixel in img_out

img_out[r/w, c/w] = np.int(np.sum(img[r:r+w, c:c+w]))

t1= time()
elapsed = t1-t
print img shape:, img.shape
print img, \n
print img_out shape:, img_out.shape
print img_out

print %.7f seconds % elapsed

rs(imgage_size=16, window=2)
rs(81, 3)
rs(100)
#rs(4800*4800) # very slow
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion