Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2010-01-07 Thread Sturla Molden

 I also tried to Install numpy with intel mkl 9.1
 I still used gfortran for numpy installation as intel mkl 9.1 supports gnu
 compiler.

I would suggest using GotoBLAS instead of ATLAS. It is easier to build
then ATLAS (basically no configuration), and has even better performance
than MKL.

http://www.tacc.utexas.edu/tacc-projects/

S.M.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Repeated dot products

2010-01-07 Thread denis
On 12/12/2009 22:55, T J wrote:
 Hi,

 Suppose I have an array of shape:  (n, k, k).  In this case, I have n
 k-by-k matrices.  My goal is to compute the product of a (potentially
 large) user-specified selection (with replacement) of these matrices.
 For example,

 x = [0,1,2,1,3,3,2,1,3,2,1,5,3,2,3,5,2,5,3,2,1,3,5,6]

TJ,
   what are your n, k, len(x) ?

_dotblas.dot is fast: dot( 10x10 matrices ) takes ~ 22 usec on my g4 ppc,
which is ~ 15 clock cycles (700 MHz) per mem access * +.

A hack to find repeated pairs (or triples ...) follows.
Your sequence above has only (3,2) 4 times, no win.

(Can someone give a probabilistic estimate of the number of non-overlapping 
pairs
in N letters from an alphabet of size A ?)


#!/usr/bin/env python
# numpy-discuss 2009 12dec TJ repeated dot products

from __future__ import division
from collections import defaultdict
import numpy as np

__version__ = 2010 7jan denis

def pairs( s, Len=2 ):
  repeated non-overlapping pairs (substrings, subwords)
 abracadabra - ab ra [[0 7] [2 9]], not br
 Len=3: triples, 4 ...
 
 # bruteforce
 # grow repeated 2 3 ... ?
 pairs = defaultdict(list)
 for j in range(len(s)-Len+1):
 pairs[ s[j:j+Len] ].append(j)
 min2 = filter( lambda x: len(x)  1, pairs.values() )
 min2.sort( key = lambda x: len(x), reverse=True )
 # remove overlaps --
 # (if many, during init scan would be faster)
 runs = np.zeros( len(s), np.uint8 )
 run = np.ones( Len, np.uint8 )
 run[0] = Len
 chains = []
 for ovchain in min2:
 chain = []
 for c in ovchain:
 if not runs[c:c+Len].any():
 runs[c:c+Len] = run
 chain.append(c)
 if len(chain)  1:
 chains.append(chain)
 return (chains, runs)

#...
if __name__ == __main__:
 import sys
 abra = abracadabra
 alph = 5
 randlen = 100
 randseed = 1
 exec( \n.join( sys.argv[1:] ))  # Test= ...

 print pairs( %s ) -- % abra
 print pairs( abra )  # ab [0, 7], br [2, 9]]
 print pairs( abra, 3 )  # abr [0, 7]

 np.random.seed( randseed )
 r = np.random.random_integers( 1, alph, randlen )
 chains, runs = pairs( tuple(r) )
 npair = sum([ len(c) for c in chains ])
 print %d repeated pairs in %d random %d % (npair, randlen, alph)
 # 35 repeated pairs in 100 random 5  (prob estimate this ?)
 # 25 repeated pairs in 100 random 10

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2010-01-07 Thread Christopher Barker
Sturla Molden wrote:
 I would suggest using GotoBLAS instead of ATLAS.

 http://www.tacc.utexas.edu/tacc-projects/

That does look promising -- nay idea what the license is? They don't 
make it clear on the site (maybe it it is you set up a user account and 
download, but I'd rather know up front). The only reference I could find 
is from 2006:

http://www.utexas.edu/news/2006/04/12/tacc/

and in that, they refer to one of those annoying free for academic and 
scientific use clauses.

-Chris




-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2010-01-07 Thread Sturla Molden
 Sturla Molden wrote:
 I would suggest using GotoBLAS instead of ATLAS.

 http://www.tacc.utexas.edu/tacc-projects/

 That does look promising -- nay idea what the license is? They don't
 make it clear on the site



UT TACC Research License (Source Code)



The Texas Advanced Computing Center of The University of Texas at Austin
has developed certain software and documentation that it desires to make
available without charge to anyone for academic, research, experimental or
personal use. This license is designed to guarantee freedom to use the
software for these purposes. If you wish to distribute or make other use
of the software, you may purchase a license to do so from the University
of Texas.

The accompanying source code is made available to you under the terms of
this UT TACC Research License (this UTTRL). By clicking the ACCEPT
button, or by installing or using the code, you are consenting to be bound
by this UTTRL. If you do not agree to the terms and conditions of this
license, do not click the ACCEPT button, and do not install or use any
part of the code.

The terms and conditions in this UTTRL not only apply to the source code
made available by UT TACC, but also to any improvements to, or derivative
works of, that source code made by you and to any object code compiled
from such source code, improvements or derivative works.

1. DEFINITIONS.

1.1 Commercial Use shall mean use of Software or Documentation by
Licensee for direct or indirect financial, commercial or strategic gain or
advantage, including without limitation: (a) bundling or integrating the
Software with any hardware product or another software product for
transfer, sale or license to a third party (even if distributing the
Software on separate media and not charging for the Software); (b)
providing customers with a link to the Software or a copy of the Software
for use with hardware or another software product purchased by that
customer; or (c) use in connection with the performance of services for
which Licensee is compensated.

1.2 Derivative Products means any improvements to, or other derivative
works of, the Software made by Licensee.

1.3 Documentation shall mean all manuals, user documentation, and other
related materials pertaining to the Software that are made available to
Licensee in connection with the Software.

1.4 Licensor shall mean The University of Texas.

1.5 Licensee shall mean the person or entity that has agreed to the
terms hereof and is exercising rights granted hereunder.

1.6 Software shall mean the computer program(s) referred to as GotoBLAS2
made available under this UTTRL in source code form, including any error
corrections, bug fixes, patches, updates or other modifications that
Licensor may in its sole discretion make available to Licensee from time
to time, and any object code compiled from such source code.

2. GRANT OF RIGHTS.

Subject to the terms and conditions hereunder, Licensor hereby grants to
Licensee a worldwide, non-transferable, non-exclusive license to (a)
install, use and reproduce the Software for academic, research,
experimental and personal use (but specifically excluding Commercial Use);
(b) use and modify the Software to create Derivative Products, subject to
Section 3.2; and (c) use the Documentation, if any, solely in connection
with Licensee's authorized use of the Software.

3. RESTRICTIONS; COVENANTS.

3.1 Licensee may not: (a) distribute, sub-license or otherwise transfer
copies or rights to the Software (or any portion thereof) or the
Documentation; (b) use the Software (or any portion thereof) or
Documentation for Commercial Use, or for any other use except as described
in Section 2; (c) copy the Software or Documentation other than for
archival and backup purposes; or (d) remove any product identification,
copyright, proprietary notices or labels from the Software and
Documentation. This UTTRL confers no rights upon Licensee except those
expressly granted herein.

3.2 Licensee hereby agrees that it will provide a copy of all Derivative
Products to Licensor and that its use of the Derivative Products will be
subject to all of the same terms, conditions, restrictions and limitations
on use imposed on the Software under this UTTRL. Licensee hereby grants
Licensor a worldwide, non-exclusive, royalty-free license to reproduce,
prepare derivative works of, publicly display, publicly perform,
sublicense and distribute Derivative Products. Licensee also hereby grants
Licensor a worldwide, non-exclusive, royalty-free patent license to make,
have made, use, offer to sell, sell, import and otherwise transfer the
Derivative Products under those patent claims licensable by Licensee that
are necessarily infringed by the Derivative Products.

4. PROTECTION OF SOFTWARE.

4.1 Confidentiality. The Software and Documentation are the confidential
and proprietary information of Licensor. Licensee agrees to take adequate
steps to protect the Software and Documentation from unauthorized

[Numpy-discussion] Behaviour of vdot(array2d, array1d)

2010-01-07 Thread Nikolas Tezak
Hi,
I am new to this list, but I have been using scipy for a couple of  
months now with great satisfaction.
Currently I have a problem:

I diagonalize a hermitian complex matrix using the eigh routine from  
scipy.linalg (this is still a numpy question, see below)
This returns the eigenvectors as columns of a 2d array.

Now I would like to project a vector onto this new basis.
I could either do:

inital_state = array(...) #dtype=complex, shape=(dim,)
coefficients = zeros( shape=(dim,), dtype=complex)
matrix = array(...) #dtype=complex, shape=(dim, dim)
eigenvalues, eigenvectors = eigh(matrix)
for i in xrange(dim):
coefficients[i] = vdot(eigenvalues[:, i], initial_state)


But it seems to me after reading the documentation for vdot, that it  
should also be possible to do this without a loop:

initial_state = array(...) #dtype=complex, shape=(dim,)

matrix = array(...) #dtype=complex, shape=(dim, dim)
eigenvalues, eigenvectors = eigh(matrix)

coefficients = vdot( eigenvalues.transpose(), initial_state)

However when I do this, vdot raises a ValueError complaining that the  
vectors have different lengths.
It seems that vdot (as opposed to dot) cannot handle arguments with  
different shape although the documentation suggests otherwise.
I am using numpy version 1.3.0. Is this a bug or am I missing something?

Regards,

Nikolas
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [Pythonmac-SIG] 1.4.0 installer fails on OSX 10.6.2

2010-01-07 Thread Christopher Barker
David Cournapeau wrote:
 On Thu, Jan 7, 2010 at 1:35 AM, Christopher Barker
 In the past, I think folks' have used the default
 name provided by bdist_mpkg, and those are not always clear. Something like:


 numpy1.4-osx10.4-python.org2.6-32bit.dmg
 
 The 32 bits is redundant - we support all archs supported by the
 official python binary, so python.org is enough.

True, though I was anticipating that there may be 32 and 64 bit builds 
some day.

 About osx10.4, 

As for that -- I put that in 'cause I remembered that in the past it has 
said 10.5, when, in fact 10.4 was supported. Thinking more, I think 
it's like 32 bit -- the python.org build supports 10.4, so that's all 
the information folks need.


 still don't know how to make sure we do work there with distutils. The
 whole MACOSX_DEPLOYMENT_TARGET confuses me quite a lot.

distutils should do it right, and indeed, I just tested the py2.5 and 
py2.6 binaries on my 10.4 PPC machine ,and most of the tests all pass on 
both. (though see the note below)

I think distutils does do it right, at least if you use the latest 
version of 2.6 -- a bug was fixed there.

What OS/architecture were those built with?

 Other than
 that, the numpy 1.4.0 follows your advice, and contains the python.org
 part.

I should have looked first -- thanks, I think that will be helpful.


NOTE:
When I first installed the binary, I got a whole bunch of errors because 
matrix' wasn't found. I recalled this issue from testing, and cleared 
out the install, then re-installed, and all was fine. I wonder if it's 
possible to have a mpkg remove anything?

Other failed tests:

==
FAIL: test_umath.test_nextafterl
...
 return _test_nextafter(np.longdouble)
   File 
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/numpy/core/tests/test_umath.py,
 
line 852, in _test_nextafter
 assert np.nextafter(one, two) - one == eps
AssertionError


==
FAIL: test_umath.test_spacingl
--
...
Traceback (most recent call last):
line 887, in test_spacingl
 return _test_spacing(np.longdouble)
   File 
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/numpy/core/tests/test_umath.py,
 
line 873, in _test_spacing
 assert np.spacing(one) == eps
AssertionError


I think both of those are known issues, and not a big deal.

-Chris




-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fromfile() for reading text (one more time!)

2010-01-07 Thread Christopher Barker
Pauli Virtanen wrote:
 ma, 2010-01-04 kello 17:05 -0800, Christopher Barker kirjoitti:
 it also does odd things with spaces 
 embedded in the separator:

 , $ # matches all of:  ,$#   , $#  ,$ #

 That's a documented feature:

Fair enough.

OK, I've written a patch that allows newlines to be interpreted as 
separators in addition to whatever is specified in sep.

In the process of testing, I found again these issues, which are still 
marked as needs decision.

http://projects.scipy.org/numpy/ticket/883

In short: what to do with missing values?

I'd like to address this bug, but I need a decision to do so.


My proposal:

Raise an ValueError with missing values.


Justification:

No function should EVER return data that is not there. Period. It is 
simply asking for hard to find bugs. Therefore:

fromstring(3, 4,,5, sep=,)

Should never, ever, return:

array([ 3.,  4.,  0.,  5.])

Which is what it does now. bad. bad. bad.




Alternatives:

   A) Raising a ValueError is the easiest way to get proper behavior. 
Folks can use a more sophisticated file reader if they want missing 
values handled. I'm willing to contribute this patch.

   B) If the dtype is a floating point type, NaN could fill in the 
missing values -- a fine idea, but you can't use it for integers, and 
zero is a really bad replacement!

   C) The user could specify what they want filled in for missing 
values. This is a fine idea, though I'm not sure I want to take the time 
to impliment it.

Oh, and this is a bug too, with probably the same solution:

In [20]: np.fromstring(hjba, sep=',')
Out[20]: array([ 0.])

In [26]: np.fromstring(34gytf39, sep=',')
Out[26]: array([ 34.])


One more unresolved question:

what should:

np.fromstring(3, 4, 5,, sep=,)

return?

it currently returns:

array([ 3.,  4.,  5.])

which seems a bit inconsitent with missing value handling. I also found 
a bug:

In [6]: np.fromstring(3, 4, 5 , , sep=,)
Out[6]: array([ 3.,  4.,  5.,  0.])

so if there is some extra whitespace in there, it does return a missing 
value. With my proposal, that wouldn't happen, but you might get an 
exception. I think you should, but it'll be easier to implement my 
allow newlines code if not.


so, should I do (A) ?


Another question:

I've got a patch mostly working (except for the above issues) that will 
allow fromfile/string to read multiline non-whitespace separated data in 
one shot:


In [15]: str
Out[15]: '1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12'

In [16]: np.fromstring(str, sep=',', allow_newlines=True)
Out[16]:
array([  1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,  11.,
 12.])


I think this is a very helpful enhancement, and, as it is a new kwarg, 
backward compatible:

1) Might it be accepted for inclusion?

2) Is the name for the flag OK: allow_newlines? It's pretty explicit, 
but also long -- I used it for the flag name in the C code, too.

3) What C datatype should I use for a boolean flag? I used a char, but I 
don't know what the numpy standard is.


-Chris












-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fromfile() for reading text (one more time!)

2010-01-07 Thread josef . pktd
On Thu, Jan 7, 2010 at 3:08 PM, Christopher Barker
chris.bar...@noaa.gov wrote:
 Pauli Virtanen wrote:
 ma, 2010-01-04 kello 17:05 -0800, Christopher Barker kirjoitti:
 it also does odd things with spaces
 embedded in the separator:

 , $ # matches all of:  ,$#   , $#  ,$ #

 That's a documented feature:

 Fair enough.

 OK, I've written a patch that allows newlines to be interpreted as
 separators in addition to whatever is specified in sep.

 In the process of testing, I found again these issues, which are still
 marked as needs decision.

 http://projects.scipy.org/numpy/ticket/883

 In short: what to do with missing values?

 I'd like to address this bug, but I need a decision to do so.


 My proposal:

 Raise an ValueError with missing values.


 Justification:

 No function should EVER return data that is not there. Period. It is
 simply asking for hard to find bugs. Therefore:

 fromstring(3, 4,,5, sep=,)

 Should never, ever, return:

 array([ 3.,  4.,  0.,  5.])

 Which is what it does now. bad. bad. bad.




 Alternatives:

   A) Raising a ValueError is the easiest way to get proper behavior.
 Folks can use a more sophisticated file reader if they want missing
 values handled. I'm willing to contribute this patch.

   B) If the dtype is a floating point type, NaN could fill in the
 missing values -- a fine idea, but you can't use it for integers, and
 zero is a really bad replacement!

   C) The user could specify what they want filled in for missing
 values. This is a fine idea, though I'm not sure I want to take the time
 to impliment it.

 Oh, and this is a bug too, with probably the same solution:

 In [20]: np.fromstring(hjba, sep=',')
 Out[20]: array([ 0.])

 In [26]: np.fromstring(34gytf39, sep=',')
 Out[26]: array([ 34.])


 One more unresolved question:

 what should:

 np.fromstring(3, 4, 5,, sep=,)

 return?

 it currently returns:

 array([ 3.,  4.,  5.])

 which seems a bit inconsitent with missing value handling. I also found
 a bug:

 In [6]: np.fromstring(3, 4, 5 , , sep=,)
 Out[6]: array([ 3.,  4.,  5.,  0.])

 so if there is some extra whitespace in there, it does return a missing
 value. With my proposal, that wouldn't happen, but you might get an
 exception. I think you should, but it'll be easier to implement my
 allow newlines code if not.


 so, should I do (A) ?


 Another question:

 I've got a patch mostly working (except for the above issues) that will
 allow fromfile/string to read multiline non-whitespace separated data in
 one shot:


 In [15]: str
 Out[15]: '1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12'

 In [16]: np.fromstring(str, sep=',', allow_newlines=True)
 Out[16]:
 array([  1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,  11.,
         12.])


 I think this is a very helpful enhancement, and, as it is a new kwarg,
 backward compatible:

 1) Might it be accepted for inclusion?

 2) Is the name for the flag OK: allow_newlines? It's pretty explicit,
 but also long -- I used it for the flag name in the C code, too.

 3) What C datatype should I use for a boolean flag? I used a char, but I
 don't know what the numpy standard is.


 -Chris



I don't know much about this, just a few more test cases

comma and newline
str =  '1, 2, 3, 4,\n5, 6, 7, 8,\n9, 10, 11, 12'

extra comma at end of file
str =  '1, 2, 3, 4,\n5, 6, 7, 8,\n9, 10, 11, 12,'

extra newlines at end of file
str =  '1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12\n\n\n'

It would be nice if these cases would go through without missing
values or exception, but I don't often have files that are clean
enough for fromfile().

I'm in favor of nan for missing values with floating point numbers. It
would make it easy to read correctly formatted csv files, even if the
data is not complete.

Josef










 --
 Christopher Barker, Ph.D.
 Oceanographer

 Emergency Response Division
 NOAA/NOS/ORR            (206) 526-6959   voice
 7600 Sand Point Way NE   (206) 526-6329   fax
 Seattle, WA  98115       (206) 526-6317   main reception

 chris.bar...@noaa.gov
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fromfile() for reading text (one more time!)

2010-01-07 Thread Bruce Southey
On Thu, Jan 7, 2010 at 2:32 PM,  josef.p...@gmail.com wrote:
 On Thu, Jan 7, 2010 at 3:08 PM, Christopher Barker
 chris.bar...@noaa.gov wrote:
 Pauli Virtanen wrote:
 ma, 2010-01-04 kello 17:05 -0800, Christopher Barker kirjoitti:
 it also does odd things with spaces
 embedded in the separator:

 , $ # matches all of:  ,$#   , $#  ,$ #

 That's a documented feature:

 Fair enough.

 OK, I've written a patch that allows newlines to be interpreted as
 separators in addition to whatever is specified in sep.

 In the process of testing, I found again these issues, which are still
 marked as needs decision.

 http://projects.scipy.org/numpy/ticket/883

 In short: what to do with missing values?

 I'd like to address this bug, but I need a decision to do so.


 My proposal:

 Raise an ValueError with missing values.


 Justification:

 No function should EVER return data that is not there. Period. It is
 simply asking for hard to find bugs. Therefore:

 fromstring(3, 4,,5, sep=,)

 Should never, ever, return:

 array([ 3.,  4.,  0.,  5.])

 Which is what it does now. bad. bad. bad.




 Alternatives:

   A) Raising a ValueError is the easiest way to get proper behavior.
 Folks can use a more sophisticated file reader if they want missing
 values handled. I'm willing to contribute this patch.

   B) If the dtype is a floating point type, NaN could fill in the
 missing values -- a fine idea, but you can't use it for integers, and
 zero is a really bad replacement!

   C) The user could specify what they want filled in for missing
 values. This is a fine idea, though I'm not sure I want to take the time
 to impliment it.

 Oh, and this is a bug too, with probably the same solution:

 In [20]: np.fromstring(hjba, sep=',')
 Out[20]: array([ 0.])

 In [26]: np.fromstring(34gytf39, sep=',')
 Out[26]: array([ 34.])


 One more unresolved question:

 what should:

 np.fromstring(3, 4, 5,, sep=,)

 return?

 it currently returns:

 array([ 3.,  4.,  5.])

 which seems a bit inconsitent with missing value handling. I also found
 a bug:

 In [6]: np.fromstring(3, 4, 5 , , sep=,)
 Out[6]: array([ 3.,  4.,  5.,  0.])

 so if there is some extra whitespace in there, it does return a missing
 value. With my proposal, that wouldn't happen, but you might get an
 exception. I think you should, but it'll be easier to implement my
 allow newlines code if not.


 so, should I do (A) ?


 Another question:

 I've got a patch mostly working (except for the above issues) that will
 allow fromfile/string to read multiline non-whitespace separated data in
 one shot:


 In [15]: str
 Out[15]: '1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12'

 In [16]: np.fromstring(str, sep=',', allow_newlines=True)
 Out[16]:
 array([  1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,  11.,
         12.])


 I think this is a very helpful enhancement, and, as it is a new kwarg,
 backward compatible:

 1) Might it be accepted for inclusion?

 2) Is the name for the flag OK: allow_newlines? It's pretty explicit,
 but also long -- I used it for the flag name in the C code, too.

 3) What C datatype should I use for a boolean flag? I used a char, but I
 don't know what the numpy standard is.


 -Chris



 I don't know much about this, just a few more test cases

 comma and newline
 str =  '1, 2, 3, 4,\n5, 6, 7, 8,\n9, 10, 11, 12'

 extra comma at end of file
 str =  '1, 2, 3, 4,\n5, 6, 7, 8,\n9, 10, 11, 12,'

 extra newlines at end of file
 str =  '1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12\n\n\n'

 It would be nice if these cases would go through without missing
 values or exception, but I don't often have files that are clean
 enough for fromfile().

 I'm in favor of nan for missing values with floating point numbers. It
 would make it easy to read correctly formatted csv files, even if the
 data is not complete.



Using the numpy NaN or similar (noting R's approach to missing values
which in turn allows it to have the above functionality) is just a
very bad idea for missing values because you always have to check that
which NaN is a missing value and which was due to some numerical
calculation. It is a very bad idea because we have masked arrays that
nicely but slowly handle this situation.

From what I can see is that you expect that fromfile() should only
split at the supplied delimiters, optionally(?) strip any whitespace
and force a specific dtype. I would agree that the failure of any of
one these should create an exception by default rather than making the
best guess. So 'missing data'  would potentially fail with forcing the
specified dtype. Thus, you should either create an exception for
invalid data (with appropriate location) or use masked arrays.

Your output from this string '1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12'
actually assumes multiple delimiters because there is no comma between
4 and 5 and 8 and 9. So I think it would be better if fromfile
accepted multiple delimiters. In Josef's last case how many 'missing
values should there be?

Bruce

Re: [Numpy-discussion] fromfile() for reading text (one more time!)

2010-01-07 Thread Travis Oliphant

On Jan 7, 2010, at 2:32 PM, josef.p...@gmail.com wrote:

 On Thu, Jan 7, 2010 at 3:08 PM, Christopher Barker
 chris.bar...@noaa.gov wrote:
 Pauli Virtanen wrote:
 ma, 2010-01-04 kello 17:05 -0800, Christopher Barker kirjoitti:
 it also does odd things with spaces
 embedded in the separator:

 , $ # matches all of:  ,$#   , $#  ,$ #

 That's a documented feature:

 Fair enough.

 OK, I've written a patch that allows newlines to be interpreted as
 separators in addition to whatever is specified in sep.

 In the process of testing, I found again these issues, which are  
 still
 marked as needs decision.

 http://projects.scipy.org/numpy/ticket/883

 In short: what to do with missing values?

 I'd like to address this bug, but I need a decision to do so.


 My proposal:

 Raise an ValueError with missing values.


 Justification:

 No function should EVER return data that is not there. Period. It is
 simply asking for hard to find bugs. Therefore:

 fromstring(3, 4,,5, sep=,)

 Should never, ever, return:

 array([ 3.,  4.,  0.,  5.])

 Which is what it does now. bad. bad. bad.




 Alternatives:

   A) Raising a ValueError is the easiest way to get proper  
 behavior.
 Folks can use a more sophisticated file reader if they want missing
 values handled. I'm willing to contribute this patch.

   B) If the dtype is a floating point type, NaN could fill in the
 missing values -- a fine idea, but you can't use it for integers, and
 zero is a really bad replacement!

   C) The user could specify what they want filled in for missing
 values. This is a fine idea, though I'm not sure I want to take the  
 time
 to impliment it.

 Oh, and this is a bug too, with probably the same solution:

 In [20]: np.fromstring(hjba, sep=',')
 Out[20]: array([ 0.])

 In [26]: np.fromstring(34gytf39, sep=',')
 Out[26]: array([ 34.])


 One more unresolved question:

 what should:

 np.fromstring(3, 4, 5,, sep=,)

 return?

 it currently returns:

 array([ 3.,  4.,  5.])

 which seems a bit inconsitent with missing value handling. I also  
 found
 a bug:

 In [6]: np.fromstring(3, 4, 5 , , sep=,)
 Out[6]: array([ 3.,  4.,  5.,  0.])

 so if there is some extra whitespace in there, it does return a  
 missing
 value. With my proposal, that wouldn't happen, but you might get an
 exception. I think you should, but it'll be easier to implement my
 allow newlines code if not.


 so, should I do (A) ?


 Another question:

 I've got a patch mostly working (except for the above issues) that  
 will
 allow fromfile/string to read multiline non-whitespace separated  
 data in
 one shot:


 In [15]: str
 Out[15]: '1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12'

 In [16]: np.fromstring(str, sep=',', allow_newlines=True)
 Out[16]:
 array([  1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,   
 11.,
 12.])


 I think this is a very helpful enhancement, and, as it is a new  
 kwarg,
 backward compatible:

 1) Might it be accepted for inclusion?

 2) Is the name for the flag OK: allow_newlines? It's pretty  
 explicit,
 but also long -- I used it for the flag name in the C code, too.

 3) What C datatype should I use for a boolean flag? I used a char,  
 but I
 don't know what the numpy standard is.


 -Chris



 I don't know much about this, just a few more test cases

 comma and newline
 str =  '1, 2, 3, 4,\n5, 6, 7, 8,\n9, 10, 11, 12'

 extra comma at end of file
 str =  '1, 2, 3, 4,\n5, 6, 7, 8,\n9, 10, 11, 12,'

 extra newlines at end of file
 str =  '1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12\n\n\n'

 It would be nice if these cases would go through without missing
 values or exception, but I don't often have files that are clean
 enough for fromfile().

+1 (ignoring new-lines transparently is a nice feature).  You can also  
use sscanf with weave to read most files.


 I'm in favor of nan for missing values with floating point numbers. It
 would make it easy to read correctly formatted csv files, even if the
 data is not complete.

+1   (much preferrable to insert NaN or other user value than raise  
ValueError in my opinion)

-Travis

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fromfile() for reading text (one more time!)

2010-01-07 Thread Christopher Barker
Bruce Southey wrote:
 chris.bar...@noaa.gov wrote:

 Using the numpy NaN or similar (noting R's approach to missing values
 which in turn allows it to have the above functionality) is just a
 very bad idea for missing values because you always have to check that
 which NaN is a missing value and which was due to some numerical
 calculation.

well, this is specific to reading files, so you know where it came from. 
And the principle of fromfile() is that it is fast and simple, if you 
want masked arrays, use slower, but more full-featured methods.

However, in this case:

In [9]: np.fromstring(3, 4, NaN, 5, sep=,)
Out[9]: array([  3.,   4.,  NaN,   5.])


An actual NaN is read from the file, rather than a missing value. 
Perhaps the user does want the distinction, so maybe it should really 
only fil it in if the users asks for it, but specifying 
missing_value=np.nan or something.

From what I can see is that you expect that fromfile() should only
 split at the supplied delimiters, optionally(?) strip any whitespace

whitespace stripping is not optional.

 Your output from this string '1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12'
 actually assumes multiple delimiters because there is no comma between
 4 and 5 and 8 and 9.

Yes, that's the point. I thought about allowing arbitrary multiple 
delimiters, but I think '/n' is a special case - for instance, a comma 
at the end of some numbers might mean missing data, but a '\n' would not.

And I couldn't really think of a useful use-case for arbitrary multiple 
delimiters.

 In Josef's last case how many 'missing values should there be?

  extra newlines at end of file
  str =  '1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12\n\n\n'

none -- exactly why I think \n is a special case.

What about:
  extra newlines in the middle of the file
  str =  '1, 2, 3, 4\n\n5, 6, 7, 8\n9, 10, 11, 12\n'

I think they should be ignored, but I hope I'm not making something that 
is too specific to my personal needs.

Travis Oliphant wrote:
 +1 (ignoring new-lines transparently is a nice feature).  You can also  
 use sscanf with weave to read most files.

right -- but that requires weave. In fact, MATLAB has a fscanf function 
that allows you to pass in a C format string and it vectorizes it to use 
the same one over an over again until it's done. It's actually quite 
powerful and flexible. I once started with that in mind, but didn't have 
the C chops to do it. I ended up with a tool that only did doubles (come 
to think of it, MATLAB only does doubles, anyway...)

I may some day write a whole new C (or, more likely, Cython) function 
that does something like that, but for now, I'm jsut trying to get 
fromfile to be useful for me.


 +1   (much preferrable to insert NaN or other user value than raise  
 ValueError in my opinion)

But raise an error for integer types?

I guess this is still up the air -- no consensus yet.

Thanks,

-Chris









-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] cPickle/unPickle across archs

2010-01-07 Thread Robert Kern
On Thu, Jan 7, 2010 at 15:54, James Mazer james.ma...@yale.edu wrote:
 Hi,

 I've got a some Numeric arrays that were created without
 an explicit byte size in the initial declaration and pickled.
 Something like this:

    cPickle.write(array(ones((3,3,)), 'f'), open('foo.pic', 'w'))

 as opposed to:

    cPickle.write(array(ones((3,3,)), Float32), open('foo.pic', 'w'))

 This works as long as the word size doesn't change between the
 reading and writing machines.

 The data were generated under a 32bit linux kernel and now I'm trying
 to read them under a 64bit kernel, so the word size has changed and
 Numeric assumes that the 'f' type is the NATIVE float

Please note that 'f' is always a 32-bit float on any machine. Only
integers may change size.

 and 'l' type is
 the NATIVE long) and dies miserable when the native types don't match
 the actual types (which defeats the whole point of pickling, to some
 extent -- I thought that cPickle.save/load were ensured to be
 invertable...)

I don't think cPickle ensures much at all. It's actually rather
fragile for persisting data over long times and between different
environments. It works better as a wire format for communication
between similar codebases when thoroughly tested on both ends. Using a
standard scientific file format for storing your important data has
always been de rigeur.

That said, it is a deficiency in Numeric that it records the native
typecode instead of a platform-neutral, explicitly sized typecode.
Unfortunately, Numeric has been deprecated for many years now, and is
not maintained. Numeric's replacement, numpy, does not have this
problem.

 I've got terrabytes of data that need to be read by both 32bit and
 64bit machines (and it's not really feasible to scan all the files
 into new structures with explict types on a 32bit machine). Anybody
 have hints for addressing this problem?  I found similar questions,
 but no answers, so I'm not completely alone iwth this problem.

What you can do is monkeypatch the function
Numeric.array_constructor() to do the right thing for your case when
it sees a platform-specific integer typecode. Something like the
following (untested; you may need to generalize it to handle the
unsigned integer typecodes, too, if you have that kind of data):

import Numeric

i_size = Numeric.empty(0, 'i').itemsize()

def patched_array_constructor(shape, typecode, thestr,
Endian=Numeric.LittleEndian):
if typecode == l:
# Ensure that the length of the data matches our expectations.
size = Numeric.product(shape)
itemsize = len(thestr) // size
if itemsize == i_size:
typecode = 'i'
if typecode == O:
x = Numeric.array(thestr,O)
else:
x = Numeric.fromstring(thestr, typecode)
x.shape = shape
if LittleEndian != Endian:
return x.byteswapped()
else:
return x

Numeric.array_constructor = patched_array_constructor


After you have done that, cPickle.load() will use that patched
function to reconstruct the arrays and make sure that the appropriate
typecode is used to interpret the data.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [Pythonmac-SIG] 1.4.0 installer fails on OSX 10.6.2

2010-01-07 Thread neil weisenfeld
On Wed, Jan 6, 2010 at 11:35 AM, Christopher Barker
chris.bar...@noaa.gov wrote:

 It's worse to have a binary you expect to work fail for you than to not
 have one available. IN the past, I think folks' have used the default
 name provided by bdist_mpkg, and those are not always clear. Something like:


 numpy1.4-osx10.4-python.org2.6-32bit.dmg

 or something -- even better, with a a bit more text -- would help a lot.


I agree here. Better labeling of the .dmg  would indeed help, I think.

And thanks to everyone for all of the responses.  I joined the mailing
list, posted my question, and then went back to dissertation writing
for a few days.  When I looked up, there were 18 answers.

I'll try getting python from python.org and/or building it all from scratch.


Thanks again,
Neil
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fromfile() for reading text (one more time!)

2010-01-07 Thread josef . pktd
On Thu, Jan 7, 2010 at 4:45 PM, Christopher Barker
chris.bar...@noaa.gov wrote:
 Bruce Southey wrote:
 chris.bar...@noaa.gov wrote:

 Using the numpy NaN or similar (noting R's approach to missing values
 which in turn allows it to have the above functionality) is just a
 very bad idea for missing values because you always have to check that
 which NaN is a missing value and which was due to some numerical
 calculation.

 well, this is specific to reading files, so you know where it came from.
 And the principle of fromfile() is that it is fast and simple, if you
 want masked arrays, use slower, but more full-featured methods.

 However, in this case:

 In [9]: np.fromstring(3, 4, NaN, 5, sep=,)
 Out[9]: array([  3.,   4.,  NaN,   5.])


 An actual NaN is read from the file, rather than a missing value.
 Perhaps the user does want the distinction, so maybe it should really
 only fil it in if the users asks for it, but specifying
 missing_value=np.nan or something.

From what I can see is that you expect that fromfile() should only
 split at the supplied delimiters, optionally(?) strip any whitespace

 whitespace stripping is not optional.

 Your output from this string '1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12'
 actually assumes multiple delimiters because there is no comma between
 4 and 5 and 8 and 9.

 Yes, that's the point. I thought about allowing arbitrary multiple
 delimiters, but I think '/n' is a special case - for instance, a comma
 at the end of some numbers might mean missing data, but a '\n' would not.

 And I couldn't really think of a useful use-case for arbitrary multiple
 delimiters.

 In Josef's last case how many 'missing values should there be?

   extra newlines at end of file
   str =  '1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12\n\n\n'

 none -- exactly why I think \n is a special case.

 What about:
   extra newlines in the middle of the file
   str =  '1, 2, 3, 4\n\n5, 6, 7, 8\n9, 10, 11, 12\n'

 I think they should be ignored, but I hope I'm not making something that
 is too specific to my personal needs.

 Travis Oliphant wrote:
 +1 (ignoring new-lines transparently is a nice feature).  You can also
 use sscanf with weave to read most files.

 right -- but that requires weave. In fact, MATLAB has a fscanf function
 that allows you to pass in a C format string and it vectorizes it to use
 the same one over an over again until it's done. It's actually quite
 powerful and flexible. I once started with that in mind, but didn't have
 the C chops to do it. I ended up with a tool that only did doubles (come
 to think of it, MATLAB only does doubles, anyway...)

 I may some day write a whole new C (or, more likely, Cython) function
 that does something like that, but for now, I'm jsut trying to get
 fromfile to be useful for me.


 +1   (much preferrable to insert NaN or other user value than raise
 ValueError in my opinion)

 But raise an error for integer types?

 I guess this is still up the air -- no consensus yet.

raise an exception, I hate the silent cast of nan to integer zero, too
much debugging and useless if there are real zeros.
(or use some -999 kind of thing if user defined nan codes are allowed,
but I just work with float if I expect nans/missing values.)

Josef


 Thanks,

 -Chris









 --
 Christopher Barker, Ph.D.
 Oceanographer

 Emergency Response Division
 NOAA/NOS/ORR            (206) 526-6959   voice
 7600 Sand Point Way NE   (206) 526-6329   fax
 Seattle, WA  98115       (206) 526-6317   main reception

 chris.bar...@noaa.gov
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fromfile() for reading text (one more time!)

2010-01-07 Thread Christopher Barker
josef.p...@gmail.com wrote:
 +1   (much preferrable to insert NaN or other user value than raise
 ValueError in my opinion)

 But raise an error for integer types?

 I guess this is still up the air -- no consensus yet.
 
 raise an exception, I hate the silent cast of nan to integer zero,

me too -- I'm sorry, I wasn't clear -- I'm not going to write any code 
that returns a zero for a missing value. These are the options I'd consider:

1) Have the user specify what to use for missing values, otherwise, 
raise an exception

2) Insert a NaN for floating points types, and raise an exception for 
integer types.

what's not clear is whether (2) is a good idea. As for (1), I just don't 
know if I'm going to get around to writing the code, and I maybe more 
kwargs is a bad idea -- though maybe not.

Enough talk: I've got ugly C code to wade through...

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Numpy MKL

2010-01-07 Thread Xue (Sue) Yang

I understand that intel mkl uses openMP parallel model.  Therefore I set
environment variable

os.environ['OMP_NUM_THREADS'] =  '4'

With same test example, however, still one cpu is used.

Do I need any specifications when I run numpy with intel MKL (MKL9.1)?
numpy developers would be able to answer this question?

I changed the name of numpy-discussion thread to Numpy  MKL attempting to
draw attentions from wide range of readers.

Thanks!

Sue

On Thu, Jan 7, 2010 at 11:20 AM, Xue (Sue) Yang
x.y...@physics.usyd.edu.au wrote:

 This time, only one cpu was used.  Does it mean that our installed intel
mkl
 9.1 is not threaded?

You would have to consult the MKL documentation - I believe you can
control how many threads are used from an environment variable. Also,
the exact build commands depend on the version of the MKL, as its
libraries often change between versions.

David

 Thank you for the reply which is useful.
 
 I also tried to Install numpy with intel mkl 9.1
 I still used gfortran for numpy installation as intel mkl 9.1 supports
 gnu compiler.
 
 I only uncomment these lines for site.cfg in  site.cfg.example
 
 [mkl]
 library_dirs = /usr/physics/intel/mkl/lib/32
 include_dirs = /usr/physics/intel/mkl/include
 lapack_libs = mkl_lapack
 
 then I tested the numpy with
 
  python
 import numpy
 a = numpy.random.randn(6000, 6000)
 numpy.dot(a, a)
 
 This time, only one cpu was used.  Does it mean that our installed
 intel mkl 9.1 is not threaded?
 I don't think so.  We have used it for openMP parallelization for quite
 a while.
 
 Thanks!
 
 Sue




___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy MKL

2010-01-07 Thread David Warde-Farley
On 7-Jan-10, at 6:58 PM, Xue (Sue) Yang wrote:

 Do I need any specifications when I run numpy with intel MKL (MKL9.1)?
 numpy developers would be able to answer this question?

Are you sure you've compiled against MKL properly? What is printed by  
numpy.show_config()?

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Numpy MKL

2010-01-07 Thread Xue (Sue) Yang
This is what I had (when I built numpy, I chose gnu compilers instead of
intel compilers),

 numpy.show_config()
lapack_opt_info:
libraries = ['mkl_lapack', 'mkl', 'vml', 'guide', 'pthread']
library_dirs = ['/usr/physics/intel/mkl/lib/32']
define_macros = [('SCIPY_MKL_H', None)]
include_dirs = ['/usr/physics/intel/mkl/include']

blas_opt_info:
libraries = ['mkl', 'vml', 'guide', 'pthread']
library_dirs = ['/usr/physics/intel/mkl/lib/32']
define_macros = [('SCIPY_MKL_H', None)]
include_dirs = ['/usr/physics/intel/mkl/include']

lapack_mkl_info:
libraries = ['mkl_lapack', 'mkl', 'vml', 'guide', 'pthread']
library_dirs = ['/usr/physics/intel/mkl/lib/32']
define_macros = [('SCIPY_MKL_H', None)]
include_dirs = ['/usr/physics/intel/mkl/include']

blas_mkl_info:
libraries = ['mkl', 'vml', 'guide', 'pthread']
library_dirs = ['/usr/physics/intel/mkl/lib/32']
define_macros = [('SCIPY_MKL_H', None)]
include_dirs = ['/usr/physics/intel/mkl/include']

mkl_info:
libraries = ['mkl', 'vml', 'guide', 'pthread']
library_dirs = ['/usr/physics/intel/mkl/lib/32']
define_macros = [('SCIPY_MKL_H', None)]
include_dirs = ['/usr/physics/intel/mkl/include']

Thanks!

Sue

 Do I need any specifications when I run numpy with intel MKL (MKL9.1)?
 numpy developers would be able to answer this question?

Are you sure you've compiled against MKL properly? What is printed by  
numpy.show_config()?

David



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] fromfile() -- help!

2010-01-07 Thread Christopher Barker
OK,

I'm trying to dig into the code and figure out how to get it to stop 
putting in zeros for missing data with fromfile()/fromstring() text reading.

It looks like the culprit is this, in arraytypes.c.src:


@fn...@_scan(FILE *fp, @type@ *ip, void *NPY_UNUSED(ignore), 
PyArray_Descr *NPY_UNUSED(ignored))
{
 double result;
 int ret;

 ret = NumPyOS_ascii_ftolf(fp, result);
 *ip = (@type@) result;
 return ret;
}


If I'm reading this right, this gets called for the datatype of 
interest, and it is passed in a pointer to the file that is being read.

if I have NumPyOS_ascii_ftolf right, it should return 0 if it doesn't 
succesfully read a number. However, this looks like it sets the data in 
*ip, even if the return value is zero.

It does pass on that return value, but, from ctors.c:

fromfile_next_element(FILE **fp, void *dptr, PyArray_Descr *dtype,
   void *NPY_UNUSED(stream_data))
{
 /* the NULL argument is for backwards-compatibility */
 return dtype-f-scanfunc(*fp, dptr, NULL, dtype);
}

just moves it on through. This is called from here:

 if (next(stream, dptr, dtype, stream_data)  0) {
 break;
 }

which is checking for  0 , so if a zero is returned, it will just go in 
its merry way...

So, have I got that right?

Should this get fixed at that last point?

One more point, this is a bit different for fromfile and fromstring, so 
I'm getting really confused!

-Chris









-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [Pythonmac-SIG] 1.4.0 installer fails on OSX 10.6.2

2010-01-07 Thread David Cournapeau
Christopher Barker wrote:
 David Cournapeau wrote:
 On Thu, Jan 7, 2010 at 1:35 AM, Christopher Barker
 In the past, I think folks' have used the default
 name provided by bdist_mpkg, and those are not always clear. Something like:


 numpy1.4-osx10.4-python.org2.6-32bit.dmg
 The 32 bits is redundant - we support all archs supported by the
 official python binary, so python.org is enough.
 
 True, though I was anticipating that there may be 32 and 64 bit builds 
 some day.

I suspect it will be exactly as today, i.e. a universal build with 64 
bits. I have not followed closely the discussion on python-dev on that 
topic, but I believe python 2.7 sill contain 64 bits as an arch.


 What OS/architecture were those built with?

Snow Leopard.

 When I first installed the binary, I got a whole bunch of errors because 
 matrix' wasn't found. I recalled this issue from testing, and cleared 
 out the install, then re-installed, and all was fine. I wonder if it's 
 possible to have a mpkg remove anything?

pkg does not have a uninstaller - I don't think Apple provides one, 
that's a known limitation of Mac OS X installers (although I believe 
there are 3rd party ones)

 
 
 I think both of those are known issues, and not a big deal.

Maybe the spacing function is wrong on PPC. The underlying is highly 
architecture dependent.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy MKL

2010-01-07 Thread David Warde-Farley
On 7-Jan-10, at 8:13 PM, Xue (Sue) Yang wrote:

 This is what I had (when I built numpy, I chose gnu compilers  
 instead of
 intel compilers),

 numpy.show_config()
 lapack_opt_info:
libraries = ['mkl_lapack', 'mkl', 'vml', 'guide', 'pthread']
library_dirs = ['/usr/physics/intel/mkl/lib/32']
define_macros = [('SCIPY_MKL_H', None)]
include_dirs = ['/usr/physics/intel/mkl/include']

 blas_opt_info:
libraries = ['mkl', 'vml', 'guide', 'pthread']
library_dirs = ['/usr/physics/intel/mkl/lib/32']
define_macros = [('SCIPY_MKL_H', None)]
include_dirs = ['/usr/physics/intel/mkl/include']

 lapack_mkl_info:
libraries = ['mkl_lapack', 'mkl', 'vml', 'guide', 'pthread']
library_dirs = ['/usr/physics/intel/mkl/lib/32']
define_macros = [('SCIPY_MKL_H', None)]
include_dirs = ['/usr/physics/intel/mkl/include']

 blas_mkl_info:
libraries = ['mkl', 'vml', 'guide', 'pthread']
library_dirs = ['/usr/physics/intel/mkl/lib/32']
define_macros = [('SCIPY_MKL_H', None)]
include_dirs = ['/usr/physics/intel/mkl/include']

 mkl_info:
libraries = ['mkl', 'vml', 'guide', 'pthread']
library_dirs = ['/usr/physics/intel/mkl/lib/32']
define_macros = [('SCIPY_MKL_H', None)]
include_dirs = ['/usr/physics/intel/mkl/include']

That looks right to me... And you're sure you've set the environment  
variable before Python is run and NumPy is loaded?

Try running:
import os; print os.environ['OMP_NUM_THREADS']

and verify it's the right number.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 1.4.0 installer fails on OSX 10.6.2

2010-01-07 Thread David Warde-Farley
On 5-Jan-10, at 7:18 PM, Christopher Barker wrote:

 If distutils/setuptools could identify the python version properly,  
 then
  binary eggs and easy-install could be a solution -- but that's a  
 mess,
 too.


Long live toydist! :)

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 1.4.0 installer fails on OSX 10.6.2

2010-01-07 Thread David Warde-Farley
On 5-Jan-10, at 7:02 PM, Christopher Barker wrote:

 Pretty sure the python.org binaries are 32-bit only. I still think
 it's sensible to prefer the

 waiting the rest of this sentence.. ;-)

I had meant to say 'sensible to prefer the Python.org version' though  
in reality I'm a little miffed that Python.org isn't providing Ron's 4- 
way binaries, since he went to the trouble of adding support for  
building them. Grumble grumble.

 I'm not really a fan of packages polluting /usr/local, I'd rather the
 tree appear /opt/packagename

 well, /opt has kind of been co-opted by macports.

I'd forgotten about that.

 or /usr/local/packagename instead, for
 ease of removal

 wxPython gets put entirely into:

 /usr/local/lib/wxPython-unicode-2.10.8

 which isn't bad.

Ah, yeah, that isn't bad either.

 but the general approach of stash somewhere and put
 a .pth in both site-packages seems fine to me.

 OK -- what about simply punting and doing two builds: one 32 bit, and
 one 64 bit. I wonder if we need 64bit PPC at all? I know I'm running  
 64
 bit hardware, but never ran a 64 bit OS on it -- I wonder if anyone  
 is?

I've built for ppc64 before, and in fact discovered a long-standing  
bug in the way ppc64 was detected. The fact that nobody found it  
before me is probably evidence that it is nearly never used. It could  
be useful in a minority of situations but I don't think it's going to  
be worth it for most people.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 1.4.0 installer fails on OSX 10.6.2

2010-01-07 Thread Robert Kern
On 2010-01-07, David Warde-Farley d...@cs.toronto.edu wrote:
 On 5-Jan-10, at 7:02 PM, Christopher Barker wrote:

 I'm not really a fan of packages polluting /usr/local, I'd rather the
 tree appear /opt/packagename

 well, /opt has kind of been co-opted by macports.

 I'd forgotten about that.

It's not really true, though. MacPorts took /opt/local/, but
/opt/yourbrandnamehere/ probably hasn't been.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 1.4.0 installer fails on OSX 10.6.2

2010-01-07 Thread David Cournapeau
On Fri, Jan 8, 2010 at 11:24 AM, David Warde-Farley d...@cs.toronto.edu wrote:
 On 5-Jan-10, at 7:18 PM, Christopher Barker wrote:

 If distutils/setuptools could identify the python version properly,
 then
  binary eggs and easy-install could be a solution -- but that's a
 mess,
 too.


 Long live toydist! :)

Toydist will not solve anything here. Versioning info is useless here
if it does not translate to compatible ABI. What is required is to be
able to identify a precise python ABI: python makes that hard, mac os
x harder, and universal builds ever harder. Things like PEP 384 may
help in the future -  As it is written by someone who actually knows
about this stuff, it will hopefully be useful.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] FIY: a (new ?) practical profiling tool on linux

2010-01-07 Thread David Cournapeau
Hi,

I don't know if many people are aware of it, but I have recently
discovered perf, a tool available from the kernel sources. It is
extremely simple to use, and very useful when looking at numpy/scipy
perf issues in compiled code. For example, I can get this kind of
results for looking at the numpy neighborhood iterator performance in
one simple command, without special compilation flags:

 44.69%   python
/home/david/local/stow/scipy.git/lib/python2.6/site-packages/scipy/signal/sigtools.so
[.] _imp_correlate_nd_double
39.47%   python
/home/david/local/stow/numpy-1.4.0/lib/python2.6/site-packages/numpy/core/multiarray.so
  [.] get_ptr_constant
 9.98%   python
/home/david/local/stow/numpy-1.4.0/lib/python2.6/site-packages/numpy/core/multiarray.so
  [.] get_ptr_simple
 0.65%   python  /usr/bin/python2.6
[.]
0x12b8a0
 0.40%   python  /usr/bin/python2.6
[.]
0x0a6662
 0.37%   python  /usr/bin/python2.6
[.]
0x04c10d
 0.32%   python  /usr/bin/python2.6
[.]
PyEval_EvalFrameEx
 0.15%   python  [kernel]
[k] __d_lookup
 0.14%   python  /lib/libc-2.10.1.so
[.] _int_malloc
 0.12%   python  /usr/bin/python2.6
[.]
0x04f90e
 0.10%   python  [kernel]
[k]
__link_path_walk
 0.09%   python  /usr/bin/python2.6
[.]
PyObject_Malloc
 0.09%   python  /lib/ld-2.10.1.so
[.] do_lookup_x
 0.09%   python  /lib/libc-2.10.1.so
[.] __GI_memcpy
 0.08%   python  [kernel]
[k]
__ticket_spin_lock
 0.07%   python  /usr/bin/python2.6
[.]
PyParser_AddToken

And even cooler, annotated sources:


 Percent |  Source code  Disassembly of multiarray.so

 :
 :
 :
 :  Disassembly of section .text:
 :
 :  0001d8a0 get_ptr_constant:
 :  _coordinates[c] = bd;
 :
 :  /* set the dataptr from its current coordinates */
 :  static char*
 :  get_ptr_constant(PyArrayIterObject* _iter, npy_intp
*coordinates)
 :  {
   15.69 : 1d8a0:   48 81 ec 08 01 00 00sub$0x108,%rsp
 :  int i;
 :  npy_intp bd, _coordinates[NPY_MAXDIMS];
 :  PyArrayNeighborhoodIterObject *niter =
(PyArrayNeighborhoodIterObject*)_iter;
 :  PyArrayIterObject *p = niter-_internal_iter;
 :
 :  for(i = 0; i  niter-nd; ++i) {
0.02 : 1d8a7:   48 83 bf 48 0a 00 00cmpq   $0x0,0xa48(%rdi)
0.00 : 1d8ae:   00
 :  get_ptr_constant(PyArrayIterObject* _iter, npy_intp
*coordinates)
 :  {
 :  int i;
 :  npy_intp bd, _coordinates[NPY_MAXDIMS];
 :  PyArrayNeighborhoodIterObject *niter =
(PyArrayNeighborhoodIterObject*)_iter;
 :  PyArrayIterObject *p = niter-_internal_iter;
0.01 : 1d8af:   48 8b 87 50 0b 00 00mov0xb50(%rdi),%rax
 :
 :  for(i = 0; i  niter-nd; ++i) {
7.92 : 1d8b6:   7e 64   jle1d91c
get_ptr_constant+0x7c
 :  _INF_SET_PTR(i)
0.01 : 1d8b8:   48 8b 0emov(%rsi),%rcx
0.00 : 1d8bb:   48 03 48 28 add0x28(%rax),%rcx
0.03 : 1d8bf:   48 3b 88 40 07 00 00cmp0x740(%rax),%rcx
7.97 : 1d8c6:   7c 68   jl 1d930
get_ptr_constant+0x90
0.02 : 1d8c8:   45 31 c9xor%r9d,%r9d
0.00 : 1d8cb:   31 d2   xor%edx,%edx
0.00 : 1d8cd:   48 3b 88 48 07 00 00cmp0x748(%rax),%rcx
7.75 : 1d8d4:   7e 32   jle1d908
get_ptr_constant+0x68
0.00 : 1d8d6:   eb 58   jmp1d930
get_ptr_constant+0x90
0.00 : 1d8d8:   0f 1f 84 00 00 00 00nopl   0x0(%rax,%rax,1)
0.00 : 1d8df:   00
7.68 : 1d8e0:   4c 8d 42 74 lea0x74(%rdx),%r8
0.00 : 1d8e4:   48 8b 0c d6 mov
(%rsi,%rdx,8),%rcx
0.00 : 1d8e8:   

Re: [Numpy-discussion] fromfile() for reading text (one more time!)

2010-01-07 Thread Bruce Southey
On Thu, Jan 7, 2010 at 3:45 PM, Christopher Barker
chris.bar...@noaa.gov wrote:
 Bruce Southey wrote:
 chris.bar...@noaa.gov wrote:

 Using the numpy NaN or similar (noting R's approach to missing values
 which in turn allows it to have the above functionality) is just a
 very bad idea for missing values because you always have to check that
 which NaN is a missing value and which was due to some numerical
 calculation.

 well, this is specific to reading files, so you know where it came from.

You can only know where it came from when you compare the original
array to the transformed one. Also a user has to check for missing
values or numpy has to warn a user that missing values are present
immediately after reading the data so the appropriate action can be
taken (like using functions that handle missing values appropriately).
That is my second problem with using codes (NaN, -9 etc)  for
missing values.



 And the principle of fromfile() is that it is fast and simple, if you
 want masked arrays, use slower, but more full-featured methods.

So in that case it should fail with missing data.


 However, in this case:

 In [9]: np.fromstring(3, 4, NaN, 5, sep=,)
 Out[9]: array([  3.,   4.,  NaN,   5.])


 An actual NaN is read from the file, rather than a missing value.
 Perhaps the user does want the distinction, so maybe it should really
 only fil it in if the users asks for it, but specifying
 missing_value=np.nan or something.

Yes, that is my first problem of using predefined codes for missing
values as you do not always know what is going to occur in the data.



From what I can see is that you expect that fromfile() should only
 split at the supplied delimiters, optionally(?) strip any whitespace

 whitespace stripping is not optional.

 Your output from this string '1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12'
 actually assumes multiple delimiters because there is no comma between
 4 and 5 and 8 and 9.

 Yes, that's the point. I thought about allowing arbitrary multiple
 delimiters, but I think '/n' is a special case - for instance, a comma
 at the end of some numbers might mean missing data, but a '\n' would not.

 And I couldn't really think of a useful use-case for arbitrary multiple
 delimiters.

 In Josef's last case how many 'missing values should there be?

   extra newlines at end of file
   str =  '1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12\n\n\n'

 none -- exactly why I think \n is a special case.

What about '\r' and '\n\r'?


 What about:
   extra newlines in the middle of the file
   str =  '1, 2, 3, 4\n\n5, 6, 7, 8\n9, 10, 11, 12\n'

 I think they should be ignored, but I hope I'm not making something that
 is too specific to my personal needs.

Not really, it is more that I am being somewhat difficult to ensure I
understand what you actually need.

My problem with this is that you are reading one huge 1-D array  (that
you can resize later) rather than a 2-D array with rows and columns
(which is what I deal with). But I agree that you can have an option
to say treat '\n' or '\r' as a delimiter but I think it should be
turned off by default.



 Travis Oliphant wrote:
 +1 (ignoring new-lines transparently is a nice feature).  You can also
 use sscanf with weave to read most files.

 right -- but that requires weave. In fact, MATLAB has a fscanf function
 that allows you to pass in a C format string and it vectorizes it to use
 the same one over an over again until it's done. It's actually quite
 powerful and flexible. I once started with that in mind, but didn't have
 the C chops to do it. I ended up with a tool that only did doubles (come
 to think of it, MATLAB only does doubles, anyway...)

 I may some day write a whole new C (or, more likely, Cython) function
 that does something like that, but for now, I'm jsut trying to get
 fromfile to be useful for me.


 +1   (much preferrable to insert NaN or other user value than raise
 ValueError in my opinion)

 But raise an error for integer types?

 I guess this is still up the air -- no consensus yet.

 Thanks,

 -Chris


You should have a corresponding value for ints because raising an
exceptionwould be inconsistent with allowing floats to have a value.
If you must keep the user defined dtype then, as Josef suggests, just
use some code be it -999 or most negative number supported by the OS
for the defined dtype or, just convert the ints into floats if the
user does not define a missing value code.  It would be nice to either
return the number of missing values or display a warning indicating
how many occurred.

Bruce
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fromfile() for reading text (one more time!)

2010-01-07 Thread josef . pktd
On Thu, Jan 7, 2010 at 11:10 PM, Bruce Southey bsout...@gmail.com wrote:
 On Thu, Jan 7, 2010 at 3:45 PM, Christopher Barker
 chris.bar...@noaa.gov wrote:
 Bruce Southey wrote:
 chris.bar...@noaa.gov wrote:

 Using the numpy NaN or similar (noting R's approach to missing values
 which in turn allows it to have the above functionality) is just a
 very bad idea for missing values because you always have to check that
 which NaN is a missing value and which was due to some numerical
 calculation.

 well, this is specific to reading files, so you know where it came from.

 You can only know where it came from when you compare the original
 array to the transformed one. Also a user has to check for missing
 values or numpy has to warn a user that missing values are present
 immediately after reading the data so the appropriate action can be
 taken (like using functions that handle missing values appropriately).
 That is my second problem with using codes (NaN, -9 etc)  for
 missing values.



 And the principle of fromfile() is that it is fast and simple, if you
 want masked arrays, use slower, but more full-featured methods.

 So in that case it should fail with missing data.


 However, in this case:

 In [9]: np.fromstring(3, 4, NaN, 5, sep=,)
 Out[9]: array([  3.,   4.,  NaN,   5.])


 An actual NaN is read from the file, rather than a missing value.
 Perhaps the user does want the distinction, so maybe it should really
 only fil it in if the users asks for it, but specifying
 missing_value=np.nan or something.

 Yes, that is my first problem of using predefined codes for missing
 values as you do not always know what is going to occur in the data.



From what I can see is that you expect that fromfile() should only
 split at the supplied delimiters, optionally(?) strip any whitespace

 whitespace stripping is not optional.

 Your output from this string '1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12'
 actually assumes multiple delimiters because there is no comma between
 4 and 5 and 8 and 9.

 Yes, that's the point. I thought about allowing arbitrary multiple
 delimiters, but I think '/n' is a special case - for instance, a comma
 at the end of some numbers might mean missing data, but a '\n' would not.

 And I couldn't really think of a useful use-case for arbitrary multiple
 delimiters.

 In Josef's last case how many 'missing values should there be?

   extra newlines at end of file
   str =  '1, 2, 3, 4\n5, 6, 7, 8\n9, 10, 11, 12\n\n\n'

 none -- exactly why I think \n is a special case.

 What about '\r' and '\n\r'?

Yes, I forgot about this, and it will be the most common case for
Windows users like myself.

I think \r should be stripped automatically, like in non-binary
reading of files in python.



 What about:
   extra newlines in the middle of the file
   str =  '1, 2, 3, 4\n\n5, 6, 7, 8\n9, 10, 11, 12\n'

 I think they should be ignored, but I hope I'm not making something that
 is too specific to my personal needs.

 Not really, it is more that I am being somewhat difficult to ensure I
 understand what you actually need.

 My problem with this is that you are reading one huge 1-D array  (that
 you can resize later) rather than a 2-D array with rows and columns
 (which is what I deal with). But I agree that you can have an option
 to say treat '\n' or '\r' as a delimiter but I think it should be
 turned off by default.



 Travis Oliphant wrote:
 +1 (ignoring new-lines transparently is a nice feature).  You can also
 use sscanf with weave to read most files.

 right -- but that requires weave. In fact, MATLAB has a fscanf function
 that allows you to pass in a C format string and it vectorizes it to use
 the same one over an over again until it's done. It's actually quite
 powerful and flexible. I once started with that in mind, but didn't have
 the C chops to do it. I ended up with a tool that only did doubles (come
 to think of it, MATLAB only does doubles, anyway...)

 I may some day write a whole new C (or, more likely, Cython) function
 that does something like that, but for now, I'm jsut trying to get
 fromfile to be useful for me.


 +1   (much preferrable to insert NaN or other user value than raise
 ValueError in my opinion)

 But raise an error for integer types?

 I guess this is still up the air -- no consensus yet.

 Thanks,

 -Chris


 You should have a corresponding value for ints because raising an
 exceptionwould be inconsistent with allowing floats to have a value.

No, I think different nan/missing value handling between integers and
float is a natural distinction. There is no default nan code for
integers, but nan (and inf) are valid floating point numbers (even if
nan is not a number). And the default treatment of nans in numpy is
getting pretty good (e.g. I like the new (nan)sort).


 If you must keep the user defined dtype then, as Josef suggests, just
 use some code be it -999 or most negative number supported by the OS
 for the defined dtype or, just convert the ints