[Numpy-discussion] Regression in choose()

2011-06-16 Thread Ed Schofield
Hi all,

I have been investigation the limitation of the choose() method (and
function) to 32 elements. This is a regression in recent versions of NumPy.
I have tested choose() in the following NumPy versions:

1.0.4: fine
1.1.1: bug
1.2.1: fine
1.3.0: bug
1.4.x: bug
1.5.x: bug
1.6.x: bug
Numeric 24.3: fine

(To run the tests on versions of NumPy prior to 1.4.x I used Python 2.4.3.
For the other tests I used Python 2.7.)

Here 'bug' means the choose() function has the 32-element limitation. I have
been helping an organization to port a large old Numeric-using codebase to
NumPy, and the choose() limitation in recent NumPy versions is throwing a
spanner in the works. The codebase is currently using both NumPy and Numeric
side-by-side, with Numeric only being used for its choose() function, with a
few dozen lines like this:

a = numpy.array(Numeric.choose(b, c))

Here is a simple example that triggers the bug. It is a simple extension of
the example from the choose() docstring:



import numpy as np

choices = [[0, 1, 2, 3], [10, 11, 12, 13],
  [20, 21, 22, 23], [30, 31, 32, 33]]

np.choose([2, 3, 1, 0], choices * 8)



A side note: the exception message (defined in
core/src/multiarray/iterators.c) is also slightly inconsistent with the
actual behaviour:

Traceback (most recent call last):
  File chooser.py, line 6, in module
np.choose([2, 3, 1, 0], choices * 8)
  File /usr/lib64/python2.7/site-packages/numpy/core/fromnumeric.py, line
277, in choose
return _wrapit(a, 'choose', choices, out=out, mode=mode)
  File /usr/lib64/python2.7/site-packages/numpy/core/fromnumeric.py, line
37, in _wrapit
result = getattr(asarray(obj),method)(*args, **kwds)
ValueError: Need between 2 and (32) array objects (inclusive).

The actual behaviour is that choose() passes with 31 objects but fails with
32 objects, so this should read exclusive rather than inclusive. (And
why the parentheses around 32?)

Does anyone know what changed between 1.2.1 and 1.3.0 that introduced the
32-element limitation to choose(), and whether we might be able to lift this
limitation again for future NumPy versions? I have a couple of days to work
on a patch ... if someone can advise me how to approach this.

Best wishes,
Ed


-- 
Dr. Edward Schofield
Python Charmers
+61 (0)405 676 229
http://pythoncharmers.com
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Regression in choose()

2011-06-16 Thread Olivier Delalleau
If performance is not critical you could just write your own function for a
quick fix, doing something like
numpy.array([choices[j][i] for i, j in enumerate([2, 3, 1, 0])])

This 32-array limitation definitely looks weird to me though, doesn't seem
to make sense.

-=- Olivier

2011/6/16 Ed Schofield e...@pythoncharmers.com

 Hi all,

 I have been investigation the limitation of the choose() method (and
 function) to 32 elements. This is a regression in recent versions of NumPy.
 I have tested choose() in the following NumPy versions:

 1.0.4: fine
 1.1.1: bug
 1.2.1: fine
 1.3.0: bug
 1.4.x: bug
 1.5.x: bug
 1.6.x: bug
 Numeric 24.3: fine

 (To run the tests on versions of NumPy prior to 1.4.x I used Python 2.4.3.
 For the other tests I used Python 2.7.)

 Here 'bug' means the choose() function has the 32-element limitation. I
 have been helping an organization to port a large old Numeric-using codebase
 to NumPy, and the choose() limitation in recent NumPy versions is throwing a
 spanner in the works. The codebase is currently using both NumPy and Numeric
 side-by-side, with Numeric only being used for its choose() function, with a
 few dozen lines like this:

 a = numpy.array(Numeric.choose(b, c))

 Here is a simple example that triggers the bug. It is a simple extension of
 the example from the choose() docstring:

 

 import numpy as np

 choices = [[0, 1, 2, 3], [10, 11, 12, 13],
   [20, 21, 22, 23], [30, 31, 32, 33]]

 np.choose([2, 3, 1, 0], choices * 8)

 

 A side note: the exception message (defined in
 core/src/multiarray/iterators.c) is also slightly inconsistent with the
 actual behaviour:

 Traceback (most recent call last):
   File chooser.py, line 6, in module
 np.choose([2, 3, 1, 0], choices * 8)
   File /usr/lib64/python2.7/site-packages/numpy/core/fromnumeric.py, line
 277, in choose
 return _wrapit(a, 'choose', choices, out=out, mode=mode)
   File /usr/lib64/python2.7/site-packages/numpy/core/fromnumeric.py, line
 37, in _wrapit
 result = getattr(asarray(obj),method)(*args, **kwds)
 ValueError: Need between 2 and (32) array objects (inclusive).

 The actual behaviour is that choose() passes with 31 objects but fails with
 32 objects, so this should read exclusive rather than inclusive. (And
 why the parentheses around 32?)

 Does anyone know what changed between 1.2.1 and 1.3.0 that introduced the
 32-element limitation to choose(), and whether we might be able to lift this
 limitation again for future NumPy versions? I have a couple of days to work
 on a patch ... if someone can advise me how to approach this.

 Best wishes,
 Ed


 --
 Dr. Edward Schofield
 Python Charmers
 +61 (0)405 676 229
 http://pythoncharmers.com


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] [ANN] OpenOpt suite 0.34

2011-06-16 Thread Dmitrey
 Hi all,
   I'm glad to inform you about new quarterly release 0.34 of the OOSuite
   package software (OpenOpt, FuncDesigner, SpaceFuncs, DerApproximator)
   .

   Main changes:
   * Python 3 compatibility
   * Lots of improvements and speedup for interval calculations
   * Now interalg can obtain all solutions of nonlinear equation (example)
   or systems of them (example) in the involved box lb_i = x_i = ub_i
   (bounds can be very large), possibly constrained (e.g. sin(x) +
   cos(y+x)  0.5).
   * Many other improvements and speedup for interalg.

   See http://forum.openopt.org/viewtopic.php?id=425 for more details.

   Regards, D.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] 3d ODR

2011-06-16 Thread Christian K .
Hi,

I need to do fit a 3d surface to a point cloud. This sounds like a job for 3d 
orthogonal distance regression. Does anybody know of an implementation?

Regards, Christian K.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Using numpy.fromfile with structured array skipping some elements.

2011-06-16 Thread Michael Klitgaard
Hello,

I hope this is the right mailings list for a numpy user questions, if
not, I'm sorry.

Im reading a binary file with numpy.fromfile()

The binary file is constructed with some integers for checking the
data for corruption.
This is how the binary file is constructed:

Timestamp       [ 12 bytes]
[ 1 int ]       check
[ 1 single ]    Time stamp (single precision).
[ 1 int ]       check

Data chunk      [ 4*(no_sensor+2) bytes ]
[ 1 int ]       check
[ no_sensor single ]    Array of sensor readings (single precision).
[ 1 int ]       check

The file continues this way
[ Timestamp ]
[ Data chunk ]
[ Timestamp ]
[ Data chunk ]
..


no_sensor is file dependend
int = 4 bytes
single = 4 bytes


This is my current procedure
f = open(file,'rb')
f.read(size_of_header)  # The file contains a header, where fx. the
no_sensor can be read.

dt = np.dtype([('junk0', 'i4'),
              ('timestamp', 'f4'),
              ('junk1', 'i4'),
              ('junk2', 'i4'),
              ('sensors', ('f4',no_sensor)),
              ('junk3', 'i4')])

data = np.fromfile(f, dtype=dt)

Now the data is read in and I can access it, but I have the 'junk' in
the array, which annoys me.
Is there a way to remove the junk data, or skip it with fromfile ?

Another issue is that when accessing one sensor, I do it this way:

data['sensors'][:,0]
for the first sensor, would it be possible to just do:
data['sensors'][0] ?

Thank you!


Sincerely
Michael Klitgaard
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 3d ODR

2011-06-16 Thread eat
Hi,

On Thu, Jun 16, 2011 at 2:28 PM, Christian K. ckk...@hoc.net wrote:

 Hi,

 I need to do fit a 3d surface to a point cloud. This sounds like a job for
 3d
 orthogonal distance regression. Does anybody know of an implementation?

How bout 
http://docs.scipy.org/doc/scipy/reference/odr.htmhttp://docs.scipy.org/doc/scipy/reference/odr.html
My 2 cents,
eat


 Regards, Christian K.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

2011-06-16 Thread Bruce Southey
On 06/15/2011 07:25 PM, Sturla Molden wrote:
 Den 15.06.2011 23:22, skrev Christopher Barker:
 I think the issue got confused -- the OP was not looking to speed up a
 matrix multiply, but rather to speed up a whole bunch of independent
 matrix multiplies.
 I would do it like this:

 1. Write a Fortran function that make multiple calls DGEMM in a do loop.
 (Or Fortran intrinsics dot_product or matmul.)

 2. Put an OpenMP pragma around the loop  (!$omp parallel do). Invoke the
 OpenMP compiler on compilation. Use static or guided thread scheduling.

 3. Call Fortran from Python using f2py, ctypes or Cython.

 Build with a thread-safe and single-threaded BLAS library.

 That should run as fast as it gets.

 Sturla
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

The idea was to calculate:
innerProduct =numpy.sum(array1*array2) where array1 and array2 are, in 
general, multidimensional.

Numpy has an inner product  function called np.inner 
(http://www.scipy.org/Numpy_Example_List_With_Doc#inner) which is a 
special case of tensordot. See the documentation for what is does and 
other examples. Also note that np.inner provides of the cross-products 
as well. For example,  you can just do:

  import numpy as np
  a=np.arange(10).reshape(2,5)
  a
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
  b=a*10
  b
array([[ 0, 10, 20, 30, 40],
[50, 60, 70, 80, 90]])
  c=a*b
  c
array([[  0,  10,  40,  90, 160],
[250, 360, 490, 640, 810]])
  c.sum()
2850
 d=np.inner(a,b)
 d
array([[ 300, 800],
  [ 800, 2550]])
 np.diag(d).sum()
2850


I do not know if numpy's multiplication, np.inner() and np.sum() are 
single-threaded and thread-safe when you link to multi-threaded blas, 
lapack or altas libraries. But if everything is *single-threaded* and 
thread-safe, then you just create a function and use Anne's very useful 
handythread.py (http://www.scipy.org/Cookbook/Multithreading). Otherwise 
you would have to determine the best approach such doing nothing 
(especially if the arrays are huge), or making the functions single-thread.

By the way, if the arrays are sufficiently small, there is a lot of 
overhead involved such that there is more time in communication than 
computation. So you have to determine the best algorithm for you case.

Bruce

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] datetimes with date vs time units, local time, and time zones

2011-06-16 Thread Benjamin Root
On Wednesday, June 15, 2011, Mark Wiebe mwwi...@gmail.com wrote:
 Towards a reasonable behavior with regard to local times, I've made the 
 default repr for datetimes use the C standard library to print them in a 
 local ISO format. Combined with the ISO8601-prescribed behavior of 
 interpreting datetime strings with no timezone specifier to be in local 
 times, this allows the following cases to behave reasonably:

 np.datetime64('now')numpy.datetime64('2011-06-15T15:16:51-0500','s')
 np.datetime64('2011-06-15T18:00')
 numpy.datetime64('2011-06-15T18:00-0500','m')
 As noted in another thread, there can be some extremely surprising behavior 
 as a consequence:

 np.array(['now', '2011-06-15'], 
 dtype='M')array(['2011-06-15T15:18:26-0500', '2011-06-14T19:00:00-0500'], 
 dtype='datetime64[s]')
 Having the 15th of June print out as 7pm on the 14th of June is probably not 
 what one would generally expect, so I've come up with an approach which 
 hopefully deals with this in a good way.

 One firm principal of the datetime in NumPy is that it is always stored as a 
 POSIX time (referencing UTC), or a TAI time. There are two categories of 
 units that can be used, which I will call date units and time units. The date 
 units are 'Y', 'M', 'W', and 'D', while the time units are 'h', 'm', 's', 
 ..., 'as'. Time zones are only applied to datetimes stored in time units, so 
 there's a qualitative difference between date and time units with respect to 
 string conversions and calendar operations.

 I would like to place an 'unsafe' casting barrier between the date units and 
 the time units, so that the above conversion from a date into a datetime will 
 raise an error instead of producing a confusing result. This only applies to 
 datetimes and not timedeltas, because for timedeltas the day - hour case is 
 fine, it is just the year/month - other units which has issues, and that is 
 already treated with an 'unsafe' casting barrier.

 Two new functions will facilitate the conversions between datetimes with date 
 units and time units:
 date_as_datetime(datearray, hour, minute, second, microsecond, 
 timezone='local', unit=None, out=None), which converts the provided dates 
 into datetimes at the specified time, according to the specified timezone. If 
 'unit' is specified, it controls the output unit, otherwise it is the units 
 in 'out' or the amount of precision specified in the function.

 datetime_as_date(datetimearray, timezone='local', out=None), which converts 
 the provided datetimes into dates according to the specified timezone.
 In both functions, timezone can be any of 'UTC', 'TAI', 'local', '+/-', 
 or a datetime.tzinfo object. The latter will allow NumPy datetimes to work 
 with the pytz library for flexible time zone support.

 I would also like to extend the 'today' input string parsing to accept 
 strings like 'today 12:30' to allow a convenient way to express different 
 local times occurring today, mostly useful for interactive usage.

 I welcome any comments on this design, particularly if you can find a case 
 where this doesn't produce a reasonable behavior.
 Cheers,Mark


Is the output for the given usecase above with the mix of 'now' and a
datetime string without tz info intended to still be correct?  I
personally have misgivings about interpreating phrases like now and
today at this level.  I think it introduces a can of worms that
would be difficult to handle.

Consider some arbitrary set of inputs to the array function for
datetime objects.  If they all contain no tz info, then they are all
interpreated the same as-is.  However, if even one element has 'now',
then the inputs are interpreated entirely differently.  This will
confuse people.

Just thinking out loud here, What about a case where the inputs are
such that some do not specify tz and some others specify a mix of
timezones? Should that be any different from the case given above?

It has been awhile for me, but how different is this from Perl's
floating tz for its datetime module?  Maybe we could combine its
approach with your unsafe barrier for the ambiguous situations that
perl's datetime module mentions?

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Using numpy.fromfile with structured array skipping some elements.

2011-06-16 Thread Sturla Molden
Den 16.06.2011 15:02, skrev Michael Klitgaard:
 Now the data is read in and I can access it, but I have the 'junk' in
 the array, which annoys me.
 Is there a way to remove the junk data, or skip it with fromfile ?

Not that I know of, unless you are ok with a loop (either in Python or 
C). If it's that important (e.g. to save physical memory), you could 
also memory map the file and never reference the junk data.

Sturla


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

2011-06-16 Thread Sturla Molden
Den 16.06.2011 16:16, skrev Bruce Southey:
 The idea was to calculate:
 innerProduct =numpy.sum(array1*array2) where array1 and array2 are, in
 general, multidimensional.

If you want to do that in parallel, fast an with minimal overhead, it's 
a typical OpenMP problem. If the dimensions are unknown in advance, I'd 
probably prefer a loop in C instead of Fortran and BLAS.

Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

2011-06-16 Thread Bruce Southey
On 06/16/2011 09:30 AM, Sturla Molden wrote:
 Den 16.06.2011 16:16, skrev Bruce Southey:
 The idea was to calculate:
 innerProduct =numpy.sum(array1*array2) where array1 and array2 are, in
 general, multidimensional.
 If you want to do that in parallel, fast an with minimal overhead, it's
 a typical OpenMP problem. If the dimensions are unknown in advance, I'd
 probably prefer a loop in C instead of Fortran and BLAS.

 Sturla
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
Remember that is what the OP wanted to do, not me.  My issue with this 
is that this suggestion may be overkill and too complex for the OP 
without knowing if there is 'sufficient' performance gain (let alone 
gaining back the time spent ensuring that code is correct and 
thread-safe as possible).

Bruce

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] datetimes with date vs time units, local time, and time zones

2011-06-16 Thread Mark Wiebe
On Thu, Jun 16, 2011 at 9:18 AM, Benjamin Root ben.r...@ou.edu wrote:

 On Wednesday, June 15, 2011, Mark Wiebe mwwi...@gmail.com wrote:
  Towards a reasonable behavior with regard to local times, I've made the
 default repr for datetimes use the C standard library to print them in a
 local ISO format. Combined with the ISO8601-prescribed behavior of
 interpreting datetime strings with no timezone specifier to be in local
 times, this allows the following cases to behave reasonably:
 
  np.datetime64('now')numpy.datetime64('2011-06-15T15:16:51-0500','s')
  np.datetime64('2011-06-15T18:00')
  numpy.datetime64('2011-06-15T18:00-0500','m')
  As noted in another thread, there can be some extremely surprising
 behavior as a consequence:
 
  np.array(['now', '2011-06-15'],
 dtype='M')array(['2011-06-15T15:18:26-0500', '2011-06-14T19:00:00-0500'],
 dtype='datetime64[s]')
  Having the 15th of June print out as 7pm on the 14th of June is probably
 not what one would generally expect, so I've come up with an approach which
 hopefully deals with this in a good way.
 
  One firm principal of the datetime in NumPy is that it is always stored
 as a POSIX time (referencing UTC), or a TAI time. There are two categories
 of units that can be used, which I will call date units and time units. The
 date units are 'Y', 'M', 'W', and 'D', while the time units are 'h', 'm',
 's', ..., 'as'. Time zones are only applied to datetimes stored in time
 units, so there's a qualitative difference between date and time units with
 respect to string conversions and calendar operations.
 
  I would like to place an 'unsafe' casting barrier between the date units
 and the time units, so that the above conversion from a date into a datetime
 will raise an error instead of producing a confusing result. This only
 applies to datetimes and not timedeltas, because for timedeltas the day -
 hour case is fine, it is just the year/month - other units which has
 issues, and that is already treated with an 'unsafe' casting barrier.
 
  Two new functions will facilitate the conversions between datetimes with
 date units and time units:
  date_as_datetime(datearray, hour, minute, second, microsecond,
 timezone='local', unit=None, out=None), which converts the provided dates
 into datetimes at the specified time, according to the specified timezone.
 If 'unit' is specified, it controls the output unit, otherwise it is the
 units in 'out' or the amount of precision specified in the function.
 
  datetime_as_date(datetimearray, timezone='local', out=None), which
 converts the provided datetimes into dates according to the specified
 timezone.
  In both functions, timezone can be any of 'UTC', 'TAI', 'local',
 '+/-', or a datetime.tzinfo object. The latter will allow NumPy
 datetimes to work with the pytz library for flexible time zone support.
 
  I would also like to extend the 'today' input string parsing to accept
 strings like 'today 12:30' to allow a convenient way to express different
 local times occurring today, mostly useful for interactive usage.
 
  I welcome any comments on this design, particularly if you can find a
 case where this doesn't produce a reasonable behavior.
  Cheers,Mark
 

 Is the output for the given usecase above with the mix of 'now' and a
 datetime string without tz info intended to still be correct?


No, that case would fail. The resolution of 'now' is seconds, and the
resolution of a date string is days, so the case would require a conversion
across the date unit/time unit boundary.


 I personally have misgivings about interpreating phrases like now and
 today at this level.  I think it introduces a can of worms that
 would be difficult to handle.


I like the convenience it gives at the interactive prompt, but maybe a
datetime_from_string function where you can selectively enable/disable
allowing of these special values and local times can provide control over
this. This is similar to the datetime_as_string function which gives more
flexibility than simple conversion to a string.

Consider some arbitrary set of inputs to the array function for
 datetime objects.  If they all contain no tz info, then they are all
 interpreated the same as-is.  However, if even one element has 'now',
 then the inputs are interpreated entirely differently.  This will
 confuse people.


The element 'now' has no effect on the other inputs, except to possibly
promote the unit to a seconds level of precision. All datetimes are in UTC,
and when timezone information is given, that is only used for parsing the
input, it is not preserved.

Just thinking out loud here, What about a case where the inputs are
 such that some do not specify tz and some others specify a mix of
 timezones? Should that be any different from the case given above?


I think this has the same answer, everything gets converted to UTC.

It has been awhile for me, but how different is this from Perl's
 floating tz for its datetime module?  Maybe we could combine its
 

Re: [Numpy-discussion] code review/build test for datetime business day API

2011-06-16 Thread Mark Wiebe
On Wed, Jun 15, 2011 at 5:16 PM, Derek Homeier 
de...@astro.physik.uni-goettingen.de wrote:

 On 15.06.2011, at 1:34AM, Mark Wiebe wrote:

  These functions are now fully implemented and documented. As always, code
 reviews are welcome here:
 
  https://github.com/numpy/numpy/pull/87
 
  and for those that don't want to dig into review C code, the commit for
 the documentation is here:
 
 
 https://github.com/m-paradox/numpy/commit/6b5a42a777b16812e774193b06da1b68b92bc689
 
  This is probably also another good place to do a merge to master, so if
 people could test it on Mac/Windows/other platforms that would be much
 appreciated.
 
 Probaby not specifically related to the business day, but to this month's
 general datetime fix-up,
 under Python3 there are a number of test failures of the kinds quoted
 below.
 You can review a fix (plus a number of additional test extensions and
 proposed fix for other
 failures on master) at
 https://github.com/dhomeier/numpy/compare/master...dt_tests_2to3


Thanks for the testing and proposed fixes, I've added some comments to the
commits.


 Plus on Mac OS X 10.5 PPC there is a failure due to the fact that the new
 datetime64 appears
 to _always_ return 0 for the year part on this architecture:

  date='1970-03-23 20:00:00Z'
  np.datetime64(date)
 np.datetime64('-03-23T21:00:00+0100','s')
  np.datetime64(date, 'Y')
 np.datetime64('','Y')
  np.datetime64('2011-06-16 02:03:04Z', 'D')
 np.datetime64('-06-16','D')

 I've tried to track this down in datetime.c, but unsuccessfully so (i.e. I
 could not connect it
 to any of the dts-year assignments therein).


This is definitely perplexing. Probably the first thing to check is whether
it's breaking during parsing or printing. This should always produce the
same result:

 np.datetime64('1970-03-23 20:00:00Z').astype('i8')
7070400

But maybe the test_days_creation is already checking that thoroughly enough.

Then, maybe printf-ing the year value at various stages of the printing,
like in set_datetimestruct_days, after convert_datetime_to_datetimestruct,
and in make_iso_8601_date. This would at least isolate where the year is
getting lost.

For the record, all tests pass with Python2.5-2.7 on OS X 10.5/10.6 i386 and
 x86_64.


Great!

Cheers,
Mark


 Cheers,
Derek


 FAIL: test_datetime_as_string (test_datetime.TestDateTime)
 --
 Traceback (most recent call last):
  File
 /Users/derek/lib/python3.2/site-packages/numpy/core/tests/test_datetime.py,
 line 970, in test_datetime_as_string
'1959')
  File /Users/derek/lib/python3.2/site-packages/numpy/testing/utils.py,
 line 313, in assert_equal
raise AssertionError(msg)
 AssertionError:
 Items are not equal:
  ACTUAL: b''
  DESIRED: '1959'

 ==
 FAIL: test_days_creation (test_datetime.TestDateTime)
 --
 Traceback (most recent call last):
  File
 /Users/derek/lib/python3.2/site-packages/numpy/core/tests/test_datetime.py,
 line 287, in test_days_creation
(1900-1970)*365 - (1970-1900)/4)
  File /Users/derek/lib/python3.2/site-packages/numpy/testing/utils.py,
 line 256, in assert_equal
return assert_array_equal(actual, desired, err_msg, verbose)
  File /Users/derek/lib/python3.2/site-packages/numpy/testing/utils.py,
 line 706, in assert_array_equal
verbose=verbose, header='Arrays are not equal')
  File /Users/derek/lib/python3.2/site-packages/numpy/testing/utils.py,
 line 635, in assert_array_compare
raise AssertionError(msg)
 AssertionError:
 Arrays are not equal

 (mismatch 100.0%)
  x: array(-25567, dtype='int64')
  y: array(-25567.5)

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 3d ODR

2011-06-16 Thread Robert Kern
On Thu, Jun 16, 2011 at 06:28, Christian K. ckk...@hoc.net wrote:
 Hi,

 I need to do fit a 3d surface to a point cloud. This sounds like a job for 3d
 orthogonal distance regression. Does anybody know of an implementation?

As eat points out, scipy.odr should do the job nicely. You should read
at least part of the ODRPACK User's Guide in addition to the scipy
docs for it:

  http://docs.scipy.org/doc/external/odrpack_guide.pdf

The FORTRAN examples there have scipy.odr translations in the unit tests:

  https://github.com/scipy/scipy/blob/master/scipy/odr/tests/test_odr.py

Exactly what you need to do depends on the details of your problem. If
your surface is of the form z=f(x,y), that's a pretty straightforward
multidimensional model. If you have a closed surface, however, you
will need to use an implicit model of the form f(x,y,z)=0, which may
or may not be easy to formulate. The unit test for the implicit case
fits a general 2D ellipse to a 2D cloud of points, as described in the
User's Guide.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

2011-06-16 Thread Christopher Barker
NOTE: I'm only taking part in this discussion because it's interesting 
and I hope to learn something. I do hope the OP chimes back in to 
clarify his needs, but in the meantime...




Bruce Southey wrote:

Remember that is what the OP wanted to do, not me.


Actually, I don't think that's what the OP wanted -- I think we have a 
conflict between the need for concrete examples, and the desire to find 
a generic solution, so I think this is what the OP wants:



How to best multiprocess a _generic_ operation that needs to be 
performed on a lot of arrays. Something like:



output = []
for a in a_bunch_of_arrays:
  output.append( a_function(a) )


More specifically, a_function() is an inner product, *defined by the user*.

So there is no way to optimize the inner product itself (that will be up 
to the user), nor any way to generally convert the bunch_of_arrays to a 
single array with a single higher-dimensional operation.


In testing his approach, the OP used a numpy multiply, and a simple, 
loop-through-the elements multiply, and found that with his 
multiprocessing calls, the simple loop was a fair bit faster with two 
processors, but that the numpy one was slower with two processors. Of 
course, the looping method was much, much, slower than the numpy one in 
any case.


So Sturla's comments are probably right on:

Sturla Molden wrote:


innerProductList = pool.map(myutil.numpy_inner_product, arrayList)

1.  Here we potentially have a case of false sharing and/or mutex 
contention, as the work is too fine grained.  pool.map does not do any 
load balancing. If pool.map is to scale nicely, each work item must take 
a substantial amount of time. I suspect this is the main issue.


2. There is also the question of when the process pool is spawned. 
Though I haven't checked, I suspect it happens prior to calling 
pool.map. But if it does not, this is a factor as well, particularly on 
Windows (less so on Linux and Apple).


It didn't work well on my Mac, so ti's either not an issue, or not 
Windows-specific, anyway.


3.  arrayList is serialised by pickling, which has a significan 
overhead.  It's not shared memory either, as the OP's code implies, but 
the main thing is the slowness of cPickle.


I'll bet this is a big issue, and one I'm curious about how to address, 
I have another problem where I need to multi-process, and I'd love to 
know a way to pass data to the other process and back *without* going 
through pickle. maybe memmapped files?



IPs = N.array(innerProductList)

4.  numpy.array is a very slow function. The benchmark should preferably 
not include this overhead.


I re-ran, moving that out of the timing loop, and, indeed, it helped a 
lot, but it still takes longer with the multi-processing.


I suspect that the overhead of pickling, etc. is overwhelming the 
operation itself. That and the load balancing issue that I don't understand!


To test this, I did a little experiment -- creating a fake operation, 
one that simply returns an element from the input array -- so it should 
take next to no time, and we can time the overhead of the pickling, etc:


$ python shared_mem.py

Using 2 processes
No shared memory, numpy array multiplication took 0.124427080154 seconds
Shared memory, numpy array multiplication took 0.586215019226 seconds

No shared memory, fake array multiplication took 0.000391006469727 seconds
Shared memory, fake array multiplication took 0.54935503006 seconds

No shared memory, my array multiplication took 23.5055780411 seconds
Shared memory, my array multiplication took 13.0932741165 seconds

Bingo!

The overhead of the multi-processing takes about .54 seconds, which 
explains the slowdown for the numpy method


not so mysterious after all.

Bruce Southey wrote:

But if everything is *single-threaded* and 
thread-safe, then you just create a function and use Anne's very useful 
handythread.py (http://www.scipy.org/Cookbook/Multithreading).


This may be worth a try -- though the GIL could well get in the way.

By the way, if the arrays are sufficiently small, there is a lot of 
overhead involved such that there is more time in communication than 
computation.


yup -- clearly the case here. I wonder if it's just array size though -- 
won't cPickle time scale with array size? So it may not be size pe-se, 
but rather how much computation you need for a given size array.


-Chris

[I've enclosed the OP's slightly altered code]




--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov


myutil.py
Description: application/python


shared_mem.py
Description: application/python
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Regression in choose()

2011-06-16 Thread Travis Oliphant
Hi Ed, 

I'm pretty sure that this is bug is due to the compile-time constant 
NPY_MAXARGS defined in include/numpy/ndarraytypes.h I suspect that the 
versions you are trying it on where it succeeds as a different compile-time 
constant of that value. 

NumPy uses a multi-iterator object (PyArrayMultiIterObject defined in 
ndarraytypes.h as well) to broadcast arguments together for ufuncs and for 
functions like choose.  The data-structure that it uses to do this has a static 
array of Iterator objects with room for NPY_MAXARGS iterators. I think in 
some versions this compile time constant has been 40 or higher.Re-compiling 
NumPy by bumping up that constant will of course require re-compilation of most 
extensions modules that use the NumPy API. 

Numeric did not use this approach to broadcast the arguments to choose together 
and so likely does not have the same limitation.   It would also not be that 
difficult to modify the NumPy code to dynamically allocate the iters array when 
needed to remove the NPY_MAXARGS limitation.   In fact, I would not mind seeing 
all the NPY_MAXDIMS and NPY_MAXARGS limitations removed.   To do it well you 
would probably want to have some minimum storage-space pre-allocated (say 
NPY_MAXDIMS as 7 and NPY_MAXARGS as 10 to avoid the memory allocation in common 
cases) and just increase that space as needed dynamically.   

This would be a nice project for someone wanting to learn the NumPy code base.

-Travis





On Jun 16, 2011, at 1:56 AM, Ed Schofield wrote:

 Hi all,
 
 I have been investigation the limitation of the choose() method (and 
 function) to 32 elements. This is a regression in recent versions of NumPy. I 
 have tested choose() in the following NumPy versions:
 
 1.0.4: fine
 1.1.1: bug
 1.2.1: fine
 1.3.0: bug
 1.4.x: bug
 1.5.x: bug
 1.6.x: bug
 Numeric 24.3: fine
 
 (To run the tests on versions of NumPy prior to 1.4.x I used Python 2.4.3. 
 For the other tests I used Python 2.7.)
 
 Here 'bug' means the choose() function has the 32-element limitation. I have 
 been helping an organization to port a large old Numeric-using codebase to 
 NumPy, and the choose() limitation in recent NumPy versions is throwing a 
 spanner in the works. The codebase is currently using both NumPy and Numeric 
 side-by-side, with Numeric only being used for its choose() function, with a 
 few dozen lines like this:
 
 a = numpy.array(Numeric.choose(b, c))
 
 Here is a simple example that triggers the bug. It is a simple extension of 
 the example from the choose() docstring:
 
 
 
 import numpy as np
 
 choices = [[0, 1, 2, 3], [10, 11, 12, 13],
   [20, 21, 22, 23], [30, 31, 32, 33]]
 
 np.choose([2, 3, 1, 0], choices * 8)
 
 
 
 A side note: the exception message (defined in 
 core/src/multiarray/iterators.c) is also slightly inconsistent with the 
 actual behaviour:
 
 Traceback (most recent call last):
   File chooser.py, line 6, in module
 np.choose([2, 3, 1, 0], choices * 8)
   File /usr/lib64/python2.7/site-packages/numpy/core/fromnumeric.py, line 
 277, in choose
 return _wrapit(a, 'choose', choices, out=out, mode=mode)
   File /usr/lib64/python2.7/site-packages/numpy/core/fromnumeric.py, line 
 37, in _wrapit
 result = getattr(asarray(obj),method)(*args, **kwds)
 ValueError: Need between 2 and (32) array objects (inclusive).
 
 The actual behaviour is that choose() passes with 31 objects but fails with 
 32 objects, so this should read exclusive rather than inclusive. (And why 
 the parentheses around 32?)
 
 Does anyone know what changed between 1.2.1 and 1.3.0 that introduced the 
 32-element limitation to choose(), and whether we might be able to lift this 
 limitation again for future NumPy versions? I have a couple of days to work 
 on a patch ... if someone can advise me how to approach this.
 
 Best wishes,
 Ed
 
 
 -- 
 Dr. Edward Schofield
 Python Charmers
 +61 (0)405 676 229
 http://pythoncharmers.com
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

---
Travis Oliphant
Enthought, Inc.
oliph...@enthought.com
1-512-536-1057
http://www.enthought.com



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

2011-06-16 Thread Robin
On Thu, Jun 16, 2011 at 6:44 PM, Christopher Barker
chris.bar...@noaa.gov wrote:
 2. There is also the question of when the process pool is spawned. Though
 I haven't checked, I suspect it happens prior to calling pool.map. But if it
 does not, this is a factor as well, particularly on Windows (less so on
 Linux and Apple).

 It didn't work well on my Mac, so ti's either not an issue, or not
 Windows-specific, anyway.

I am pretty sure that the process pool is spawned when you create the
pool object instance.


 3.  arrayList is serialised by pickling, which has a significan
 overhead.  It's not shared memory either, as the OP's code implies, but the
 main thing is the slowness of cPickle.

 I'll bet this is a big issue, and one I'm curious about how to address, I
 have another problem where I need to multi-process, and I'd love to know a
 way to pass data to the other process and back *without* going through
 pickle. maybe memmapped files?

If you are on Linux or Mac then fork works nicely so you have read
only shared memory you just have to put it in a module before the fork
(so before pool = Pool() ) and then all the subprocesses can access it
without any pickling required. ie
myutil.data = listofdata
p = multiprocessing.Pool(8)
def mymapfunc(i):
return mydatafunc(myutil.data[i])

p.map(mymapfunc, range(len(myutil.data)))

Actually that won't work because mymapfunc needs to be in a module so
it can be pickled but hopefully you get the idea.

Cheers

Robin




 IPs = N.array(innerProductList)

 4.  numpy.array is a very slow function. The benchmark should preferably
 not include this overhead.

 I re-ran, moving that out of the timing loop, and, indeed, it helped a lot,
 but it still takes longer with the multi-processing.

 I suspect that the overhead of pickling, etc. is overwhelming the operation
 itself. That and the load balancing issue that I don't understand!

 To test this, I did a little experiment -- creating a fake operation, one
 that simply returns an element from the input array -- so it should take
 next to no time, and we can time the overhead of the pickling, etc:

 $ python shared_mem.py

 Using 2 processes
 No shared memory, numpy array multiplication took 0.124427080154 seconds
 Shared memory, numpy array multiplication took 0.586215019226 seconds

 No shared memory, fake array multiplication took 0.000391006469727 seconds
 Shared memory, fake array multiplication took 0.54935503006 seconds

 No shared memory, my array multiplication took 23.5055780411 seconds
 Shared memory, my array multiplication took 13.0932741165 seconds

 Bingo!

 The overhead of the multi-processing takes about .54 seconds, which explains
 the slowdown for the numpy method

 not so mysterious after all.

 Bruce Southey wrote:

 But if everything is *single-threaded* and thread-safe, then you just
 create a function and use Anne's very useful handythread.py
 (http://www.scipy.org/Cookbook/Multithreading).

 This may be worth a try -- though the GIL could well get in the way.

 By the way, if the arrays are sufficiently small, there is a lot of
 overhead involved such that there is more time in communication than
 computation.

 yup -- clearly the case here. I wonder if it's just array size though --
 won't cPickle time scale with array size? So it may not be size pe-se, but
 rather how much computation you need for a given size array.

 -Chris

 [I've enclosed the OP's slightly altered code]




 --
 Christopher Barker, Ph.D.
 Oceanographer

 Emergency Response Division
 NOAA/NOS/ORR            (206) 526-6959   voice
 7600 Sand Point Way NE   (206) 526-6329   fax
 Seattle, WA  98115       (206) 526-6317   main reception

 chris.bar...@noaa.gov

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

2011-06-16 Thread Sturla Molden
Den 16.06.2011 18:44, skrev Christopher Barker:

 I'll bet this is a big issue, and one I'm curious about how to 
 address, I have another problem where I need to multi-process, and I'd 
 love to know a way to pass data to the other process and back 
 *without* going through pickle. maybe memmapped files?

Remember that os.fork is copy-on-write optimized. Often this is enough 
to share data.

To get data back, a possibility is to use shared memory: just memmap 
from file 0 or -1 (depending on the system) and fork.

Note that the use of fork, as opposed to multiprocessing, avoids the 
pickle overhead for NumPy arrays.

On Windows this sucks, because there is no fork system call. Here we are 
stuck with multiprocessing and pickle, even if we use shared memory.

Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

2011-06-16 Thread Christopher Barker
Sturla Molden wrote:
 On Windows this sucks, because there is no fork system call.

Darn -- I do need to support Windows.

 Here we are 
 stuck with multiprocessing and pickle, even if we use shared memory.

What do you need to pickle if you're using shared memory?

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

2011-06-16 Thread Sturla Molden
Den 16.06.2011 19:23, skrev Robin:

 If you are on Linux or Mac then fork works nicely so you have read
 only shared memory you just have to put it in a module before the fork
 (so before pool = Pool() ) and then all the subprocesses can access it
 without any pickling required. ie
 myutil.data = listofdata
 p = multiprocessing.Pool(8)
 def mymapfunc(i):
  return mydatafunc(myutil.data[i])

 p.map(mymapfunc, range(len(myutil.data)))


There is still the issue that p.map does not do any load balancing. The 
processes in the pool might spend all the time battling for a mutex. We 
must thus arrange it so each call to mymapfunc processes a chunk of data 
instead of a single item.

This is one strategy that works well:

With n remaining work items and m processes, and c the minimum chunk 
size, let the process holding a mutex grab

max( c, n/m )

work items. Then n is reduced accordingly, and this continues until all 
items are exchausted.

Also, as processes do not work on interleaved items, we avoid false 
sharing as much as possible. (False sharing means that one processor 
will write to a cache line used by another processor, so both must stop 
what they're doing and synchronize cache with RAM.)


Sturla












___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

2011-06-16 Thread Sturla Molden
Den 16.06.2011 19:47, skrev Christopher Barker:

 What do you need to pickle if you're using shared memory?

The meta-data for the ndarray (strides, shape, dtype, etc.) and the name 
of the shared memory segment. As there is no fork, we must tell the 
other process where to find the shared memory and what to do with it.

Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

2011-06-16 Thread Sturla Molden
Den 16.06.2011 19:55, skrev Sturla Molden:

 The meta-data for the ndarray (strides, shape, dtype, etc.) and the name
 of the shared memory segment. As there is no fork, we must tell the
 other process where to find the shared memory and what to do with it.


Which by the way is what the shared memory arrays I and Gaël made will do,
but we still have the annoying pickle overhead.

Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

2011-06-16 Thread Christopher Barker
Sturla Molden wrote:
 Den 16.06.2011 19:55, skrev Sturla Molden:
 The meta-data for the ndarray (strides, shape, dtype, etc.) and the name
 of the shared memory segment. As there is no fork, we must tell the
 other process where to find the shared memory and what to do with it.

got it, thanks.

 Which by the way is what the shared memory arrays I and Gaël made will do,
 but we still have the annoying pickle overhead.

Do you have a pointer to that code? It sounds handy.

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

2011-06-16 Thread Bruce Southey

On 06/16/2011 11:44 AM, Christopher Barker wrote:
NOTE: I'm only taking part in this discussion because it's interesting 
and I hope to learn something. I do hope the OP chimes back in to 
clarify his needs, but in the meantime...




Bruce Southey wrote:

Remember that is what the OP wanted to do, not me.


Actually, I don't think that's what the OP wanted -- I think we have a 
conflict between the need for concrete examples, and the desire to 
find a generic solution, so I think this is what the OP wants:



How to best multiprocess a _generic_ operation that needs to be 
performed on a lot of arrays. Something like:



output = []
for a in a_bunch_of_arrays:
  output.append( a_function(a) )


More specifically, a_function() is an inner product, *defined by the 
user*.


So there is no way to optimize the inner product itself (that will be 
up to the user), nor any way to generally convert the bunch_of_arrays 
to a single array with a single higher-dimensional operation.


In testing his approach, the OP used a numpy multiply, and a simple, 
loop-through-the elements multiply, and found that with his 
multiprocessing calls, the simple loop was a fair bit faster with two 
processors, but that the numpy one was slower with two processors. Of 
course, the looping method was much, much, slower than the numpy one 
in any case.


So Sturla's comments are probably right on:

Sturla Molden wrote:


innerProductList = pool.map(myutil.numpy_inner_product, arrayList)

1.  Here we potentially have a case of false sharing and/or mutex 
contention, as the work is too fine grained.  pool.map does not do 
any load balancing. If pool.map is to scale nicely, each work item 
must take a substantial amount of time. I suspect this is the main 
issue.


2. There is also the question of when the process pool is spawned. 
Though I haven't checked, I suspect it happens prior to calling 
pool.map. But if it does not, this is a factor as well, particularly 
on Windows (less so on Linux and Apple).


It didn't work well on my Mac, so ti's either not an issue, or not 
Windows-specific, anyway.


3.  arrayList is serialised by pickling, which has a significan 
overhead.  It's not shared memory either, as the OP's code implies, 
but the main thing is the slowness of cPickle.


I'll bet this is a big issue, and one I'm curious about how to 
address, I have another problem where I need to multi-process, and I'd 
love to know a way to pass data to the other process and back 
*without* going through pickle. maybe memmapped files?



IPs = N.array(innerProductList)

4.  numpy.array is a very slow function. The benchmark should 
preferably not include this overhead.


I re-ran, moving that out of the timing loop, and, indeed, it helped a 
lot, but it still takes longer with the multi-processing.


I suspect that the overhead of pickling, etc. is overwhelming the 
operation itself. That and the load balancing issue that I don't 
understand!


To test this, I did a little experiment -- creating a fake 
operation, one that simply returns an element from the input array -- 
so it should take next to no time, and we can time the overhead of the 
pickling, etc:


$ python shared_mem.py

Using 2 processes
No shared memory, numpy array multiplication took 0.124427080154 seconds
Shared memory, numpy array multiplication took 0.586215019226 seconds

No shared memory, fake array multiplication took 0.000391006469727 
seconds

Shared memory, fake array multiplication took 0.54935503006 seconds

No shared memory, my array multiplication took 23.5055780411 seconds
Shared memory, my array multiplication took 13.0932741165 seconds

Bingo!

The overhead of the multi-processing takes about .54 seconds, which 
explains the slowdown for the numpy method


not so mysterious after all.

Bruce Southey wrote:

But if everything is *single-threaded* and thread-safe, then you just 
create a function and use Anne's very useful handythread.py 
(http://www.scipy.org/Cookbook/Multithreading).


This may be worth a try -- though the GIL could well get in the way.

By the way, if the arrays are sufficiently small, there is a lot of 
overhead involved such that there is more time in communication than 
computation.


yup -- clearly the case here. I wonder if it's just array size though 
-- won't cPickle time scale with array size? So it may not be size 
pe-se, but rather how much computation you need for a given size array.


-Chris

[I've enclosed the OP's slightly altered code]





___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Please see:
http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056766.html
I'm doing element wise multiplication, basically innerProduct = 
numpy.sum(array1*array2) where array1 and array2 are, in general, 
multidimensional.


Thanks for the code as I forgot that it was sent.

I think there is something weird about these 

[Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

2011-06-16 Thread Brandt Belson
Hi all,
Thanks for the replies. As mentioned, I'm parallelizing so that I can take
many inner products simultaneously (which I agree is embarrassingly
parallel). The library I'm writing asks the user to supply a function that
takes two objects and returns their inner product. After all the discussion
though it seems this is too simplistic of an approach. Instead, I plan to
write this part of the library as if the inner product function supplied by
the user uses all available cores (with numpy and/or numexpr built with MKL
or LAPACK).

As far as using fortran or C and openMP, this probably isn't worth the time
it would take, both for me and the user.

I've tried increasing the array sizes and found the same trends, so the
slowdown isn't only because the arrays are too small to see the benefit of
multiprocessing. I wrote the code to be easy for anyone to experiment with,
so feel free to play around with what is included in the profiling, the
sizes of arrays, functions used, etc.

I also tried using handythread.foreach with arraySize = (3000,1000), and
found the following:
No shared memory, numpy array multiplication took 1.57585811615 seconds
Shared memory, numpy array multiplication took 1.25499510765 seconds
This is definitely an improvement from multiprocessing, but without knowing
any better, I was hoping to see a roughly 8x speedup on my 8-core
workstation.

Based on what Chris sent, it seems there is some large overhead caused by
multiprocessing pickling numpy arrays. To test what Robin mentioned

 If you are on Linux or Mac then fork works nicely so you have read
 only shared memory you just have to put it in a module before the fork
 (so before pool = Pool() ) and then all the subprocesses can access it
 without any pickling required. ie
 myutil.data = listofdata
 p = multiprocessing.Pool(8)
 def mymapfunc(i):
   return mydatafunc(myutil.data[i])

 p.map(mymapfunc, range(len(myutil.data)))

I tried creating the arrayList in the myutil module and using
multiprocessing to find the inner products of myutil.arrayList, however this
was still slower than not using multiprocessing, so I believe there is still
some large overhead. Here are the results:
No shared memory, numpy array multiplication took 1.55906510353 seconds
Shared memory, numpy array multiplication took 9.8242638 seconds
Shared memory, myutil.arrayList numpy array multiplication took
8.77094507217 seconds
I'm attaching this code.

I'm going to work around this numpy/multiprocessing behavior with
numpy/numexpr built with MKL or LAPACK. It would be good to know exactly
what's causing this though. It would be nice if there was a way to get the
ideal speedup via multiprocessing, regardless of the internal workings of
the single-threaded inner product function, as this was the behavior I
expected. I imagine other people might come across similar situations, but
again I'm going to try to get around this by letting MKL or LAPACK make use
of all available cores.

Thanks again,
Brandt


myutil.py
Description: Binary data


shared_mem.py
Description: Binary data
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

2011-06-16 Thread Robin
The fact that you are still passing the myutil.arrayList argument to
pool.map means the data is still being pickled. All arguments to
pool.map are pickled to be passed to the subprocess. You need to
change the function so that it accesses the data directly, then just
pass indices to pool.map.

Cheers

Robin

On Thu, Jun 16, 2011 at 9:05 PM, Brandt Belson bbel...@princeton.edu wrote:
 Hi all,
 Thanks for the replies. As mentioned, I'm parallelizing so that I can take
 many inner products simultaneously (which I agree is embarrassingly
 parallel). The library I'm writing asks the user to supply a function that
 takes two objects and returns their inner product. After all the discussion
 though it seems this is too simplistic of an approach. Instead, I plan to
 write this part of the library as if the inner product function supplied by
 the user uses all available cores (with numpy and/or numexpr built with MKL
 or LAPACK).
 As far as using fortran or C and openMP, this probably isn't worth the time
 it would take, both for me and the user.
 I've tried increasing the array sizes and found the same trends, so the
 slowdown isn't only because the arrays are too small to see the benefit of
 multiprocessing. I wrote the code to be easy for anyone to experiment with,
 so feel free to play around with what is included in the profiling, the
 sizes of arrays, functions used, etc.
 I also tried using handythread.foreach with arraySize = (3000,1000), and
 found the following:
 No shared memory, numpy array multiplication took 1.57585811615 seconds
 Shared memory, numpy array multiplication took 1.25499510765 seconds
 This is definitely an improvement from multiprocessing, but without knowing
 any better, I was hoping to see a roughly 8x speedup on my 8-core
 workstation.
 Based on what Chris sent, it seems there is some large overhead caused by
 multiprocessing pickling numpy arrays. To test what Robin mentioned
 If you are on Linux or Mac then fork works nicely so you have read
 only shared memory you just have to put it in a module before the fork
 (so before pool = Pool() ) and then all the subprocesses can access it
 without any pickling required. ie
 myutil.data = listofdata
 p = multiprocessing.Pool(8)
 def mymapfunc(i):
   return mydatafunc(myutil.data[i])

 p.map(mymapfunc, range(len(myutil.data)))
 I tried creating the arrayList in the myutil module and using
 multiprocessing to find the inner products of myutil.arrayList, however this
 was still slower than not using multiprocessing, so I believe there is still
 some large overhead. Here are the results:
 No shared memory, numpy array multiplication took 1.55906510353 seconds
 Shared memory, numpy array multiplication took 9.8242638 seconds
 Shared memory, myutil.arrayList numpy array multiplication took
 8.77094507217 seconds
 I'm attaching this code.
 I'm going to work around this numpy/multiprocessing behavior with
 numpy/numexpr built with MKL or LAPACK. It would be good to know exactly
 what's causing this though. It would be nice if there was a way to get the
 ideal speedup via multiprocessing, regardless of the internal workings of
 the single-threaded inner product function, as this was the behavior I
 expected. I imagine other people might come across similar situations, but
 again I'm going to try to get around this by letting MKL or LAPACK make use
 of all available cores.
 Thanks again,
 Brandt
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] code review/build test for datetime business day API

2011-06-16 Thread Mark Wiebe
I've received some good feedback from Chuck and Ralf on the code and
documentation, respectively, and build testing with proposed fixes for
issues from the previous merge from Derek. I believe the current set of
changes are in good shape to merge, so would like to proceed with that later
today.

Cheers,
Mark

On Tue, Jun 14, 2011 at 6:34 PM, Mark Wiebe mwwi...@gmail.com wrote:

 These functions are now fully implemented and documented. As always, code
 reviews are welcome here:

 https://github.com/numpy/numpy/pull/87

 and for those that don't want to dig into review C code, the commit for the
 documentation is here:


 https://github.com/m-paradox/numpy/commit/6b5a42a777b16812e774193b06da1b68b92bc689

 This is probably also another good place to do a merge to master, so if
 people could test it on Mac/Windows/other platforms that would be much
 appreciated.

 Thanks,
 Mark

 On Fri, Jun 10, 2011 at 5:49 PM, Mark Wiebe mwwi...@gmail.com wrote:

 I've implemented the busday_offset function with support for the weekmask
 and roll parameters, the commits are tagged 'datetime-bday' in the pull
 request here:

 https://github.com/numpy/numpy/pull/87

 -Mark


 On Thu, Jun 9, 2011 at 5:23 PM, Mark Wiebe mwwi...@gmail.com wrote:

 Here's a possible design for a business day API for numpy datetimes:


 The 'B' business day unit will be removed. All business day-related
 calculations will be done using the 'D' day unit.

 A class *BusinessDayDef* to encapsulate the definition of the business
 week and holidays. The business day functions will either take one of these
 objects, or separate weekmask and holidays parameters, to specify the
 business day definition. This class serves as both a performance
 optimization and a way to encapsulate the weekmask and holidays together,
 for example if you want to make a dictionary mapping exchange names to their
 trading days definition.

 The weekmask can be specified in a number of ways, and internally becomes
 a boolean array with 7 elements with True for the days Monday through Sunday
 which are valid business days. Some different notations are for the 5-day
 week include [1,1,1,1,1,0,0], 100 MonTueWedThuFri. The holidays are
 always specified as a one-dimensional array of dtype 'M8[D]', and are
 internally used in sorted form.


 A function *is_busday*(datearray, weekmask=, holidays=, busdaydef=)
 returns a boolean array matching the input datearray, with True for the
 valid business days.

 A function *busday_offset*(datearray, offsetarray,
 roll='raise', weekmask=, holidays=, busdaydef=) which first applies the
 'roll' policy to start at a valid business date, then offsets the date by
 the number of business days specified in offsetarray. The arrays datearray
 and offsetarray are broadcast together. The 'roll' parameter can be
 'forward'/'following', 'backward'/'preceding', 'modifiedfollowing',
 'modifiedpreceding', or 'raise' (the default).

 A function *busday_count*(datearray1, datearray2, weekmask=, holidays=,
 busdaydef=) which calculates the number of business days between datearray1
 and datearray2, not including the day of datearray2.


 For example, to find the first Monday in Feb 2011,

 np.busday_offset('2011-02', 0, roll='forward', weekmask='Mon')

 or to find the number of weekdays in Feb 2011,

 np.busday_count('2011-02', '2011-03')

 This set of three functions appears to be powerful enough to express the
 business-day computations that I've been shown thus far.

 Cheers,
 Mark




___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] code review/build test for datetime business day API

2011-06-16 Thread Charles R Harris
On Thu, Jun 16, 2011 at 2:00 PM, Mark Wiebe mwwi...@gmail.com wrote:

 I've received some good feedback from Chuck and Ralf on the code and
 documentation, respectively, and build testing with proposed fixes for
 issues from the previous merge from Derek. I believe the current set of
 changes are in good shape to merge, so would like to proceed with that later
 today.


Go for it.

snip

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

2011-06-16 Thread Bruce Southey

On 06/16/2011 02:05 PM, Brandt Belson wrote:

Hi all,
Thanks for the replies. As mentioned, I'm parallelizing so that I can 
take many inner products simultaneously (which I agree is 
embarrassingly parallel). The library I'm writing asks the user to 
supply a function that takes two objects and returns their inner 
product. After all the discussion though it seems this is too 
simplistic of an approach. Instead, I plan to write this part of the 
library as if the inner product function supplied by the user uses all 
available cores (with numpy and/or numexpr built with MKL or LAPACK).


As far as using fortran or C and openMP, this probably isn't worth the 
time it would take, both for me and the user.


I've tried increasing the array sizes and found the same trends, so 
the slowdown isn't only because the arrays are too small to see the 
benefit of multiprocessing. I wrote the code to be easy for anyone to 
experiment with, so feel free to play around with what is included in 
the profiling, the sizes of arrays, functions used, etc.


I also tried using handythread.foreach with arraySize = (3000,1000), 
and found the following:

No shared memory, numpy array multiplication took 1.57585811615 seconds
Shared memory, numpy array multiplication took 1.25499510765 seconds
This is definitely an improvement from multiprocessing, but without 
knowing any better, I was hoping to see a roughly 8x speedup on my 
8-core workstation.


Based on what Chris sent, it seems there is some large overhead caused 
by multiprocessing pickling numpy arrays. To test what Robin mentioned


 If you are on Linux or Mac then fork works nicely so you have read
 only shared memory you just have to put it in a module before the fork
 (so before pool = Pool() ) and then all the subprocesses can access it
 without any pickling required. ie
 myutil.data = listofdata
 p = multiprocessing.Pool(8)
 def mymapfunc(i):
   return mydatafunc(myutil.data[i])

 p.map(mymapfunc, range(len(myutil.data)))

I tried creating the arrayList in the myutil module and using 
multiprocessing to find the inner products of myutil.arrayList, 
however this was still slower than not using multiprocessing, so I 
believe there is still some large overhead. Here are the results:

No shared memory, numpy array multiplication took 1.55906510353 seconds
Shared memory, numpy array multiplication took 9.8242638 seconds
Shared memory, myutil.arrayList numpy array multiplication took 
8.77094507217 seconds

I'm attaching this code.

I'm going to work around this numpy/multiprocessing behavior with 
numpy/numexpr built with MKL or LAPACK. It would be good to know 
exactly what's causing this though. It would be nice if there was a 
way to get the ideal speedup via multiprocessing, regardless of the 
internal workings of the single-threaded inner product function, as 
this was the behavior I expected. I imagine other people might come 
across similar situations, but again I'm going to try to get around 
this by letting MKL or LAPACK make use of all available cores.


Thanks again,
Brandt


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
I think this is not being benchmarked correctly because there should be 
a noticeable different when different number of threads are selected.


But really you should read these sources:
http://www.scipy.org/ParallelProgramming
http://stackoverflow.com/questions/5260068/multithreaded-blas-in-python-numpy

Also numpy has extra things going on like checks and copies that 
probably make using np.inner() slower. Thus, your 'numpy_inner_product' 
is probably as efficient as you can get without extreme measures like 
cython.


Bruce
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] code review/build test for datetime business day API

2011-06-16 Thread Derek Homeier
Hi Mark,

On 16.06.2011, at 5:40PM, Mark Wiebe wrote:

  np.datetime64('2011-06-16 02:03:04Z', 'D')
 np.datetime64('-06-16','D')
 
 I've tried to track this down in datetime.c, but unsuccessfully so (i.e. I 
 could not connect it
 to any of the dts-year assignments therein).
 
 This is definitely perplexing. Probably the first thing to check is whether 
 it's breaking during parsing or printing. This should always produce the same 
 result:
 
  np.datetime64('1970-03-23 20:00:00Z').astype('i8')
 7070400
 
 But maybe the test_days_creation is already checking that thoroughly enough.
 
 Then, maybe printf-ing the year value at various stages of the printing, like 
 in set_datetimestruct_days, after convert_datetime_to_datetimestruct, and in 
 make_iso_8601_date. This would at least isolate where the year is getting 
 lost.
 
ok, that was a lengthy hunt, but it's in printing the string in 
make_iso_8601_date:

tmplen = snprintf(substr, sublen, %04 NPY_INT64_FMT, dts-year);
fprintf(stderr, printed %d[%d]: dts-year=%lld: %s\n, tmplen, sublen, 
dts-year, substr);

produces

 np.datetime64('1970-03-23 20:00:00Z', 'D')
printed 4[62]: dts-year=1970: 
numpy.datetime64('-03-23','D')

It seems snprintf is not using the correct format for INT64 (as I happened to 
do in fprintf before 
realising I had to use %lld ;-) - could it be this is a general issue, which 
just does not show up 
on little-endian machines because they happen to pass the right half of the 
int64 to printf?
BTW, how is this supposed to be handled (in 4 digits) if the year is indeed 
beyond the 32bit range 
(i.e. ~ 0.3 Hubble times...)? Just wondering if one could simply cast it to 
int32 before print.

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] code review/build test for datetime business day API

2011-06-16 Thread Mark Wiebe
On Thu, Jun 16, 2011 at 5:52 PM, Derek Homeier 
de...@astro.physik.uni-goettingen.de wrote:

 Hi Mark,

 On 16.06.2011, at 5:40PM, Mark Wiebe wrote:

   np.datetime64('2011-06-16 02:03:04Z', 'D')
  np.datetime64('-06-16','D')
 
  I've tried to track this down in datetime.c, but unsuccessfully so (i.e.
 I could not connect it
  to any of the dts-year assignments therein).
 
  This is definitely perplexing. Probably the first thing to check is
 whether it's breaking during parsing or printing. This should always produce
 the same result:
 
   np.datetime64('1970-03-23 20:00:00Z').astype('i8')
  7070400
 
  But maybe the test_days_creation is already checking that thoroughly
 enough.
 
  Then, maybe printf-ing the year value at various stages of the printing,
 like in set_datetimestruct_days, after convert_datetime_to_datetimestruct,
 and in make_iso_8601_date. This would at least isolate where the year is
 getting lost.
 
 ok, that was a lengthy hunt, but it's in printing the string in
 make_iso_8601_date:

tmplen = snprintf(substr, sublen, %04 NPY_INT64_FMT, dts-year);
fprintf(stderr, printed %d[%d]: dts-year=%lld: %s\n, tmplen, sublen,
 dts-year, substr);

 produces

  np.datetime64('1970-03-23 20:00:00Z', 'D')
 printed 4[62]: dts-year=1970: 
 numpy.datetime64('-03-23','D')

 It seems snprintf is not using the correct format for INT64 (as I happened
 to do in fprintf before
 realising I had to use %lld ;-) - could it be this is a general issue,
 which just does not show up
 on little-endian machines because they happen to pass the right half of the
 int64 to printf?
 BTW, how is this supposed to be handled (in 4 digits) if the year is indeed
 beyond the 32bit range
 (i.e. ~ 0.3 Hubble times...)? Just wondering if one could simply cast it
 to int32 before print.


I'd prefer to fix the NPY_INT64_FMT macro. There's no point in having it if
it doesn't work... What is NumPy setting it to for that platform?

-Mark


 Cheers,
 Derek

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] code review/build test for datetime business day API

2011-06-16 Thread Derek Homeier
On 17.06.2011, at 2:02AM, Mark Wiebe wrote:

 ok, that was a lengthy hunt, but it's in printing the string in 
 make_iso_8601_date:
 
tmplen = snprintf(substr, sublen, %04 NPY_INT64_FMT, dts-year);
fprintf(stderr, printed %d[%d]: dts-year=%lld: %s\n, tmplen, sublen, 
 dts-year, substr);
 
 produces
 
  np.datetime64('1970-03-23 20:00:00Z', 'D')
 printed 4[62]: dts-year=1970: 
 numpy.datetime64('-03-23','D')
 
 It seems snprintf is not using the correct format for INT64 (as I happened 
 to do in fprintf before
 realising I had to use %lld ;-) - could it be this is a general issue, 
 which just does not show up
 on little-endian machines because they happen to pass the right half of the 
 int64 to printf?
 BTW, how is this supposed to be handled (in 4 digits) if the year is indeed 
 beyond the 32bit range
 (i.e. ~ 0.3 Hubble times...)? Just wondering if one could simply cast it to 
 int32 before print.
 
 I'd prefer to fix the NPY_INT64_FMT macro. There's no point in having it if 
 it doesn't work... What is NumPy setting it to for that platform?
 
Of course (just felt somewhat lost among all the #defines). It clearly seems to 
be mis-constructed 
on PowerPC 32:
NPY_SIZEOF_LONG is 4, thus NPY_INT64_FMT is set to NPY_LONGLONG_FMT - Ld, 
but this does not seem to handle int64 on big-endian Macs - explicitly printing 
%Ld, dts-year 
also produces 0. 
Changing the snprintf format to %04 lld produces the correct output, so if 
nothing else 
avails, I suggest to put something like 

#  elseif (defined(__ppc__)  || defined(__ppc64__))
 #define LONGLONG_FMT   lld  
 #define ULONGLONG_FMT  llu
#  else

into npy_common.h (or possibly simply defined(__APPLE__), since %lld seems to 
work on 32bit i386 Macs just as well).

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Sorting library.

2011-06-16 Thread Charles R Harris
Hi All,

Pull request 89 is a set of patches that move the numpy sorting patches into
a library. I've tested it on fedora 15 with gcc and would greatly appreciate
it if some of you with different os/arch/compilers could test it out to make
sure it works correctly.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Unicode characters in a numpy array

2011-06-16 Thread Charles R Harris
On Wed, Jun 15, 2011 at 1:30 PM, Gökhan Sever gokhanse...@gmail.com wrote:

 Hello,

 The following snippet works fine for a regular string and prints out
 the string without a problem.

 python
 Python 2.7 (r27:82500, Sep 16 2010, 18:02:00)
 [GCC 4.5.1 20100907 (Red Hat 4.5.1-3)] on linux2
 Type help, copyright, credits or license for more information.
  mystr = uöööğğğ
  mystr
 u'\xf6\xf6\xf6\u011f\u011f\u011f'
  type(mystr)
 type 'unicode'
  print mystr
 öööğğğ

 What is the correct way to print out the following array?

  import numpy as np
  arr = np.array(uöööğğğ)
  arr
 array(u'\xf6\xf6\xf6\u011f\u011f\u011f',
  dtype='U6')
  print arr
 Traceback (most recent call last):
  File stdin, line 1, in module
  File /usr/lib64/python2.7/site-packages/numpy/core/numeric.py,
 line 1379, in array_str
return array2string(a, max_line_width, precision, suppress_small,
 ' ', , str)
  File /usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py,
 line 426, in array2string
lst = style(x)
 UnicodeEncodeError: 'ascii' codec can't encode characters in position
 0-5: ordinal not in range(128)


I don't know. It might be that we need to fix the printing functions for
unicode and maybe have some way to set the codec as well.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] genfromtxt converter question

2011-06-16 Thread gary ruben
I'm trying to read a file containing data formatted as in the
following example using genfromtxt and I'm doing something wrong. It
almost works. Can someone point out my error, or suggest a simpler
solution to the ugly converter function? I thought I'd leave in the
commented-out line for future reference, which I thought was a neat
way to get genfromtxt to show what it is trying to pass to the
converter.

import numpy as np
from StringIO import StringIO

a = StringIO('''\
 (-3.9700,-5.0400) (-1.1318,-2.5693) (-4.6027,-0.1426) (-1.4249, 1.7330)
 (-5.4797, 0.) ( 1.8585,-1.5502) ( 4.4145,-0.7638) (-0.4805,-1.1976)
 ( 0., 0.) ( 6.2673, 0.) (-0.4504,-0.0290) (-1.3467, 1.6579)
 ( 0., 0.) ( 0., 0.) (-3.5000, 0.) ( 2.5619,-3.3708)
''')

#~ b = np.genfromtxt(a,converters={1:lambda x:str(x)},dtype=object,delimiter=18)
b = np.genfromtxt(a,converters={1:lambda
x:complex(*eval(x))},dtype=None,delimiter=18,usecols=range(4))

print b

--

This produces
[ (' (-3.9700,-5.0400)', (-1.1318-2.5693j), ' (-4.6027,-0.1426)', '
(-1.4249, 1.7330)')
 (' (-5.4797, 0.)', (1.8585-1.5502j), ' ( 4.4145,-0.7638)', '
(-0.4805,-1.1976)')
 (' ( 0., 0.)', (6.2673+0j), ' (-0.4504,-0.0290)', ' (-1.3467, 1.6579)')
 (' ( 0., 0.)', 0j, ' (-3.5000, 0.)', ' ( 2.5619,-3.3708)')]

which I just need to unpack into a 4x4 array, but I get an error if I
try to apply a different view.

thanks,
Gary
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion