[Numpy-discussion] Regression in choose()
Hi all, I have been investigation the limitation of the choose() method (and function) to 32 elements. This is a regression in recent versions of NumPy. I have tested choose() in the following NumPy versions: 1.0.4: fine 1.1.1: bug 1.2.1: fine 1.3.0: bug 1.4.x: bug 1.5.x: bug 1.6.x: bug Numeric 24.3: fine (To run the tests on versions of NumPy prior to 1.4.x I used Python 2.4.3. For the other tests I used Python 2.7.) Here 'bug' means the choose() function has the 32-element limitation. I have been helping an organization to port a large old Numeric-using codebase to NumPy, and the choose() limitation in recent NumPy versions is throwing a spanner in the works. The codebase is currently using both NumPy and Numeric side-by-side, with Numeric only being used for its choose() function, with a few dozen lines like this: a = numpy.array(Numeric.choose(b, c)) Here is a simple example that triggers the bug. It is a simple extension of the example from the choose() docstring: import numpy as np choices = [[0, 1, 2, 3], [10, 11, 12, 13], [20, 21, 22, 23], [30, 31, 32, 33]] np.choose([2, 3, 1, 0], choices * 8) A side note: the exception message (defined in core/src/multiarray/iterators.c) is also slightly inconsistent with the actual behaviour: Traceback (most recent call last): File chooser.py, line 6, in module np.choose([2, 3, 1, 0], choices * 8) File /usr/lib64/python2.7/site-packages/numpy/core/fromnumeric.py, line 277, in choose return _wrapit(a, 'choose', choices, out=out, mode=mode) File /usr/lib64/python2.7/site-packages/numpy/core/fromnumeric.py, line 37, in _wrapit result = getattr(asarray(obj),method)(*args, **kwds) ValueError: Need between 2 and (32) array objects (inclusive). The actual behaviour is that choose() passes with 31 objects but fails with 32 objects, so this should read exclusive rather than inclusive. (And why the parentheses around 32?) Does anyone know what changed between 1.2.1 and 1.3.0 that introduced the 32-element limitation to choose(), and whether we might be able to lift this limitation again for future NumPy versions? I have a couple of days to work on a patch ... if someone can advise me how to approach this. Best wishes, Ed -- Dr. Edward Schofield Python Charmers +61 (0)405 676 229 http://pythoncharmers.com ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Regression in choose()
If performance is not critical you could just write your own function for a quick fix, doing something like numpy.array([choices[j][i] for i, j in enumerate([2, 3, 1, 0])]) This 32-array limitation definitely looks weird to me though, doesn't seem to make sense. -=- Olivier 2011/6/16 Ed Schofield e...@pythoncharmers.com Hi all, I have been investigation the limitation of the choose() method (and function) to 32 elements. This is a regression in recent versions of NumPy. I have tested choose() in the following NumPy versions: 1.0.4: fine 1.1.1: bug 1.2.1: fine 1.3.0: bug 1.4.x: bug 1.5.x: bug 1.6.x: bug Numeric 24.3: fine (To run the tests on versions of NumPy prior to 1.4.x I used Python 2.4.3. For the other tests I used Python 2.7.) Here 'bug' means the choose() function has the 32-element limitation. I have been helping an organization to port a large old Numeric-using codebase to NumPy, and the choose() limitation in recent NumPy versions is throwing a spanner in the works. The codebase is currently using both NumPy and Numeric side-by-side, with Numeric only being used for its choose() function, with a few dozen lines like this: a = numpy.array(Numeric.choose(b, c)) Here is a simple example that triggers the bug. It is a simple extension of the example from the choose() docstring: import numpy as np choices = [[0, 1, 2, 3], [10, 11, 12, 13], [20, 21, 22, 23], [30, 31, 32, 33]] np.choose([2, 3, 1, 0], choices * 8) A side note: the exception message (defined in core/src/multiarray/iterators.c) is also slightly inconsistent with the actual behaviour: Traceback (most recent call last): File chooser.py, line 6, in module np.choose([2, 3, 1, 0], choices * 8) File /usr/lib64/python2.7/site-packages/numpy/core/fromnumeric.py, line 277, in choose return _wrapit(a, 'choose', choices, out=out, mode=mode) File /usr/lib64/python2.7/site-packages/numpy/core/fromnumeric.py, line 37, in _wrapit result = getattr(asarray(obj),method)(*args, **kwds) ValueError: Need between 2 and (32) array objects (inclusive). The actual behaviour is that choose() passes with 31 objects but fails with 32 objects, so this should read exclusive rather than inclusive. (And why the parentheses around 32?) Does anyone know what changed between 1.2.1 and 1.3.0 that introduced the 32-element limitation to choose(), and whether we might be able to lift this limitation again for future NumPy versions? I have a couple of days to work on a patch ... if someone can advise me how to approach this. Best wishes, Ed -- Dr. Edward Schofield Python Charmers +61 (0)405 676 229 http://pythoncharmers.com ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] [ANN] OpenOpt suite 0.34
Hi all, I'm glad to inform you about new quarterly release 0.34 of the OOSuite package software (OpenOpt, FuncDesigner, SpaceFuncs, DerApproximator) . Main changes: * Python 3 compatibility * Lots of improvements and speedup for interval calculations * Now interalg can obtain all solutions of nonlinear equation (example) or systems of them (example) in the involved box lb_i = x_i = ub_i (bounds can be very large), possibly constrained (e.g. sin(x) + cos(y+x) 0.5). * Many other improvements and speedup for interalg. See http://forum.openopt.org/viewtopic.php?id=425 for more details. Regards, D. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] 3d ODR
Hi, I need to do fit a 3d surface to a point cloud. This sounds like a job for 3d orthogonal distance regression. Does anybody know of an implementation? Regards, Christian K. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Using numpy.fromfile with structured array skipping some elements.
Hello, I hope this is the right mailings list for a numpy user questions, if not, I'm sorry. Im reading a binary file with numpy.fromfile() The binary file is constructed with some integers for checking the data for corruption. This is how the binary file is constructed: Timestamp [ 12 bytes] [ 1 int ] check [ 1 single ] Time stamp (single precision). [ 1 int ] check Data chunk [ 4*(no_sensor+2) bytes ] [ 1 int ] check [ no_sensor single ] Array of sensor readings (single precision). [ 1 int ] check The file continues this way [ Timestamp ] [ Data chunk ] [ Timestamp ] [ Data chunk ] .. no_sensor is file dependend int = 4 bytes single = 4 bytes This is my current procedure f = open(file,'rb') f.read(size_of_header) # The file contains a header, where fx. the no_sensor can be read. dt = np.dtype([('junk0', 'i4'), ('timestamp', 'f4'), ('junk1', 'i4'), ('junk2', 'i4'), ('sensors', ('f4',no_sensor)), ('junk3', 'i4')]) data = np.fromfile(f, dtype=dt) Now the data is read in and I can access it, but I have the 'junk' in the array, which annoys me. Is there a way to remove the junk data, or skip it with fromfile ? Another issue is that when accessing one sensor, I do it this way: data['sensors'][:,0] for the first sensor, would it be possible to just do: data['sensors'][0] ? Thank you! Sincerely Michael Klitgaard ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] 3d ODR
Hi, On Thu, Jun 16, 2011 at 2:28 PM, Christian K. ckk...@hoc.net wrote: Hi, I need to do fit a 3d surface to a point cloud. This sounds like a job for 3d orthogonal distance regression. Does anybody know of an implementation? How bout http://docs.scipy.org/doc/scipy/reference/odr.htmhttp://docs.scipy.org/doc/scipy/reference/odr.html My 2 cents, eat Regards, Christian K. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication
On 06/15/2011 07:25 PM, Sturla Molden wrote: Den 15.06.2011 23:22, skrev Christopher Barker: I think the issue got confused -- the OP was not looking to speed up a matrix multiply, but rather to speed up a whole bunch of independent matrix multiplies. I would do it like this: 1. Write a Fortran function that make multiple calls DGEMM in a do loop. (Or Fortran intrinsics dot_product or matmul.) 2. Put an OpenMP pragma around the loop (!$omp parallel do). Invoke the OpenMP compiler on compilation. Use static or guided thread scheduling. 3. Call Fortran from Python using f2py, ctypes or Cython. Build with a thread-safe and single-threaded BLAS library. That should run as fast as it gets. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion The idea was to calculate: innerProduct =numpy.sum(array1*array2) where array1 and array2 are, in general, multidimensional. Numpy has an inner product function called np.inner (http://www.scipy.org/Numpy_Example_List_With_Doc#inner) which is a special case of tensordot. See the documentation for what is does and other examples. Also note that np.inner provides of the cross-products as well. For example, you can just do: import numpy as np a=np.arange(10).reshape(2,5) a array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]) b=a*10 b array([[ 0, 10, 20, 30, 40], [50, 60, 70, 80, 90]]) c=a*b c array([[ 0, 10, 40, 90, 160], [250, 360, 490, 640, 810]]) c.sum() 2850 d=np.inner(a,b) d array([[ 300, 800], [ 800, 2550]]) np.diag(d).sum() 2850 I do not know if numpy's multiplication, np.inner() and np.sum() are single-threaded and thread-safe when you link to multi-threaded blas, lapack or altas libraries. But if everything is *single-threaded* and thread-safe, then you just create a function and use Anne's very useful handythread.py (http://www.scipy.org/Cookbook/Multithreading). Otherwise you would have to determine the best approach such doing nothing (especially if the arrays are huge), or making the functions single-thread. By the way, if the arrays are sufficiently small, there is a lot of overhead involved such that there is more time in communication than computation. So you have to determine the best algorithm for you case. Bruce ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] datetimes with date vs time units, local time, and time zones
On Wednesday, June 15, 2011, Mark Wiebe mwwi...@gmail.com wrote: Towards a reasonable behavior with regard to local times, I've made the default repr for datetimes use the C standard library to print them in a local ISO format. Combined with the ISO8601-prescribed behavior of interpreting datetime strings with no timezone specifier to be in local times, this allows the following cases to behave reasonably: np.datetime64('now')numpy.datetime64('2011-06-15T15:16:51-0500','s') np.datetime64('2011-06-15T18:00') numpy.datetime64('2011-06-15T18:00-0500','m') As noted in another thread, there can be some extremely surprising behavior as a consequence: np.array(['now', '2011-06-15'], dtype='M')array(['2011-06-15T15:18:26-0500', '2011-06-14T19:00:00-0500'], dtype='datetime64[s]') Having the 15th of June print out as 7pm on the 14th of June is probably not what one would generally expect, so I've come up with an approach which hopefully deals with this in a good way. One firm principal of the datetime in NumPy is that it is always stored as a POSIX time (referencing UTC), or a TAI time. There are two categories of units that can be used, which I will call date units and time units. The date units are 'Y', 'M', 'W', and 'D', while the time units are 'h', 'm', 's', ..., 'as'. Time zones are only applied to datetimes stored in time units, so there's a qualitative difference between date and time units with respect to string conversions and calendar operations. I would like to place an 'unsafe' casting barrier between the date units and the time units, so that the above conversion from a date into a datetime will raise an error instead of producing a confusing result. This only applies to datetimes and not timedeltas, because for timedeltas the day - hour case is fine, it is just the year/month - other units which has issues, and that is already treated with an 'unsafe' casting barrier. Two new functions will facilitate the conversions between datetimes with date units and time units: date_as_datetime(datearray, hour, minute, second, microsecond, timezone='local', unit=None, out=None), which converts the provided dates into datetimes at the specified time, according to the specified timezone. If 'unit' is specified, it controls the output unit, otherwise it is the units in 'out' or the amount of precision specified in the function. datetime_as_date(datetimearray, timezone='local', out=None), which converts the provided datetimes into dates according to the specified timezone. In both functions, timezone can be any of 'UTC', 'TAI', 'local', '+/-', or a datetime.tzinfo object. The latter will allow NumPy datetimes to work with the pytz library for flexible time zone support. I would also like to extend the 'today' input string parsing to accept strings like 'today 12:30' to allow a convenient way to express different local times occurring today, mostly useful for interactive usage. I welcome any comments on this design, particularly if you can find a case where this doesn't produce a reasonable behavior. Cheers,Mark Is the output for the given usecase above with the mix of 'now' and a datetime string without tz info intended to still be correct? I personally have misgivings about interpreating phrases like now and today at this level. I think it introduces a can of worms that would be difficult to handle. Consider some arbitrary set of inputs to the array function for datetime objects. If they all contain no tz info, then they are all interpreated the same as-is. However, if even one element has 'now', then the inputs are interpreated entirely differently. This will confuse people. Just thinking out loud here, What about a case where the inputs are such that some do not specify tz and some others specify a mix of timezones? Should that be any different from the case given above? It has been awhile for me, but how different is this from Perl's floating tz for its datetime module? Maybe we could combine its approach with your unsafe barrier for the ambiguous situations that perl's datetime module mentions? Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Using numpy.fromfile with structured array skipping some elements.
Den 16.06.2011 15:02, skrev Michael Klitgaard: Now the data is read in and I can access it, but I have the 'junk' in the array, which annoys me. Is there a way to remove the junk data, or skip it with fromfile ? Not that I know of, unless you are ok with a loop (either in Python or C). If it's that important (e.g. to save physical memory), you could also memory map the file and never reference the junk data. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication
Den 16.06.2011 16:16, skrev Bruce Southey: The idea was to calculate: innerProduct =numpy.sum(array1*array2) where array1 and array2 are, in general, multidimensional. If you want to do that in parallel, fast an with minimal overhead, it's a typical OpenMP problem. If the dimensions are unknown in advance, I'd probably prefer a loop in C instead of Fortran and BLAS. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication
On 06/16/2011 09:30 AM, Sturla Molden wrote: Den 16.06.2011 16:16, skrev Bruce Southey: The idea was to calculate: innerProduct =numpy.sum(array1*array2) where array1 and array2 are, in general, multidimensional. If you want to do that in parallel, fast an with minimal overhead, it's a typical OpenMP problem. If the dimensions are unknown in advance, I'd probably prefer a loop in C instead of Fortran and BLAS. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Remember that is what the OP wanted to do, not me. My issue with this is that this suggestion may be overkill and too complex for the OP without knowing if there is 'sufficient' performance gain (let alone gaining back the time spent ensuring that code is correct and thread-safe as possible). Bruce ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] datetimes with date vs time units, local time, and time zones
On Thu, Jun 16, 2011 at 9:18 AM, Benjamin Root ben.r...@ou.edu wrote: On Wednesday, June 15, 2011, Mark Wiebe mwwi...@gmail.com wrote: Towards a reasonable behavior with regard to local times, I've made the default repr for datetimes use the C standard library to print them in a local ISO format. Combined with the ISO8601-prescribed behavior of interpreting datetime strings with no timezone specifier to be in local times, this allows the following cases to behave reasonably: np.datetime64('now')numpy.datetime64('2011-06-15T15:16:51-0500','s') np.datetime64('2011-06-15T18:00') numpy.datetime64('2011-06-15T18:00-0500','m') As noted in another thread, there can be some extremely surprising behavior as a consequence: np.array(['now', '2011-06-15'], dtype='M')array(['2011-06-15T15:18:26-0500', '2011-06-14T19:00:00-0500'], dtype='datetime64[s]') Having the 15th of June print out as 7pm on the 14th of June is probably not what one would generally expect, so I've come up with an approach which hopefully deals with this in a good way. One firm principal of the datetime in NumPy is that it is always stored as a POSIX time (referencing UTC), or a TAI time. There are two categories of units that can be used, which I will call date units and time units. The date units are 'Y', 'M', 'W', and 'D', while the time units are 'h', 'm', 's', ..., 'as'. Time zones are only applied to datetimes stored in time units, so there's a qualitative difference between date and time units with respect to string conversions and calendar operations. I would like to place an 'unsafe' casting barrier between the date units and the time units, so that the above conversion from a date into a datetime will raise an error instead of producing a confusing result. This only applies to datetimes and not timedeltas, because for timedeltas the day - hour case is fine, it is just the year/month - other units which has issues, and that is already treated with an 'unsafe' casting barrier. Two new functions will facilitate the conversions between datetimes with date units and time units: date_as_datetime(datearray, hour, minute, second, microsecond, timezone='local', unit=None, out=None), which converts the provided dates into datetimes at the specified time, according to the specified timezone. If 'unit' is specified, it controls the output unit, otherwise it is the units in 'out' or the amount of precision specified in the function. datetime_as_date(datetimearray, timezone='local', out=None), which converts the provided datetimes into dates according to the specified timezone. In both functions, timezone can be any of 'UTC', 'TAI', 'local', '+/-', or a datetime.tzinfo object. The latter will allow NumPy datetimes to work with the pytz library for flexible time zone support. I would also like to extend the 'today' input string parsing to accept strings like 'today 12:30' to allow a convenient way to express different local times occurring today, mostly useful for interactive usage. I welcome any comments on this design, particularly if you can find a case where this doesn't produce a reasonable behavior. Cheers,Mark Is the output for the given usecase above with the mix of 'now' and a datetime string without tz info intended to still be correct? No, that case would fail. The resolution of 'now' is seconds, and the resolution of a date string is days, so the case would require a conversion across the date unit/time unit boundary. I personally have misgivings about interpreating phrases like now and today at this level. I think it introduces a can of worms that would be difficult to handle. I like the convenience it gives at the interactive prompt, but maybe a datetime_from_string function where you can selectively enable/disable allowing of these special values and local times can provide control over this. This is similar to the datetime_as_string function which gives more flexibility than simple conversion to a string. Consider some arbitrary set of inputs to the array function for datetime objects. If they all contain no tz info, then they are all interpreated the same as-is. However, if even one element has 'now', then the inputs are interpreated entirely differently. This will confuse people. The element 'now' has no effect on the other inputs, except to possibly promote the unit to a seconds level of precision. All datetimes are in UTC, and when timezone information is given, that is only used for parsing the input, it is not preserved. Just thinking out loud here, What about a case where the inputs are such that some do not specify tz and some others specify a mix of timezones? Should that be any different from the case given above? I think this has the same answer, everything gets converted to UTC. It has been awhile for me, but how different is this from Perl's floating tz for its datetime module? Maybe we could combine its
Re: [Numpy-discussion] code review/build test for datetime business day API
On Wed, Jun 15, 2011 at 5:16 PM, Derek Homeier de...@astro.physik.uni-goettingen.de wrote: On 15.06.2011, at 1:34AM, Mark Wiebe wrote: These functions are now fully implemented and documented. As always, code reviews are welcome here: https://github.com/numpy/numpy/pull/87 and for those that don't want to dig into review C code, the commit for the documentation is here: https://github.com/m-paradox/numpy/commit/6b5a42a777b16812e774193b06da1b68b92bc689 This is probably also another good place to do a merge to master, so if people could test it on Mac/Windows/other platforms that would be much appreciated. Probaby not specifically related to the business day, but to this month's general datetime fix-up, under Python3 there are a number of test failures of the kinds quoted below. You can review a fix (plus a number of additional test extensions and proposed fix for other failures on master) at https://github.com/dhomeier/numpy/compare/master...dt_tests_2to3 Thanks for the testing and proposed fixes, I've added some comments to the commits. Plus on Mac OS X 10.5 PPC there is a failure due to the fact that the new datetime64 appears to _always_ return 0 for the year part on this architecture: date='1970-03-23 20:00:00Z' np.datetime64(date) np.datetime64('-03-23T21:00:00+0100','s') np.datetime64(date, 'Y') np.datetime64('','Y') np.datetime64('2011-06-16 02:03:04Z', 'D') np.datetime64('-06-16','D') I've tried to track this down in datetime.c, but unsuccessfully so (i.e. I could not connect it to any of the dts-year assignments therein). This is definitely perplexing. Probably the first thing to check is whether it's breaking during parsing or printing. This should always produce the same result: np.datetime64('1970-03-23 20:00:00Z').astype('i8') 7070400 But maybe the test_days_creation is already checking that thoroughly enough. Then, maybe printf-ing the year value at various stages of the printing, like in set_datetimestruct_days, after convert_datetime_to_datetimestruct, and in make_iso_8601_date. This would at least isolate where the year is getting lost. For the record, all tests pass with Python2.5-2.7 on OS X 10.5/10.6 i386 and x86_64. Great! Cheers, Mark Cheers, Derek FAIL: test_datetime_as_string (test_datetime.TestDateTime) -- Traceback (most recent call last): File /Users/derek/lib/python3.2/site-packages/numpy/core/tests/test_datetime.py, line 970, in test_datetime_as_string '1959') File /Users/derek/lib/python3.2/site-packages/numpy/testing/utils.py, line 313, in assert_equal raise AssertionError(msg) AssertionError: Items are not equal: ACTUAL: b'' DESIRED: '1959' == FAIL: test_days_creation (test_datetime.TestDateTime) -- Traceback (most recent call last): File /Users/derek/lib/python3.2/site-packages/numpy/core/tests/test_datetime.py, line 287, in test_days_creation (1900-1970)*365 - (1970-1900)/4) File /Users/derek/lib/python3.2/site-packages/numpy/testing/utils.py, line 256, in assert_equal return assert_array_equal(actual, desired, err_msg, verbose) File /Users/derek/lib/python3.2/site-packages/numpy/testing/utils.py, line 706, in assert_array_equal verbose=verbose, header='Arrays are not equal') File /Users/derek/lib/python3.2/site-packages/numpy/testing/utils.py, line 635, in assert_array_compare raise AssertionError(msg) AssertionError: Arrays are not equal (mismatch 100.0%) x: array(-25567, dtype='int64') y: array(-25567.5) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] 3d ODR
On Thu, Jun 16, 2011 at 06:28, Christian K. ckk...@hoc.net wrote: Hi, I need to do fit a 3d surface to a point cloud. This sounds like a job for 3d orthogonal distance regression. Does anybody know of an implementation? As eat points out, scipy.odr should do the job nicely. You should read at least part of the ODRPACK User's Guide in addition to the scipy docs for it: http://docs.scipy.org/doc/external/odrpack_guide.pdf The FORTRAN examples there have scipy.odr translations in the unit tests: https://github.com/scipy/scipy/blob/master/scipy/odr/tests/test_odr.py Exactly what you need to do depends on the details of your problem. If your surface is of the form z=f(x,y), that's a pretty straightforward multidimensional model. If you have a closed surface, however, you will need to use an implicit model of the form f(x,y,z)=0, which may or may not be easy to formulate. The unit test for the implicit case fits a general 2D ellipse to a 2D cloud of points, as described in the User's Guide. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication
NOTE: I'm only taking part in this discussion because it's interesting and I hope to learn something. I do hope the OP chimes back in to clarify his needs, but in the meantime... Bruce Southey wrote: Remember that is what the OP wanted to do, not me. Actually, I don't think that's what the OP wanted -- I think we have a conflict between the need for concrete examples, and the desire to find a generic solution, so I think this is what the OP wants: How to best multiprocess a _generic_ operation that needs to be performed on a lot of arrays. Something like: output = [] for a in a_bunch_of_arrays: output.append( a_function(a) ) More specifically, a_function() is an inner product, *defined by the user*. So there is no way to optimize the inner product itself (that will be up to the user), nor any way to generally convert the bunch_of_arrays to a single array with a single higher-dimensional operation. In testing his approach, the OP used a numpy multiply, and a simple, loop-through-the elements multiply, and found that with his multiprocessing calls, the simple loop was a fair bit faster with two processors, but that the numpy one was slower with two processors. Of course, the looping method was much, much, slower than the numpy one in any case. So Sturla's comments are probably right on: Sturla Molden wrote: innerProductList = pool.map(myutil.numpy_inner_product, arrayList) 1. Here we potentially have a case of false sharing and/or mutex contention, as the work is too fine grained. pool.map does not do any load balancing. If pool.map is to scale nicely, each work item must take a substantial amount of time. I suspect this is the main issue. 2. There is also the question of when the process pool is spawned. Though I haven't checked, I suspect it happens prior to calling pool.map. But if it does not, this is a factor as well, particularly on Windows (less so on Linux and Apple). It didn't work well on my Mac, so ti's either not an issue, or not Windows-specific, anyway. 3. arrayList is serialised by pickling, which has a significan overhead. It's not shared memory either, as the OP's code implies, but the main thing is the slowness of cPickle. I'll bet this is a big issue, and one I'm curious about how to address, I have another problem where I need to multi-process, and I'd love to know a way to pass data to the other process and back *without* going through pickle. maybe memmapped files? IPs = N.array(innerProductList) 4. numpy.array is a very slow function. The benchmark should preferably not include this overhead. I re-ran, moving that out of the timing loop, and, indeed, it helped a lot, but it still takes longer with the multi-processing. I suspect that the overhead of pickling, etc. is overwhelming the operation itself. That and the load balancing issue that I don't understand! To test this, I did a little experiment -- creating a fake operation, one that simply returns an element from the input array -- so it should take next to no time, and we can time the overhead of the pickling, etc: $ python shared_mem.py Using 2 processes No shared memory, numpy array multiplication took 0.124427080154 seconds Shared memory, numpy array multiplication took 0.586215019226 seconds No shared memory, fake array multiplication took 0.000391006469727 seconds Shared memory, fake array multiplication took 0.54935503006 seconds No shared memory, my array multiplication took 23.5055780411 seconds Shared memory, my array multiplication took 13.0932741165 seconds Bingo! The overhead of the multi-processing takes about .54 seconds, which explains the slowdown for the numpy method not so mysterious after all. Bruce Southey wrote: But if everything is *single-threaded* and thread-safe, then you just create a function and use Anne's very useful handythread.py (http://www.scipy.org/Cookbook/Multithreading). This may be worth a try -- though the GIL could well get in the way. By the way, if the arrays are sufficiently small, there is a lot of overhead involved such that there is more time in communication than computation. yup -- clearly the case here. I wonder if it's just array size though -- won't cPickle time scale with array size? So it may not be size pe-se, but rather how much computation you need for a given size array. -Chris [I've enclosed the OP's slightly altered code] -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov myutil.py Description: application/python shared_mem.py Description: application/python ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Regression in choose()
Hi Ed, I'm pretty sure that this is bug is due to the compile-time constant NPY_MAXARGS defined in include/numpy/ndarraytypes.h I suspect that the versions you are trying it on where it succeeds as a different compile-time constant of that value. NumPy uses a multi-iterator object (PyArrayMultiIterObject defined in ndarraytypes.h as well) to broadcast arguments together for ufuncs and for functions like choose. The data-structure that it uses to do this has a static array of Iterator objects with room for NPY_MAXARGS iterators. I think in some versions this compile time constant has been 40 or higher.Re-compiling NumPy by bumping up that constant will of course require re-compilation of most extensions modules that use the NumPy API. Numeric did not use this approach to broadcast the arguments to choose together and so likely does not have the same limitation. It would also not be that difficult to modify the NumPy code to dynamically allocate the iters array when needed to remove the NPY_MAXARGS limitation. In fact, I would not mind seeing all the NPY_MAXDIMS and NPY_MAXARGS limitations removed. To do it well you would probably want to have some minimum storage-space pre-allocated (say NPY_MAXDIMS as 7 and NPY_MAXARGS as 10 to avoid the memory allocation in common cases) and just increase that space as needed dynamically. This would be a nice project for someone wanting to learn the NumPy code base. -Travis On Jun 16, 2011, at 1:56 AM, Ed Schofield wrote: Hi all, I have been investigation the limitation of the choose() method (and function) to 32 elements. This is a regression in recent versions of NumPy. I have tested choose() in the following NumPy versions: 1.0.4: fine 1.1.1: bug 1.2.1: fine 1.3.0: bug 1.4.x: bug 1.5.x: bug 1.6.x: bug Numeric 24.3: fine (To run the tests on versions of NumPy prior to 1.4.x I used Python 2.4.3. For the other tests I used Python 2.7.) Here 'bug' means the choose() function has the 32-element limitation. I have been helping an organization to port a large old Numeric-using codebase to NumPy, and the choose() limitation in recent NumPy versions is throwing a spanner in the works. The codebase is currently using both NumPy and Numeric side-by-side, with Numeric only being used for its choose() function, with a few dozen lines like this: a = numpy.array(Numeric.choose(b, c)) Here is a simple example that triggers the bug. It is a simple extension of the example from the choose() docstring: import numpy as np choices = [[0, 1, 2, 3], [10, 11, 12, 13], [20, 21, 22, 23], [30, 31, 32, 33]] np.choose([2, 3, 1, 0], choices * 8) A side note: the exception message (defined in core/src/multiarray/iterators.c) is also slightly inconsistent with the actual behaviour: Traceback (most recent call last): File chooser.py, line 6, in module np.choose([2, 3, 1, 0], choices * 8) File /usr/lib64/python2.7/site-packages/numpy/core/fromnumeric.py, line 277, in choose return _wrapit(a, 'choose', choices, out=out, mode=mode) File /usr/lib64/python2.7/site-packages/numpy/core/fromnumeric.py, line 37, in _wrapit result = getattr(asarray(obj),method)(*args, **kwds) ValueError: Need between 2 and (32) array objects (inclusive). The actual behaviour is that choose() passes with 31 objects but fails with 32 objects, so this should read exclusive rather than inclusive. (And why the parentheses around 32?) Does anyone know what changed between 1.2.1 and 1.3.0 that introduced the 32-element limitation to choose(), and whether we might be able to lift this limitation again for future NumPy versions? I have a couple of days to work on a patch ... if someone can advise me how to approach this. Best wishes, Ed -- Dr. Edward Schofield Python Charmers +61 (0)405 676 229 http://pythoncharmers.com ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion --- Travis Oliphant Enthought, Inc. oliph...@enthought.com 1-512-536-1057 http://www.enthought.com ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication
On Thu, Jun 16, 2011 at 6:44 PM, Christopher Barker chris.bar...@noaa.gov wrote: 2. There is also the question of when the process pool is spawned. Though I haven't checked, I suspect it happens prior to calling pool.map. But if it does not, this is a factor as well, particularly on Windows (less so on Linux and Apple). It didn't work well on my Mac, so ti's either not an issue, or not Windows-specific, anyway. I am pretty sure that the process pool is spawned when you create the pool object instance. 3. arrayList is serialised by pickling, which has a significan overhead. It's not shared memory either, as the OP's code implies, but the main thing is the slowness of cPickle. I'll bet this is a big issue, and one I'm curious about how to address, I have another problem where I need to multi-process, and I'd love to know a way to pass data to the other process and back *without* going through pickle. maybe memmapped files? If you are on Linux or Mac then fork works nicely so you have read only shared memory you just have to put it in a module before the fork (so before pool = Pool() ) and then all the subprocesses can access it without any pickling required. ie myutil.data = listofdata p = multiprocessing.Pool(8) def mymapfunc(i): return mydatafunc(myutil.data[i]) p.map(mymapfunc, range(len(myutil.data))) Actually that won't work because mymapfunc needs to be in a module so it can be pickled but hopefully you get the idea. Cheers Robin IPs = N.array(innerProductList) 4. numpy.array is a very slow function. The benchmark should preferably not include this overhead. I re-ran, moving that out of the timing loop, and, indeed, it helped a lot, but it still takes longer with the multi-processing. I suspect that the overhead of pickling, etc. is overwhelming the operation itself. That and the load balancing issue that I don't understand! To test this, I did a little experiment -- creating a fake operation, one that simply returns an element from the input array -- so it should take next to no time, and we can time the overhead of the pickling, etc: $ python shared_mem.py Using 2 processes No shared memory, numpy array multiplication took 0.124427080154 seconds Shared memory, numpy array multiplication took 0.586215019226 seconds No shared memory, fake array multiplication took 0.000391006469727 seconds Shared memory, fake array multiplication took 0.54935503006 seconds No shared memory, my array multiplication took 23.5055780411 seconds Shared memory, my array multiplication took 13.0932741165 seconds Bingo! The overhead of the multi-processing takes about .54 seconds, which explains the slowdown for the numpy method not so mysterious after all. Bruce Southey wrote: But if everything is *single-threaded* and thread-safe, then you just create a function and use Anne's very useful handythread.py (http://www.scipy.org/Cookbook/Multithreading). This may be worth a try -- though the GIL could well get in the way. By the way, if the arrays are sufficiently small, there is a lot of overhead involved such that there is more time in communication than computation. yup -- clearly the case here. I wonder if it's just array size though -- won't cPickle time scale with array size? So it may not be size pe-se, but rather how much computation you need for a given size array. -Chris [I've enclosed the OP's slightly altered code] -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication
Den 16.06.2011 18:44, skrev Christopher Barker: I'll bet this is a big issue, and one I'm curious about how to address, I have another problem where I need to multi-process, and I'd love to know a way to pass data to the other process and back *without* going through pickle. maybe memmapped files? Remember that os.fork is copy-on-write optimized. Often this is enough to share data. To get data back, a possibility is to use shared memory: just memmap from file 0 or -1 (depending on the system) and fork. Note that the use of fork, as opposed to multiprocessing, avoids the pickle overhead for NumPy arrays. On Windows this sucks, because there is no fork system call. Here we are stuck with multiprocessing and pickle, even if we use shared memory. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication
Sturla Molden wrote: On Windows this sucks, because there is no fork system call. Darn -- I do need to support Windows. Here we are stuck with multiprocessing and pickle, even if we use shared memory. What do you need to pickle if you're using shared memory? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication
Den 16.06.2011 19:23, skrev Robin: If you are on Linux or Mac then fork works nicely so you have read only shared memory you just have to put it in a module before the fork (so before pool = Pool() ) and then all the subprocesses can access it without any pickling required. ie myutil.data = listofdata p = multiprocessing.Pool(8) def mymapfunc(i): return mydatafunc(myutil.data[i]) p.map(mymapfunc, range(len(myutil.data))) There is still the issue that p.map does not do any load balancing. The processes in the pool might spend all the time battling for a mutex. We must thus arrange it so each call to mymapfunc processes a chunk of data instead of a single item. This is one strategy that works well: With n remaining work items and m processes, and c the minimum chunk size, let the process holding a mutex grab max( c, n/m ) work items. Then n is reduced accordingly, and this continues until all items are exchausted. Also, as processes do not work on interleaved items, we avoid false sharing as much as possible. (False sharing means that one processor will write to a cache line used by another processor, so both must stop what they're doing and synchronize cache with RAM.) Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication
Den 16.06.2011 19:47, skrev Christopher Barker: What do you need to pickle if you're using shared memory? The meta-data for the ndarray (strides, shape, dtype, etc.) and the name of the shared memory segment. As there is no fork, we must tell the other process where to find the shared memory and what to do with it. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication
Den 16.06.2011 19:55, skrev Sturla Molden: The meta-data for the ndarray (strides, shape, dtype, etc.) and the name of the shared memory segment. As there is no fork, we must tell the other process where to find the shared memory and what to do with it. Which by the way is what the shared memory arrays I and Gaël made will do, but we still have the annoying pickle overhead. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication
Sturla Molden wrote: Den 16.06.2011 19:55, skrev Sturla Molden: The meta-data for the ndarray (strides, shape, dtype, etc.) and the name of the shared memory segment. As there is no fork, we must tell the other process where to find the shared memory and what to do with it. got it, thanks. Which by the way is what the shared memory arrays I and Gaël made will do, but we still have the annoying pickle overhead. Do you have a pointer to that code? It sounds handy. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication
On 06/16/2011 11:44 AM, Christopher Barker wrote: NOTE: I'm only taking part in this discussion because it's interesting and I hope to learn something. I do hope the OP chimes back in to clarify his needs, but in the meantime... Bruce Southey wrote: Remember that is what the OP wanted to do, not me. Actually, I don't think that's what the OP wanted -- I think we have a conflict between the need for concrete examples, and the desire to find a generic solution, so I think this is what the OP wants: How to best multiprocess a _generic_ operation that needs to be performed on a lot of arrays. Something like: output = [] for a in a_bunch_of_arrays: output.append( a_function(a) ) More specifically, a_function() is an inner product, *defined by the user*. So there is no way to optimize the inner product itself (that will be up to the user), nor any way to generally convert the bunch_of_arrays to a single array with a single higher-dimensional operation. In testing his approach, the OP used a numpy multiply, and a simple, loop-through-the elements multiply, and found that with his multiprocessing calls, the simple loop was a fair bit faster with two processors, but that the numpy one was slower with two processors. Of course, the looping method was much, much, slower than the numpy one in any case. So Sturla's comments are probably right on: Sturla Molden wrote: innerProductList = pool.map(myutil.numpy_inner_product, arrayList) 1. Here we potentially have a case of false sharing and/or mutex contention, as the work is too fine grained. pool.map does not do any load balancing. If pool.map is to scale nicely, each work item must take a substantial amount of time. I suspect this is the main issue. 2. There is also the question of when the process pool is spawned. Though I haven't checked, I suspect it happens prior to calling pool.map. But if it does not, this is a factor as well, particularly on Windows (less so on Linux and Apple). It didn't work well on my Mac, so ti's either not an issue, or not Windows-specific, anyway. 3. arrayList is serialised by pickling, which has a significan overhead. It's not shared memory either, as the OP's code implies, but the main thing is the slowness of cPickle. I'll bet this is a big issue, and one I'm curious about how to address, I have another problem where I need to multi-process, and I'd love to know a way to pass data to the other process and back *without* going through pickle. maybe memmapped files? IPs = N.array(innerProductList) 4. numpy.array is a very slow function. The benchmark should preferably not include this overhead. I re-ran, moving that out of the timing loop, and, indeed, it helped a lot, but it still takes longer with the multi-processing. I suspect that the overhead of pickling, etc. is overwhelming the operation itself. That and the load balancing issue that I don't understand! To test this, I did a little experiment -- creating a fake operation, one that simply returns an element from the input array -- so it should take next to no time, and we can time the overhead of the pickling, etc: $ python shared_mem.py Using 2 processes No shared memory, numpy array multiplication took 0.124427080154 seconds Shared memory, numpy array multiplication took 0.586215019226 seconds No shared memory, fake array multiplication took 0.000391006469727 seconds Shared memory, fake array multiplication took 0.54935503006 seconds No shared memory, my array multiplication took 23.5055780411 seconds Shared memory, my array multiplication took 13.0932741165 seconds Bingo! The overhead of the multi-processing takes about .54 seconds, which explains the slowdown for the numpy method not so mysterious after all. Bruce Southey wrote: But if everything is *single-threaded* and thread-safe, then you just create a function and use Anne's very useful handythread.py (http://www.scipy.org/Cookbook/Multithreading). This may be worth a try -- though the GIL could well get in the way. By the way, if the arrays are sufficiently small, there is a lot of overhead involved such that there is more time in communication than computation. yup -- clearly the case here. I wonder if it's just array size though -- won't cPickle time scale with array size? So it may not be size pe-se, but rather how much computation you need for a given size array. -Chris [I've enclosed the OP's slightly altered code] ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Please see: http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056766.html I'm doing element wise multiplication, basically innerProduct = numpy.sum(array1*array2) where array1 and array2 are, in general, multidimensional. Thanks for the code as I forgot that it was sent. I think there is something weird about these
[Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication
Hi all, Thanks for the replies. As mentioned, I'm parallelizing so that I can take many inner products simultaneously (which I agree is embarrassingly parallel). The library I'm writing asks the user to supply a function that takes two objects and returns their inner product. After all the discussion though it seems this is too simplistic of an approach. Instead, I plan to write this part of the library as if the inner product function supplied by the user uses all available cores (with numpy and/or numexpr built with MKL or LAPACK). As far as using fortran or C and openMP, this probably isn't worth the time it would take, both for me and the user. I've tried increasing the array sizes and found the same trends, so the slowdown isn't only because the arrays are too small to see the benefit of multiprocessing. I wrote the code to be easy for anyone to experiment with, so feel free to play around with what is included in the profiling, the sizes of arrays, functions used, etc. I also tried using handythread.foreach with arraySize = (3000,1000), and found the following: No shared memory, numpy array multiplication took 1.57585811615 seconds Shared memory, numpy array multiplication took 1.25499510765 seconds This is definitely an improvement from multiprocessing, but without knowing any better, I was hoping to see a roughly 8x speedup on my 8-core workstation. Based on what Chris sent, it seems there is some large overhead caused by multiprocessing pickling numpy arrays. To test what Robin mentioned If you are on Linux or Mac then fork works nicely so you have read only shared memory you just have to put it in a module before the fork (so before pool = Pool() ) and then all the subprocesses can access it without any pickling required. ie myutil.data = listofdata p = multiprocessing.Pool(8) def mymapfunc(i): return mydatafunc(myutil.data[i]) p.map(mymapfunc, range(len(myutil.data))) I tried creating the arrayList in the myutil module and using multiprocessing to find the inner products of myutil.arrayList, however this was still slower than not using multiprocessing, so I believe there is still some large overhead. Here are the results: No shared memory, numpy array multiplication took 1.55906510353 seconds Shared memory, numpy array multiplication took 9.8242638 seconds Shared memory, myutil.arrayList numpy array multiplication took 8.77094507217 seconds I'm attaching this code. I'm going to work around this numpy/multiprocessing behavior with numpy/numexpr built with MKL or LAPACK. It would be good to know exactly what's causing this though. It would be nice if there was a way to get the ideal speedup via multiprocessing, regardless of the internal workings of the single-threaded inner product function, as this was the behavior I expected. I imagine other people might come across similar situations, but again I'm going to try to get around this by letting MKL or LAPACK make use of all available cores. Thanks again, Brandt myutil.py Description: Binary data shared_mem.py Description: Binary data ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication
The fact that you are still passing the myutil.arrayList argument to pool.map means the data is still being pickled. All arguments to pool.map are pickled to be passed to the subprocess. You need to change the function so that it accesses the data directly, then just pass indices to pool.map. Cheers Robin On Thu, Jun 16, 2011 at 9:05 PM, Brandt Belson bbel...@princeton.edu wrote: Hi all, Thanks for the replies. As mentioned, I'm parallelizing so that I can take many inner products simultaneously (which I agree is embarrassingly parallel). The library I'm writing asks the user to supply a function that takes two objects and returns their inner product. After all the discussion though it seems this is too simplistic of an approach. Instead, I plan to write this part of the library as if the inner product function supplied by the user uses all available cores (with numpy and/or numexpr built with MKL or LAPACK). As far as using fortran or C and openMP, this probably isn't worth the time it would take, both for me and the user. I've tried increasing the array sizes and found the same trends, so the slowdown isn't only because the arrays are too small to see the benefit of multiprocessing. I wrote the code to be easy for anyone to experiment with, so feel free to play around with what is included in the profiling, the sizes of arrays, functions used, etc. I also tried using handythread.foreach with arraySize = (3000,1000), and found the following: No shared memory, numpy array multiplication took 1.57585811615 seconds Shared memory, numpy array multiplication took 1.25499510765 seconds This is definitely an improvement from multiprocessing, but without knowing any better, I was hoping to see a roughly 8x speedup on my 8-core workstation. Based on what Chris sent, it seems there is some large overhead caused by multiprocessing pickling numpy arrays. To test what Robin mentioned If you are on Linux or Mac then fork works nicely so you have read only shared memory you just have to put it in a module before the fork (so before pool = Pool() ) and then all the subprocesses can access it without any pickling required. ie myutil.data = listofdata p = multiprocessing.Pool(8) def mymapfunc(i): return mydatafunc(myutil.data[i]) p.map(mymapfunc, range(len(myutil.data))) I tried creating the arrayList in the myutil module and using multiprocessing to find the inner products of myutil.arrayList, however this was still slower than not using multiprocessing, so I believe there is still some large overhead. Here are the results: No shared memory, numpy array multiplication took 1.55906510353 seconds Shared memory, numpy array multiplication took 9.8242638 seconds Shared memory, myutil.arrayList numpy array multiplication took 8.77094507217 seconds I'm attaching this code. I'm going to work around this numpy/multiprocessing behavior with numpy/numexpr built with MKL or LAPACK. It would be good to know exactly what's causing this though. It would be nice if there was a way to get the ideal speedup via multiprocessing, regardless of the internal workings of the single-threaded inner product function, as this was the behavior I expected. I imagine other people might come across similar situations, but again I'm going to try to get around this by letting MKL or LAPACK make use of all available cores. Thanks again, Brandt ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] code review/build test for datetime business day API
I've received some good feedback from Chuck and Ralf on the code and documentation, respectively, and build testing with proposed fixes for issues from the previous merge from Derek. I believe the current set of changes are in good shape to merge, so would like to proceed with that later today. Cheers, Mark On Tue, Jun 14, 2011 at 6:34 PM, Mark Wiebe mwwi...@gmail.com wrote: These functions are now fully implemented and documented. As always, code reviews are welcome here: https://github.com/numpy/numpy/pull/87 and for those that don't want to dig into review C code, the commit for the documentation is here: https://github.com/m-paradox/numpy/commit/6b5a42a777b16812e774193b06da1b68b92bc689 This is probably also another good place to do a merge to master, so if people could test it on Mac/Windows/other platforms that would be much appreciated. Thanks, Mark On Fri, Jun 10, 2011 at 5:49 PM, Mark Wiebe mwwi...@gmail.com wrote: I've implemented the busday_offset function with support for the weekmask and roll parameters, the commits are tagged 'datetime-bday' in the pull request here: https://github.com/numpy/numpy/pull/87 -Mark On Thu, Jun 9, 2011 at 5:23 PM, Mark Wiebe mwwi...@gmail.com wrote: Here's a possible design for a business day API for numpy datetimes: The 'B' business day unit will be removed. All business day-related calculations will be done using the 'D' day unit. A class *BusinessDayDef* to encapsulate the definition of the business week and holidays. The business day functions will either take one of these objects, or separate weekmask and holidays parameters, to specify the business day definition. This class serves as both a performance optimization and a way to encapsulate the weekmask and holidays together, for example if you want to make a dictionary mapping exchange names to their trading days definition. The weekmask can be specified in a number of ways, and internally becomes a boolean array with 7 elements with True for the days Monday through Sunday which are valid business days. Some different notations are for the 5-day week include [1,1,1,1,1,0,0], 100 MonTueWedThuFri. The holidays are always specified as a one-dimensional array of dtype 'M8[D]', and are internally used in sorted form. A function *is_busday*(datearray, weekmask=, holidays=, busdaydef=) returns a boolean array matching the input datearray, with True for the valid business days. A function *busday_offset*(datearray, offsetarray, roll='raise', weekmask=, holidays=, busdaydef=) which first applies the 'roll' policy to start at a valid business date, then offsets the date by the number of business days specified in offsetarray. The arrays datearray and offsetarray are broadcast together. The 'roll' parameter can be 'forward'/'following', 'backward'/'preceding', 'modifiedfollowing', 'modifiedpreceding', or 'raise' (the default). A function *busday_count*(datearray1, datearray2, weekmask=, holidays=, busdaydef=) which calculates the number of business days between datearray1 and datearray2, not including the day of datearray2. For example, to find the first Monday in Feb 2011, np.busday_offset('2011-02', 0, roll='forward', weekmask='Mon') or to find the number of weekdays in Feb 2011, np.busday_count('2011-02', '2011-03') This set of three functions appears to be powerful enough to express the business-day computations that I've been shown thus far. Cheers, Mark ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] code review/build test for datetime business day API
On Thu, Jun 16, 2011 at 2:00 PM, Mark Wiebe mwwi...@gmail.com wrote: I've received some good feedback from Chuck and Ralf on the code and documentation, respectively, and build testing with proposed fixes for issues from the previous merge from Derek. I believe the current set of changes are in good shape to merge, so would like to proceed with that later today. Go for it. snip Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication
On 06/16/2011 02:05 PM, Brandt Belson wrote: Hi all, Thanks for the replies. As mentioned, I'm parallelizing so that I can take many inner products simultaneously (which I agree is embarrassingly parallel). The library I'm writing asks the user to supply a function that takes two objects and returns their inner product. After all the discussion though it seems this is too simplistic of an approach. Instead, I plan to write this part of the library as if the inner product function supplied by the user uses all available cores (with numpy and/or numexpr built with MKL or LAPACK). As far as using fortran or C and openMP, this probably isn't worth the time it would take, both for me and the user. I've tried increasing the array sizes and found the same trends, so the slowdown isn't only because the arrays are too small to see the benefit of multiprocessing. I wrote the code to be easy for anyone to experiment with, so feel free to play around with what is included in the profiling, the sizes of arrays, functions used, etc. I also tried using handythread.foreach with arraySize = (3000,1000), and found the following: No shared memory, numpy array multiplication took 1.57585811615 seconds Shared memory, numpy array multiplication took 1.25499510765 seconds This is definitely an improvement from multiprocessing, but without knowing any better, I was hoping to see a roughly 8x speedup on my 8-core workstation. Based on what Chris sent, it seems there is some large overhead caused by multiprocessing pickling numpy arrays. To test what Robin mentioned If you are on Linux or Mac then fork works nicely so you have read only shared memory you just have to put it in a module before the fork (so before pool = Pool() ) and then all the subprocesses can access it without any pickling required. ie myutil.data = listofdata p = multiprocessing.Pool(8) def mymapfunc(i): return mydatafunc(myutil.data[i]) p.map(mymapfunc, range(len(myutil.data))) I tried creating the arrayList in the myutil module and using multiprocessing to find the inner products of myutil.arrayList, however this was still slower than not using multiprocessing, so I believe there is still some large overhead. Here are the results: No shared memory, numpy array multiplication took 1.55906510353 seconds Shared memory, numpy array multiplication took 9.8242638 seconds Shared memory, myutil.arrayList numpy array multiplication took 8.77094507217 seconds I'm attaching this code. I'm going to work around this numpy/multiprocessing behavior with numpy/numexpr built with MKL or LAPACK. It would be good to know exactly what's causing this though. It would be nice if there was a way to get the ideal speedup via multiprocessing, regardless of the internal workings of the single-threaded inner product function, as this was the behavior I expected. I imagine other people might come across similar situations, but again I'm going to try to get around this by letting MKL or LAPACK make use of all available cores. Thanks again, Brandt ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion I think this is not being benchmarked correctly because there should be a noticeable different when different number of threads are selected. But really you should read these sources: http://www.scipy.org/ParallelProgramming http://stackoverflow.com/questions/5260068/multithreaded-blas-in-python-numpy Also numpy has extra things going on like checks and copies that probably make using np.inner() slower. Thus, your 'numpy_inner_product' is probably as efficient as you can get without extreme measures like cython. Bruce ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] code review/build test for datetime business day API
Hi Mark, On 16.06.2011, at 5:40PM, Mark Wiebe wrote: np.datetime64('2011-06-16 02:03:04Z', 'D') np.datetime64('-06-16','D') I've tried to track this down in datetime.c, but unsuccessfully so (i.e. I could not connect it to any of the dts-year assignments therein). This is definitely perplexing. Probably the first thing to check is whether it's breaking during parsing or printing. This should always produce the same result: np.datetime64('1970-03-23 20:00:00Z').astype('i8') 7070400 But maybe the test_days_creation is already checking that thoroughly enough. Then, maybe printf-ing the year value at various stages of the printing, like in set_datetimestruct_days, after convert_datetime_to_datetimestruct, and in make_iso_8601_date. This would at least isolate where the year is getting lost. ok, that was a lengthy hunt, but it's in printing the string in make_iso_8601_date: tmplen = snprintf(substr, sublen, %04 NPY_INT64_FMT, dts-year); fprintf(stderr, printed %d[%d]: dts-year=%lld: %s\n, tmplen, sublen, dts-year, substr); produces np.datetime64('1970-03-23 20:00:00Z', 'D') printed 4[62]: dts-year=1970: numpy.datetime64('-03-23','D') It seems snprintf is not using the correct format for INT64 (as I happened to do in fprintf before realising I had to use %lld ;-) - could it be this is a general issue, which just does not show up on little-endian machines because they happen to pass the right half of the int64 to printf? BTW, how is this supposed to be handled (in 4 digits) if the year is indeed beyond the 32bit range (i.e. ~ 0.3 Hubble times...)? Just wondering if one could simply cast it to int32 before print. Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] code review/build test for datetime business day API
On Thu, Jun 16, 2011 at 5:52 PM, Derek Homeier de...@astro.physik.uni-goettingen.de wrote: Hi Mark, On 16.06.2011, at 5:40PM, Mark Wiebe wrote: np.datetime64('2011-06-16 02:03:04Z', 'D') np.datetime64('-06-16','D') I've tried to track this down in datetime.c, but unsuccessfully so (i.e. I could not connect it to any of the dts-year assignments therein). This is definitely perplexing. Probably the first thing to check is whether it's breaking during parsing or printing. This should always produce the same result: np.datetime64('1970-03-23 20:00:00Z').astype('i8') 7070400 But maybe the test_days_creation is already checking that thoroughly enough. Then, maybe printf-ing the year value at various stages of the printing, like in set_datetimestruct_days, after convert_datetime_to_datetimestruct, and in make_iso_8601_date. This would at least isolate where the year is getting lost. ok, that was a lengthy hunt, but it's in printing the string in make_iso_8601_date: tmplen = snprintf(substr, sublen, %04 NPY_INT64_FMT, dts-year); fprintf(stderr, printed %d[%d]: dts-year=%lld: %s\n, tmplen, sublen, dts-year, substr); produces np.datetime64('1970-03-23 20:00:00Z', 'D') printed 4[62]: dts-year=1970: numpy.datetime64('-03-23','D') It seems snprintf is not using the correct format for INT64 (as I happened to do in fprintf before realising I had to use %lld ;-) - could it be this is a general issue, which just does not show up on little-endian machines because they happen to pass the right half of the int64 to printf? BTW, how is this supposed to be handled (in 4 digits) if the year is indeed beyond the 32bit range (i.e. ~ 0.3 Hubble times...)? Just wondering if one could simply cast it to int32 before print. I'd prefer to fix the NPY_INT64_FMT macro. There's no point in having it if it doesn't work... What is NumPy setting it to for that platform? -Mark Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] code review/build test for datetime business day API
On 17.06.2011, at 2:02AM, Mark Wiebe wrote: ok, that was a lengthy hunt, but it's in printing the string in make_iso_8601_date: tmplen = snprintf(substr, sublen, %04 NPY_INT64_FMT, dts-year); fprintf(stderr, printed %d[%d]: dts-year=%lld: %s\n, tmplen, sublen, dts-year, substr); produces np.datetime64('1970-03-23 20:00:00Z', 'D') printed 4[62]: dts-year=1970: numpy.datetime64('-03-23','D') It seems snprintf is not using the correct format for INT64 (as I happened to do in fprintf before realising I had to use %lld ;-) - could it be this is a general issue, which just does not show up on little-endian machines because they happen to pass the right half of the int64 to printf? BTW, how is this supposed to be handled (in 4 digits) if the year is indeed beyond the 32bit range (i.e. ~ 0.3 Hubble times...)? Just wondering if one could simply cast it to int32 before print. I'd prefer to fix the NPY_INT64_FMT macro. There's no point in having it if it doesn't work... What is NumPy setting it to for that platform? Of course (just felt somewhat lost among all the #defines). It clearly seems to be mis-constructed on PowerPC 32: NPY_SIZEOF_LONG is 4, thus NPY_INT64_FMT is set to NPY_LONGLONG_FMT - Ld, but this does not seem to handle int64 on big-endian Macs - explicitly printing %Ld, dts-year also produces 0. Changing the snprintf format to %04 lld produces the correct output, so if nothing else avails, I suggest to put something like # elseif (defined(__ppc__) || defined(__ppc64__)) #define LONGLONG_FMT lld #define ULONGLONG_FMT llu # else into npy_common.h (or possibly simply defined(__APPLE__), since %lld seems to work on 32bit i386 Macs just as well). Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Sorting library.
Hi All, Pull request 89 is a set of patches that move the numpy sorting patches into a library. I've tested it on fedora 15 with gcc and would greatly appreciate it if some of you with different os/arch/compilers could test it out to make sure it works correctly. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Unicode characters in a numpy array
On Wed, Jun 15, 2011 at 1:30 PM, Gökhan Sever gokhanse...@gmail.com wrote: Hello, The following snippet works fine for a regular string and prints out the string without a problem. python Python 2.7 (r27:82500, Sep 16 2010, 18:02:00) [GCC 4.5.1 20100907 (Red Hat 4.5.1-3)] on linux2 Type help, copyright, credits or license for more information. mystr = uöööğğğ mystr u'\xf6\xf6\xf6\u011f\u011f\u011f' type(mystr) type 'unicode' print mystr öööğğğ What is the correct way to print out the following array? import numpy as np arr = np.array(uöööğğğ) arr array(u'\xf6\xf6\xf6\u011f\u011f\u011f', dtype='U6') print arr Traceback (most recent call last): File stdin, line 1, in module File /usr/lib64/python2.7/site-packages/numpy/core/numeric.py, line 1379, in array_str return array2string(a, max_line_width, precision, suppress_small, ' ', , str) File /usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py, line 426, in array2string lst = style(x) UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128) I don't know. It might be that we need to fix the printing functions for unicode and maybe have some way to set the codec as well. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] genfromtxt converter question
I'm trying to read a file containing data formatted as in the following example using genfromtxt and I'm doing something wrong. It almost works. Can someone point out my error, or suggest a simpler solution to the ugly converter function? I thought I'd leave in the commented-out line for future reference, which I thought was a neat way to get genfromtxt to show what it is trying to pass to the converter. import numpy as np from StringIO import StringIO a = StringIO('''\ (-3.9700,-5.0400) (-1.1318,-2.5693) (-4.6027,-0.1426) (-1.4249, 1.7330) (-5.4797, 0.) ( 1.8585,-1.5502) ( 4.4145,-0.7638) (-0.4805,-1.1976) ( 0., 0.) ( 6.2673, 0.) (-0.4504,-0.0290) (-1.3467, 1.6579) ( 0., 0.) ( 0., 0.) (-3.5000, 0.) ( 2.5619,-3.3708) ''') #~ b = np.genfromtxt(a,converters={1:lambda x:str(x)},dtype=object,delimiter=18) b = np.genfromtxt(a,converters={1:lambda x:complex(*eval(x))},dtype=None,delimiter=18,usecols=range(4)) print b -- This produces [ (' (-3.9700,-5.0400)', (-1.1318-2.5693j), ' (-4.6027,-0.1426)', ' (-1.4249, 1.7330)') (' (-5.4797, 0.)', (1.8585-1.5502j), ' ( 4.4145,-0.7638)', ' (-0.4805,-1.1976)') (' ( 0., 0.)', (6.2673+0j), ' (-0.4504,-0.0290)', ' (-1.3467, 1.6579)') (' ( 0., 0.)', 0j, ' (-3.5000, 0.)', ' ( 2.5619,-3.3708)')] which I just need to unpack into a 4x4 array, but I get an error if I try to apply a different view. thanks, Gary ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion