[Numpy-discussion] upsample or scale an array
Thanks Warren, this is great, and even handles giant arrays just fine if you've got enough RAM. I also just found this StackOverflow post with another solution. a.repeat(2, axis=0).repeat(2, axis=1). http://stackoverflow.com/questions/7525214/how-to-scale-a-numpy-array np.kron lets you do more, but for my simple use case the repeat() method is faster and more ram efficient with large arrays. In [3]: a = np.random.randint(0, 255, (2400, 2400)).astype('uint8') In [4]: timeit a.repeat(2, axis=0).repeat(2, axis=1) 10 loops, best of 3: 182 ms per loop In [5]: timeit np.kron(a, np.ones((2,2), dtype='uint8')) 1 loops, best of 3: 513 ms per loop Or for a 43200x4800 array: In [6]: a = np.random.randint(0, 255, (2400*18, 2400*2)).astype('uint8') In [7]: timeit a.repeat(2, axis=0).repeat(2, axis=1) 1 loops, best of 3: 6.92 s per loop In [8]: timeit np.kron(a, np.ones((2, 2), dtype='uint8')) 1 loops, best of 3: 27.8 s per loop In this case repeat() peaked at about 1gb of ram usage while np.kron hit about 1.7gb. Thanks again Warren. I'd tried way too many variations on reshape and rollaxis, and should have come to the Numpy list a lot sooner! -Robin On Dec 3, 2011, at 12:51 AM, Warren Weckesser wrote: On Sat, Dec 3, 2011 at 12:35 AM, Robin Kraft wrote: I need to take an array - derived from raster GIS data - and upsample or scale it. That is, I need to repeat each value in each dimension so that, for example, a 2x2 array becomes a 4x4 array as follows: [[1, 2], [3, 4]] becomes [[1,1,2,2], [1,1,2,2], [3,3,4,4] [3,3,4,4]] It seems like some combination of np.resize or np.repeat and reshape + rollaxis would do the trick, but I'm at a loss. Many thanks! -Robin Just a day or so ago, Josef Perktold showed one way of accomplishing this using numpy.kron: In [14]: a = arange(12).reshape(3,4) In [15]: a Out[15]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) In [16]: kron(a, ones((2,2))) Out[16]: array([[ 0., 0., 1., 1., 2., 2., 3., 3.], [ 0., 0., 1., 1., 2., 2., 3., 3.], [ 4., 4., 5., 5., 6., 6., 7., 7.], [ 4., 4., 5., 5., 6., 6., 7., 7.], [ 8., 8., 9., 9., 10., 10., 11., 11.], [ 8., 8., 9., 9., 10., 10., 11., 11.]]) Warren ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] upsample or scale an array
You can also use numpy.tile -=- Olivier 2011/12/3 Robin Kraft rkra...@gmail.com Thanks Warren, this is great, and even handles giant arrays just fine if you've got enough RAM. I also just found this StackOverflow post with another solution. a.repeat(2, axis=0).repeat(2, axis=1). http://stackoverflow.com/questions/7525214/how-to-scale-a-numpy-array np.kron lets you do more, but for my simple use case the repeat() method is faster and more ram efficient with large arrays. In [3]: a = np.random.randint(0, 255, (2400, 2400)).astype('uint8') In [4]: timeit a.repeat(2, axis=0).repeat(2, axis=1) 10 loops, best of 3: 182 ms per loop In [5]: timeit np.kron(a, np.ones((2,2), dtype='uint8')) 1 loops, best of 3: 513 ms per loop Or for a 43200x4800 array: In [6]: a = np.random.randint(0, 255, (2400*18, 2400*2)).astype('uint8') In [7]: timeit a.repeat(2, axis=0).repeat(2, axis=1) 1 loops, best of 3: 6.92 s per loop In [8]: timeit np.kron(a, np.ones((2, 2), dtype='uint8')) 1 loops, best of 3: 27.8 s per loop In this case repeat() peaked at about 1gb of ram usage while np.kron hit about 1.7gb. Thanks again Warren. I'd tried way too many variations on reshape and rollaxis, and should have come to the Numpy list a lot sooner! -Robin On Dec 3, 2011, at 12:51 AM, Warren Weckesser wrote: On Sat, Dec 3, 2011 at 12:35 AM, Robin Kraft wrote: * I need to take an array - derived from raster GIS data - and upsample or** scale it. That is, I need to repeat each value in each dimension so that,** for example, a 2x2 array becomes a 4x4 array as follows: [[1, 2],** [3, 4]] becomes [[1,1,2,2],** [1,1,2,2],** [3,3,4,4]** [3,3,4,4]] It seems like some combination of np.resize or np.repeat and reshape +** rollaxis would do the trick, but I'm at a loss. Many thanks! -Robin*** Just a day or so ago, Josef Perktold showed one way of accomplishing this using numpy.kron: In [14]: a = arange(12).reshape(3,4) In [15]: a Out[15]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) In [16]: kron(a, ones((2,2))) Out[16]: array([[ 0., 0., 1., 1., 2., 2., 3., 3.], [ 0., 0., 1., 1., 2., 2., 3., 3.], [ 4., 4., 5., 5., 6., 6., 7., 7.], [ 4., 4., 5., 5., 6., 6., 7., 7.], [ 8., 8., 9., 9., 10., 10., 11., 11.], [ 8., 8., 9., 9., 10., 10., 11., 11.]]) Warren ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] upsample or scale an array
That does repeat the elements, but doesn't get them into the desired order. In [4]: print a [[1 2] [3 4]] In [7]: np.tile(a, 4) Out[7]: array([[1, 2, 1, 2, 1, 2, 1, 2], [3, 4, 3, 4, 3, 4, 3, 4]]) In [8]: np.tile(a, 4).reshape(4,4) Out[8]: array([[1, 2, 1, 2], [1, 2, 1, 2], [3, 4, 3, 4], [3, 4, 3, 4]]) It's close, but I want to repeat the elements along the two axes, effectively stretching it by the lower right corner: array([[1, 1, 2, 2], [1, 1, 2, 2], [3, 3, 4, 4], [3, 3, 4, 4]]) It would take some more reshaping/axis rolling to get there, but it seems doable. Anyone know what combination of manipulations would work with the result of np.tile? -Robin On Dec 3, 2011, at 11:05 AM, Olivier Delalleau wrote: You can also use numpy.tile -=- Olivier 2011/12/3 Robin Kraft Thanks Warren, this is great, and even handles giant arrays just fine if you've got enough RAM. I also just found this StackOverflow post with another solution. a.repeat(2, axis=0).repeat(2, axis=1). http://stackoverflow.com/questions/7525214/how-to-scale-a-numpy-array np.kron lets you do more, but for my simple use case the repeat() method is faster and more ram efficient with large arrays. In [3]: a = np.random.randint(0, 255, (2400, 2400)).astype('uint8') In [4]: timeit a.repeat(2, axis=0).repeat(2, axis=1) 10 loops, best of 3: 182 ms per loop In [5]: timeit np.kron(a, np.ones((2,2), dtype='uint8')) 1 loops, best of 3: 513 ms per loop Or for a 43200x4800 array: In [6]: a = np.random.randint(0, 255, (2400*18, 2400*2)).astype('uint8') In [7]: timeit a.repeat(2, axis=0).repeat(2, axis=1) 1 loops, best of 3: 6.92 s per loop In [8]: timeit np.kron(a, np.ones((2, 2), dtype='uint8')) 1 loops, best of 3: 27.8 s per loop In this case repeat() peaked at about 1gb of ram usage while np.kron hit about 1.7gb. Thanks again Warren. I'd tried way too many variations on reshape and rollaxis, and should have come to the Numpy list a lot sooner! -Robin On Dec 3, 2011, at 12:51 AM, Warren Weckesser wrote: On Sat, Dec 3, 2011 at 12:35 AM, Robin Kraft wrote: I need to take an array - derived from raster GIS data - and upsample or scale it. That is, I need to repeat each value in each dimension so that, for example, a 2x2 array becomes a 4x4 array as follows: [[1, 2], [3, 4]] becomes [[1,1,2,2], [1,1,2,2], [3,3,4,4] [3,3,4,4]] It seems like some combination of np.resize or np.repeat and reshape + rollaxis would do the trick, but I'm at a loss. Many thanks! -Robin Just a day or so ago, Josef Perktold showed one way of accomplishing this using numpy.kron: In [14]: a = arange(12).reshape(3,4) In [15]: a Out[15]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) In [16]: kron(a, ones((2,2))) Out[16]: array([[ 0., 0., 1., 1., 2., 2., 3., 3.], [ 0., 0., 1., 1., 2., 2., 3., 3.], [ 4., 4., 5., 5., 6., 6., 7., 7.], [ 4., 4., 5., 5., 6., 6., 7., 7.], [ 8., 8., 9., 9., 10., 10., 11., 11.], [ 8., 8., 9., 9., 10., 10., 11., 11.]]) Warren ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] upsample or scale an array
Ah sorry, I hadn't read carefully enough what you were trying to achieve. I think the double repeat solution looks like your best option then. -=- Olivier 2011/12/3 Robin Kraft rkra...@gmail.com That does repeat the elements, but doesn't get them into the desired order. In [4]: print a [[1 2] [3 4]] In [7]: np.tile(a, 4) Out[7]: array([[1, 2, 1, 2, 1, 2, 1, 2], [3, 4, 3, 4, 3, 4, 3, 4]]) In [8]: np.tile(a, 4).reshape(4,4) Out[8]: array([[1, 2, 1, 2], [1, 2, 1, 2], [3, 4, 3, 4], [3, 4, 3, 4]]) It's close, but I want to repeat the elements along the two axes, effectively stretching it by the lower right corner: array([[1, 1, 2, 2], [1, 1, 2, 2], [3, 3, 4, 4], [3, 3, 4, 4]]) It would take some more reshaping/axis rolling to get there, but it seems doable. Anyone know what combination of manipulations would work with the result of np.tile? -Robin On Dec 3, 2011, at 11:05 AM, Olivier Delalleau wrote: You can also use numpy.tile -=- Olivier 2011/12/3 Robin Kraft Thanks Warren, this is great, and even handles giant arrays just fine if you've got enough RAM. I also just found this StackOverflow post with another solution. a.repeat(2, axis=0).repeat(2, axis=1). http://stackoverflow.com/questions/7525214/how-to-scale-a-numpy-array np.kron lets you do more, but for my simple use case the repeat() method is faster and more ram efficient with large arrays. In [3]: a = np.random.randint(0, 255, (2400, 2400)).astype('uint8') In [4]: timeit a.repeat(2, axis=0).repeat(2, axis=1) 10 loops, best of 3: 182 ms per loop In [5]: timeit np.kron(a, np.ones((2,2), dtype='uint8')) 1 loops, best of 3: 513 ms per loop Or for a 43200x4800 array: In [6]: a = np.random.randint(0, 255, (2400*18, 2400*2)).astype('uint8') In [7]: timeit a.repeat(2, axis=0).repeat(2, axis=1) 1 loops, best of 3: 6.92 s per loop In [8]: timeit np.kron(a, np.ones((2, 2), dtype='uint8')) 1 loops, best of 3: 27.8 s per loop In this case repeat() peaked at about 1gb of ram usage while np.kron hit about 1.7gb. Thanks again Warren. I'd tried way too many variations on reshape and rollaxis, and should have come to the Numpy list a lot sooner! -Robin On Dec 3, 2011, at 12:51 AM, Warren Weckesser wrote: On Sat, Dec 3, 2011 at 12:35 AM, Robin Kraft wrote: * I need to take an array - derived from raster GIS data - and upsample or** scale it. That is, I need to repeat each value in each dimension so that,** for example, a 2x2 array becomes a 4x4 array as follows: [[1, 2],** [3, 4]] becomes [[1,1,2,2],** [1,1,2,2],** [3,3,4,4]** [3,3,4,4]] It seems like some combination of np.resize or np.repeat and reshape +** rollaxis would do the trick, but I'm at a loss. Many thanks! -Robin*** Just a day or so ago, Josef Perktold showed one way of accomplishing this using numpy.kron: In [14]: a = arange(12).reshape(3,4) In [15]: a Out[15]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) In [16]: kron(a, ones((2,2))) Out[16]: array([[ 0., 0., 1., 1., 2., 2., 3., 3.], [ 0., 0., 1., 1., 2., 2., 3., 3.], [ 4., 4., 5., 5., 6., 6., 7., 7.], [ 4., 4., 5., 5., 6., 6., 7., 7.], [ 8., 8., 9., 9., 10., 10., 11., 11.], [ 8., 8., 9., 9., 10., 10., 11., 11.]]) Warren ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] upsample or scale an array
On 03.12.2011, at 6:22PM, Robin Kraft wrote: That does repeat the elements, but doesn't get them into the desired order. In [4]: print a [[1 2] [3 4]] In [7]: np.tile(a, 4) Out[7]: array([[1, 2, 1, 2, 1, 2, 1, 2], [3, 4, 3, 4, 3, 4, 3, 4]]) In [8]: np.tile(a, 4).reshape(4,4) Out[8]: array([[1, 2, 1, 2], [1, 2, 1, 2], [3, 4, 3, 4], [3, 4, 3, 4]]) It's close, but I want to repeat the elements along the two axes, effectively stretching it by the lower right corner: array([[1, 1, 2, 2], [1, 1, 2, 2], [3, 3, 4, 4], [3, 3, 4, 4]]) It would take some more reshaping/axis rolling to get there, but it seems doable. Anyone know what combination of manipulations would work with the result of np.tile? Rolling was the keyword: np.rollaxis(np.tile(a, 4).reshape(2,2,-1), 2, 1).reshape(4,4)) [[1 1 2 2] [1 1 2 2] [3 3 4 4] [3 3 4 4]] I leave the generalisation and timing up to you, but it seems for a = np.arange(M**2).reshape(M,-1) np.rollaxis(np.tile(a, N**2).reshape(M,N,-1), 2, 1).reshape(M*N,-1) should do the trick. Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] upsample or scale an array
On 03.12.2011, at 6:47PM, Olivier Delalleau wrote: Ah sorry, I hadn't read carefully enough what you were trying to achieve. I think the double repeat solution looks like your best option then. Considering that it is a lot shorter than fixing the tile() result, you are probably right (I've only now looked closer at the repeat() solution ;-). I'd still be interested in the performance - since I think none of the reshape or rollaxis operations actually move any data in memory (for numpy 1.6), it might still be faster. Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] upsample or scale an array
Ha! I knew it had to be possible! Thanks Derek. So for and N = 2 (now on my laptop): In [70]: M = 1200 In [69]: N = 2 In [71]: a = np.random.randint(0, 255, (M**2)).reshape(M,-1) In [76]: timeit np.rollaxis(np.tile(a, N**2).reshape(M,N,-1), 2, 1).reshape(M*N,-1) 10 loops, best of 3: 99.1 ms per loop In [78]: timeit a.repeat(2, axis=0).repeat(2, axis=1) 10 loops, best of 3: 85.6 ms per loop In [79]: timeit np.kron(a, np.ones((2,2), 'uint8')) 1 loops, best of 3: 521 ms per loop It turns out np.kron and repeat are pretty straightforward for multi-dimensional data too - scaling or stretching a stacked array representing pixel data over time, for example. Nothing changes for np.kron - it handles the additional dimensionality by itself. With repeat you just tell it to operate on the last two dimensions. So to sum up: 1) np.kron is cool for the simplicity of the code and simple scaling to N dimensions. It's also handy if you want to scale the array elements themselves too. 2) repeat() along the last N axes is a bit more intuitive (i.e. less magical) to me and has a better performance profile. 3) Derek's reshape/rolling solution is almost as fast but it gives me a headache trying to visualize what it's actually doing. I don't want to think about adding another dimension ... Thanks for the help folks. Here's scaling of a hypothetical time series (i.e. 3 axes), where each sub-array represents a month. In [26]: print a [[[1 2] [3 4]] [[1 2] [3 4]] [[1 2] [3 4]]] In [27]: np.kron(a, np.ones((2,2), dtype='uint8')) Out[27]: array([[[1, 1, 2, 2], [1, 1, 2, 2], [3, 3, 4, 4], [3, 3, 4, 4]], [[1, 1, 2, 2], [1, 1, 2, 2], [3, 3, 4, 4], [3, 3, 4, 4]], [[1, 1, 2, 2], [1, 1, 2, 2], [3, 3, 4, 4], [3, 3, 4, 4]]]) In [64]: a.repeat(2, axis=1).repeat(2, axis=2) Out[64]: array([[[1, 1, 2, 2], [1, 1, 2, 2], [3, 3, 4, 4], [3, 3, 4, 4]], [[1, 1, 2, 2], [1, 1, 2, 2], [3, 3, 4, 4], [3, 3, 4, 4]], [[1, 1, 2, 2], [1, 1, 2, 2], [3, 3, 4, 4], [3, 3, 4, 4]]]) On Dec. 3, 2011, at 12:50PM, Derek Homeier wrote: On 03.12.2011, at 6:22PM, Robin Kraft wrote: That does repeat the elements, but doesn't get them into the desired order. In [4]: print a [[1 2] [3 4]] In [7]: np.tile(a, 4) Out[7]: array([[1, 2, 1, 2, 1, 2, 1, 2], [3, 4, 3, 4, 3, 4, 3, 4]]) In [8]: np.tile(a, 4).reshape(4,4) Out[8]: array([[1, 2, 1, 2], [1, 2, 1, 2], [3, 4, 3, 4], [3, 4, 3, 4]]) It's close, but I want to repeat the elements along the two axes, effectively stretching it by the lower right corner: array([[1, 1, 2, 2], [1, 1, 2, 2], [3, 3, 4, 4], [3, 3, 4, 4]]) It would take some more reshaping/axis rolling to get there, but it seems doable. Anyone know what combination of manipulations would work with the result of np.tile? Rolling was the keyword: np.rollaxis(np.tile(a, 4).reshape(2,2,-1), 2, 1).reshape(4,4)) [[1 1 2 2] [1 1 2 2] [3 3 4 4] [3 3 4 4]] I leave the generalisation and timing up to you, but it seems for a = np.arange(M**2).reshape(M,-1) np.rollaxis(np.tile(a, N**2).reshape(M,N,-1), 2, 1).reshape(M*N,-1) should do the trick. Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] bug in PyArray_GetCastFunc
When attempting to cast to a user defined type, PyArray_GetCast looks up the cast function in the dictionary but doesn't check if the entry exists. This causes segfaults. Here's a patch. Geoffrey diff --git a/numpy/core/src/multiarray/convert_datatype.c b/numpy/core/src/multiarray/convert_datatype.c index 818d558..4b8f38b 100644 --- a/numpy/core/src/multiarray/convert_datatype.c +++ b/numpy/core/src/multiarray/convert_datatype.c @@ -81,7 +81,7 @@ PyArray_GetCastFunc(PyArray_Descr *descr, int type_num) key = PyInt_FromLong(type_num); cobj = PyDict_GetItem(obj, key); Py_DECREF(key); -if (NpyCapsule_Check(cobj)) { +if (cobj NpyCapsule_Check(cobj)) { castfunc = NpyCapsule_AsVoidPtr(cobj); } } ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] NumPy Governance
Hi everyone, There have been some wonderfully vigorous discussions over the past few months that have made it clear that we need some clarity about how decisions will be made in the NumPy community. When we were a smaller bunch of people it seemed easier to come to an agreement and things pretty much evolved based on (mostly) consensus and who was available to actually do the work. There is a need for a more clear structure so that we know how decisions will get made and so that code can move forward while paying attention to the current user-base. There has been a steering committee structure for SciPy in the past, and I have certainly been prone to lump both NumPy and SciPy together given that I have a strong interest in and have spent a great amount of time working on both projects.Others have also spent time on both projects. However, I think it is critical at this stage to clearly separate the projects and define a governing structure that is fair and agreeable for NumPy. SciPy has multiple modules and will probably need structure around each module independently.For now, I wanted to open up a discussion to see what people thought about NumPy's governance. My initial thoughts: * discussions happen as they do now on the mailing list * a small group of developers (5-11) constitute the board and major decisions are made by vote of that group (not just simple majority --- needs at least 2/3 +1 votes). * votes are +1/+0/-0/-1 * if a topic is difficult to resolve it is moved off the main list and discussed on a separate board mailing list --- these should be rare, but parts of the NA discussion would probably qualify * This board mailing list is publically viewable but only board members may post. * The board is renewed and adjusted each year --- based on nomination and 2/3 vote of the current board until board is at 11. * The chairman of the board is voted by a majority of the board and has veto power unless over-ridden by 3/4 of the board. * Petitions to remove people off the board can be made by 50+ independent reverse nominations (hopefully people will just withdraw if they are no longer active). All of these points are open for discussion. I just thought I would start the conversation. I will be much more active this next year with NumPy and will be very interested in the direction NumPy is taking.I'm hoping to discern by this conversation, who else is very interested in the direction of NumPy so that the first board can be formally constituted. Best regards, -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] failure to register ufunc loops for user defined types
Hello, I'm trying to add a fixed precision rational number dtype to numpy, and am running into an issue trying to register ufunc loops. The code in question looks like int npy_rational = PyArray_RegisterDataType(rational_descr); PyObject* equal = ... // extract equal object from the imported numpy module int types[3] = {npy_rational,npy_rational,NPY_BOOL}; if (PyUFunc_RegisterLoopForType((PyUFuncObject*)ufunc,npy_rational,rational_ufunc_##name,_types,0)0) return 0; In Python 2.6.7 with the latest numpy from git, I get from rational import * i = array([rational(5,3)]) i array([5/3], dtype=rational) equal(i,i) Traceback (most recent call last): File stdin, line 1, in module TypeError: ufunc 'equal' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'' The same thing happens with (rational,rational)-rational ufuncs like multiply. The full extension module code is here: https://github.com/girving/poker/blob/rational/rational.cpp I realize this isn't much information to go on, but let me know if anything comes to mind in terms of possible reasons or further tests to run. Unfortunately it looks like the ufunc ntypes and types properties aren't updated based on user-defined loops, so I'm not yet sure if the problem is in registry or resolution. It's also possible someone else hit this before: http://projects.scipy.org/numpy/ticket/1913. Thanks, Geoffrey ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy Governance
Hi Travis, On Sat, Dec 3, 2011 at 6:18 PM, Travis Oliphant teoliph...@gmail.com wrote: Hi everyone, There have been some wonderfully vigorous discussions over the past few months that have made it clear that we need some clarity about how decisions will be made in the NumPy community. When we were a smaller bunch of people it seemed easier to come to an agreement and things pretty much evolved based on (mostly) consensus and who was available to actually do the work. There is a need for a more clear structure so that we know how decisions will get made and so that code can move forward while paying attention to the current user-base. There has been a steering committee structure for SciPy in the past, and I have certainly been prone to lump both NumPy and SciPy together given that I have a strong interest in and have spent a great amount of time working on both projects. Others have also spent time on both projects. However, I think it is critical at this stage to clearly separate the projects and define a governing structure that is fair and agreeable for NumPy. SciPy has multiple modules and will probably need structure around each module independently. For now, I wanted to open up a discussion to see what people thought about NumPy's governance. My initial thoughts: * discussions happen as they do now on the mailing list * a small group of developers (5-11) constitute the board and major decisions are made by vote of that group (not just simple majority --- needs at least 2/3 +1 votes). * votes are +1/+0/-0/-1 * if a topic is difficult to resolve it is moved off the main list and discussed on a separate board mailing list --- these should be rare, but parts of the NA discussion would probably qualify * This board mailing list is publically viewable but only board members may post. * The board is renewed and adjusted each year --- based on nomination and 2/3 vote of the current board until board is at 11. * The chairman of the board is voted by a majority of the board and has veto power unless over-ridden by 3/4 of the board. * Petitions to remove people off the board can be made by 50+ independent reverse nominations (hopefully people will just withdraw if they are no longer active). Thanks very much for starting this discussion. You have probably seen that my preference would be for all discussions to be public - in the sense that all can contribute. So, it seems reasonable to me to have 'board' as you describe, but that the board should vote on the same mailing list as the rest of the discussion. Having a separate mailing list for discussion makes the separation overt between those with a granted voice and those without, and I would hope for a structure which emphasized discsussion in an open forum. Put another way, what advantage would having a separate public mailing list have? How does this governance compare to that of - say - Linux or Python or Debian? My worry will be that it will be too tempting to terminate discussions and proceed to resolve by vote, when voting (as Karl Vogel describes) may still do harm. What will be the position - maybe I mean your position - on consensus as Nathaniel has described it? I feel the masked array discussion would have been more productive (an maybe shorter and more to the point) if there had been some rule-of-thumb that every effort is made to reach consensus before proceeding to implementation - or a vote. For example, in the masked array discussion, I would have liked to be able to say 'hold on, we have a rule that we try our best to reach consensus; I do not feel we have done that yet'. See you, Matthew I guess that the ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy Governance
I like the idea of trying to reach consensus first. The only point of having a board is to have someway to resolve issues should consensus not be reachable. Believe me, I'm not that excited about a separate mailing list. It would be great if we could resolve everything on a single list. -Travis On Dec 3, 2011, at 9:42 PM, Matthew Brett wrote: Hi Travis, On Sat, Dec 3, 2011 at 6:18 PM, Travis Oliphant teoliph...@gmail.com wrote: Hi everyone, There have been some wonderfully vigorous discussions over the past few months that have made it clear that we need some clarity about how decisions will be made in the NumPy community. When we were a smaller bunch of people it seemed easier to come to an agreement and things pretty much evolved based on (mostly) consensus and who was available to actually do the work. There is a need for a more clear structure so that we know how decisions will get made and so that code can move forward while paying attention to the current user-base. There has been a steering committee structure for SciPy in the past, and I have certainly been prone to lump both NumPy and SciPy together given that I have a strong interest in and have spent a great amount of time working on both projects.Others have also spent time on both projects. However, I think it is critical at this stage to clearly separate the projects and define a governing structure that is fair and agreeable for NumPy. SciPy has multiple modules and will probably need structure around each module independently.For now, I wanted to open up a discussion to see what people thought about NumPy's governance. My initial thoughts: * discussions happen as they do now on the mailing list * a small group of developers (5-11) constitute the board and major decisions are made by vote of that group (not just simple majority --- needs at least 2/3 +1 votes). * votes are +1/+0/-0/-1 * if a topic is difficult to resolve it is moved off the main list and discussed on a separate board mailing list --- these should be rare, but parts of the NA discussion would probably qualify * This board mailing list is publically viewable but only board members may post. * The board is renewed and adjusted each year --- based on nomination and 2/3 vote of the current board until board is at 11. * The chairman of the board is voted by a majority of the board and has veto power unless over-ridden by 3/4 of the board. * Petitions to remove people off the board can be made by 50+ independent reverse nominations (hopefully people will just withdraw if they are no longer active). Thanks very much for starting this discussion. You have probably seen that my preference would be for all discussions to be public - in the sense that all can contribute. So, it seems reasonable to me to have 'board' as you describe, but that the board should vote on the same mailing list as the rest of the discussion. Having a separate mailing list for discussion makes the separation overt between those with a granted voice and those without, and I would hope for a structure which emphasized discsussion in an open forum. Put another way, what advantage would having a separate public mailing list have? How does this governance compare to that of - say - Linux or Python or Debian? My worry will be that it will be too tempting to terminate discussions and proceed to resolve by vote, when voting (as Karl Vogel describes) may still do harm. What will be the position - maybe I mean your position - on consensus as Nathaniel has described it? I feel the masked array discussion would have been more productive (an maybe shorter and more to the point) if there had been some rule-of-thumb that every effort is made to reach consensus before proceeding to implementation - or a vote. For example, in the masked array discussion, I would have liked to be able to say 'hold on, we have a rule that we try our best to reach consensus; I do not feel we have done that yet'. See you, Matthew I guess that the ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion --- Travis Oliphant Enthought, Inc. oliph...@enthought.com 1-512-536-1057 http://www.enthought.com ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Convert datetime64 to python datetime.datetime in numpy 1.6.1?
In numpy 1.6.1, what's the most straightforward way to convert a datetime64 to a python datetime.datetime? E.g. I have In [1]: d = datetime64(2011-12-03 12:34:56.75) In [2]: d Out[2]: 2011-12-03 12:34:56.75 I want the same time as a datetime.datetime instance. My best hack so far is to parse repr(d) with datetime.datetime.strptime: In [3]: import datetime In [4]: dt = datetime.datetime.strptime(repr(d), %Y-%m-%d %H:%M:%S.%f) In [5]: dt Out[5]: datetime.datetime(2011, 12, 3, 12, 34, 56, 75) That works--unless there are no microseconds, in which case .%f must be removed from the format string--but there must be a better way. Warren ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy Governance
On Sat, Dec 3, 2011 at 7:18 PM, Travis Oliphant teoliph...@gmail.comwrote: Hi everyone, There have been some wonderfully vigorous discussions over the past few months that have made it clear that we need some clarity about how decisions will be made in the NumPy community. When we were a smaller bunch of people it seemed easier to come to an agreement and things pretty much evolved based on (mostly) consensus and who was available to actually do the work. There is a need for a more clear structure so that we know how decisions will get made and so that code can move forward while paying attention to the current user-base. There has been a steering committee structure for SciPy in the past, and I have certainly been prone to lump both NumPy and SciPy together given that I have a strong interest in and have spent a great amount of time working on both projects.Others have also spent time on both projects. However, I think it is critical at this stage to clearly separate the projects and define a governing structure that is fair and agreeable for NumPy. SciPy has multiple modules and will probably need structure around each module independently.For now, I wanted to open up a discussion to see what people thought about NumPy's governance. My initial thoughts: * discussions happen as they do now on the mailing list * a small group of developers (5-11) constitute the board and major decisions are made by vote of that group (not just simple majority --- needs at least 2/3 +1 votes). * votes are +1/+0/-0/-1 * if a topic is difficult to resolve it is moved off the main list and discussed on a separate board mailing list --- these should be rare, but parts of the NA discussion would probably qualify * This board mailing list is publically viewable but only board members may post. * The board is renewed and adjusted each year --- based on nomination and 2/3 vote of the current board until board is at 11. * The chairman of the board is voted by a majority of the board and has veto power unless over-ridden by 3/4 of the board. * Petitions to remove people off the board can be made by 50+ independent reverse nominations (hopefully people will just withdraw if they are no longer active). All of these points are open for discussion. I just thought I would start the conversation. I will be much more active this next year with NumPy and will be very interested in the direction NumPy is taking.I'm hoping to discern by this conversation, who else is very interested in the direction of NumPy so that the first board can be formally constituted. If the purpose of the board is to resolve controversies, the 2/3 requirement is going to cause problems. The reason majority votes are usually used and that committees are set up with an odd number of members is that nothing gets resolved otherwise. Doing nothing is not a solution to missing consensus. Furthermore, at the current time, I don't think there are 5 active developers, let alone 11. With hard work you might scrape together two or three. Having 5 or 11 people making decisions for the two or three actually doing the work isn't going to go over well. I would propose a technical board of one or three people who can step in if an issue look like it needs outside intervention. And I would suggest at least one of the members be someone from the outside but familiar with the project, say someone like Fernando. The one member model is if we decide to go with a benevolent dictator. Note that for the smaller boards both the 2/3'rds and majority votes would be the same number of people ;) Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion