[Numpy-discussion] PyArray_PutTo Question
Hi All, I am a bit new to the NumPy C-API and I am having a hard time with placing results into output arrays... I am using PyArray_TakeFrom to grab an input dimension of data, then do a calculation, then I want to pack it back to the output... yet the PutTo function does not have an axis argument like the TakeFrom does... I am grabbing by column in a two-dimensional array and I would like to pack it that way. I know that I can build the result in reverse and pack the columns into rows and then reshape the output... but I am wondering why the PutTo does not behave exactly like the take-from does?... The python implementation numpy.put also does not have the axis... so I guess I can see the one-to-one reason for the omission. However, is building in reverse and reshaping the normal way to pack by column? Thanks much! MJ ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Question on LinAlg Inverse Algorithm
Right indeed... I have spent a lot of time looking at this and it seems a waste of time as the results are garbage anyways when the columns are collinear. I am just going to set a threshold, check the condition number, continue is satisfied, return error/warning if not now, what is too large? Ill poke around. TY! MJ -Original Message- From: numpy-discussion-boun...@scipy.org [mailto:numpy-discussion-boun...@scipy.org] On Behalf Of Pauli Virtanen Sent: Wednesday, August 31, 2011 2:00 AM To: numpy-discussion@scipy.org Subject: Re: [Numpy-discussion] Question on LinAlg Inverse Algorithm On Tue, 30 Aug 2011 15:48:18 -0700, Mark Janikas wrote: Last week I posted a question involving the identification of linear dependent columns of a matrix... but now I am finding an interesting result based on the linalg.inv() function... sometime I am able to invert a matrix that has linear dependent columns and other times I get the LinAlgError()... this suggests that there is some kind of random component to the INV method. Is this normal? I suspect that this is a case of floating-point rounding errors. Floating-point arithmetic is inexact, so even if a certain matrix is singular in exact arithmetic, for a computer it may still be invertible (by a given algorithm). This type of things are not unusual in floating-point computations. The matrix condition number (`np.linalg.cond`) is a better measure of whether a matrix is invertible or not. -- Pauli Virtanen ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Question on LinAlg Inverse Algorithm
When I say garbage, I mean in the context of my hypothesis testing when in the presence of perfect multicollinearity. I advise the user of the combination that leads to the problem and move on -Original Message- From: numpy-discussion-boun...@scipy.org [mailto:numpy-discussion-boun...@scipy.org] On Behalf Of Bruce Southey Sent: Wednesday, August 31, 2011 11:11 AM To: numpy-discussion@scipy.org Subject: Re: [Numpy-discussion] Question on LinAlg Inverse Algorithm On 08/31/2011 12:56 PM, Mark Janikas wrote: Right indeed... I have spent a lot of time looking at this and it seems a waste of time as the results are garbage anyways when the columns are collinear. I am just going to set a threshold, check the condition number, continue is satisfied, return error/warning if not now, what is too large? Ill poke around. TY! MJ The results are not 'garbage' as if you have collinear columns as these have very well-known and understandable meaning. But if you don't expect this then you really need to examine how you are modeling or measuring your data because that is where the problem lies. For example, if you are measuring two variables then it means that those measurements are not independent as you are assuming. Bruce -Original Message- From: numpy-discussion-boun...@scipy.org [mailto:numpy-discussion-boun...@scipy.org] On Behalf Of Pauli Virtanen Sent: Wednesday, August 31, 2011 2:00 AM To: numpy-discussion@scipy.org Subject: Re: [Numpy-discussion] Question on LinAlg Inverse Algorithm On Tue, 30 Aug 2011 15:48:18 -0700, Mark Janikas wrote: Last week I posted a question involving the identification of linear dependent columns of a matrix... but now I am finding an interesting result based on the linalg.inv() function... sometime I am able to invert a matrix that has linear dependent columns and other times I get the LinAlgError()... this suggests that there is some kind of random component to the INV method. Is this normal? I suspect that this is a case of floating-point rounding errors. Floating-point arithmetic is inexact, so even if a certain matrix is singular in exact arithmetic, for a computer it may still be invertible (by a given algorithm). This type of things are not unusual in floating-point computations. The matrix condition number (`np.linalg.cond`) is a better measure of whether a matrix is invertible or not. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Question on LinAlg Inverse Algorithm
Hello All, Last week I posted a question involving the identification of linear dependent columns of a matrix... but now I am finding an interesting result based on the linalg.inv() function... sometime I am able to invert a matrix that has linear dependent columns and other times I get the LinAlgError()... this suggests that there is some kind of random component to the INV method. Is this normal? Thanks much ahead of time, MJ Mark Janikas Product Developer ESRI, Geoprocessing 380 New York St. Redlands, CA 92373 909-793-2853 (2563) mjani...@esri.commailto:mjani...@esri.com ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Question on LinAlg Inverse Algorithm
Working on it... Give me a few minutes to get you the data. TY! MJ -Original Message- From: numpy-discussion-boun...@scipy.org [mailto:numpy-discussion-boun...@scipy.org] On Behalf Of Christopher Jordan-Squire Sent: Tuesday, August 30, 2011 3:57 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Question on LinAlg Inverse Algorithm Can you give an example matrix? I'm not a numerical linear algebra expert, but I suspect that if your matrix is singular (or nearly so, in floating point) then any inverse given will look pretty wonky. Huge determinant, eigenvalues, operator norm, etc.. -Chris JS On Tue, Aug 30, 2011 at 5:48 PM, Mark Janikas mjani...@esri.com wrote: Hello All, Last week I posted a question involving the identification of linear dependent columns of a matrix. but now I am finding an interesting result based on the linalg.inv() function. sometime I am able to invert a matrix that has linear dependent columns and other times I get the LinAlgError(). this suggests that there is some kind of random component to the INV method. Is this normal? Thanks much ahead of time, MJ Mark Janikas Product Developer ESRI, Geoprocessing 380 New York St. Redlands, CA 92373 909-793-2853 (2563) mjani...@esri.com ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Question on LinAlg Inverse Algorithm
When I export to ascii I am losing precision and it getting consistency... I will try a flat dump. More to come. TY MJ -Original Message- From: numpy-discussion-boun...@scipy.org [mailto:numpy-discussion-boun...@scipy.org] On Behalf Of Mark Janikas Sent: Tuesday, August 30, 2011 4:02 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Question on LinAlg Inverse Algorithm Working on it... Give me a few minutes to get you the data. TY! MJ -Original Message- From: numpy-discussion-boun...@scipy.org [mailto:numpy-discussion-boun...@scipy.org] On Behalf Of Christopher Jordan-Squire Sent: Tuesday, August 30, 2011 3:57 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Question on LinAlg Inverse Algorithm Can you give an example matrix? I'm not a numerical linear algebra expert, but I suspect that if your matrix is singular (or nearly so, in floating point) then any inverse given will look pretty wonky. Huge determinant, eigenvalues, operator norm, etc.. -Chris JS On Tue, Aug 30, 2011 at 5:48 PM, Mark Janikas mjani...@esri.com wrote: Hello All, Last week I posted a question involving the identification of linear dependent columns of a matrix. but now I am finding an interesting result based on the linalg.inv() function. sometime I am able to invert a matrix that has linear dependent columns and other times I get the LinAlgError(). this suggests that there is some kind of random component to the INV method. Is this normal? Thanks much ahead of time, MJ Mark Janikas Product Developer ESRI, Geoprocessing 380 New York St. Redlands, CA 92373 909-793-2853 (2563) mjani...@esri.com ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Question on LinAlg Inverse Algorithm
OK... so I have been using checksums to compare and it looks like I am getting a different value when it fails as opposed to when it passes... I.e. the input is NOT the same. When I save them to npy files and run LA.inv() I get consistent results. Now I have to track down in my code why the inputs are different Sucks, because I keep having to dive deeper (more checksums... yeh!). But it is all linear algebra from the same input, so kinda weird that there is a diversion. Thanks for all of your help! And Ill post again when I find the culprit. (probably me :-)) MJ -Original Message- From: numpy-discussion-boun...@scipy.org [mailto:numpy-discussion-boun...@scipy.org] On Behalf Of Robert Kern Sent: Tuesday, August 30, 2011 4:42 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Question on LinAlg Inverse Algorithm On Tue, Aug 30, 2011 at 17:48, Mark Janikas mjani...@esri.com wrote: Hello All, Last week I posted a question involving the identification of linear dependent columns of a matrix… but now I am finding an interesting result based on the linalg.inv() function… sometime I am able to invert a matrix that has linear dependent columns and other times I get the LinAlgError()… this suggests that there is some kind of random component to the INV method. Is this normal? Thanks much ahead of time, We will also need to know the platform that you are on as well as the LAPACK library that you linked numpy against. It is the behavior of that LAPACK library that is controlling here. Standard LAPACK does sometimes use pseudorandom numbers in certain situations, but AFAICT it deterministically seeds the PRNG on every call, and I don't think it does this for any subroutine involved with inversion. But if you use an optimized LAPACK from some vendor, I don't know what they may be doing. Some optimized LAPACK/BLAS libraries may be threaded and may dynamically determine how to break up the problem based on load (I don't know of any that specifically do this, but it's a possibility). -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Identifying Colinear Columns of a Matrix
Hello All, I am trying to identify columns of a matrix that are perfectly collinear. It is not that difficult to identify when two columns are identical are have zero variance, but I do not know how to ID when the culprit is of a higher order. i.e. columns 1 + 2 + 3 = column 4. NUM.corrcoef(matrix.T) will return NaNs when the matrix is singular, and LA.cond(matrix.T) will provide a very large condition number But they do not tell me which columns are causing the problem. For example: zt = numpy. array([[ 1. , 1. , 1. , 1. , 1. ], [ 0.25, 0.1 , 0.2 , 0.25, 0.5 ], [ 0.75, 0.9 , 0.8 , 0.75, 0.5 ], [ 3. , 8. , 0. , 5. , 0. ]]) How can I identify that columns 0,1,2 are the issue because: column 1 + column 2 = column 0? Any input would be greatly appreciated. Thanks much, MJ ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Identifying Colinear Columns of a Matrix
I actually use the VIF when the design matrix can be inverted I do it the quick and dirty way as opposed to the step regression: 1. Calc the correlation coefficient of the matrix (w/o the intercept) 2. Return the diagonal of the inversion of the correlation matrix in step 1. Again, the problem lies in the multiple column relationship... I wouldn't be able to run sub regressions at all when the columns are perfectly collinear. MJ -Original Message- From: numpy-discussion-boun...@scipy.org [mailto:numpy-discussion-boun...@scipy.org] On Behalf Of Skipper Seabold Sent: Friday, August 26, 2011 10:28 AM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Identifying Colinear Columns of a Matrix On Fri, Aug 26, 2011 at 1:10 PM, Mark Janikas mjani...@esri.com wrote: Hello All, I am trying to identify columns of a matrix that are perfectly collinear. It is not that difficult to identify when two columns are identical are have zero variance, but I do not know how to ID when the culprit is of a higher order. i.e. columns 1 + 2 + 3 = column 4. NUM.corrcoef(matrix.T) will return NaNs when the matrix is singular, and LA.cond(matrix.T) will provide a very large condition number.. But they do not tell me which columns are causing the problem. For example: zt = numpy. array([[ 1. , 1. , 1. , 1. , 1. ], [ 0.25, 0.1 , 0.2 , 0.25, 0.5 ], [ 0.75, 0.9 , 0.8 , 0.75, 0.5 ], [ 3. , 8. , 0. , 5. , 0. ]]) How can I identify that columns 0,1,2 are the issue because: column 1 + column 2 = column 0? Any input would be greatly appreciated. Thanks much, The way that I know to do this in a regression context for (near perfect) multicollinearity is VIF. It's long been on my todo list for statsmodels. http://en.wikipedia.org/wiki/Variance_inflation_factor Maybe there are other ways with decompositions. I'd be happy to hear about them. Please post back if you write any code to do this. Skipper ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Identifying Colinear Columns of a Matrix
I wonder if my last statement is essentially the only answer... which I wanted to avoid... Should I just use combinations of the columns and try and construct the corrcoef() (then ID whether NaNs are present), or use the condition number to ID the singularity? I just wanted to avoid the whole k! algorithm. MJ -Original Message- From: numpy-discussion-boun...@scipy.org [mailto:numpy-discussion-boun...@scipy.org] On Behalf Of Mark Janikas Sent: Friday, August 26, 2011 10:35 AM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Identifying Colinear Columns of a Matrix I actually use the VIF when the design matrix can be inverted I do it the quick and dirty way as opposed to the step regression: 1. Calc the correlation coefficient of the matrix (w/o the intercept) 2. Return the diagonal of the inversion of the correlation matrix in step 1. Again, the problem lies in the multiple column relationship... I wouldn't be able to run sub regressions at all when the columns are perfectly collinear. MJ -Original Message- From: numpy-discussion-boun...@scipy.org [mailto:numpy-discussion-boun...@scipy.org] On Behalf Of Skipper Seabold Sent: Friday, August 26, 2011 10:28 AM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Identifying Colinear Columns of a Matrix On Fri, Aug 26, 2011 at 1:10 PM, Mark Janikas mjani...@esri.com wrote: Hello All, I am trying to identify columns of a matrix that are perfectly collinear. It is not that difficult to identify when two columns are identical are have zero variance, but I do not know how to ID when the culprit is of a higher order. i.e. columns 1 + 2 + 3 = column 4. NUM.corrcoef(matrix.T) will return NaNs when the matrix is singular, and LA.cond(matrix.T) will provide a very large condition number.. But they do not tell me which columns are causing the problem. For example: zt = numpy. array([[ 1. , 1. , 1. , 1. , 1. ], [ 0.25, 0.1 , 0.2 , 0.25, 0.5 ], [ 0.75, 0.9 , 0.8 , 0.75, 0.5 ], [ 3. , 8. , 0. , 5. , 0. ]]) How can I identify that columns 0,1,2 are the issue because: column 1 + column 2 = column 0? Any input would be greatly appreciated. Thanks much, The way that I know to do this in a regression context for (near perfect) multicollinearity is VIF. It's long been on my todo list for statsmodels. http://en.wikipedia.org/wiki/Variance_inflation_factor Maybe there are other ways with decompositions. I'd be happy to hear about them. Please post back if you write any code to do this. Skipper ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Identifying Colinear Columns of a Matrix
Charles! That looks like it could be a winner! It looks like you always choose the last column of the U matrix and ID the columns that have the same values? It works when I add extra columns as well! BTW, sorry for my lack of knowledge... but what was the point of the dot multiply at the end? That they add up to essentially zero, indicating singularity? Thanks so much! MJ From: numpy-discussion-boun...@scipy.org [mailto:numpy-discussion-boun...@scipy.org] On Behalf Of Charles R Harris Sent: Friday, August 26, 2011 11:04 AM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Identifying Colinear Columns of a Matrix On Fri, Aug 26, 2011 at 11:41 AM, Mark Janikas mjani...@esri.commailto:mjani...@esri.com wrote: I wonder if my last statement is essentially the only answer... which I wanted to avoid... Should I just use combinations of the columns and try and construct the corrcoef() (then ID whether NaNs are present), or use the condition number to ID the singularity? I just wanted to avoid the whole k! algorithm. MJ -Original Message- From: numpy-discussion-boun...@scipy.orgmailto:numpy-discussion-boun...@scipy.org [mailto:numpy-discussion-boun...@scipy.orgmailto:numpy-discussion-boun...@scipy.org] On Behalf Of Mark Janikas Sent: Friday, August 26, 2011 10:35 AM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Identifying Colinear Columns of a Matrix I actually use the VIF when the design matrix can be inverted I do it the quick and dirty way as opposed to the step regression: 1. Calc the correlation coefficient of the matrix (w/o the intercept) 2. Return the diagonal of the inversion of the correlation matrix in step 1. Again, the problem lies in the multiple column relationship... I wouldn't be able to run sub regressions at all when the columns are perfectly collinear. MJ -Original Message- From: numpy-discussion-boun...@scipy.orgmailto:numpy-discussion-boun...@scipy.org [mailto:numpy-discussion-boun...@scipy.orgmailto:numpy-discussion-boun...@scipy.org] On Behalf Of Skipper Seabold Sent: Friday, August 26, 2011 10:28 AM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Identifying Colinear Columns of a Matrix On Fri, Aug 26, 2011 at 1:10 PM, Mark Janikas mjani...@esri.commailto:mjani...@esri.com wrote: Hello All, I am trying to identify columns of a matrix that are perfectly collinear. It is not that difficult to identify when two columns are identical are have zero variance, but I do not know how to ID when the culprit is of a higher order. i.e. columns 1 + 2 + 3 = column 4. NUM.corrcoef(matrix.T) will return NaNs when the matrix is singular, and LA.cond(matrix.T) will provide a very large condition number.. But they do not tell me which columns are causing the problem. For example: zt = numpy. array([[ 1. , 1. , 1. , 1. , 1. ], [ 0.25, 0.1 , 0.2 , 0.25, 0.5 ], [ 0.75, 0.9 , 0.8 , 0.75, 0.5 ], [ 3. , 8. , 0. , 5. , 0. ]]) How can I identify that columns 0,1,2 are the issue because: column 1 + column 2 = column 0? Any input would be greatly appreciated. Thanks much, The way that I know to do this in a regression context for (near perfect) multicollinearity is VIF. It's long been on my todo list for statsmodels. http://en.wikipedia.org/wiki/Variance_inflation_factor Maybe there are other ways with decompositions. I'd be happy to hear about them. Please post back if you write any code to do this. Why not svd? In [13]: u,d,v = svd(zt) In [14]: d Out[14]: array([ 1.01307066e+01, 1.87795095e+00, 3.03454566e-01, 3.29253945e-16]) In [15]: u[:,3] Out[15]: array([ 0.57735027, -0.57735027, -0.57735027, 0.]) In [16]: dot(u[:,3], zt) Out[16]: array([ -7.77156117e-16, -6.66133815e-16, -7.21644966e-16, -7.77156117e-16, -8.88178420e-16]) Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Most efficient trim of arrays
Hello All, I was wondering what the best way to trim an array based on some values I do not want I could use NUM.where or NUM.take... but let me give you an example: import numpy as NUM n = 100 (Length of my dataset) data = NUM.empty((n,), float) badRecords = [] for ind, record in enumerate(records): if record == someValueIDOntWant: badRecords.append(ind) else: data[ind] = record Now, I want to trim my array using badRecords. I guess I want to avoid copying. Any thoughts on the best way to do it? I do not want to use lists and then subsequently array the result as it is nice to pre-allocate the space. Thanks much, MJ ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Database with Nulls to Numpy Structure
Hello All, I was hoping you could help me out with a simple little problem I am having: I am reading data from a database that contains NULL values. There is more than one field being read in with equal length, but if any of them are NULL in a row, then I do NOT want to include it in my numpy structure (I.e. no records for that row across fields). As the values from each field are of the same type, I can pre-allocate the space for the entire dataset (if all were not NULL), but there may be less observations after accounting for the NULLS. So, do I use lists and append then create the arrays... Or do I fill up the pre-allocated empty arrays and slice off the ends? Thoughts? Thanks much... MJ Mark Janikas Product Engineer ESRI, Geoprocessing 380 New York St. Redlands, CA 92373 909-793-2853 (2563) mjani...@esri.commailto:mjani...@esri.com ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Database with Nulls to Numpy Structure
Thanks for the input! I wonder if I can resize my own record array? I.e. one call to truncate... Ill give it a go. But the resize works great as it doesn't make a copy: In [12]: a = NUM.arange(10) In [13]: id(a) Out[13]: 190182896 In [14]: a.resize(5,) In [15]: a Out[15]: array([0, 1, 2, 3, 4]) In [16]: id(a) Out[16]: 190182896 Whereas the slice seems to make a copy/reassign: In [18]: a = a[0:2] In [19]: id(a) Out[19]: 189981184 Pretty Nice. Pre-allocate the full space and count number of good records... then resize. Doesn't seem that much faster than using the lists then creating arrays, but memory should be better. Thanks again, and anything further would be appreciated. MJ -Original Message- From: numpy-discussion-boun...@scipy.org [mailto:numpy-discussion-boun...@scipy.org] On Behalf Of Christopher Barker Sent: Friday, October 02, 2009 12:34 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Database with Nulls to Numpy Structure Mark Janikas wrote: So, do I use lists and append then create the arrays... Or do I fill up the pre-allocated empty arrays and slice off the ends? Thoughts? Thanks much... Either will work. I think the decision would be based on how many Null records you expect -- if it's a small fraction then go ahead and pre-allocate the array, if it's a large fraction, then you might want to go with a list. Note: you may be able to use arr.resize() to chop it off at the end. The list method has the downside of using more memory, and being a bit slower, which may be mitigated if there are lots of null records. See an upcoming email of mine for another option... -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Timing array construction
Thanks Eric! I have a lot of array constructions in my code that use NUM.array([list of values])... I am going to replace it with the empty allocation and insertion. It is indeed twice as fast as c_ (when it matters, I.e. N is relatively large): c_, empty 100 0.0007, 0.0230 200 0.0007, 0.0002 400 0.0007, 0.0002 800 0.0020, 0.0002 1600 0.0009, 0.0003 3200 0.0010, 0.0003 6400 0.0013, 0.0005 12800 0.0058, 0.0032 -Original Message- From: numpy-discussion-boun...@scipy.org [mailto:numpy-discussion-boun...@scipy.org] On Behalf Of Eric Firing Sent: Wednesday, April 29, 2009 11:49 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Timing array construction Mark Janikas wrote: Hello All, I was exploring some different ways to concatenate arrays, and using c_ is the fastest by far. Is there a difference I am missing that can account for the huge disparity? Obviously the zip function makes the as array and array calls slower, but the same arguments (xCoords, yCoords) are being passed to the methods... so if there is no difference in the outputs (there doesn't appear to be) then what reason would I have to use array or as array in this context? Thanks so much ahead of time.. If you really want speed, use something like this: import numpy as np def useEmpty(xCoords, yCoords): out = np.empty((len(xCoords), 2), dtype=xCoords.dtype) out[:,0] = xCoords out[:,1] = yCoords return out It is quite a bit faster than using c_; more than a factor of two on my machine for all your test cases. All your methods using zip and array are doing a lot of unpacking, repacking, checking, iterating... Even the c_ method is slower than it needs to be for this case because it is more general and flexible. Eric MJ ## Snippet ### import numpy as NUM def useAsArray(xCoords, yCoords): return NUM.asarray(zip(xCoords, yCoords)) def useArray(xCoords, yCoords): return NUM.array(zip(xCoords, yCoords)) def useC(xCoords, yCoords): return NUM.c_[xCoords, yCoords] if __name__ == __main__: from timeit import Timer import numpy.random as RAND import collections as COLL resAsArray = COLL.defaultdict(float) resArray = COLL.defaultdict(float) resMat = COLL.defaultdict(float) numTests = 0.0 sameTests = 0.0 N = [100, 200, 400, 800, 1600, 3200, 6400, 12800] for i in N: print Time Join List into Array for N = + str(i) xCoords = RAND.normal(10, 1, i) yCoords = RAND.normal(10, 1, i) statement = 'from __main__ import xCoords, yCoords, useAsArray' t1 = Timer('useAsArray(xCoords, yCoords)', statement) resAsArray[i] = t1.timeit(10) statement = 'from __main__ import xCoords, yCoords, useArray' t2 = Timer('useArray(xCoords, yCoords)', statement) resArray[i] = t2.timeit(10) statement = 'from __main__ import xCoords, yCoords, useC' t3 = Timer('useC(xCoords, yCoords)', statement) resMat[i] = t3.timeit(10) for n in N: print %i, %0.4f, %0.4f, %0.4f % (n, resAsArray[n], resArray[n], resMat[n]) ### RESULT N, useAsArray, useArray, useC 100, 0.0066, 0.0065, 0.0007 200, 0.0137, 0.0140, 0.0008 400, 0.0277, 0.0288, 0.0007 800, 0.0579, 0.0577, 0.0008 1600, 0.1175, 0.1289, 0.0009 3200, 0.2291, 0.2309, 0.0012 6400, 0.4561, 0.4564, 0.0013 12800, 0.9218, 0.9122, 0.0019 Mark Janikas Product Engineer ESRI, Geoprocessing 380 New York St. Redlands, CA 92373 909-793-2853 (2563) mjani...@esri.com mailto:mjani...@esri.com ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Timing array construction
Thanks Chris and Bruce for the further input. I kindof like the c_ method because it is still relatively speedy and easy to implement. But, the empty method seems to be closest to what is actually done no matter which direction you go in... I.e. preallocate space and insert. I am in the process of ripping all of my zip calls out. The profile of my first set of techniques is already significantly better. This whole exercise has been very enlightening, as I spend so much time working on speeding up my algorithms and simple things like this should be tackled first. Thanks again! MJ -Original Message- From: numpy-discussion-boun...@scipy.org [mailto:numpy-discussion-boun...@scipy.org] On Behalf Of Christopher Barker Sent: Thursday, April 30, 2009 12:16 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Timing array construction Mark Janikas wrote: I have a lot of array constructions in my code that use NUM.array([list of values])... I am going to replace it with the empty allocation and insertion. It may not be worth it, depending on where list_of_values comes from/is. A rule of thumb may be: it's going to be slow going from a numpy array to a regular old python list or tuple, back to a numpy array. If your data is a python list already, than np.array(list) is a fine choice. def useAsArray(xCoords, yCoords): return NUM.asarray(zip(xCoords, yCoords)) Here are some of the issues with this one: zip unpacks two generic python sequences and then put the items into tuple, then puts them in a list. Essentially this: new_list = [] for i in range(len(xCoords)): new_list.append((xCoords[i], yCoords[i])) In each iteration of that loop, it's indexing into the numpy arrays, making a python object out of them, putting them into a tuple, and appending that tuple to the list, which may have to re-allocate memory a few times. Then the np.array() call loops through that list, unpacks each tuple, examines the python object, decides what it is, and turn it into a raw c-type to put into the array. whereas: def useEmpty(xCoords, yCoords): out = np.empty((len(xCoords), 2), dtype=xCoords.dtype) out[:,0] = xCoords out[:,1] = yCoords return out allocates an array the right size. directly copies the data from xCoords and yCoords to it. that's it. You can see why it's so much faster! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Timing array construction
Hello All, I was exploring some different ways to concatenate arrays, and using c_ is the fastest by far. Is there a difference I am missing that can account for the huge disparity? Obviously the zip function makes the as array and array calls slower, but the same arguments (xCoords, yCoords) are being passed to the methods... so if there is no difference in the outputs (there doesn't appear to be) then what reason would I have to use array or as array in this context? Thanks so much ahead of time.. MJ ## Snippet ### import numpy as NUM def useAsArray(xCoords, yCoords): return NUM.asarray(zip(xCoords, yCoords)) def useArray(xCoords, yCoords): return NUM.array(zip(xCoords, yCoords)) def useC(xCoords, yCoords): return NUM.c_[xCoords, yCoords] if __name__ == __main__: from timeit import Timer import numpy.random as RAND import collections as COLL resAsArray = COLL.defaultdict(float) resArray = COLL.defaultdict(float) resMat = COLL.defaultdict(float) numTests = 0.0 sameTests = 0.0 N = [100, 200, 400, 800, 1600, 3200, 6400, 12800] for i in N: print Time Join List into Array for N = + str(i) xCoords = RAND.normal(10, 1, i) yCoords = RAND.normal(10, 1, i) statement = 'from __main__ import xCoords, yCoords, useAsArray' t1 = Timer('useAsArray(xCoords, yCoords)', statement) resAsArray[i] = t1.timeit(10) statement = 'from __main__ import xCoords, yCoords, useArray' t2 = Timer('useArray(xCoords, yCoords)', statement) resArray[i] = t2.timeit(10) statement = 'from __main__ import xCoords, yCoords, useC' t3 = Timer('useC(xCoords, yCoords)', statement) resMat[i] = t3.timeit(10) for n in N: print %i, %0.4f, %0.4f, %0.4f % (n, resAsArray[n], resArray[n], resMat[n]) ### RESULT N, useAsArray, useArray, useC 100, 0.0066, 0.0065, 0.0007 200, 0.0137, 0.0140, 0.0008 400, 0.0277, 0.0288, 0.0007 800, 0.0579, 0.0577, 0.0008 1600, 0.1175, 0.1289, 0.0009 3200, 0.2291, 0.2309, 0.0012 6400, 0.4561, 0.4564, 0.0013 12800, 0.9218, 0.9122, 0.0019 Mark Janikas Product Engineer ESRI, Geoprocessing 380 New York St. Redlands, CA 92373 909-793-2853 (2563) mjani...@esri.commailto:mjani...@esri.com ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Permutations in Simulations`
Hello All, I want to create an array that contains a column of permutations for each simulation: import numpy as NUM import numpy.random as RAND x = NUM.arange(4.) res = NUM.zeros((4,100)) for sim in range(100): res[:,sim] = RAND.permutation(x) Is there a way to do this without a loop? Thanks so much ahead of time... MJ Mark Janikas Product Engineer ESRI, Geoprocessing 380 New York St. Redlands, CA 92373 909-793-2853 (2563) mjani...@esri.commailto:mjani...@esri.com ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Permutations in Simulations`
Thanks to all for your replies. I want this to work on any vector so I was thinking this...? import numpy as np import timeit x = np.array([4.,5.,10.,3.,5.,6.,7.,2.,9.,1.]) nx = 10 ny = 100 def weirdshuffle4(x, ny): nx = len(x) indices = np.random.random_sample((nx,ny)).argsort(0).argsort(0) return x[indices] t=timeit.Timer(weirdshuffle4(x,ny), from __main__ import *) print t.timeit(100) 0.0148663153873 -Original Message- From: numpy-discussion-boun...@scipy.org [mailto:numpy-discussion-boun...@scipy.org] On Behalf Of Keith Goodman Sent: Tuesday, February 10, 2009 12:59 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Permutations in Simulations` On Tue, Feb 10, 2009 at 12:41 PM, Keith Goodman kwgood...@gmail.com wrote: On Tue, Feb 10, 2009 at 12:28 PM, Keith Goodman kwgood...@gmail.com wrote: On Tue, Feb 10, 2009 at 12:18 PM, Keith Goodman kwgood...@gmail.com wrote: On Tue, Feb 10, 2009 at 11:29 AM, Mark Janikas mjani...@esri.com wrote: I want to create an array that contains a column of permutations for each simulation: import numpy as NUM import numpy.random as RAND x = NUM.arange(4.) res = NUM.zeros((4,100)) for sim in range(100): res[:,sim] = RAND.permutation(x) Is there a way to do this without a loop? Thanks so much ahead of time. Does this work? Might not be faster but it does avoid the loop. import numpy as np def weirdshuffle(nx, ny): x = np.ones((nx,ny)).cumsum(0, dtype=np.int) - 1 yidx = np.ones((nx,ny)).cumsum(1, dtype=np.int) - 1 xidx = np.random.rand(nx,ny).argsort(0).argsort(0) return x[xidx, yidx] Hey, it is faster for nx=4, ny=100 def baseshuffle(nx, ny): x = np.arange(nx) res = np.zeros((nx,ny)) for sim in range(ny): res[:,sim] = np.random.permutation(x) return res timeit baseshuffle(4,100) 1000 loops, best of 3: 1.11 ms per loop timeit weirdshuffle(4,100) 1 loops, best of 3: 127 µs per loop OK, who can cut that time in half? My first try looks clunky. This is a little faster: def weirdshuffle2(nx, ny): one = np.ones((nx,ny), dtype=np.int) x = one.cumsum(0) x -= 1 yidx = one.cumsum(1) yidx -= 1 xidx = np.random.random_sample((nx,ny)).argsort(0).argsort(0) return x[xidx, yidx] timeit weirdshuffle(4,100) 1 loops, best of 3: 129 µs per loop timeit weirdshuffle2(4,100) 1 loops, best of 3: 106 µs per loop Sorry for all the mail. def weirdshuffle3(nx, ny): return np.random.random_sample((nx,ny)).argsort(0).argsort(0) timeit weirdshuffle(4,100) 1 loops, best of 3: 128 µs per loop timeit weirdshuffle3(4,100) 1 loops, best of 3: 37.5 µs per loop ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Permutations in Simulations`
You are correct! Thanks to all! MJ -Original Message- From: numpy-discussion-boun...@scipy.org [mailto:numpy-discussion-boun...@scipy.org] On Behalf Of Keith Goodman Sent: Tuesday, February 10, 2009 6:07 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Permutations in Simulations` Yeah, good point. The second argsort isn't needed. That should speed things up. The double argsort ranks the values in the array. But we don't need that here. On Tue, Feb 10, 2009 at 5:31 PM, josef.p...@gmail.com wrote: very nice. What's the purpose of the second `.argsort(0)` ? Doesn't it also work without it, or am I missing something in how this works? Josef On 2/10/09, Mark Janikas mjani...@esri.com wrote: Thanks to all for your replies. I want this to work on any vector so I was thinking this...? import numpy as np import timeit x = np.array([4.,5.,10.,3.,5.,6.,7.,2.,9.,1.]) nx = 10 ny = 100 def weirdshuffle4(x, ny): nx = len(x) indices = np.random.random_sample((nx,ny)).argsort(0).argsort(0) return x[indices] t=timeit.Timer(weirdshuffle4(x,ny), from __main__ import *) print t.timeit(100) 0.0148663153873 -Original Message- From: numpy-discussion-boun...@scipy.org [mailto:numpy-discussion-boun...@scipy.org] On Behalf Of Keith Goodman Sent: Tuesday, February 10, 2009 12:59 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Permutations in Simulations` On Tue, Feb 10, 2009 at 12:41 PM, Keith Goodman kwgood...@gmail.com wrote: On Tue, Feb 10, 2009 at 12:28 PM, Keith Goodman kwgood...@gmail.com wrote: On Tue, Feb 10, 2009 at 12:18 PM, Keith Goodman kwgood...@gmail.com wrote: On Tue, Feb 10, 2009 at 11:29 AM, Mark Janikas mjani...@esri.com wrote: I want to create an array that contains a column of permutations for each simulation: import numpy as NUM import numpy.random as RAND x = NUM.arange(4.) res = NUM.zeros((4,100)) for sim in range(100): res[:,sim] = RAND.permutation(x) Is there a way to do this without a loop? Thanks so much ahead of time. Does this work? Might not be faster but it does avoid the loop. import numpy as np def weirdshuffle(nx, ny): x = np.ones((nx,ny)).cumsum(0, dtype=np.int) - 1 yidx = np.ones((nx,ny)).cumsum(1, dtype=np.int) - 1 xidx = np.random.rand(nx,ny).argsort(0).argsort(0) return x[xidx, yidx] Hey, it is faster for nx=4, ny=100 def baseshuffle(nx, ny): x = np.arange(nx) res = np.zeros((nx,ny)) for sim in range(ny): res[:,sim] = np.random.permutation(x) return res timeit baseshuffle(4,100) 1000 loops, best of 3: 1.11 ms per loop timeit weirdshuffle(4,100) 1 loops, best of 3: 127 µs per loop OK, who can cut that time in half? My first try looks clunky. This is a little faster: def weirdshuffle2(nx, ny): one = np.ones((nx,ny), dtype=np.int) x = one.cumsum(0) x -= 1 yidx = one.cumsum(1) yidx -= 1 xidx = np.random.random_sample((nx,ny)).argsort(0).argsort(0) return x[xidx, yidx] timeit weirdshuffle(4,100) 1 loops, best of 3: 129 µs per loop timeit weirdshuffle2(4,100) 1 loops, best of 3: 106 µs per loop Sorry for all the mail. def weirdshuffle3(nx, ny): return np.random.random_sample((nx,ny)).argsort(0).argsort(0) timeit weirdshuffle(4,100) 1 loops, best of 3: 128 µs per loop timeit weirdshuffle3(4,100) 1 loops, best of 3: 37.5 µs per loop ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] appending extra items to arrays
If you do not know the size of your array before you finalize it, then you should use lists whenever you can. I just cooked up a short example: ## import timeit import numpy as N values = range(1) def appendArray(values): result = N.array([], dtype=int) for value in values: result = N.append(result, value) return result def appendList(values): result = [] for value in values: result.append(value) return N.array(result) test = timeit.Timer('appendArray(values)', 'from __main__ import appendArray, values') t1 = test.timeit(number=10) test2 = timeit.Timer('appendList(values)', 'from __main__ import appendList, values') t2 = test2.timeit(number=10) print Total Time with array: + str(t1) print Total Time with list: + str(t2) # Result # Total Time with array: 2.12951189331 Total Time with list: 0.0469707035741 Hope this helps, MJ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Adam Mercer Sent: Thursday, October 11, 2007 7:42 AM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] appending extra items to arrays On 11/10/2007, Robert Kern [EMAIL PROTECTED] wrote: Appending to a list then converting the list to an array is the most straightforward way to do it. If the performance of this isn't a problem, I recommend leaving it alone. Thanks, I'll leave it as is - I was just wondering if there was a better way to do it. Cheers Adam ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy and freeze.py
I cant be sure if your issue is related to mine, so I was wondering where/when you got your numpy build? My issue: http://projects.scipy.org/pipermail/numpy-discussion/2007-April/027000.h tml Travis has been kind enough to work with me on it. His changes are in the svn. So, I don't think this is an issue that has arisen due to the changes unless you have checked numpy out recently and compiled it yourself. MJ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Hanno Klemm Sent: Tuesday, May 22, 2007 9:04 AM To: numpy-discussion@scipy.org Subject: [Numpy-discussion] numpy and freeze.py Hi, I want to use freeze.py on code that heavily relies on numpy. If I just try python2.5 /scratch/src/Python-2.5/Tools/freeze/freeze.py pylay.py the make works but then I get the error: Traceback (most recent call last): File pylay.py, line 1, in module import kuvBeta4 as kuv File kuvBeta4.py, line 6, in module import mfunBeta4 as mfun File mfunBeta4.py, line 2, in module import numpy File /glb/eu/siep_bv/proj/yot04/apps/python2.5/lib/python2.5/site-packages/n umpy/__init__.py, line 39, in module import core File /glb/eu/siep_bv/proj/yot04/apps/python2.5/lib/python2.5/site-packages/n umpy/core/__init__.py, line 5, in module import multiarray ImportError: No module named multiarray Am I doing something wrong? Or does freeze.py not work with numpy? Hanno -- Hanno Klemm [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Silent install of .exe
Is there a way to silently install the numpy.exe from a Microsoft DOS prompt? Something like: numpy-1.0.2.win32-py2.4.exe -silent Thanks ahead of time... MJ Mark Janikas Product Engineer ESRI, Geoprocessing 380 New York St. Redlands, CA 92373 909-793-2853 (2563) [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Dynamic module not initialized properly
Thanks for the info Greg. Yup. I am sorry that I had to post a thread without code to back it up unfortunately, there just isn't a way for me to roll it into an example without the entire package being installed. This is all very good info you have provided. Ill let you know how things work out. Thanks again, MJ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Steele, Greg Sent: Monday, April 02, 2007 9:07 AM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Dynamic module not initialized properly Mark, It is hard to comment since you have not provided much information. Your link to a previous thread brought up a post that I had sent. The issue that I encountered had to do with the multiarraymodule.c extension module. When numpy is imported, it imports this module and the static variable _multiarray_module_loaded gets set. When Python is finalized is does not unload the multiarraymodule.c DLL. When Python is initialized again and numpy is imported again, the static variable is already set and multiarraymodule does not import correctly. Hence the error. The way I dealt with this is a 'hack', but it worked for us. This was on a windows platform. After I finalize Python, I forcibly unload the multiarraymodule DLL using the FreeLibrary call. The C code looks like if (multiarray_loaded) { HINSTANCE hDLL = NULL; hDLL = LoadLibraryEx(buf, NULL,LOAD_WITH_ALTERED_SEARCH_PATH); FreeLibrary(hDLL); FreeLibrary(hDLL); } The two calls of FreeLibrary are needed since each call to LoadLibraryEx increments the DLL reference count. The call to LoadLibraryEx here gets a handle to the DLL. What needs to be done long term is the removal of the static variable in multiarraymodule. I don't understand the code well enough to know why it is needed, but that appears to be the crux of the issue. Another solution would be for Python to call FreeLibrary on all the DLLs during Py_Finalize. Greg From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mark Janikas Sent: Friday, March 30, 2007 4:55 PM To: Discussion of Numerical Python Subject: [Numpy-discussion] Dynamic module not initialized properly Hello all, I am having an issue importing numpy on subsequent (I.e. not on first load) attempts in our software. The majority of the code is written in C, C++ and I am a python developer and do not have direct access to a lot of it. This is a bit of a difficult question to ask all of you because I cant provide you a direct example. All I can do is point to a numpy thread that discusses the issue: http://groups.google.com/group/Numpy-discussion/browse_thread/thread/321 77a82deab05ae/d8eecaf494ba5ad5?lnk=stq=dynamic+module+not+initialized+p roperly+numpyrnum=1hl=en#d8eecaf494ba5ad5 ERROR: exceptions.SystemError: dynamic module not initialized properly What is really odd about my specific issue is that if I don't change anything in the source code Then the error doesn't pop up. Furthermore, the error doesn't show on some attempts even after I make a change Not sure whether there is anything I can do from the scripting side (some alternative form of reload?)... or if I have to forward it along to the C developers. You have my appreciation ahead of time. Mark Janikas Product Engineer ESRI, Geoprocessing 380 New York St. Redlands, CA 92373 909-793-2853 (2563) [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Source install
Hello all, I have used numpy on both Mac and Windows. The latter is easily installed with the exe file. The former required the gcc program from XCode... but once installed, python setup.py install worked. I cant seem to get numpy to work on my linux machine. Can someone point me to a platform-independent doc on how to install from the source tar file? Thanks ahead of time, MJ Mark Janikas Product Engineer ESRI, Geoprocessing 380 New York St. Redlands, CA 92373 909-793-2853 (2563) [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Source install
Thanks Robert, Sorry for the incomplete request for help. The install of numpy seems to go fine, but when I import numpy it reports that it is running from the source directory. I assume this has to do with the BLAS/ATLAS stuff I have been reading about. What I am actually trying to do is get NumPy wrapped in the install of our software program. We currently wrap Python2.4 as our scripting language and I need a way to get numpy in our compiler. The gui portions of our software runs on Windows but the engine works on Unix flavors. I am afraid I am not too knowledgeable about what goes on under the hood of the NumPy install. I assume I need an appropriate C compiler (where gcc fit in for Mac OSX), but I was wondering if there was an appropriate Doc I should closely examine that would point me in the right direction. I hope this clears my question up a bit. Again, thanks in advance MJ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Robert Kern Sent: Wednesday, February 28, 2007 11:26 AM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Source install Mark Janikas wrote: Hello all, I have used numpy on both Mac and Windows. The latter is easily installed with the exe file. The former required the gcc program from XCode... but once installed, python setup.py install worked. I cant seem to get numpy to work on my linux machine. Can someone point me to a platform-independent doc on how to install from the source tar file? Thanks ahead of time, We need more information from you. There is no way one can make a platform-independent doc that covers all of the cases. We need to know what you tried and exactly how it failed (i.e., we need you to copy the exact error messages and paste them into an email). If I had to guess, though, since you succeeded doing an install from source on OS X, the problem on Linux is likely that you do not have the appropriate Python development package for your system. On RPM-based systems like Fedora Core, it is usually named something like python-devel. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Greek Letters
Hello all, I was wondering how I could print the chi-squared symbol in python. I have been looking at the Unicode docs, but I figured I would ask for assistance here while I delve into it. Thanks for any help in advance. Mark Janikas Product Engineer ESRI, Geoprocessing 380 New York St. Redlands, CA 92373 909-793-2853 (2563) [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Greek Letters
Thanks for all the info. That website with all the codes is great. MJ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Zachary Pincus Sent: Tuesday, February 20, 2007 4:18 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Greek Letters I have found that the python 'unicode name' escape sequence, combined with the canonical list of unicode names ( http://unicode.org/Public/ UNIDATA/NamesList.txt ), is a good way of getting the symbols you want and still keeping the python code legible. From the above list, we see that the symbol name we want is GREEK SMALL LETTER CHI, so: chi = u'\N{GREEK SMALL LETTER CHI}' will do the trick. For chi^2, use: chi2 = u'\N{GREEK SMALL LETTER CHI}\N{SUPERSCRIPT TWO}' Note that to print these characters, we usually need to encode them somehow. My terminal supports UTF-8, so the following works for me: import codecs print codecs.encode(chi2, 'utf8') giving (if your mail reader supports utf8 and mine encodes it properly...): χ² Zach Pincus Program in Biomedical Informatics and Department of Biochemistry Stanford University School of Medicine On Feb 20, 2007, at 3:56 PM, Mark Janikas wrote: Hello all, I was wondering how I could print the chi-squared symbol in python. I have been looking at the Unicode docs, but I figured I would ask for assistance here while I delve into it. Thanks for any help in advance. Mark Janikas Product Engineer ESRI, Geoprocessing 380 New York St. Redlands, CA 92373 909-793-2853 (2563) [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Greek Letters
Oh. I am using CygWin, and the website I just went to: http://www.cygwin.com/faq/faq_3.html stated that: The short answer is that Cygwin is not Unicode-aware Not sure if this is going to apply to python in general, but I suspect it will. Ugh, I dislike Windows a lot, but it pays the bills. The interesting thing to note is that the print out to gui interface is 'UTF-8' so it works. It just wont work on my terminal where I do all of my testing. I might just have to put a try statement in and put a chi-square in the except. MJ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mark Janikas Sent: Tuesday, February 20, 2007 5:16 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Greek Letters Thanks Robert but alas, I get. import sys sys.stdout.encoding 'cp437' print u'\u03a7\u00b2'.encode(sys.stdout.encoding) Traceback (most recent call last): File stdin, line 1, in ? File C:\Python24\lib\encodings\cp437.py, line 18, in encode return codecs.charmap_encode(input,errors,encoding_map) UnicodeEncodeError: 'charmap' codec can't encode character u'\u03a7' in position 0: character maps to undefined Ill keep at it please let me know if you have any solutions Thanks again, MJ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Robert Kern Sent: Tuesday, February 20, 2007 4:20 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Greek Letters Mark Janikas wrote: Hello all, I was wondering how I could print the chi-squared symbol in python. I have been looking at the Unicode docs, but I figured I would ask for assistance here while I delve into it. Thanks for any help in advance. Print it where? To the terminal (which one?)? In HTML? With some GUI? Assuming that you have a Unicode-capable terminal, you can find out the encoding it uses by looking at sys.stdout.encoding. Encode your Unicode string with that encoding, and print it. E.g., I use iTerm on OS X and set it to use UTF-8 as the encoding: In [5]: import sys In [6]: sys.stdout.encoding Out[6]: 'UTF-8' In [7]: print u'\u03a7\u00b2'.encode(sys.stdout.encoding) Χ² -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fromstring, tostring slow?
Yup. It was faster to: Use lists for the append, then transform into an array, then transform into a binary string Rather than Create empty arrays and use its append method, then transform into a binary string. The last question on the output when then be to test the speed of using generic Python arrays, which have append methods as well. Then, there would still only be the binary string conversion as apposed to list--numpy array--binary string Thanks to all for your input MJ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Charles R Harris Sent: Tuesday, February 13, 2007 12:44 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] fromstring, tostring slow? I am going to guess that a list would be faster for appending. Concat and, I suspect, append make new arrays for each use, rather like string concatenation in Python. A list, on the other hand, is no doubt optimized for adding new values. Another option might be using PyTables with extensible arrays. In any case, a bit of timing should show the way if the performance is that crucial to your application. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fromstring, tostring slow?
I don't think I can do that because I have heterogeneous rows of data I.e. the columns in each row are different in length. Furthermore, when reading it back in, I want to read only bytes of the info at a time so I can save memory. In this case, I only want to have one record in mem at once. Another issue has arisen from taking this routine cross-platform namely, if I write the file on Windows I cant read it on Solaris. I assume the big-little endian is at hand here. I know using the struct module that I can pack using either one. Perhaps I will have to go back to the drawing board. I actually love these methods now because I get back out directly what I put in. Great kudos to the developers MJ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Christopher Barker Sent: Tuesday, February 13, 2007 1:39 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] fromstring, tostring slow? Mark Janikas wrote: I am finding that directly packing numpy arrays into binary using the tostring and fromstring methods For starters, use fromfile and tofile, to save the overhead of creating an entire extra string. fromfile is a function (as it is an alternate constructor for arrays): numpy.fromfile() ndarray.tofile() is an array method. Enclosed is your test, including a test for tofile(), I needed to make the arrays much larger, and use time.time() rather than time.clock() to get enough time resolution to see anything, though if you really want to be accurate, you need to use the timeit module. My results: Using lists 0.457561016083 Using tostring 0.00922703742981 Using tofile 0.00431108474731 Another note: where is the data coming from -- there may be ways to optimize this whole process if we saw that. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fromstring, tostring slow?
Yes, but does the code have the same license as NumPy? As I work for a software company, where I help with the scripting interface, I must make sure everything I use is cited and has the appropriate license. MJ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Stefan van der Walt Sent: Tuesday, February 13, 2007 3:52 PM To: numpy-discussion@scipy.org Subject: Re: [Numpy-discussion] fromstring, tostring slow? On Tue, Feb 13, 2007 at 03:44:37PM -0800, Mark Janikas wrote: I don't think I can do that because I have heterogeneous rows of data I.e. the columns in each row are different in length. Furthermore, when reading it back in, I want to read only bytes of the info at a time so I can save memory. In this case, I only want to have one record in mem at once. Another issue has arisen from taking this routine cross-platform namely, if I write the file on Windows I cant read it on Solaris. I assume the big-little endian is at hand here. Indeed. You may want to take a look at npfile, the new IO module in scipy written by Matthew Brett (you don't have to install the whole scipy to use it, just grab the file). Cheers Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fromstring, tostring slow?
This is all very good info. Especially, the byteswap. Ill be testing it momentarily. As far as a detailed explanation of the problem In essence, I am applying sparse matrix multiplication. The matrix of which I am dealing with in the matter described is nxn. Generally, this matrix is 1-20% sparse. I use it in spatial data analysis, where the matrix W represents the spatial association between n observations. The operations I perform on it are generally related to the spatial lag of a variable... or Wy, where y is a nxk matrix (usually k=1). As k is generally small, the y vector and the result vector are represented by numpy arrays. I can have nxkx2 pieces of info in mem (usually). What I cant have is n**2. So, I store each row of W in a file as a record consisting of 3 parts: 1) row, nn (# of neighbors) 2) nhs (nx1) vector of integers representing the columns in row[i] != 0 3) weights (nx1) vector of floats corresponding to the index in the previous row The first two parts of the record are known as a GAL or geographic algorithm library. Since a lot of my W matrices have distance metrics associated with them I added the third. I think this might be termed by someone else as an enhanced GAL. At any rate, this allows me to perform this operation on large datasets w/o running out of mem. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Christopher Barker Sent: Tuesday, February 13, 2007 4:07 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] fromstring, tostring slow? Mark Janikas wrote: I don't think I can do that because I have heterogeneous rows of data I.e. the columns in each row are different in length. like I said, show us your whole problem... But you don't have to write.read all the data at once with from/tofile() anyway. Each of your rows has to be in a separate array anyway, as numpy arrays don't support ragged arrays, but each row can be written with tofile() Furthermore, when reading it back in, I want to read only bytes of the info at a time so I can save memory. In this case, I only want to have one record in mem at once. you can make multiple calls to fromfile(), thou you'll have to know how long each record is. Another issue has arisen from taking this routine cross-platform namely, if I write the file on Windows I cant read it on Solaris. I assume the big-little endian is at hand here. yup. I know using the struct module that I can pack using either one. so can numpy. see the byteswap method, and you can specify a particular endianess with a datatype when you read with fromfile(): a = N.fromfile(DataFile, dtype=N.dtype(d), count=20) reads 20 little-endian doubles from DataFile, regardless of the native endianess of the machine you're on. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Newbie Question, Probability
Hello all, Is there a way to get probability values for the various families of distributions in numpy? I.e. ala R: pnorm(1.96, mean = 0 , sd = 1) [1] 0.9750021 # for the normal pt(1.65, df=100) [1] 0.9489597 # for student t Any suggestions would be greatly appreciated. Mark Janikas Product Engineer ESRI, Geoprocessing 380 New York St. Redlands, CA 92373 909-793-2853 (2563) [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion