Re: [Numpy-discussion] finding elements that match any in a set
Michael Katz michaeladamkatz at yahoo.com writes: Yes, thanks, np.in1d is what I needed. I didn't know how to find that. Did you check in the documentation? If so, where did you check? Would you have found it if it was in the 'See also' section of where()? (http://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html) I ask because people often post to the list needing in1d() after not being able to find it via the docs, so it would be nice to add references in the places people go looking for it. Neil ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Boolean arrays
Ideally, I would like in1d to always be the right answer to this problem. It should be easy to put in an if statement to switch to a kern_in()-type function in the case of large ar1 but small ar2. I will do some timing tests and make a patch. I uploaded a timing test and a patch to arraysetops.py here: http://projects.scipy.org/numpy/ticket/1603 The new in1d() uses the kern_in algorithm when it's faster, and the existing algorithm otherwise. The speedup compared to the old in1d() for cases with very large ar1 and small ar2 can be up to 10x on my laptop. If someone with commit access could take a look and and apply it if ok, that would be great. Thanks, Neil ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] BOF notes: Fernando's proposal : NumPy?ndarray with named axes
Gael Varoquaux gael.varoquaux at normalesup.org writes: Let say that you have a dataset that is in a 3D array, where axis 0 corresponds to days, axis 1 to hours of the day, and axis 2 to temperature, you might want to have the mean of the temperature in each day, which would be in current numpy: data.mean(axis=0) or the mean of the temperature at every hour, across the different days, which would be: data.mean(axis=1) I do such manipulation all the time, and keeping track of which axis is what is fairly tedious and error prone. It would be much nicer to be able to write: data.ax_day.mean(axis=0) data.ax_hour.mean(axis=0) Thanks, that's a really nice description. Instead of data.ax_day.mean(axis=0) I think it would be clearer to do something like data.mean(axis='day') but I see the motivation. Neil ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes
Rob Speer rspeer at MIT.EDU writes: It's not just about the rows: a 2-D datarray can also index by columns, an operation that has no equivalent in a 1-D array of records like your example. rec['305'] effectively indexes by column. This is one the main attractions of structured/record arrays. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] BOF notes: Fernando's proposal : NumPy ndarray with named axes
Robert Kern robert.kern at gmail.com writes: Please install Fernando's datarray package, play with it, read its documentation, then come back with objections or alternatives. I really don't think you understand what is being proposed. Well the discussion has been pretty confusing. For mostly my benefit, here's my understanding of the proposal. Currently the only way to choose which axis of an array we want is by the indexing position. So to access a row of a 2d array (axis=0) a[0,:] or just a[0] # first row and a column (axis=1) a[:,0]# first column To choose an individual element along an axis we must use integer indices: a[0,3] # the element that is 1st along the 1st axis and # 4th along the 2nd axis Fernando's proposal would allow us to specify the axis by a name (called a label) instead of a position, and the element number by a name (called a tick) instead of an integer, while retaining the old position + integer indexing. Ticks are in effect named indices. I can see the attraction of accessing an axis by name instead of indexing position, because it's easy to get confused over position when you've got a 2d or higher dimension array. But the utility of named indices is not so clear to me. As I understand it, these new arrays will still only be able to have a single type of data (one of float, str, int and so on). This seems to be pretty limiting. What is a use case for the new array type that can't be solved by structured/record arrays? Sounds like it was decided at the Sciy BOF they were a good idea, several people have implemented a version of them and Fernando and Gael have both said they find them useful, so they must have something going for them. Maybe Fernando or Gael could share an example where arrays with named axes and indices are especially useful, for the peanut gallery's benefit? Cheers, Neil ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] reduce array by computing min/max every n samples
Warren Weckesser warren.weckesser at enthought.com writes: Benjamin Root wrote: Brad, I think you are doing it the right way, but I think what is happening is that the reshape() call on the sliced array is forcing a copy to be made first. The fact that the copy has to be made twice just worsens the issue. I would save a copy of the reshape result (it is usually a view of the original data, unless a copy is forced), and then perform a min/max call on that with the appropriate axis. On that note, would it be a bad idea to have a function that returns a min/max tuple? +1. More than once I've wanted exactly such a function. I also think this would be useful. For what it's worth, IDL also has a function called minmax() that does this (e.g. http://astro.uni-tuebingen.de/software/idl/astrolib/misc/minmax.html) Neil ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] chararray stripping trailing whitespace a bug?
This is an intentional feature, not a bug. Chris Ah, ok, thanks. I missed the explanation in the doc string because I'm using version 1.3 and forgot to check the web docs. For the record, this was my bug: I read a fits binary table with pyfits. One of the table fields was a chararray containing a bunch of flags ('A','B','C','D'). I tried to use in1d() to identify all entries with flags of 'C' or 'D'. So c = pyfits_table.chararray_column mask = np.in1d(c, ['C', 'D']) It turns out the actual stored values in the chararray were 'A ', 'B ', 'C ' and 'D '. in1d() converts the chararray to an ndarray before performing the comparison, so none of the entries matches 'C' or 'D'. What is the best way to ensure this doesn't happen to other people? We could change the array set operations to special-case chararrays, but this seems like an ugly solution. Is it possible to change something in pyfits to avoid this? Neil ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] chararray stripping trailing whitespace a bug?
This inconsistency is fixed in Numpy 1.4 (which included a major overhaul of chararrays). in1d will perform the auto whitespace-stripping on chararrays, but not on regular ndarrays of strings. Great, thanks. Pyfits continues to use chararray since not doing so would break existing code relying on this behavior. And there are many use cases where this behavior is desirable, particularly with fixed-length strings in tables. The best way to get around it from your code is to cast the chararray pyfits returns to a regular ndarray. My problem was I didn't know I needed to get around it :) But thanks for the suggestion, I'll use that in future when I need to switch between chararrays and ndarrays. Neil ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] quot;Matchquot; two arrays
Shailendra shailendra.vikas at gmail.com writes: Hi All, I want to make a function which should be like this code cordinates1=(x1,y1) # x1 and y1 are x-cord and y-cord of a large number of points cordinates2=(x2,y2) # similar to condinates1 indices1,indices2= match_cordinates(cordinates1,cordinates2) code (x1[indices1],y1[indices1]) matches (x2[indices2],y2[indices2]) where definition of match is such that : If A is closest point to B and distance between A and B is less that delta than it is a match. If A is closest point to B and distance between A and B is more that delta than there is no match. Every point has either 1 match(closest point) or none Also, the size of the cordinates1 and cordinates2 are quite large and outer should not be used. I can think of only C style code to achieve this. Can any one suggest pythonic way of doing this? Thanks, Shailendra A similar problem comes up when you have to match astronomical coordinates. I wrote a python + numpy function that is fast enough for my use cases - you might be able to adapt it: http://bitbucket.org/nhmc/pyserpens/src/tip/coord.py The matching function starts on line 166. Disclaimer: I haven't looked at the kdtree code yet, that might be a better approach. Neil ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Test if one element of string array is in a defined list
Eric Emsellem eemselle at eso.org writes: Hi I would like to test whether strings in a numpy S array are in a given list but I don't manage to do so. Any hint is welcome. === # So here is an example of what I would like to do # I have a String numpy array: import numpy as num Sarray = num.asarray([test1,test2,tutu,toto]) Farray = num.arange(len(Sarray)) mylist = [tutu,hello,why] in1d() does what you want. import numpy as np Sarray = np.array([test1,test2,tutu,toto]) mylist = [tutu,hello,why] np.in1d(Sarray, mylist) array([False, False, True, False], dtype=bool) Be careful of whitespace when doing string comparisons; tutu != tutu (I've been burnt by this in the past). in1d() is only in more recent versions of numpy (1.4+). If you can't upgrade, you can cut and paste the in1d() and unique() routines from here: http://projects.scipy.org/numpy/browser/branches/datetime/numpy/lib/arraysetops. py to use in your own modules. Cheers, Neil ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Calling routines from a Fortran li brary using python
Nils Wagner nwagner at iam.uni-stuttgart.de writes: Hi David, you are right. It's a proprietary library. I found a header file (*.h) including prototype declarations of externally callable procedures. How can I proceed ? Apparently you can use ctypes to access fortran libraries. See the first paragraph of: http://www.sagemath.org/doc/numerical_sage/ctypes.html You may have to convert the .a library to a .so library. Neil ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] dtype=None as default for np.genfromtxt ?
Pierre GM pgmdevlist at gmail.com writes: It has been suggested (ticket #1262) to change the default dtype=float to dtype=None in np.genfromtxt. Any thoughts ? I agree dtype=None should be default for the reasons given in the ticket. How do we handle the backwards-incompatible change? A warning in the next release, then change it in the following release? Neil ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Release notes for arraysetops changes
Hi, I've written some release notes (below) describing the changes to arraysetops.py. If someone with commit access could check that these sound ok and add them to the release notes file, that would be great. Cheers, Neil New features Improved set operations ~~~ In previous versions of NumPy some set functions (intersect1d, setxor1d, setdiff1d and setmember1d) could return incorrect results if the input arrays contained duplicate items. These now work correctly for input arrays with duplicates. setmember1d has been renamed to in1d, as with the change to accept arrays with duplicates it is no longer a set operation, and is conceptually similar to an elementwise version of the Python operator 'in'. All of these functions now accept the boolean keyword assume_unique. This is False by default, but can be set True if the input arrays are known not to contain duplicates, which can increase the functions' execution speed. Deprecations #. unique1d: use unique instead. unique1d raises a deprecation warning in 1.4, and will be removed in 1.5. #. intersect1d_nu: use intersect1d instead. intersect1d_nu raises a deprecation warning in 1.4, and will be removed in 1.5. #. setmember1d: use in1d instead. setmember1d raises a deprecation warning in 1.4, and will be removed in 1.5. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Deprecate np.max/np.min ?
Charles R Harris charlesr.harris at gmail.com writes: People import these functions -- yes, they shouldn't do that -- and the python builtin versions are overloaded, causing hard to locate errors. While I would love less duplication in the numpy namespace, I don't think the small gain here is worth the pain of deprecation. OTOH, one can ask, why is np.min(3, 2) allowed when np.min([3], 2) gives ValueError: axis(=2) out of bounds. It seems to me that 0-dimensional objects should accept only None as the axis? (Fixing this would also make misuse of np.min and np.max more difficult.) I think it would be better to fix this issue. np.min(3,2) should also give ValueError: axis(=2) out of bounds. Fixing this also removes any possibility of generating hard-to-find errors by overwriting the builtin min/max. (Unless there's some corner case I'm missing). Neil ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] converting discrete data to unique integers
josef.pktd at gmail.com writes: Good point. With the return_inverse solution, is unique() guaranteed to give back the same array of unique values in the same (presumably sorted) order? That is, for two arrays A and B which have elements only drawn from a set S, is all(unique(A) == unique(B)) guaranteed? The code is a quite clever and a bit hard to follow, but it *looks* like it will provide a stable mapping since it's using a sort. I looked at it some time ago, and from what I remember, the sort is done if return_inverse=True but for some codepath it uses set. unique always sorts, even if it uses set. So I'm pretty sure all(unique(A) == unique(B)) is guaranteed. Neil ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Help with np.where and datetime functions
John [H2O] washakie at gmail.com writes: What I am trying to do (obviously?) is find all the values of X that fall within a time range. Specifically, one point I do not understand is why the following two methods fail: -- 196 ind = np.where( (t1 Y[:,0] t2) ) #same result with/without inner parens TypeError: can't compare datetime.datetime to numpy.ndarray OR trying the 'and' method: -- 196 ind = np.where( (Y[:,0]t1) and (Y[:,0]t2) ) ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() Use (t1 Y[:,0]) (Y[:,0] t2). The python keywords 'and' and 'or' can't be overloaded, but the bitwise operators can be (and are) overloaded by arrays. Conditionals like a x b are converted to (a x) and (x b), which is why they don't work either. There is a proposal to enable overloadable 'and' and 'or' methods (http://www.python.org/dev/peps/pep-0335/), but I don't think it's ever got enough support to be accepted. Also, if you don't need the indices, you can just use the conditional expression as a boolean mask: condition = (t1 Y[:,0]) (Y[:,0] t2) Y[:,0][condition] Neil ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Using loadtxt to read in mixed data types
Pierre GM pgmdevlist at gmail.com writes: What about 'formats':[eval(b) for b in event_format] Should it fail, try something like: dtype([(x,eval(b)) for (x,b) in zip(event_fields, event_format)]) At least you force dtype to have the same nb of names formats. You could use data = np.genfromtxt(filename, names=listofname, dtype=None) Then you just need to specify the column names, and not the dtypes (they are inferred from the data). There are probably backwards compatibility issues, but it would be great if dtype=None was the default for genfromtxt. Neil ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Patch for review (improving arraysetops)
Robert Cimrman cimrman3 at ntc.zcu.cz writes: Hi Neil, This sounds good. If you don't have time to do it, I don't mind having a go at writing a patch to implement these changes (deprecate the existing unique1d, rename unique1d to unique and add the set approach from the old unique, and the other changes mentioned in http://projects.scipy.org/numpy/ticket/1133). That would be really great - I will not be online starting tomorrow till the end of next week (more or less), so I can really look at the issue after I return. Here's a patch that implements most of the changes suggested in the ticket, and merges unique and unique1d functionality to a single function unique in arraysetops: http://projects.scipy.org/numpy/ticket/1133 Please review it. Thanks, Neil ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Plans for Numpy 1.4.0 and scipy 0.8.0
David Cournapeau david at ar.media.kyoto-u.ac.jp writes: (Continuing the discussion initiated in the neighborhood iterator thread) Hi, I would like to gather people's opinion on what to target for numpy 1.4.0. Are there any other features people would like to put into numpy for 1.4.0 ? I'd like to get the patch in ticket 1113 (http://projects.scipy.org/numpy/ticket/1133), or some version of it, into 1.4. It would also be great to get all the docstrings David Goldsmith and others are working on into the next release. Neil ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Plans for Numpy 1.4.0 and scipy 0.8.0
David Cournapeau cournape at gmail.com writes: David Cournapeau wrote: (Continuing the discussion initiated in the neighborhood iterator thread) - Chuck suggested to drop python 2.6 support from now on. I am against it without a very strong and detailed rationale, because many OS still don't have python 2.6 (RHEL, Ubuntu LTS). I vote against dropping support for python 2.5. Personally I have no incentive to upgrade to 2.6 and am very happy with 2.5. Will requiring python-2.6 help the developers port numpy to python-3? Can't really say at this point, but it is the suggested path to python-3. OTOH, I don't find the python 3 official transition story very convincing. I have tried to gather all the information I could find, both on the python wiki and from transitions stories. To support both python 2 and 3, the suggestion is to use the 2to3 script, but it is painfully slow for big packages like numpy. And there ave very few stories for porting python 3 C extensions. Another suggestion is to avoid breaking the API when transitioning for python 3. But that seems quite unrealistic. How do we deal with the removing of string/long APIs ? This will impact the numpy API as well, so how do we deal with it ? As I understand this suggestion, they just hope external packages don't say 'Hey, if we're breaking backwards compatibility anyway, lets take the chance to do a whole lot of extra API breakage!' That way, if people have problems migrating to the new version, they know they're likely to be python 3 related. Jarrod Millman's blog post about numpy and python 3 mentions this: http://jarrodmillman.blogspot.com/2009/01/when-will-numpy-and-scipy-migrate-to.html Also, there does not seem to be any advantages for python 3 for scientific people ? I think there are lots of advantages in python 3 for scientific people. The new integer division alone is a huge improvement. I've been bitten by this (1/2 = 0) several times in the past, and the only reason I'm not bitten by it now is that I've trained myself to always type things like 1./x, which look ugly. The reorganisation of the standard library and the removal of duplicate ways of doing things in the core also makes the language much easier to learn. This isn't a huge gain for people already familiar with Python's idiosyncracies, but it's important for people first coming to the language. Print becoming a function would have been a pain for interactive work, but happily ipython auto-parentheses takes care of that. You could argue that moving to python 3 isn't attractive because there isn't any scientific library support, but then that's because numpy hasn't been ported to python 3 yet ;) Neil ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] all and alltrue
Shivaraj M S shivraj.ms at gmail.com writes: Hello,I just came across 'all' and 'alltrue' functions in fromnumeric.py They are one and same.IMHO,alltrue = all would be sufficient.Regards___ Shivaraj-- There are other duplications too: np.all np.alltrue np.any np.sometrue np.deg2rad np.radians np.rad2deg np.degrees And maybe more I've missed. Can we deprecate alltrue and sometrue, and either deg2rad/rad2deg, or radians/degrees? They would be deprecated in 1.4 and presumably removed in 1.5. Neil ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] improving arraysetops
What about merging unique and unique1d? They're essentially identical for an array input, but unique uses the builtin set() for non-array inputs and so is around 2x faster in this case - see below. Is it worth accepting a speed regression for unique to get rid of the function duplication? (Or can they be combined?) unique1d can return the indices - can this be achieved by using set(), too? No, set() can't return the indices as far as I know. The implementation for arrays is the same already, IMHO, so I would prefer adding return_index, return_inverse to unique (automatically converting input to array, if necessary), and deprecate unique1d. We can view it also as adding the set() approach to unique1d, when the return_index, return_inverse arguments are not set, and renaming unique1d - unique. This sounds good. If you don't have time to do it, I don't mind having a go at writing a patch to implement these changes (deprecate the existing unique1d, rename unique1d to unique and add the set approach from the old unique, and the other changes mentioned in http://projects.scipy.org/numpy/ticket/1133). I have found a strange bug in unique(): In [24]: l = list(np.random.randint(100, size=1000)) In [25]: %timeit np.unique(l) --- UnicodeEncodeErrorTraceback (most recent call last) /usr/lib64/python2.5/site-packages/IPython/iplib.py in ipmagic(self, arg_s) 951 else: 952 magic_args = self.var_expand(magic_args,1) -- 953 return fn(magic_args) 954 955 def ipalias(self,arg_s): /usr/lib64/python2.5/site-packages/IPython/Magic.py in magic_timeit(self, parameter_s) 1829 precision, 1830 best * scaling[order], - 1831 units[order]) 1832 if tc tc_min: 1833 print Compiler time: %.2f s % tc UnicodeEncodeError: 'ascii' codec can't encode character u'\xb5' in position 28: ordinal not in range(128) It disappears after increasing the array size, or the integer size. In [39]: np.__version__ Out[39]: '1.4.0.dev7047' r. Weird! From the error message, it looks like a problem with ipython's timeit function rather than unique. I can't reproduce it on my machine (numpy 1.4.0.dev, r7059; IPython 0.10.bzr.r1163 ). Neil ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] improving arraysetops
Robert Cimrman cimrman3 at ntc.zcu.cz writes: Hi, I am starting a new thread, so that it reaches the interested people. Let us discuss improvements to arraysetops (array set operations) at [1] (allowing non-unique arrays as function arguments, better naming conventions and documentation). r. [1] http://projects.scipy.org/numpy/ticket/1133 Hi, These changes looks good to me. For point (1) I think we should fold the unique and _nu code into a single function. For point (3) I like in1d - it's shorter than isin1d but is still clear. What about merging unique and unique1d? They're essentially identical for an array input, but unique uses the builtin set() for non-array inputs and so is around 2x faster in this case - see below. Is it worth accepting a speed regression for unique to get rid of the function duplication? (Or can they be combined?) Neil In [24]: l = list(np.random.randint(100, size=1)) In [25]: %timeit np.unique1d(l) 1000 loops, best of 3: 1.9 ms per loop In [26]: %timeit np.unique(l) 1000 loops, best of 3: 793 µs per loop In [27]: l = list(np.random.randint(100, size=100)) In [28]: %timeit np.unique(l) 10 loops, best of 3: 78 ms per loop In [29]: %timeit np.unique1d(l) 10 loops, best of 3: 233 ms per loop ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] setmember1d_nu
Robert Cimrman cimrman3 at ntc.zcu.cz writes: I'd really like to see the setmember1d_nu function in ticket 1036 get into numpy. There's a patch waiting for review that including tests: http://projects.scipy.org/numpy/ticket/1036 Is there anything I can do to help get it applied? I guess I could commit it, if you review the patch and it works for you. Obviously, I cannot review it myself, but my SVN access may still work :) Thanks for the review, it is in! r. Great - thanks! People often post to the list asking for this functionality, so it's nice to get it into numpy (whatever it ends up being called). Neil ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] extract elements of an array that are contained in another array?
Robert Cimrman cimrman3 at ntc.zcu.cz writes: Anne Archibald wrote: 1. add a keyword argument to intersect1d assume_unique; if it is not present, check for uniqueness and emit a warning if not unique 2. change the warning to an exception Optionally: 3. change the meaning of the function to that of intersect1d_nu if the keyword argument is not present You mean something like: def intersect1d(ar1, ar2, assume_unique=False): if not assume_unique: return intersect1d_nu(ar1, ar2) else: ... # the current code intersect1d_nu could be still exported to numpy namespace, or not. +1 - from the user's point of view there should just be intersect1d and setmember1d (i.e. no '_nu' versions). The assume_unique keyword Robert suggests can be used if speed is a problem. I really like in1d (no underscore) as a new name for setmember1d_nu. inarray is another possibility. I don't like 'ain'; 'a' in front of 'in' detracts from readability, unlike the extra a in arange. Can we summarise the discussion in this thread and write up a short proposal about what we'd like to change in arraysetops, and how to make the changes? Then it's easy for other people to give their opinion on any changes. I can do this if no one else has time. Neil ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Changes to arraysetops
Thanks for the summary! I'm +1 on points 1, 2 and 3. +0 for points 4 and 5 (assume_unique keyword and renaming arraysetops). Neil PS. I think you mean deprecate, not depreciate :) ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] setmember1d_nu
Hi all, I posted this message couple of days ago, but gmane grouped it with an old thread and it hasn't shown up on the front page. So here it is again... I'd really like to see the setmember1d_nu function in ticket 1036 get into numpy. There's a patch waiting for review that including tests: http://projects.scipy.org/numpy/ticket/1036 Is there anything I can do to help get it applied? Neil ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] setmember1d_nu
Robert Cimrman cimrman3 at ntc.zcu.cz writes: Re-hi! Robert Cimrman wrote: Hi all, I have added to the ticket [1] a script that compares the proposed setmember1d_nu() implementations of Neil and Kim. Comments are welcome! [1] http://projects.scipy.org/numpy/ticket/1036 I have attached a patch incorporating the solution that the involved people agreed on, so review, please. best regards, r. Hi all, I'd really like to see the setmember1d_nu function in ticket 1036 get into numpy. There's a patch waiting for review that includes tests for the new function: http://projects.scipy.org/numpy/ticket/1036 Is there anything I can do to help get it applied? Neil ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] List/location of consecutive integers
Andrea Gavana andrea.gavana at gmail.com writes: this should be a very easy question but I am trying to make a script run as fast as possible, so please bear with me if the solution is easy and I just overlooked it. That's weird, I was trying to solve exactly the same problem a couple of weeks ago for a program I was working on. It must come up a lot. I ended up with a similar solution to Josef's, but it took me more than an hour to work it out - I should have asked here first! Neil ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Masking an array with another array
josef.pktd at gmail.com writes: setmember1d is very fast compared to the other solutions for large b. However, setmember1d requires that both arrays only have unique elements. So it doesn't work if, for example, your first array is a data vector with member ship in different groups (therefore not only uniques), and the second array is the sub group that you want to check. Note there's a patch waiting to be reviewed that adds another version of setmember_1d for non-unique inputs. http://projects.scipy.org/numpy/ticket/1036 Neil ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] intersect1d and setmember1d
Robert Kern robert.kern at gmail.com writes: Do you mind if we just add you to the THANKS.txt file, and consider you as a NumPy Developer per the LICENSE.txt as having released that code under the numpy license? If we're dotting our i's and crossing our t's legally, that's a bit more straightforward (oddly enough). No, I don't mind having it released under the numpy licence. Neil ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] intersect1d and setmember1d
Robert Cimrman cimrman3 at ntc.zcu.cz writes: Hi Neil! I would like to add your function to arraysetops.py - is it ok? Just the name would be changed to setmember1d_nu, to follow the naming in the module (like intersect1d_nu). Thank you, r. That's fine! There's no licence attached, it's in the public domain. Neil ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [SciPy08] Documentation BoF
- Should we have a separate User manual and a Reference manual, or only a single manual? Are there still plans to write a 10 page 'Getting started with NumPy' document? I think this would be very useful. Ideally a 'getting started' document, the docstrings, and a reference manual is all the documentation you'd need. I try to avoid reading long user manuals unless I'm forced to, but I don't know what other people do. Neil ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Reference guide updated
Ok, thanks. I meant the amount of vertical space between lines of text - for example, the gaps between parameter values and their description, or the large spacing between both lines of text and and the text boxes in the examples section. If other people agree it's a problem, I thought the spacing could be tweaked. It's not a problem that there's more than one function per page. I have been helping out with the docs (where I feel I'm qualified enough - I'm no numpy expert!). I think it will make numpy much easier to learn to have easily-accessible, comprehensive docstrings, and the documentation editor makes it very easy to contribute. I'm also learning a lot reading other people's docstrings :) It there a reason why there's so much vertical space between all of the text sections? I find the docstrings much easier to read in the editor: Roughly: * In the editor, you have one function per page, vs several function per page in the reference * in the editor, the blocks 'Parameters', 'Returns'... are considered as sections, while in the reference, they are Field lists (roughly). In the end, it's only a matter of taste, of course. But you raise an interesting point, we should provide some kind of options to choose between behaviors. Please note as well that the reference guide is a work in progress: you're more than welcome to join and work with us. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Reference guide updated
A new copy of the reference guide is now available at http://mentat.za.net/numpy/refguide/ It there a reason why there's so much vertical space between all of the text sections? I find the docstrings much easier to read in the editor: http://sd-2116.dedibox.fr/pydocweb/doc/numpy.core.fromnumeric.all/ than in the reference guide: http://mentat.za.net/numpy/refguide/routines.logic.xhtml Neil ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Making NumPy accessible to everyone (or no-one) (was Numpy-discussion Digest, Vol 19, Issue 44)
Thanks Joe for the excellent post. It mirrors my experience with Python and Numpy very eloquently, and I think it presents a good argument against the excessive use of namespaces. I'm not so worried about N. vs np. though - I use the same method Matthew Brett suggests. If I'm going to use, say, sin and cos a lot in a script such that all the np. prefixes would make the code hard to read, I'll use: import numpy as np from numpy import sin,cos To those people who have invoked 'Namespaces are a honking great idea - let's do more of those', I'll cancel that with 'Flat is better than nested' :) I certainly wouldn't argue that using namespaces to separate categories of functions is always a bad thing, but I think it should only be done as a last resort. Neil On 10/04/2008, Joe Harrington [EMAIL PROTECTED] wrote: Absolutely. Let's please standardize on: import numpy as np import scipy as sp I hope we do NOT standardize on these abbreviations. While a few may have discussed it at a sprint, it hasn't seen broad discussion and there are reasons to prefer the other practice (numpy as N, scipy as S, pylab as P). My reasons for saying this go back to my reasons for disliking lots of heirarchical namespaces at all: if we must have namespaces, let's minimize the visual and typing impact by making them short and visually distinct from the function names (by capitalizing them). What concerns me about the discussion is that we are still not thinking like communications and thought-process experts, we are thinking like categorizers and accountants. The arguments we are raising don't have to do, positively or negatively, with the difficult acts of communicating with a computer and with other readers of our code. Those are the sole purposes of computer languages. Namespaces add characters to code that have a high redundancy factor. This means they pollute code, make it slow and inaccurate to read, and making learning harder. Lines get longer and may wrap if they contain several calls. It is harder while visually scanning code to distinguish the function name if it's adjacent to a bunch of other text, particularly if that text appears commonly in the nearby code. It therefore becomes harder to spot bugs. Mathematical code becomes less and less like the math expressions we write on paper when doing derivations, making it harder to interpret and verify. You have to memorize which subpackage each function is in, which is hard to do for those functions that could naturally go in two subpackages. While many math function names are obvious, subpackage names are not. Is it .stat or .stats or .statistics? .rand or .random? .fin or .financial? Some functions have this problem, but *every* namespace name has it in spades. The arguments people are raising are arguments related to how emotionally satisfying it is to have a place for everything and everything in its place, and to know you know everything there is to know. While we like both those things, as scientists, engineers, and mathematicians, they are almost irrelevant to coding. There is simply no reduction in readability, writeability, or debugability if you don't have namespace prefixes on everything, and knowing you know everything is easily accomplished now with the online categorized function list. We can incorporate that functionality into the doc reading apparatus (help, currently) by using keywords in ReST comments in the docstrings and providing a way for help and its friends to list the keywords and what functions are connected to them. What nobody has said is if we have lots of namespaces, my code will look prettier or if we have lots of namespaces, normal people will learn faster or if we have lots of namespaces, my code will be easier to verify and debug. I don't believe any of these statements to be true. Do you? Similarly, nobody has said, if we have lots of namespaces, I'll be a faster coder. There is a *very* high obnoxiousness factor in typing redundant stuff at an interpreter. It's already annoying to type N.sin instead of sin, but N.T.sin? Or worse, np.tg.sin? Now the prefix has twice the characters of the function itself! Most IDL users *hate* that you have to type print, in order to inspect the contents of a variable. Yet, with multiple layers of namespaces we'd have lots more than seven extra characters on most lines of code, and unlike the IDL mess you'd have to *think* to recall what the right extra characters were for each function call, unlike just telling your hands to run the print, finger macro once again. The reasons we all like Python relate to how quick and easy it is to emit code from our fingertips that is similar to what we are thinking in our brains, compared to other languages. The brain doesn't declare variables, nor run loops over arrays. Neither does Python. When we
Re: [Numpy-discussion] Simple financial functions for NumPy
I'm just a numpy user, but for what it's worth, I would much prefer to have a single numpy namespace with a small as possible number of objects inside that namespace. To me, 'as small as possible' means that it only includes the array and associated array manipulation functions (searchsorted, where, and the record array functions), and the various functions that operate on arrays (exp, log, sin, cos, abs, any, etc). Having a small number of objects in a single namespace means that numpy is much easier to learn for beginners, as it's easier to find the appropriate thing for what you want to do (this is also helped by reducing duplication *shakes fist at .compress* and good documentation). It's also much easier to explore with dir() to jog your memory as to what function you need for a task. If I felt I contributed enough to this list to have a '1', I would be -1 on adding financial functions to numpy. On Fri, Apr 4, 2008 at 3:31 PM, Anne Archibald [EMAIL PROTECTED] wrote: On 04/04/2008, Alan G Isaac [EMAIL PROTECTED] wrote: It seems to me that there are two separate issues people are talking about when they talk about packaging: * How should functions be arranged in the namespaces? numpy.foo(), scipy.foo(), numpy.lib.financial.foo(), scikits.foo(), numkitfull.foo()? * Which code should be distributed together? Should scipy require separate downloading and compilation from numpy? The two questions are not completely independent - it would be horribly confusing to have the set of functions available in a given namespace depend on which packages you had installed - but for the most part it's not a problem to have several toplevel namespaces in one package (python's library is the biggest example of this I know of). For the first question, there's definitely a question about how much should be done with namespaces and how much with documentation. The second is a different story. Personally, I would prefer if numpy and scipy were distributed together, preferably with matplotlib. Then anybody who used numpy would have available all the scpy tools and all the plotting tools; I think it would cut down on wheel reinvention and make application development easier. Teachers would not need to restrict themselves to using only functions built into numpy for fear that their students might not have scipy installed - how many students have learned to save their arrays in unportable binary formats because their teacher didn't want them to have to install scipy? I realize that this poses technical problems. For me installing scipy is just a matter of clicking on a checkbox and installing a 30 MB package, but I realize that some platforms make this much more difficult. If we can't just bundle the two, fine. But I think it is mad to consider subdividing further if we don't have to. If these were tightly tied together, for instance in one big dll , this would be unpleasant for me. I still have people downloading stuff over 56k modems and adding an extra 30 MB to the already somewhat bloated numpy distribution would make there lives more tedious than they already are. I think python's success is due in part to its batteries included library. The fact that you can just write a short python script with no extra dependencies that can download files from the Web, parse XML, manage subprocesses, and save persistent objects makes development much faster than if you had to forever decide between adding dependencies and reinventing the wheel. I think numpy and scipy should follow the same principle, of coming batteries included. One thing they try to do in Python proper is think a lot more before adding stuff to the standard library. Generally packages need to exist separately for some period of time to prove there general utility and to stabilize before they get accepted. Particularly in the core, but in the library as well, they make an effort to chose a compact set of primitive operations without a lot of duplication (the old There should be one-- and preferably only one --obvious way to do it.). The numpy community has, particularly of late, been rather quick to add things that seem like they *might *be useful. One of the advantages of having multiple namespaces would have been to enforce a certain amount of discipline on what went into numpy, since it would've been easier to look at and evaluate a few dozen functions that might have comprised some subpackage rather than, let's say, five hundred or so. I suspect it's too late now; numpy has chosen the path of matlab and the other array packages and is slowly accumulating nearly everything in one big flat namespace. I don't like it, but it seems pointless to fight it at this point. So in this specific case, yes, I think the financial functions should absolutely be included; whether
Re: [Numpy-discussion] RAdian -- degres conversion
Do we really need these functions in numpy? I mean it's just multiplying/dividing the value by pi/180 (who knows why they're in the math module..). Numpy doesn't have asin, acos, or atan either (they're arcsin, arcos and arctan) so it isn't a superset of the math module. I would like there to be fewer functions in numpy, not more. Someone on the wxPython list just pointed out that the math module now includes includes angle-conversion utilities: degrees.__doc__ degrees(x) - converts angle x from radians to degrees radians.__doc__ radians(x) - converts angle x from degrees to radians Not a big deal, but handy. As I generally like to think if numpy as a superset of the math module, perhaps is should include these too. Done. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion