Neil Crighton wrote: > Robert Cimrman <cimrman3 <at> ntc.zcu.cz> writes: > >> Hi, >> >> I am starting a new thread, so that it reaches the interested people. >> Let us discuss improvements to arraysetops (array set operations) at [1] >> (allowing non-unique arrays as function arguments, better naming >> conventions and documentation). >> >> r. >> >> [1] http://projects.scipy.org/numpy/ticket/1133 >> > > Hi, > > These changes looks good to me. For point (1) I think we should fold the > unique and _nu code into a single function. For point (3) I like in1d - it's > shorter than isin1d but is still clear.
yes, the _nu functions will be useless then, their bodies can be moved into the generic functions. > What about merging unique and unique1d? They're essentially identical for an > array input, but unique uses the builtin set() for non-array inputs and so is > around 2x faster in this case - see below. Is it worth accepting a speed > regression for unique to get rid of the function duplication? (Or can they > be > combined?) unique1d can return the indices - can this be achieved by using set(), too? The implementation for arrays is the same already, IMHO, so I would prefer adding return_index, return_inverse to unique (automatically converting input to array, if necessary), and deprecate unique1d. We can view it also as adding the set() approach to unique1d, when the return_index, return_inverse arguments are not set, and renaming unique1d -> unique. > Neil > > > In [24]: l = list(np.random.randint(100, size=10000)) > In [25]: %timeit np.unique1d(l) > 1000 loops, best of 3: 1.9 ms per loop > In [26]: %timeit np.unique(l) > 1000 loops, best of 3: 793 µs per loop > In [27]: l = list(np.random.randint(100, size=1000000)) > In [28]: %timeit np.unique(l) > 10 loops, best of 3: 78 ms per loop > In [29]: %timeit np.unique1d(l) > 10 loops, best of 3: 233 ms per loop I have found a strange bug in unique(): In [24]: l = list(np.random.randint(100, size=1000)) In [25]: %timeit np.unique(l) --------------------------------------------------------------------------- UnicodeEncodeError Traceback (most recent call last) /usr/lib64/python2.5/site-packages/IPython/iplib.py in ipmagic(self, arg_s) 951 else: 952 magic_args = self.var_expand(magic_args,1) --> 953 return fn(magic_args) 954 955 def ipalias(self,arg_s): /usr/lib64/python2.5/site-packages/IPython/Magic.py in magic_timeit(self, parameter_s) 1829 precision, 1830 best * scaling[order], -> 1831 units[order]) 1832 if tc > tc_min: 1833 print "Compiler time: %.2f s" % tc UnicodeEncodeError: 'ascii' codec can't encode character u'\xb5' in position 28: ordinal not in range(128) It disappears after increasing the array size, or the integer size. In [39]: np.__version__ Out[39]: '1.4.0.dev7047' r. _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion