> > What about merging unique and unique1d? They're essentially identical for > > an > > array input, but unique uses the builtin set() for non-array inputs and so > > is > > around 2x faster in this case - see below. Is it worth accepting a speed > > regression for unique to get rid of the function duplication? (Or can they > > be > > combined?) > > unique1d can return the indices - can this be achieved by using set(), too? >
No, set() can't return the indices as far as I know. > The implementation for arrays is the same already, IMHO, so I would > prefer adding return_index, return_inverse to unique (automatically > converting input to array, if necessary), and deprecate unique1d. > > We can view it also as adding the set() approach to unique1d, when the > return_index, return_inverse arguments are not set, and renaming > unique1d -> unique. > This sounds good. If you don't have time to do it, I don't mind having a go at writing a patch to implement these changes (deprecate the existing unique1d, rename unique1d to unique and add the set approach from the old unique, and the other changes mentioned in http://projects.scipy.org/numpy/ticket/1133). > I have found a strange bug in unique(): > > In [24]: l = list(np.random.randint(100, size=1000)) > > In [25]: %timeit np.unique(l) > --------------------------------------------------------------------------- > UnicodeEncodeError Traceback (most recent call last) > > /usr/lib64/python2.5/site-packages/IPython/iplib.py in ipmagic(self, arg_s) > 951 else: > 952 magic_args = self.var_expand(magic_args,1) > --> 953 return fn(magic_args) > 954 > 955 def ipalias(self,arg_s): > > /usr/lib64/python2.5/site-packages/IPython/Magic.py in > magic_timeit(self, parameter_s) > 1829 > precision, > 1830 best > * scaling[order], > -> 1831 > units[order]) > 1832 if tc > tc_min: > 1833 print "Compiler time: %.2f s" % tc > > UnicodeEncodeError: 'ascii' codec can't encode character u'\xb5' in > position 28: ordinal not in range(128) > > It disappears after increasing the array size, or the integer size. > In [39]: np.__version__ > Out[39]: '1.4.0.dev7047' > > r. Weird! From the error message, it looks like a problem with ipython's timeit function rather than unique. I can't reproduce it on my machine (numpy 1.4.0.dev, r7059; IPython 0.10.bzr.r1163 ). Neil _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion