On Wed, Mar 08, 2017 at 01:20:19AM +0000, Erik wrote: > >Part of the complexity here is that I'd like this flag to be available > >to Python code, not just a hidden internal state of the list. > > Out of interest, for what purpose? Generally, I thought Python code > should not need to worry about low-level optimisations such as this > (which are C-Python specific AIUI).
I mentioned earlier that I have code which has to track the type of list items, and swaps to a different algorithm when the types are not all the same. In practice, I just run the "mixed types" algorithm regardless, because the cost of doing a scan of the items in pure Python is too expensive relative to the time saved. Moving some of the work into the C infrastructure might change that. I'm not completely sure that this would, in fact, be useful to me, but I'd like the opportunity to experiment. I could try using a list subclass, but again, the cost of doing the type-checking in Python instead of C is (I think) prohibitive. Nevertheless, I understand that the burden of proving the usefulness of this is on me. (Or anyone else that wants to argue the case.) > A list.is_heterogeneous() method > could be implemented if it was necessary, but how would that be used? I would prefer to get the list item's type: if mylist.__type_hint__ is float: # run optimized float version ... elif mylist.__type_hint__ is int: ... else: # unoptimized version > >But also avoids bothering with an O(N) scan in some situations where > >the list really is hetrogeneous. So there's both an opportunity cost and > >a benefit. > > O(N) is worst case. It is also the best and average case. Given a homogenous list of N items, for some arbitrary N between 0 and ∞, you have to look at all N items before knowing that they're all the same type. So for the homogenous case, the best, worst and average are identically O(N). Given a hetrogeneous list of N items where the first difference is found at index K, K can range from 1 through N-1. (By definition, it cannot be at index 0: you can only detect a difference in types after checking *two* items.) The worst case is that you don't find the difference until the last item, which is O(N). The best case is that you find the difference in position 1 and bail out early. On average, you will find the first difference halfway through the list, which makes it O(N/2) but that's just O(N). (Constant factors don't matter.) If you want to call that O(1) for the best hetrogeneous case, I won't argue except to say that's rather against the spirit of Big Oh analysis in my opinion. I think it's more realistic to say its O(N) across all combinations of best/worst/average and homogeneous/hetrogeneous. But of course if you bail out early, the constant multiplier may differ. > Most of the anecdotal evidence in this thread so far seems to suggest > that heterogeneous lists are not common. May or may not be true. > Empirically, for me, it is true. Who knows? (and there is the question). That will strongly depend on where the data is coming from, but when it comes to sorting, randomly mixed types will usually fail since comparisons between different types are generally illegal: py> [1, "a"].sort() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unorderable types: str() < int() -- Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/