On 08/03/17 00:18, Steven D'Aprano wrote:
I thought about that and rejected it as an unnecessary complication.
Hetrogeneous and unknown might as well be the same state: either way,
you cannot use the homogeneous-type optimization.
Knowing it's definitely one of two positive states and not knowing which
of those two states it is is not the same thing when it comes to what
one can and can't optimize cheaply :) It sort of depends on how cheaply
one can track the states though ...
Part of the complexity here is that I'd like this flag to be available
to Python code, not just a hidden internal state of the list.
Out of interest, for what purpose? Generally, I thought Python code
should not need to worry about low-level optimisations such as this
(which are C-Python specific AIUI). A list.is_heterogeneous() method
could be implemented if it was necessary, but how would that be used?
But also avoids bothering with an O(N) scan in some situations where
the list really is hetrogeneous. So there's both an opportunity cost and
a benefit.
O(N) is worst case.
Most of the anecdotal evidence in this thread so far seems to suggest
that heterogeneous lists are not common. May or may not be true.
Empirically, for me, it is true. Who knows? (and there is the question).
Remember, we're talking about opportunities for applying an optimization
here, nothing more. You're not giving up anything: at worst, the
ordinary, unoptimized routine will run and you're no worse off than you
are today.
You are a little bit - the extra overhead of checking all of this (which
is the unknown factor we're all skirting around ATM) costs. So
converting a previously-heterogeneous list to a homogeneous list via a
delete or whatever has a benefit if the optimisations can then be
applied to that list many times in the future (i.e., once it becomes
recognised as homogeneous again, it benefits from optimised paths in the
interpreter).
And of course, all that depends on your use case. It might work out
better for one application over another. As you quite rightly point out,
it needs someone to measure the alternatives and work out if _overall_
it has a positive impact ...
so I'm not
a fan of the "once heterogeneous, always considered heterogeneous"
behaviour if it's cheap enough to avoid it.
It is not just a matter of the cost of tracking three states versus two.
It is a matter of the complexity of the interface.
I suppose this could be reported to Python code as None, False or the
type.
I didn't think any of this stuff would come back to Python code (I
thought we were talking about C-Python specific implementation only).
How is this useful to Python code?
Ultimately, this is all very pie-in-the-sky unless somebody tests just
how expensive this is and whether the benefit is worthwhile.
I agree. As I said before, I'm just pointing out things I noticed while
looking at the current C code which could be picked up on if someone
wants to try implementing and benchmarking any of this.
It sort of feels like an argument, but I hope we're just violently
agreeing on a generally shared goal ;)
Regards, E.
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/