On 03/17/2013 06:32 PM, Richard Fobes wrote:
On 3/15/2013 2:12 AM, Kristofer Munsterhjelm wrote:
On 03/14/2013 06:45 PM, robert bristow-johnson wrote:
IRV will prevent a true spoiler (that is a candidate
with no viable chance of winning, but whose presence in the race changes
who the winner is) from spoiling the election, but if the "spoiler" and
the two leaders are all roughly equal going into the election, IRV can
fail and *has* failed (and Burlington 2009 is that example).
If you think about it, even Plurality is immune to spoilers... if the
spoilers are small enough. More specifically, if the "spoilers" have
less support in total than the difference in support between party
number one and two, Plurality is immune to them.
So instead of saying method X resists spoilers and Y doesn't, it seems
better to say that X resists larger spoilers than Y. And that raises the
question of how much spoiler-resistance you need. Plurality's result is
independent of very small spoilers. IRV's is of somewhat larger
spoilers, and Condorcet larger still (through mutual majority or
independence of Smith-dominated alternatives, depending on the method).
This is a good example of the need to _quantify_ the failure rate for
each election method for each "fairness" criteria.
Just a yes-or-no checkmark -- which is the approach in the comparison
table in the Wikipedia "Voting systems" article -- is not sufficient for
a full comparison.
Spoiler resistance is to some degree already quantified. If a method
passes the majority criterion, then it's resistant to spoilers when a
party or candidate has a majority. A method that passes mutual majority
is resistant to spoilers outside a group that's ranked first by a
majority. Independence from Smith-dominated alternatives gives
resistance to spoilers not in the Smith set; and so on.
But you have a point. In the practical view, these are only interesting
inasfar as they cover enough to make the method resistant against
spoilers in general. That is, if an oracle told us that to get
multiparty democracy where people don't think spoilers get in the way,
all you need is ISDA and everything else is icing on the cake; then we
wouldn't need to bother about anything more than ISDA. At least not
unless the voters would find it unfair *on principle* to have something
that didn't pass, say, independence from covered alternatives.
That's the division into three I've mentioned before. Performance under
honesty, things that deter or make strategy unnecessary so we get to
honesty in the first place, and consistency with itself (or, more
broadly speaking, compliance with what the voters think should be held
for the method to be fair).
In all three cases, we have approximations.
Bayesian regret is an approximation to performance under honesty. It
holds if you assume certain things about what performance actually
means: how to do interpersonal comparisons, and utilitarianism[2].
Criteria are approximations to the other two. The good thing about
criteria is that they provide a bound. If I prove that a method passes
independence from Smith-dominated alternatives (ISDA), then it passes
ISDA outright. You don't have to worry about that the method only passes
ISDA in the cases that are irrelevant to a real election. If it passes
ISDA, it passes ISDA *everywhere*[1]. And I think that's why I try to
make methods that pass many criteria, because if they pass some
criterion X, then I can say "done" and move on without having to
quantify *where* they're passed. This saves a lot of detective work
determining if the areas where they pass are the areas we care about.
But beyond that, you're right. The approximations are not the real
things. They're proxies we use because they're easier to investigate.
And a method might seem to have contradictions when you look upon every
possible ballot set yet be without such in the real world. For instance,
if people voted exclusively on a left-right scale, then Condorcet always
finds a CW and so passes later-no-harm, later-no-help, IIA and so on, in
these cases. In that case, we could even use Borda IRV if that's what
the people would prefer. The various monotonicity failures wouldn't be a
problem because we'd never get to that domain. And if we had some way of
knowing what level of spoiler resistance is enough (or conversely, what
isn't), then we could exclude a lot of methods for either being too
complex or for not passing the mark.
It's like reinforcing a bridge that would collapse when a cat walks
across it, so that it no longer does so, but it still collapses when a
person walks across it. Cat resistance is not enough :-)
Great analogy. We need to start assessing _how_ _resistant_ each method
is to each "fairness" criteria.
Yes, and these fairness criteria might not even be the same sort as the
traditional criteria. They might be more vague, like "spoiler
resistance", which then fails when the voters complain like they did in
Burlington, and which would really be a meta-category including things
like ISDA, mutual majority, and so on.
It would be really useful to know what level of resistance is enough,
but that data is going to be hard to gather.[...]
Indeed, that is difficult.
Perhaps one could make mock elections in some way, or a game where
candidates distribute benefits to certain groups of voters.
Polls are reasonably good at showing behavior under honesty, I think.
But one may object that they don't show adaptation to the system in
question. Both MO and David Wetzell have used arguments of this sort,
and I think there's something to it. Consider the Range polls on
rangevoting's site. These show great support and variety, and use of
ratings values besides min and max. On the other hand, consider YouTube,
which moved from Range-style voting to Approval style. They presumably
did this because people voted min and max, although I don't know that
for sure. If they did, then that shows that the YouTubers adapted to
Range and started voting min and max.
A game or series of mock elections would have the advantage that it
would include that adaptation element. However, the pressure might not
be right. It could induce too much strategy (if the game is set up so
candidates can only distribute power after each election, thus being
maximally patronage-like). It could also induce too little. More
generally, we wouldn't know if we hit the realistic spot. There would be
no oracle we could ask that would say "yes, with these rules, the people
will engage in just enough strategy that they would in a real election".
Still, it would be better than nothing and we might be able to gain
bounds from it. (If in the most patronage-based, most zero-sum variant,
people still don't massively bury, then we know they won't in a real
election, since the real election will be more "kind". Similarly, if the
voters engaged in massive burial even in the most cooperative scenario,
then we know that will be a problem in real elections too.)
> And beyond that we have even harder questions of how much resistance
> is needed to get a democratic system that works well. It seems
> reasonable to me that advanced Condorcet will do, but praxeology
> can only go so far. If only we had actual experimental data!
My VoteFair site collects lots of data. I have used it to verify that
VoteFair ranking accomplishes what it was designed to do. Not only has
such testing been useful for refining the code for the single-winner
portion (VoteFair popularity ranking, which is equivalent to the
Condorcet-Kemeny method), but such testing has revealed that VoteFair
representation ranking (which can be thought of as a two-seats-at-a-time
PR method) also works as intended.
As for praxeology ("the study of human conduct"), I also watch to see
how people try to vote strategically. The attempts are interesting, but
ineffective.
I agree that using better ballots and better vote-counting methods in
real situation -- using real data -- is essential for making real progress.
Could we use the polling data to get some information about, say,
candidate variety? I think we could, at least to some extent. We could
ask something like "how many elections with more than 20 voters have no
CW?". I think you published stats like that once, but I don't remember
what the values were.
Perhaps you could also ask the voters some time later if they were
satisfied with the choice. That kind of "later polling" could uncover
Burlington-type breakdowns if there were any. If they could rank the
options in retrospect, it would also be possible to determine whether
they would have been satisfied with, say, IRV; but I imagine that's too
much to ask.
-
[1] There are still assumptions about the input, of course. If everybody
goes on a burial spree, the Smith set may not mean what we think it
means anymore, and then ISDA would no longer hold. Same thing with
Majority Judgement and IIA. If people vote in a comparative manner, IIA
no longer holds for it.
[2] I have also suggested another approximation for performance, but I
haven't made code to implement it. It's the "games AI" approximation:
you take a bunch of different games AIs (say, chess programs) and run
their suggestions through a voting method, creating a "collective AI".
The better the performance of the collective AI, the better the method.
This could even be done in a "world champion vs the world" type match,
where the individual "AIs" are human players. This would be a better
metric than just using AIs, since then the various advisors could make
suggestions and thus influence the vote in a way that might improve play.
----
Election-Methods mailing list - see http://electorama.com/em for list info