On 03/17/2013 06:32 PM, Richard Fobes wrote:
On 3/15/2013 2:12 AM, Kristofer Munsterhjelm wrote:
On 03/14/2013 06:45 PM, robert bristow-johnson wrote:
IRV will prevent a true spoiler (that is a candidate
with no viable chance of winning, but whose presence in the race changes
who the winner is) from spoiling the election, but if the "spoiler" and
the two leaders are all roughly equal going into the election, IRV can
fail and *has* failed (and Burlington 2009 is that example).

If you think about it, even Plurality is immune to spoilers... if the
spoilers are small enough. More specifically, if the "spoilers" have
less support in total than the difference in support between party
number one and two, Plurality is immune to them.

So instead of saying method X resists spoilers and Y doesn't, it seems
better to say that X resists larger spoilers than Y. And that raises the
question of how much spoiler-resistance you need. Plurality's result is
independent of very small spoilers. IRV's is of somewhat larger
spoilers, and Condorcet larger still (through mutual majority or
independence of Smith-dominated alternatives, depending on the method).

This is a good example of the need to _quantify_ the failure rate for
each election method for each "fairness" criteria.

Just a yes-or-no checkmark -- which is the approach in the comparison
table in the Wikipedia "Voting systems" article -- is not sufficient for
a full comparison.

Spoiler resistance is to some degree already quantified. If a method passes the majority criterion, then it's resistant to spoilers when a party or candidate has a majority. A method that passes mutual majority is resistant to spoilers outside a group that's ranked first by a majority. Independence from Smith-dominated alternatives gives resistance to spoilers not in the Smith set; and so on.

But you have a point. In the practical view, these are only interesting inasfar as they cover enough to make the method resistant against spoilers in general. That is, if an oracle told us that to get multiparty democracy where people don't think spoilers get in the way, all you need is ISDA and everything else is icing on the cake; then we wouldn't need to bother about anything more than ISDA. At least not unless the voters would find it unfair *on principle* to have something that didn't pass, say, independence from covered alternatives.

That's the division into three I've mentioned before. Performance under honesty, things that deter or make strategy unnecessary so we get to honesty in the first place, and consistency with itself (or, more broadly speaking, compliance with what the voters think should be held for the method to be fair).

In all three cases, we have approximations.

Bayesian regret is an approximation to performance under honesty. It holds if you assume certain things about what performance actually means: how to do interpersonal comparisons, and utilitarianism[2].

Criteria are approximations to the other two. The good thing about criteria is that they provide a bound. If I prove that a method passes independence from Smith-dominated alternatives (ISDA), then it passes ISDA outright. You don't have to worry about that the method only passes ISDA in the cases that are irrelevant to a real election. If it passes ISDA, it passes ISDA *everywhere*[1]. And I think that's why I try to make methods that pass many criteria, because if they pass some criterion X, then I can say "done" and move on without having to quantify *where* they're passed. This saves a lot of detective work determining if the areas where they pass are the areas we care about.

But beyond that, you're right. The approximations are not the real things. They're proxies we use because they're easier to investigate. And a method might seem to have contradictions when you look upon every possible ballot set yet be without such in the real world. For instance, if people voted exclusively on a left-right scale, then Condorcet always finds a CW and so passes later-no-harm, later-no-help, IIA and so on, in these cases. In that case, we could even use Borda IRV if that's what the people would prefer. The various monotonicity failures wouldn't be a problem because we'd never get to that domain. And if we had some way of knowing what level of spoiler resistance is enough (or conversely, what isn't), then we could exclude a lot of methods for either being too complex or for not passing the mark.

It's like reinforcing a bridge that would collapse when a cat walks
across it, so that it no longer does so, but it still collapses when a
person walks across it. Cat resistance is not enough :-)

Great analogy. We need to start assessing _how_ _resistant_ each method
is to each "fairness" criteria.

Yes, and these fairness criteria might not even be the same sort as the traditional criteria. They might be more vague, like "spoiler resistance", which then fails when the voters complain like they did in Burlington, and which would really be a meta-category including things like ISDA, mutual majority, and so on.

It would be really useful to know what level of resistance is enough,
but that data is going to be hard to gather.[...]

Indeed, that is difficult.

Perhaps one could make mock elections in some way, or a game where candidates distribute benefits to certain groups of voters.

Polls are reasonably good at showing behavior under honesty, I think. But one may object that they don't show adaptation to the system in question. Both MO and David Wetzell have used arguments of this sort, and I think there's something to it. Consider the Range polls on rangevoting's site. These show great support and variety, and use of ratings values besides min and max. On the other hand, consider YouTube, which moved from Range-style voting to Approval style. They presumably did this because people voted min and max, although I don't know that for sure. If they did, then that shows that the YouTubers adapted to Range and started voting min and max.

A game or series of mock elections would have the advantage that it would include that adaptation element. However, the pressure might not be right. It could induce too much strategy (if the game is set up so candidates can only distribute power after each election, thus being maximally patronage-like). It could also induce too little. More generally, we wouldn't know if we hit the realistic spot. There would be no oracle we could ask that would say "yes, with these rules, the people will engage in just enough strategy that they would in a real election". Still, it would be better than nothing and we might be able to gain bounds from it. (If in the most patronage-based, most zero-sum variant, people still don't massively bury, then we know they won't in a real election, since the real election will be more "kind". Similarly, if the voters engaged in massive burial even in the most cooperative scenario, then we know that will be a problem in real elections too.)

 > And beyond that we have even harder questions of how much resistance
 > is needed to get a democratic system that works well. It seems
 > reasonable to me that advanced Condorcet will do, but praxeology
 > can only go so far. If only we had actual experimental data!

My VoteFair site collects lots of data. I have used it to verify that
VoteFair ranking accomplishes what it was designed to do. Not only has
such testing been useful for refining the code for the single-winner
portion (VoteFair popularity ranking, which is equivalent to the
Condorcet-Kemeny method), but such testing has revealed that VoteFair
representation ranking (which can be thought of as a two-seats-at-a-time
PR method) also works as intended.

As for praxeology ("the study of human conduct"), I also watch to see
how people try to vote strategically. The attempts are interesting, but
ineffective.

I agree that using better ballots and better vote-counting methods in
real situation -- using real data -- is essential for making real progress.

Could we use the polling data to get some information about, say, candidate variety? I think we could, at least to some extent. We could ask something like "how many elections with more than 20 voters have no CW?". I think you published stats like that once, but I don't remember what the values were.

Perhaps you could also ask the voters some time later if they were satisfied with the choice. That kind of "later polling" could uncover Burlington-type breakdowns if there were any. If they could rank the options in retrospect, it would also be possible to determine whether they would have been satisfied with, say, IRV; but I imagine that's too much to ask.

-

[1] There are still assumptions about the input, of course. If everybody goes on a burial spree, the Smith set may not mean what we think it means anymore, and then ISDA would no longer hold. Same thing with Majority Judgement and IIA. If people vote in a comparative manner, IIA no longer holds for it.

[2] I have also suggested another approximation for performance, but I haven't made code to implement it. It's the "games AI" approximation: you take a bunch of different games AIs (say, chess programs) and run their suggestions through a voting method, creating a "collective AI". The better the performance of the collective AI, the better the method. This could even be done in a "world champion vs the world" type match, where the individual "AIs" are human players. This would be a better metric than just using AIs, since then the various advisors could make suggestions and thus influence the vote in a way that might improve play.

----
Election-Methods mailing list - see http://electorama.com/em for list info

Reply via email to