Hello all,
(says the newcomer.)
I think I have found a metric for comparing multiwinner systems, at
least as these pertain to proportional representation, when all votes
are honest.
The advantage of the metric is that, if what it measures is desirable,
it gives an idea of how good the system performs - how representative
it is - and thus its best case performance. In contrast, criterion
failure shows how bad a system can get in the worst case.
The broad idea is this: The most proportional assembly is the one
which reflects the population on all issues. In other words, if a
fraction p of the population is of a certain position on a binary
opinion, it is better (ceteris paribus) for a council to have, of that
opinion, a fraction close to p than one far away from it.
Thus we could make a simulation. First, set that there are n binary
issues. Each of the voters then have an issue profile which consists
of n booleans. Set these randomly with different biases for each issue
(so that, for instance, on the first issue, 70% may hold the "true"
position, while on another, only 23% do).
Counting the proportion that hold the true-position for each issue
gives the popular issue profile. In general, the issue profile of a
certain subset takes the form of n numbers (for n issues), where each
number is equal to the proportion that holds the true-position for the
issue in question.
Then a perfectly representative assembly has an issue profile that is
equal to the issue profile of the people. So now we have a measure of
how well the assembly or council represents the people: the more its
issue profile differs from that of the people, the less representative
it is.
However, this presents a problem. How does one aggregate the
difference on each issue into a single score? Is a one-percent
difference on a single issue better than 1/n percent difference on all
issues? One way to solve this is to just settle on an aggregation
measure (like root-mean-square) and hope the results can be
generalized across; another is to use Pareto-domination as a measure
instead, in saying that councils produced by a method A is better than
councils produced by a method B to the extent that A-councils lie
strictly closer to the population profile than does B. That approach
can give no information on the cases where some issues are closer by
method A and some are closer by B (mutual nondomination).
Putting all of the pieces together, to figure out the scores, a
simulation would do something like this:
- Generate issue vectors for all of the people, and get the
popular issue profile.
- Choose a subset of the people as candidates.
- Generate ballots for each voter of all the candidates.
- For a great number of random assemblies:
- Get the issue profile of this assembly, and calculate
the similarity measure for that with regards to
the popular issue profile.
- If the similarity measure is more similar or less
similar than any random assembly we've seen so
far, update the worst (respectively best)
record.
- For each multiwinner election system:
- Feed the ballots into the system.
- Get the issue profile of the elected assembly, and
calculate the similarity measure for that with
regards to the popular issue profile.
- Normalize the similarity measure with regards to the
worst and best random councils.
- Add the normalized similarity measure to that system's
running total.
To be robust, it would do this a lot of times with various population
sizes, council sizes, and issue numbers (n). With a similarity
measure, 0 would be perfect (impossible most of the time), and 1 (or
infinity, depending on the measure) be the worst possible.
The only thing remaining is to find out how to generate ballots for
each voter. A reasonable assumption is that voters are going to prefer
the candidates who agree with them on many issues to those that agree
with them on a few. For binary issues, Hamming distance works: in the
simple model, voters rank (or rate) the candidates inversely of
Hamming distance.
--
I have made a program that does this. It is simple, does not use equal
ranks (randomizing preferences instead), but the results are
interesting.
Worst of the lot are the majoritarian systems ported to multiwinner
systems. Those would, for a council of size k, just pick the k first
in the social order of the single-winner method. This result shouldn't
be surprising, because the straight port excludes minority opinion. Of
some curiosity, however, is that IRV does the best among those; maybe
it reflects IRV's origins as the multiwinner method STV? Or maybe
noise (as resulting from nonmonotonicity and the likes) bring it
closer to the results gained by just picking a random assembly.
Then come the vote-reweighted methods, like RRV. Vote-reweighted
methods can be generalized as: run a single-winner method, then
reweight those who voted for the winner, according to some function
that does not take the number of seats into account. Then run again,
and disregarding those that have already been elected, pick the next
member as the one who is closest to the top in the social ordering
output.
Best of all were the "proper" methods implemented: STV (with
Senatorial rules) and QLTD-PR, which uses Woodall's QLTD instead of
IRV as its basis: it adds fractional votes until someone gets above
the quota, then reweights the voters who contributed to that one,
basing the weighting on the candidate's surplus.
According to the RMSE scores:
Majoritarian assemblies:
Borda: 0.871528 *Plurality: 0.256192
Antiplurality: 0.73616 Nauru-Borda: 0.599807
IRV: 0.362097 Cardinal ratings: 0.894351
Vote-reweighted assemblies:
Borda: 0.376745 *Plurality: 0.260454
Antiplurality: 0.401539 Nauru-Borda: 0.406815
RRV (k = 1.0): 0.682116 RRV (k = 0.5): 0.644339
Quota:
*STV: 0.193959 *QLTD-PR: 0.121693
QLTD-PR (rated): 0.417813
Other:
Random Cands: 0.364437
STV-QLTD Pareto dominance: QLTD: 236, STV: 237, nondomin: 674
"Plurality" is the weighted positional system of {1, 0, 0....}
applied to ranked ballots.
(* marks those that are better than a random assembly, on
average)
Some of the results may be due to artifacts in the voting pattern -
the simulator was a proof of concept, after all. I think that
Plurality benefits by that everyone votes sincerely, and that the
ballots are complete, for instance. Yet patterns emerge.
If anyone wants to experiment with the simulation program, it is here:
http://munsterhjelm.no/km/raw/pr_elect.zip . QLTD is called "Quota
Bucklin" there, as I sort of independently discovered it while trying
to make a quota-proportional form of Bucklin.
--
On a second thought, it shouldn't be so surprising that
vote-reweighted methods, in general, do worse than quota-based ones.
Consider the following situation:
20: Left > Center > Right
20: Right > Center > Left
1: Center > Left = Right
Condorcet would pick Center in the single-winner case. In the
situation of an assembly of two, the reasonable choice (which CPO-STV
picks) would be Left and Right.
However, vote-reweighted methods based on Condorcet would have to
start off by picking Center, since all voters start off with equal
weights. After it has done so, there is not enough room on the
assembly to permit an even division of Left and Right, and thus either
Left or Right will be favored, assuming Center supports both sides
equally.
Vote-reweighted methods that aren't based on Condorcet may pick Left
and Right, but they can only do so if they would pick either Left or
Right in the single-winner case.
----