There's been some talk lately of publishing good reference implementations of
all the election methods we talk about in various forms of description. This
inspired me to write up my thoughts on programming election methods. It's in
the body of the message below, and also posted as a Google Doc
https://docs.google.com/document/d/1QZqBNVYJuNIIaFxE7IUB4nPZhodFHJu2JOPNy7oBiF8/edit?hl=en_US
--
Some thoughts I’ve had over the years of writing election code for simulations
or counting of real votes.
I assume an object or context of some sort. Usually this works out to the two
minimum basic functions:
ElectionObject.vote( data )
ElectionObject.getResults()
The vote() Function
voteSN( map of strings to numbers )
voteNN( map of numbers to numbers )
voteA( array of numbers )
The simplest and most unambiguous vote function API is to take a mapping from
strings (choice names) to numbers (rankings or ratings). The API should be
unambiguous about whether the numbers are rankings (1 best, 2 ok, 3 lesser,
etc) or ratings (high better). I usually name functions “voteRatings” or
“voteRankings” to reinforce this.
For efficiency of storage this often internally calls a voteNN function after
mapping choice names to choice index numbers.
Another benefit of this API is that partial ballots completion is very easy to
represent. A choice not marked on the ballot gets no membership in the mapping
passed to the vote() function. There might be A through F on the ballot, but my
vote might only be {A: 1, C: 2, E: 3}
Voting a string map also makes write-ins automatically represented. On the down
side, I have had to rearrange some election method algorithms to be able to use
this API and transparently accept write-in choices. Curious readers should read
my Condorcet implementation which grows the NxN tally array as choices are
encountered.
Often the simplest vote() function to implement is one that simply takes an
array of numbers. The fixed length array corresponds to a fixed list of
choices. I have used this form of implementation in C code for simulation of
thousands of elections per second, counting hundreds of thousands or millions
of votes per second.
Getting Results
getWinner()
getResults()
textResults()
htmlResults()
htmlExplain()
If code needs to do something based on the winner, then there should be code to
return the winner(s). Mostly I’ve needed this in simulations where I need to
count up statistics on who won and their characteristics, or to fill in a pixel
in a Yee Election Space diagram.
Displaying results for people to look at is simpler. Just present the available
data in a reasonable way. There is usually a simple table of candidate names
and a score or count of votes they got. This can be returned programattically
as a map from names to numbers and rendered by other code. For my uses I’ve
found it useful to have each election method know how to render its own output
to ascii-art tables or HTML tables. Furthermore an ‘explain’ method is a good
idea. Multi-round methods such as IRV should show the internal state on each
round. Condorcet can show the NxN tally table.
The getResults() function might do the final calculation on whatever votes have
been submitted by vote() up to that point. I write it so that it caches the
results inside the context object. Calls to vote() clear that cache. Repeated
calls to getResults() and related methods use the cached version if available.
Adavced Methods: Summability
It’s possible to have a ‘summable’ method where votes could be added up into
many groups and those group results further added together to get the final
result. Condorcet, Approval, Plurality and others are this way. They don’t need
to have the full set of ballot data in the end, just a summation will do. The
method to achieve this would look like this:
ElectionObject.addSubElection(ElectionObject sub)
On the down side, it’s hard to implement this and get it right for some
methods, and it’s not really needed. Any modern computer could hold all the
votes for every person on earth. There is no election too big. So, it’s a neat
trick, and mostly interesting to election algorithm nerds, but I probably won’t
be implementing this more than I have so far … until the next time it would
just be too cool and needs doing.
Data Formats
To go with my preferred representation of a vote as a mapping between choice
names and numbers, there are a couple good ways of storing that kind of data.
My favorite by far is URL encoded. nameA=number1&nameB=number2&nameC=number3
On the up side, it is unambiguous and very hard to misinterpret. (The only
possible misinterpretation is whether the numbers are ratings or rankings.) On
the down side, it repeats the choice name a great deal and wastes some space.
This can be mitigated by any common compression software such as gzip/zlib,
bzip2 or zip. They all do a very good job of compressing frequently repeated
substrings. Another option for compressing is to include an index of names and
votes lookup into that index.
index line: NameA=1&nameB=2&nameC=3
vote line: 1=1&2=2&3=3
URL encoding is well understood and there are high quality libraries for it in
every modern language.
Another easy format is comma separated value, or CSV. This is a common
import/export format of spreadsheets and databases. There is one header row of
choice names "name a,name b,name c", and then each vote row is just the
ranking/rating values accordingly, "1,2,3". Non-voting for a choice has to be
represented by an empty cell, ",,1"
I know Python has a good CSV library and most other languages should have
something too.
There are some formats exported by actually deployed election systems used by
places like the cities of San Francisco, CA and Burlington, VT. They are
horrendously inefficient and inconvenient to parse, and I wind up writing a
Python script to import from them and export URL encoded votes for my own
software to use.
----
Election-Methods mailing list - see http://electorama.com/em for list info