To answer my own question, I think the attached perl script nicely shows the difference between std-dev and gini by this output:

data: 1, 2, 3, 4
std: 1.29099444873581
gini: 0.25

data: 1, 1, 1, 9
std: 4
gini: 0.5

data: 1, 1, 1, 999
std: 499
gini: 0.747005988023952

data: 1, 1, 1, 1, 1, 1, 1, 1, 1, 999
std: 315.595310484804
gini: 0.891071428571429

data: 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 999
std: 188.604272031604
gini: 0.93796992481203

Higher "Gini Inequality Coefficient" is bad, lower standard deviation should be good but in the last two data sets above I agree with Gini more than stddev. In those the standard deviation falls as there are more agreeing data points, but the gini number rises to show that more people are being left way behind the top end number.

Now to check my C code and get this implemented in my simulator.

Brian Olson
http://bolson.org/
#!/usr/bin/perl -w

sub stdev(@) {
  my $i;
  my $avg = 0.0;
  my @v = @_;
  for ( $i = 0; $i <= $#v; $i++ ) {
    $avg += $v[$i];
  }
  $avg /= ($#v + 1);
  my $var = 0.0;
  for ( $i = 0; $i <= $#v; $i++ ) {
    my $d;
    $d = $avg - $v[$i];
    $var += ( $d * $d );
  }
  return sqrt( $var / $#v );
}

sub gini(@) {
  my $i;
  my $j;
  my $sum = 0.0;
  my $gs = 0.0;
  my @v = @_;
  for ( $i = 0; $i <= $#v; $i++ ) {
    $sum += $v[$i];
    for ( $j = $i + 1; $j <= $#v; $j++ ) {
      $gs += abs( $v[$i] - $v[$j] );
    }
  }
  return $gs / (($#v + 1) * $sum);
}

@data = ( 
[1, 2, 3, 4],
[1,1,1,9],
[1,1,1,999],
[1,1,1,1,1,1,1,1,1,999],
[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,999],
);

foreach $x ( @data ) {
  my @xa = @{$x};
  $std = stdev( @xa );
  print "data: " . join( ", ", @xa ) . "\n";
  print "std: $std\n";
  print "gini: " . gini(@xa) . "\n";
  print "\n";
}

----
election-methods mailing list - see http://electorama.com/em for list info

Reply via email to