Kai Krakow <hurikha...@gmail.com> schrieb:

> Matti Nykyri <matti.nyk...@iki.fi> schrieb:
> 
>> If you are looking a mathematically perfect solution there is a simple
>> one even if your list is not in the power of 2! Take 6 bits at a time of
>> the random data. If the result is 62 or 63 you will discard the data and
>> get the next 6 bits. This selectively modifies the random data but keeps
>> the probabilities in correct balance. Now the probability for index of
>> 0-61 is 1/62 because the probability to get 62-63 out of 64 if 0.
> 
> Why not do just something like this?
> 
> index = 0;
> while (true) {
>   index = (index + get_6bit_random()) % 62;
>   output << char_array[index];
> }
> 
> Done, no bits wasted. Should have perfect distribution also. We also don't
> have to throw away random data just to stay within unaligned boundaries.
> The unalignment is being taken over into the next loop so the "error"
> corrects itself over time (it becomes distributed over the whole set).

It is worth noting that my approach has the tendency of generating random 
characters in sequence. I think there are several possibilities to fix this, 
one being swapping two elements in the char array on each access. That fixed 
this particular problem in my tests. Here's are quick ruby snippet I played 
with:

$ cat random.rb
# danger: ugly code ahead 
dist = [0] * 62
seq = [0, 0]
chars = [*('A'..'Z'), *('a'..'z'), *('0'..'9')]
orig_chars = chars.dup

index = 0
old = nil
(62 * 10).times do
  # generate random wrapped index
  index = (index + rand(64)) % 62

  # swap two indexed positions and output the char
  chars[0], chars[index] = chars[index], chars[0]
  $stdout.write chars[0]

  # count the generated indexes
  dist[index] = dist[index] + 1

  # count if indexes were generated in sequence by looking
  # them up in the original sorted chars array
  unless old.nil?
    seq[0] = seq[0] + 1 if old < orig_chars.index(chars[0])
    seq[1] = seq[1] + 1 if old > orig_chars.index(chars[0])
  end
  old = orig_chars.index(chars[0])
end
puts

# check the average, it should always equal the loop count
# multiplicator
avg = dist.inject(:+) / 62.0

# calculate if the algorithm preferred sequences
seq = [*seq, seq[0] - seq[1]]

# output some stats
puts avg
puts dist.inspect
puts dist.map {|v| avg - v }.inspect
puts seq.inspect


-- 
Replies to list only preferred.


Reply via email to