Re: random shuffles

2006-07-26 Thread Boris Borcic
I wrote :
> 
> ...in this sense, it is clear that quicksort for instance is optimal*
> 
> It is easy to see, when you detail this algorithm, that never during its 
> run is the result of a comparison it makes, preordained by comparisons 
> already made; iow : it doesn't make superfluous or redundant comparisons.

The above is true, but...

While an algorithm may never call for a comparison that has a result 
predetermined by previous comparisons, it doesn't follow that the property 
would 
hold true for the exact same comparisons done in a different order.

In particular, any sort algorithm must have compared successive items to be 
able 
to conclude to their order, so that any system of comparisons that allows to 
sort a dataset of n items, must contain the strictly minimal system of n-1 
comparisons of successive items.

This doesn't change the conclusion that an algorithm that never does a 
comparison with a result that could be deduced from previous comparisons, will 
only sample a random comparison function up to a system of comparisons that's 
consistent with an order relation between the items.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: random shuffles

2006-07-24 Thread Boris Borcic
Paul Rubin wrote:
> Boris Borcic <[EMAIL PROTECTED]> writes:
>> To be more convincing... assume the algorithm is optimal and calls
> 
> That assumption is not even slightly realistic.  Real-world sorting
> algorithms almost never do a precisely minimal amount of comparison.

'optimal' or 'minimal amount of comparisons' does not mean here to make just 
the 
right n-1 comparisons between successive items to verify the order of n items, 
but rather that no strict subset of the comparisons made to sort some data is 
sufficient to sort that data.

IOW "minimal" in "minimal amount of comparisons" refers not to the total 
ordering by size of the sets of comparisons made, but to the partial ordering 
by 
set inclusion of these sets.

And in this sense, it is clear that quicksort for instance is optimal*

It is easy to see, when you detail this algorithm, that never during its run is 
the result of a comparison it makes, preordained by comparisons already made; 
iow : it doesn't make superfluous or redundant comparisons.

Do you mean quicksort-like algorithms aren't real-world ?

Best, BB
--

*assuming a variant, like the following for illustration, that doesn't waste 
info on equal items.

def qsort(items,cmp) :
 if len(items)<2 : return items
 pivot = items[0]
 divide = [[pivot],[],[]]
 for item in items[1:] :
 divide[cmp(item,pivot)].append(item)
 return qsort(divide[-1],cmp)+divide[0]+qsort(divide[1],cmp)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: random shuffles

2006-07-24 Thread Ben Sizer
Ross Ridge wrote:
> David G. Wonnacott wrote:
> > Couldn't we easily get an n*log(n) shuffle...
>
> Why are you trying to get an O(n*log(n)) shuffle when an O(n) shuffle
> algorithim is well known and implemented in Python as random.shuffle()?

I think David is referring to this: "don't you still need to use
O(log(n)) time to find and remove the item from the collection?"

The answer for him is no: as far as I know, the Python list is a
random-access structure, so looking up 2 items and swapping them runs
in constant time. You perform that N times to shuffle the sequence, so
it runs in O(N).

-- 
Ben Sizer

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: random shuffles

2006-07-22 Thread Ross Ridge
David G. Wonnacott wrote:
> Couldn't we easily get an n*log(n) shuffle...

Why are you trying to get an O(n*log(n)) shuffle when an O(n) shuffle
algorithim is well known and implemented in Python as random.shuffle()?

 Ross Ridge

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: random shuffles

2006-07-22 Thread danielx
David G. Wonnacott wrote:
> From: "danielx" <[EMAIL PROTECTED]>
>Date: 22 Jul 2006 01:43:30 -0700
>
>Boris Borcic wrote:
>> does
>>
>> x.sort(cmp = lambda x,y : cmp(random.random(),0.5))
>>
>> pick a random shuffle of x with uniform distribution ?
>
> ...
>
>Let e be the element which was in the first position to begin with.
>What are its chances of being there after being "sorted"? Well, in
>order for it to still be there, it would have to survive N rounds of
>selection. In each selection, it has 1/2 chance of surviving. That
>means its total chance of survival is 1/(2**N), which is much less than
>1/N. QED!
>
>
> This proof makes sense to me if the sorting algorithm makes a random
> decision every time it swaps.
>
> Couldn't we easily get an n*log(n) shuffle by performing a _single_
> mapping of each element in the collection to a pair made up of a
> random key and its value, and then sorting based on the keys and
> stripping them out? i.e., in O(n) time, turn
>
>"2 clubs", "2 diamonds", "2 hearts", "2 spades", "3 clubs"
>
> into something like
>
>[0.395, "2 clubs"], [0.158, "2 diamonds"], [0.432, "2 hearts"], [0.192, "2 
> spades"], [0.266, "3 clubs"]
>
> and then in O(n*log(n)) time, sort it into
>
>[0.158, "2 diamonds"], [0.192, "2 spades"], [0.266, "3 clubs"], [0.395, "2 
> clubs"], [0.432, "2 hearts"]
>
> and then strip out the keys again in O(n) time?
>
> Or perhaps this has been discussed already and I missed that part? I
> just joined the list... apologies if this is a repeat.
>

Yes, that would work beautifully. You could use the key argument of
list.sort. What we are talking about though, is using the cmp argument.
I.e. will list.sort(cmp=lambda:???) be able to give you a shuffle? I
think most of us think this depends on what sort algorithm Python uses.

>
>
>You can accomplish this by doing what I will call "random selection".
>It would be like linear selection, except you wouldn't bother checking
>the head against every other element. When you want to figure out what
>to put at the head, just pick at random! I suspect this is what Python
>does in random.shuffle, since it is rather an easy to see it would
>result in something uniform (I swear I haven't looked at the source
>code for random.shuffle :P).
>
>
> But, after making the linear selection, don't you still need to use
> O(log(n)) time to find and remove the item from the collection? I
> don't know much about Python's internal data structures, but if
> there's an array in there, you need O(n) time to move up everything
> after the deleted item; if there's a list, you need O(n) time to find
> the element you picked at random; if there's a balanced tree, you
> could find it and delete it in O(log(n)) time...

I'm almost sure there's a C array back there as well (well, not
techinically an array, but something from malloc), because indexing a
Python list is supposed to be "fast". It would take constant time to
put something at the head. Just swap with the thing that's already
there. Since you have to do this swap for each position, it takes time
proportional to N.

When I originally came up with this idea, I was thinking you would not
choose a new head among previously chosen heads. But if you do allow
that, I think it will still be uniform. So my original idea was
something like this:

  1  2  3  4
# ^ head pos
stage 0: we need to choose something to put in pos 0. We can choose
anything to go there.

(swap)
  3  2  1  4
#^ head pos
stage 1: we need to choose something to put in pos 1. We can choose
among things in positions greater than or equal to 1 ie, we may choose
among 2 1 4.

etc.

This is what I meant when I said this would be like linear selection,
because once something is in its correct position, it doesn't get
moved. But you might also be able to achieve a uniform sort if in
stages 1 and up, you are still allowed to choose anything you want to
be the head. I'll let someone else figure it out :P.

>
> Help me review my undergraduate data structures here -- is there some
> way to pick each item _exactly_once_ in O(n) time?

I think I answered this in the first segment of this post. Let me know
if I don't seem clear.

> 
> Dave Wonnacott

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: random shuffles

2006-07-22 Thread David G. Wonnacott
   From: "danielx" <[EMAIL PROTECTED]>
   Date: 22 Jul 2006 01:43:30 -0700

   Boris Borcic wrote:
   > does
   >
   > x.sort(cmp = lambda x,y : cmp(random.random(),0.5))
   >
   > pick a random shuffle of x with uniform distribution ?

...

   Let e be the element which was in the first position to begin with.
   What are its chances of being there after being "sorted"? Well, in
   order for it to still be there, it would have to survive N rounds of
   selection. In each selection, it has 1/2 chance of surviving. That
   means its total chance of survival is 1/(2**N), which is much less than
   1/N. QED!


This proof makes sense to me if the sorting algorithm makes a random
decision every time it swaps.

Couldn't we easily get an n*log(n) shuffle by performing a _single_
mapping of each element in the collection to a pair made up of a
random key and its value, and then sorting based on the keys and
stripping them out? i.e., in O(n) time, turn

   "2 clubs", "2 diamonds", "2 hearts", "2 spades", "3 clubs"

into something like

   [0.395, "2 clubs"], [0.158, "2 diamonds"], [0.432, "2 hearts"], [0.192, "2 
spades"], [0.266, "3 clubs"]

and then in O(n*log(n)) time, sort it into

   [0.158, "2 diamonds"], [0.192, "2 spades"], [0.266, "3 clubs"], [0.395, "2 
clubs"], [0.432, "2 hearts"]

and then strip out the keys again in O(n) time?

Or perhaps this has been discussed already and I missed that part? I
just joined the list... apologies if this is a repeat.



   You can accomplish this by doing what I will call "random selection".
   It would be like linear selection, except you wouldn't bother checking
   the head against every other element. When you want to figure out what
   to put at the head, just pick at random! I suspect this is what Python
   does in random.shuffle, since it is rather an easy to see it would
   result in something uniform (I swear I haven't looked at the source
   code for random.shuffle :P).


But, after making the linear selection, don't you still need to use
O(log(n)) time to find and remove the item from the collection? I
don't know much about Python's internal data structures, but if
there's an array in there, you need O(n) time to move up everything
after the deleted item; if there's a list, you need O(n) time to find
the element you picked at random; if there's a balanced tree, you
could find it and delete it in O(log(n)) time...

Help me review my undergraduate data structures here -- is there some
way to pick each item _exactly_once_ in O(n) time?

Dave Wonnacott

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: random shuffles

2006-07-22 Thread danielx
Boris Borcic wrote:
> does
>
> x.sort(cmp = lambda x,y : cmp(random.random(),0.5))
>
> pick a random shuffle of x with uniform distribution ?
>
> Intuitively, assuming list.sort() does a minimal number of comparisons to
> achieve the sort, I'd say the answer is yes. But I don't feel quite 
> confortable
> with the intuition... can anyone think of a more solid argumentation ?
>
> - BB
> --
> "On naît tous les mètres du même monde"

Someone has already said this, but I'm not sure we can ignore exactly
what algorithm we are using. With that in mind, I'll just arbitrarily
choose an algorithm to use for analysis. I know you want something that
works in N*log(N) where N is the length of the list, but I'm going to
ignore that and consider selection sort for the sake of a "more solid
argument".

In that case, you would NOT achieve a "uniform" distribution. I quote
that because I'm going to make up a definition, which I hope
corresponds to the official one ;). To me, uniform will mean if we look
at any position in the list, element e has 1/N chance of being there.

Let e be the element which was in the first position to begin with.
What are its chances of being there after being "sorted"? Well, in
order for it to still be there, it would have to survive N rounds of
selection. In each selection, it has 1/2 chance of surviving. That
means its total chance of survival is 1/(2**N), which is much less than
1/N. QED!

***

After writting that down, I thought of an argument for an N*log(N)
algorithm, which would cause the resulting list to be uniformly random:
tournament sort (I think this is also called binary tree sort). How
many rounds does an element have to win in order to come out on top? A
number which approaches log2(N). This is like before, except this
element doesn't have to survive as many rounds; therefore, it's total
chance of coming out on top is 1/(2**log2(N)) ==  1/N. Hoorah!

***

After considering that, I realized that even if your idea to shuffle a
list did work (can't tell because I don't know how Python's sort
works), it would not be an optimal way to shuffle a list even though
Python uses an N*log(N) sort (at least I hope so :P). This is because
you can shuffle in time proportional the the length of the list.

You can accomplish this by doing what I will call "random selection".
It would be like linear selection, except you wouldn't bother checking
the head against every other element. When you want to figure out what
to put at the head, just pick at random! I suspect this is what Python
does in random.shuffle, since it is rather an easy to see it would
result in something uniform (I swear I haven't looked at the source
code for random.shuffle :P).

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: random shuffles

2006-07-21 Thread bryanjugglercryptographer

Boris Borcic wrote:
> does
>
> x.sort(cmp = lambda x,y : cmp(random.random(),0.5))
>
> pick a random shuffle of x with uniform distribution ?
>
> Intuitively, assuming list.sort() does a minimal number of comparisons to
> achieve the sort, I'd say the answer is yes.

You would be mistaken (except for the trivial case of 2 elements).

In m uniform, independant, random 2-way choices, there are 2**m
equally probable outcomes. We can map multiple random outcomes
to the same final output, but each will still have probability of the
form
n/2**m, where n is an integer.

A random permutation requires that we generate outputs with
probability 1/(n!). For n>2, we cannot reach the probability using a
limited number of 2-way choices.


Have you ever looked at the problem of making a perfectly uniform
1-in-3 choice when the only source of randomness is a perfect random
bit generator? The algorithms terminate with probability 1, but are
non-terminators in that there is no finite number of steps in which
they must terminate.


-- 
--Bryan

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: random shuffles

2006-07-21 Thread Raymond Hettinger

Boris Borcic wrote:
> does
>
> x.sort(cmp = lambda x,y : cmp(random.random(),0.5))
>
> pick a random shuffle of x with uniform distribution ?
>
> Intuitively, assuming list.sort() does a minimal number of comparisons to
> achieve the sort, I'd say the answer is yes. But I don't feel quite 
> confortable
> with the intuition... can anyone think of a more solid argumentation ?


Try this:

x.sort(key=lambda x: random.random())


Raymond

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: random shuffles

2006-07-21 Thread Paul Rubin
Boris Borcic <[EMAIL PROTECTED]> writes:
> To be more convincing... assume the algorithm is optimal and calls

That assumption is not even slightly realistic.  Real-world sorting
algorithms almost never do a precisely minimal amount of comparison.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: random shuffles

2006-07-21 Thread Boris Borcic
Boris Borcic wrote:
> Paul Rubin wrote:
>> Boris Borcic <[EMAIL PROTECTED]> writes:
>>> x.sort(cmp = lambda x,y : cmp(random.random(),0.5))
>>> pick a random shuffle of x with uniform distribution ?
>>
>> You really can't assume anything like that.  Sorting assumes an order
>> relation on the items being sorted, which means if a < b and b < c,
>> then a < c.  If your comparison operation doesn't preserve that
>> property then the sort algorithm could do anything, including looping
>> forever or crashing.
> 
> Not if it does the minimum number of comparisons to achieve the sort, in 
> which case it won't ever call cmp() if the result is pre-determined by 
> the order relation and previous comparisons, so that it will never get 
> from this comparison function a system of answers that's not consistent 
> with an order relation. That's obvious at least in the case where 
> random.random() == 0.5 never occurs (and at first sight even the latter 
> case shouldn't change it).

To be more convincing... assume the algorithm is optimal and calls
cmp(x,y). Either the previous calls (+ assuming order relation) give no 
conclusion as to the possible result of the call, in which case the result 
can't 
contradict an order relation; or the previous calls (+assumption) provide only 
partial information; iow the algorithm knows for example x<=y and needs to 
determine which of xy".

But is it possible for a sort algorithm assuming an order relation to attain eg 
x<=y but neither xhttp://mail.python.org/mailman/listinfo/python-list


Re: random shuffles

2006-07-21 Thread Boris Borcic
Boris Borcic wrote:
> Paul Rubin wrote:
>> Boris Borcic <[EMAIL PROTECTED]> writes:
>>> x.sort(cmp = lambda x,y : cmp(random.random(),0.5))
>>> pick a random shuffle of x with uniform distribution ?
>>
>> You really can't assume anything like that.  Sorting assumes an order
>> relation on the items being sorted, which means if a < b and b < c,
>> then a < c.  If your comparison operation doesn't preserve that
>> property then the sort algorithm could do anything, including looping
>> forever or crashing.
> 
> Not if it does the minimum number of comparisons to achieve the sort, in 
> which case it won't ever call cmp() if the result is pre-determined by 
> the order relation and previous comparisons, so that it will never get 

read "by /the assumption of an order relation/ and previous comparisons"

> from this comparison function a system of answers that's not consistent 
> with an order relation. That's obvious at least in the case where 
> random.random() == 0.5 never occurs (and at first sight even the latter 
> case shouldn't change it).

- this - the idea that /if/ the algorithm was optimal it would only sample the 
random comparison function up to a system of answers consistent with an order 
relation - is actually what prompted my question,
iow "is it good for anything ?"

Best, BB
--
"On naît tous les mètres du même monde"
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: random shuffles

2006-07-21 Thread Boris Borcic
Thanks for these details. BB

Tim Peters wrote:
> [ Boris Borcic]
>> x.sort(cmp = lambda x,y : cmp(random.random(),0.5))
>>
>> pick a random shuffle of x with uniform distribution ?
> 
> Say len(x) == N.  With Python's current sort, the conjecture is true
> if and only if N <= 2.
> 
>> Intuitively, assuming list.sort() does a minimal number of comparisons to
>> achieve the sort,
> 
> No idea what that could mean in a rigorous, algorithm-independent way.
> 
>> I'd say the answer is yes. But I don't feel quite confortable
>> with the intuition... can anyone think of a more solid argumentation ?
> 
> If a list is already sorted, or reverse-sorted, the minimum number of
> comparisons needed to determine that is N-1, and Python's current sort
> achieves that.  It first looks for the longest ascending or descending
> run.  If comparison outcomes are random, it will decide "yes, already
> sorted" with probability 1/2**(N-1), and likewise for reverse-sorted.
> When N > 2, those are larger than the "correct" probabilities 1/(N!).
> So., e.g., when N=3, the sort will leave the list alone 1/4th of the
> time (and will reverse the list in-place another 1/4th).  That makes
> the identity and reversal permutations much more likely than "they
> should be", and the bias is worse as N increases.
> 
> Of course random.shuffle(x) is the intended way to effect a
> permutation uniformly at random.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: random shuffles

2006-07-21 Thread Boris Borcic
Paul Rubin wrote:
> Boris Borcic <[EMAIL PROTECTED]> writes:
>> x.sort(cmp = lambda x,y : cmp(random.random(),0.5))
>> pick a random shuffle of x with uniform distribution ?
> 
> You really can't assume anything like that.  Sorting assumes an order
> relation on the items being sorted, which means if a < b and b < c,
> then a < c.  If your comparison operation doesn't preserve that
> property then the sort algorithm could do anything, including looping
> forever or crashing.

Not if it does the minimum number of comparisons to achieve the sort, in which 
case it won't ever call cmp() if the result is pre-determined by the order 
relation and previous comparisons, so that it will never get from this 
comparison function a system of answers that's not consistent with an order 
relation. That's obvious at least in the case where random.random() == 0.5 
never 
occurs (and at first sight even the latter case shouldn't change it).

Best, BB
--
"On naît tous les mètres du même monde"



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: random shuffles

2006-07-21 Thread Paul Rubin
Boris Borcic <[EMAIL PROTECTED]> writes:
> x.sort(cmp = lambda x,y : cmp(random.random(),0.5))
> pick a random shuffle of x with uniform distribution ?

You really can't assume anything like that.  Sorting assumes an order
relation on the items being sorted, which means if a < b and b < c,
then a < c.  If your comparison operation doesn't preserve that
property then the sort algorithm could do anything, including looping
forever or crashing.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: random shuffles

2006-07-21 Thread Tim Peters
[ Boris Borcic]
> x.sort(cmp = lambda x,y : cmp(random.random(),0.5))
>
> pick a random shuffle of x with uniform distribution ?

Say len(x) == N.  With Python's current sort, the conjecture is true
if and only if N <= 2.

> Intuitively, assuming list.sort() does a minimal number of comparisons to
> achieve the sort,

No idea what that could mean in a rigorous, algorithm-independent way.

> I'd say the answer is yes. But I don't feel quite confortable
> with the intuition... can anyone think of a more solid argumentation ?

If a list is already sorted, or reverse-sorted, the minimum number of
comparisons needed to determine that is N-1, and Python's current sort
achieves that.  It first looks for the longest ascending or descending
run.  If comparison outcomes are random, it will decide "yes, already
sorted" with probability 1/2**(N-1), and likewise for reverse-sorted.
When N > 2, those are larger than the "correct" probabilities 1/(N!).
So., e.g., when N=3, the sort will leave the list alone 1/4th of the
time (and will reverse the list in-place another 1/4th).  That makes
the identity and reversal permutations much more likely than "they
should be", and the bias is worse as N increases.

Of course random.shuffle(x) is the intended way to effect a
permutation uniformly at random.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: random shuffles

2006-07-21 Thread Iain King

Dustan wrote:
> Boris Borcic wrote:
> > does
> >
> > x.sort(cmp = lambda x,y : cmp(random.random(),0.5))
> >
> > pick a random shuffle of x with uniform distribution ?
> >
> > Intuitively, assuming list.sort() does a minimal number of comparisons to
> > achieve the sort, I'd say the answer is yes. But I don't feel quite 
> > confortable
> > with the intuition... can anyone think of a more solid argumentation ?
>
> Why not use the supplied shuffle method?
>
> random.shuffle(x)

or check out this thread:
http://groups.google.com/group/comp.lang.python/browse_thread/thread/766f4dcc92ff6545?tvc=2&q=shuffle

Iain

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: random shuffles

2006-07-21 Thread Dustan

Boris Borcic wrote:
> does
>
> x.sort(cmp = lambda x,y : cmp(random.random(),0.5))
>
> pick a random shuffle of x with uniform distribution ?
>
> Intuitively, assuming list.sort() does a minimal number of comparisons to
> achieve the sort, I'd say the answer is yes. But I don't feel quite 
> confortable
> with the intuition... can anyone think of a more solid argumentation ?

Why not use the supplied shuffle method?

random.shuffle(x)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: random shuffles

2006-07-21 Thread Peter Otten
Boris Borcic wrote:

> does
> 
> x.sort(cmp = lambda x,y : cmp(random.random(),0.5))
> 
> pick a random shuffle of x with uniform distribution ?
> 
> Intuitively, assuming list.sort() does a minimal number of comparisons to
> achieve the sort, I'd say the answer is yes. But I don't feel quite
> confortable with the intuition... can anyone think of a more solid
> argumentation ?

Anecdotal evidence suggests the answer is no:

>>> histo = {}
>>> for i in xrange(1000):
... t = tuple(sorted(range(3), lambda x, y: cmp(random.random(), 0.5)))
... histo[t] = histo.get(t, 0) + 1
...
>>> sorted(histo.values())
[60, 62, 64, 122, 334, 358]

versus:

>>> histo = {}
>>> for i in xrange(1000):
... t = tuple(sorted(range(3), key=lambda x: random.random()))
... histo[t] = histo.get(t, 0) + 1
...
>>> sorted(histo.values())
[147, 158, 160, 164, 183, 188]

Peter
-- 
http://mail.python.org/mailman/listinfo/python-list