Re: 'contains?' on 'sorted-set-by' based on the comparator only? (possible bug?)

Cedric Greevey Sat, 24 Mar 2012 19:28:21 -0700

On Sat, Mar 24, 2012 at 9:39 PM, Cedric Greevey <cgree...@gmail.com> wrote:
> In increasing order of difficulty...
>
> Option 1:
>
> Extend your comparator to sort first on the key you're actually
> interested in, then if that key isn't different on the others
> more-or-less arbitrarily.


Or use this:

(defn po->to [po]
  (let [m (atom #{})]
    (fn [a b]
      (or
        (po a b)
        (and
          (not (po b a))
          (or
            (< (hash a) (hash b))
            (and
              (== (hash a) (hash b))
              (not= a b)
              (or (contains? @m [a b])
                (and
                  (not (contains? @m [b a]))
                  (swap! m conj [a b]))))))))))

(defn sorted-set-po [po & keys]
  (apply sorted-set-by (po->to po) keys))

Input: a partial order (may be true for a b, true for b a, or false for both).

Output: a sorted-set that uses a total order built from the partial
order. For things where the po says neither a < b nor b < a, it will
first compare their hashes. If those don't collide, the order of a and
b is decided that way. If they do, and a and b are not actually equal,
it will arbitrarily pick one to be "less than" the other and remember
its decision so that the ordering of unequal, hash-colliding items
will remain consistent.

Notes:
1. The memory use can grow over time due to the memoization-like
behavior using the atom. However, this only occurs for items with both
order and hash collisions, which should be extraordinarily infrequent.
(If hashes were uniform on [0,2^32-1] you'd need an average of around
65,000 items before, per the birthday paradox, a single collision was
likely within your collection. And then it would probably be for a
pair of items that po considered unequal.) So the memory use growth
should be small (for collections of thousands to millions of items) to
nonexistent (for smaller ones).

2. Two sorted-set-pos with the same items, where they were added in
different sequences, may not sort them in the same order, though this
should be rare (again tied to the frequency of order+hash collisions).
But this is not really a problem since, if you're sorting with a
partial order, it makes sense to consider the ordering within equality
buckets of the partial order to be undefined (the way ordering of an
entire hash-set is undefined) and thus does not make sense to make
program semantics depend on the ordering. Also, all sets compare equal
purely by having the same contents (I checked APersistentSet.java to
be sure) so the two sorted-set-pos would still compare equal despite
traversing their elements in slightly different sequences.

Here's a clooj REPL session testing po->to. Note that the last tests
intentionally induce an order+hash collision of unequal items and
check that the same instance of a po->to produces consistent results
when given the colliding items in either order (that is, it always
returns logical true when given the items in one order and always
returns logical false when given them in the opposite order). The very
last tests verify that it always returns false for equal items.

user=>
(def foo (po->to #(< (:a %1) (:a %2))))
#'user/foo

user=>
(foo {:a 1 :b 1} {:a 2 :b 1})
true

user=>
(foo {:a 1 :b 1} {:a 2 :b 1})
true

user=>
(foo {:a 1 :b 1} {:a 2 :b 1})
true

user=>
(foo {:a 2 :b 1} {:a 1 :b 1})
false

user=>
(foo {:a 2 :b 1} {:a 1 :b 1})
false

user=>
(foo {:a 1 :b 1} {:a 1 :b 2})
true

user=>
(foo {:a 1 :b 1} {:a 1 :b 2})
true

user=>
(foo {:a 1 :b 1} {:a 1 :b 2})
true

user=>
(foo {:a 1 :b 2} {:a 1 :b 1})
false

user=>
(foo {:a 1 :b 2} {:a 1 :b 1})
false

user=>
(foo {:a 1 :b 2} {:a 1 :b 1})
false

user=>
(hash {:a 1 :b 1 :c 1})
-1253235521

user=>
(hash {:a 1 :b 1 :c 2})
-1253235522

user=>
(hash {:a 1 :b 2 :c 1})
-1253235520

user=>
(hash {:a 1 :b 0 :c 1})
-1253235522

user=>
(foo {:a 1 :b 1 :c 2} {:a 1 :b 0 :c 1})
#{[{:a 1, :c 2, :b 1} {:a 1, :c 1, :b 0}]}

user=>
(foo {:a 1 :b 1 :c 2} {:a 1 :b 0 :c 1})
true

user=>
(foo {:a 1 :b 1 :c 2} {:a 1 :b 0 :c 1})
true

user=>
(foo {:a 1 :b 0 :c 1} {:a 1 :b 1 :c 2})
false

user=>
(foo {:a 1 :b 0 :c 1} {:a 1 :b 1 :c 2})
false

user=>
(foo {:a 1 :b 0 :c 1} {:a 1 :b 1 :c 2})
false

user=>
(def bar (po->to #(< (:a %1) (:a %2))))
#'user/bar

user=>
(bar {:a 1 :b 0 :c 1} {:a 1 :b 1 :c 2})
#{[{:a 1, :c 1, :b 0} {:a 1, :c 2, :b 1}]}

user=>
(bar {:a 1 :b 0 :c 1} {:a 1 :b 1 :c 2})
true

user=>
(bar {:a 1 :b 1 :c 2} {:a 1 :b 0 :c 1})
false

user=>
(bar {:a 1 :b 1 :c 2} {:a 1 :b 0 :c 1})
false

user=>
(bar {:a 1 :b 1 :c 2} {:a 1 :b 1 :c 2})
false

user=>
(bar {:a 1 :b 1 :c 2} {:a 1 :b 1 :c 2})
false

user=>
(bar {:a 1 :b 0 :c 1} {:a 1 :b 0 :c 1})
false

user=>
(bar {:a 1 :b 0 :c 1} {:a 1 :b 0 :c 1})
false

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: 'contains?' on 'sorted-set-by' based on the comparator only? (possible bug?)

Reply via email to