Sorting gotcha

2011-08-23 Thread Mark Engelberg
I had always assumed that vectors were sorted lexicographically.  In
other words, you sort on the first element, and then refine by the
second element, and so on.  I was surprised tonight to discover that
is not the case.

 (compare abc b); Strings are compared lexicographically
-1
 (compare (vec abc) (vec b)); Vectors are not
1

It turns out that shorter vectors are always considered to be less
than longer vectors.  Lexicographic behavior only kicks in among
vectors of the same length.  Not at all what I expected, so I wanted
to make sure everyone knew about this behavior.

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: Sorting gotcha

2011-08-23 Thread Ben Smith-Mannschott
On Tue, Aug 23, 2011 at 09:44, Mark Engelberg mark.engelb...@gmail.com wrote:
 I had always assumed that vectors were sorted lexicographically.  In
 other words, you sort on the first element, and then refine by the
 second element, and so on.  I was surprised tonight to discover that
 is not the case.

 (compare abc b)        ; Strings are compared lexicographically
 -1
 (compare (vec abc) (vec b))    ; Vectors are not
 1

 It turns out that shorter vectors are always considered to be less
 than longer vectors.  Lexicographic behavior only kicks in among
 vectors of the same length.  Not at all what I expected, so I wanted
 to make sure everyone knew about this behavior.

Yea, confirmed.

$ git describe
clojure-1.3.0-beta1-6-g82c7254

$ java -jar target/clojure-1.3.0-master-SNAPSHOT.jar
Clojure 1.3.0-master-SNAPSHOT

user= (defn sort* [  more ] (sort more))

;; strings: always sort lexicographically

user= (sort* abc b)
(abc b)

;; vector: short sorts before long

user= (sort* (vec abc) (vec b))
([\b] [\a \b \c])

;; vector: same length sorts lexicographically

user= (sort* (vec abc) (vec abb))
([\a \b \b] [\a \b \c])

user= (sort* (vec abc) (vec abd))
([\a \b \c] [\a \b \d])

;; seq: does not implement comparable

user= (sort* (seq abc) (seq b))
ClassCastException clojure.lang.StringSeq cannot be cast to
java.lang.Comparable  clojure.lang.Util.compare (Util.java:92)

user= (sort* (seq (vec abc)) (seq (vec b)))
ClassCastException clojure.lang.PersistentVector$ChunkedSeq cannot be
cast to java.lang.Comparable  clojure.lang.Util.compare (Util.java:92)

;; list: does not implement comparable

user= (sort* (apply list abc) (apply list b))
ClassCastException clojure.lang.PersistentList cannot be cast to
java.lang.Comparable  clojure.lang.Util.compare (Util.java:92)


It seems a shame that Clojure doesn't provide universal lexicographical
comparability of sequential things, much as it provides equality for
sequential things:

user= (= (list \a \b \c) (vector \a \b \c))
true

Perhaps there's a good reason for this (law of unintended consequences,
anyone?) that I'm not seeing.

// Ben

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en