On Sat, May 14, 2011 at 1:32 PM, Base <basselh...@gmail.com> wrote:
> Hi All-
>
> I am sure I am just missing this in contrib somewhere , but....
>
> I am trying to find a subvector such that it returns the rest of the
> vector at the first occurrence of a value
>
> (def v ['a 'b 'c 'd 'e 'f 'g])
>
> (my-fun 'c v)
> =>  ['c 'd 'e 'f 'g]

Jonathan's subvec-at-val is good if the data is unsorted. If the items
are going to be sorted, however, then you can use binary search, which
will be O(log n) instead of O(n).

Something like:

(defn rest-from-sorted
  ([thing v comp]
    (rest-from-sorted thing v 0 (count v) (count v) comp))
  ([thing v start end cnt comp]
    (let [i (+ start (quot (- end start) 2))]
      (cond
        (= (v i) thing) (rest-from-sorted thing v (dec i) cnt)
        (== i start) []
        :else (if (comp (v i) thing)
                  (recur thing v (inc i) end cnt comp)
                  (recur thing v start i cnt comp)))))
  ([thing v i cnt]
    (if (< i 0)
      v
      (if (= (v i) thing)
        (recur thing v (dec i) cnt)
        (subvec v (inc i) cnt))))
  ([thing v]
    (let [comp (if (number? thing)
                 <
                 #(neg? (.compareTo %1 %2)))]
      (rest-from-sorted thing v 0 (count v) (count v) comp))))

The first sub-function is one of two intended entry points and just
calls the second.

The second has some added parameters and does binary search until it
finds some occurrence of thing, or proves that (if v was sorted) thing
wasn't in there. In the latter case it returns an empty vector; in the
former it calls the third, which just walks backward until it hits
either the start of the vector or an element different from thing, to
find the start of the run of copies of thing.

The "comp" function should implement < on the vector's elements. The
fourth sub-function is the other entry point and uses a sensible
default: < itself if the "thing" to look for is a number, and Java's
.compareTo otherwise.

user=> (rest-from-sorted "foo" ["a" "big" "foo" "grumpy" "honorable"
                                "jackal" "quasimodo" "xylophone"])
["foo" "grumpy" "honorable" "jackal" "quasimodo" "xylophone"]
user=> (rest-from-sorted 15 [4 8 15 15 15 16 16 23 42])
[15 15 15 16 16 23 42]

As expected, the binary search is faster than subvec-at-val (which I
took the liberty of debugging) for long, sorted vectors. In fact, it's
2x as fast for a sorted vector of 1000 strings, and subvec-at-val is
about 2x faster than drop-while, where it's nearly a worst-case for
drop-while (first occurrence of target is near the end of the vector):

user=> (def v (vec (sort (map str (range 1000)))))
#'user/v
user=> (def q (atom nil))
#'user/q
user=> (do
         (time
           (dotimes [_ 1e4]
             (reset! q (doall (drop-while (partial not= "987") v)))))
        @q)
"Elapsed time: 4695.8418 msecs"
("987" "988" "989" "99" "990" "991" "992" "993" "994" "995" "996"
  "997" "998" "999")
user=> (do
         (time
           (dotimes [_ 1e4]
             (reset! q (subvec-at-val "987" v))))
        @q)
"Elapsed time: 2291.36256 msecs"
["987" "988" "989" "99" "990" "991" "992" "993" "994" "995" "996"
 "997" "998" "999"]
user=> (do
         (time
           (dotimes [_ 1e4]
             (reset! q (rest-from-sorted "987" v))))
         @q)
"Elapsed time: 1211.11344 msecs"
["987" "988" "989" "99" "990" "991" "992" "993" "994" "995" "996"
 "997" "998" "999"]

Note: I had to fix a bug in subvec-at-val ("val" -> "x" in arglist) to
get it to work at all, and to wrap the drop-while result in a doall to
properly time it (drop-while, and therefore the second component of
split-with, is lazy).

A second disadvantage to drop-while is that it returns a sequence
rather than a subvec. If you want random access you have to call vec
on it and then it takes up extra memory compared to a subvec; likewise
if you hold onto the head for any reason; otherwise there's little
memory advantage to the lazy sequence over the subvec, if any.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to