Hi,

dedupe is almost what you need, but you can just copy the source and modify 
it slightly:

(defn dedupe-by
  "Similar to dedupe but allows applying a function to the element by which to 
dedupe."
  ([f]
   (fn [rf]
     (let [pv (volatile! ::none)]
       (fn
         ([] (rf))
         ([result] (rf result))
         ([result input]
          (let [prior @pv
                cv (f input)]
            (vreset! pv cv)
            (if (= prior cv)
              result
              (rf result input))))))))
  ([f coll] (sequence (dedupe-by f) coll)))


You can then just say `(dedupe-by :value xs)`.

HTH

On Tuesday, May 17, 2016 at 11:47:06 AM UTC+2, Simon Brooke wrote:
>
> I'm having trouble with writing a function
>
>    1. in idiomatic clojure
>    2. which doesn't blow the stack
>
> The problem is I have a time series of events e.g.
>
> ({:idhistory 78758272, :timestamp #inst 
> "2016-03-31T19:34:27.313000000-00:00", :nameid 5637, :stringvalue nil, 
> :value 8000.0} 
>  {:idhistory 78756591, :timestamp #inst 
> "2016-03-31T19:33:31.697000000-00:00", :nameid 5637, :stringvalue nil, 
> :value 7368.0} 
>  {:idhistory 78754249, :timestamp #inst 
> "2016-03-31T19:32:17.100000000-00:00", :nameid 5637, :stringvalue nil, 
> :value 6316.0} 
>  {:idhistory 78753165, :timestamp #inst 
> "2016-03-31T19:31:41.843000000-00:00", :nameid 5637, :stringvalue nil, 
> :value 5263.0} 
>  {:idhistory 78751187, :timestamp #inst 
> "2016-03-31T19:30:36.213000000-00:00", :nameid 5637, :stringvalue nil, 
> :value 4211.0}
>  {:idhistory 78749476, :timestamp #inst 
> "2016-03-31T19:29:41.363000000-00:00", :nameid 5637, :stringvalue nil, 
> :value 3158.0} ...)
>
> which is to say, each event is a map, and each event has two critical 
> keys, :timestamp and :value. The series is sorted in descending order by 
> timestamp, i.e. most recent event first. These series are of up to millions 
> of events; the average length of the series is about half a million events. 
> However, many contain successive events at which the value does not change, 
> and where the value doesn't change I want to retain only the first event.
>
> So far what I've got is:
>
> (defn consolidate-events
>   "Return a time series like this `series`, but without those events whose 
> value is
>    identical to the value of the preceding event."
>   [series]
>   (let [[car cadr & cddr] series]
>     (cond
>       (empty? series) series
>       (=
>         (get-value-for-event car)
>         (get-value-for-event cadr)) (consolidate-events (rest series))
>       true (cons car (consolidate-events (rest series))))))
>
>
> Obviously, with millions of events or even merely hundreds of thousands, a 
> recursive function blows the stack. Furthermore, this one isn't even tail 
> call optimisable. I tried creating an inner function which I naively 
> thought should be tail call optimisable, but it fails 'Can only recur from 
> tail position':
>
> (defn consolidate-events
>   "Return a time series like this `series`, but without those events whose 
> value is
>   identical to the value of the preceding event."
>   [series]
>   (remove
>     nil?
>     (let [inner (fn [series]
>                   (let [[car cadr & cddr] series]
>                     (if
>                       (not (empty? series))
>                       ;; then
>                       (cons
>                         (if
>                           (= (get-value-for-event car)
>                              (get-value-for-event cadr))
>                           ;; then
>                           nil
>                           ;; else
>                           car)
>                         (if
>                           (not (empty? series))
>                           (recur (rest series)))))))]
>     (inner series))))
>
>
> Test for the function is as follows:
>
> (deftest consolidate-events-test
>   (testing "consolidate-events"
>     (let [s1 [{:timestamp #inst "2016-03-31T19:34:27.313000000-00:00", 
> :value 8000.0}
>               {:timestamp #inst "2016-03-31T19:33:31.697000000-00:00", 
> :value 7368.0}
>               {:timestamp #inst "2016-03-31T19:32:17.100000000-00:00", 
> :value 6316.0}
>               {:timestamp #inst "2016-03-31T19:31:41.843000000-00:00", 
> :value 5263.0}
>               {:timestamp #inst "2016-03-31T19:30:36.213000000-00:00", 
> :value 4211.0}
>               {:timestamp #inst "2016-03-31T19:29:41.363000000-00:00", 
> :value 3158.0}]
>           s2 [{:timestamp #inst "2016-03-31T19:34:27.313000000-00:00", 
> :value 8000.0}
>               {:timestamp #inst "2016-03-31T19:33:31.697000000-00:00", 
> :value 7368.0}
>               {:timestamp #inst "2016-03-31T19:33:17.100000000-00:00", 
> :value 6316.0}
>               {:timestamp #inst "2016-03-31T19:32:27.100000000-00:00", 
> :value 6316.0}
>               {:timestamp #inst "2016-03-31T19:32:17.100000000-00:00", 
> :value 6316.0}
>               {:timestamp #inst "2016-03-31T19:31:41.843000000-00:00", 
> :value 5263.0}
>               {:timestamp #inst "2016-03-31T19:30:36.213000000-00:00", 
> :value 4211.0}
>               {:timestamp #inst "2016-03-31T19:29:41.363000000-00:00", 
> :value 3158.0}]]
>       (is (= s1 (consolidate-events s1)) "There are no events in s1 that 
> can be consolidated")
>       (is (= s1 (consolidate-events s2)) "When consolidated, s2 = s1")
>       (is (not (= s2 (consolidate-events s2))) "When consolidated, s2 no 
> longer equals s2"))))
>
>
> Any help gratefully accepted! 
>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to