Re: Michael newbee challange nr 1
On Nov 8, 12:33 pm, Michael Jaaka michael.ja...@googlemail.com wrote: now I would like get such effect that callbackListener will be called twice for the example collection. once with tom and iterator (or a lazy seq) to lazy evaluated collection of (32 and 2333) and second with anne and iterator for collection of (12 and 55) snip Note that all must be evaluated lazily. Unless the keys in your sequence will always be sorted, I don't believe you can do this lazily; in order to generate a sequence of the values for a given key in your collection, you would have to traverse the entire collection first in order to check each key, i.e. if you had: [[ tom 32 ] [ anne 12 ] [ anne 55 ] [ tom 2333 ]] Then you'd need to get right to the end of the list in order to generate the sequence of values for tom. One, non-lazy, way to accomplish what you're after would be: (use '[clojure.contrib.seq-utils :only (group-by)]) (def *s* [[tom 32] [tom 2333] [anne 12] [anne 55]]) (defn callback-listener [k v] (println (str key: k , value: v))) (doseq [[k v] (map (fn [[k v]] [k (map second v)]) (group-by first *s*))] (callback-listener k v)) Note that since you used the name callback-listener, I'm assuming that it has side effects, which is why I used doseq. If callback- listener does _not_ have side effects (i.e. if it performs some computation upon it's arguments and simply yields a result, then you could use: (defn callback-fn [[k v]] (compute-some-value-from-arguments k v)) (map (comp callback-fn (fn [[k v]] [k (map second v)])) (group-by first *s*)) which would yield a sequence of the result of each call to callback- fn2. Note that while the resulting sequence would, technically, be lazy, group-by is not, so you would still consume the entire input sequence up front. Now, _if_ you can guarantee that your input sequence will always be sorted by key (that is, if you know for a fact that all the entries for a given key will be consecutive, as in your example: [tom tom anne anne], rather than mine above: [tom anne tom anne]), then you could use partition-by instead, which would (I believe) let you achieve full laziness: (use '[clojure.contrib.seq-utils :only (partition-by)]) (map (comp callback-fn (fn [part] [(ffirst part) (map second part)])) (partition-by first *s*)) Hope this helps, -David --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~--~~~~--~~--~--~---
Re: Michael newbee challange nr 1
Keys are always sorted. So once a key stops appearing it won't appear again. The first solution seems to be the right one, because all values are processed in a sequence manner in a lazy way. 2009/11/8 DTH dth...@gmail.com On Nov 8, 12:33 pm, Michael Jaaka michael.ja...@googlemail.com wrote: now I would like get such effect that callbackListener will be called twice for the example collection. once with tom and iterator (or a lazy seq) to lazy evaluated collection of (32 and 2333) and second with anne and iterator for collection of (12 and 55) snip Note that all must be evaluated lazily. Unless the keys in your sequence will always be sorted, I don't believe you can do this lazily; in order to generate a sequence of the values for a given key in your collection, you would have to traverse the entire collection first in order to check each key, i.e. if you had: [[ tom 32 ] [ anne 12 ] [ anne 55 ] [ tom 2333 ]] Then you'd need to get right to the end of the list in order to generate the sequence of values for tom. One, non-lazy, way to accomplish what you're after would be: (use '[clojure.contrib.seq-utils :only (group-by)]) (def *s* [[tom 32] [tom 2333] [anne 12] [anne 55]]) (defn callback-listener [k v] (println (str key: k , value: v))) (doseq [[k v] (map (fn [[k v]] [k (map second v)]) (group-by first *s*))] (callback-listener k v)) Note that since you used the name callback-listener, I'm assuming that it has side effects, which is why I used doseq. If callback- listener does _not_ have side effects (i.e. if it performs some computation upon it's arguments and simply yields a result, then you could use: (defn callback-fn [[k v]] (compute-some-value-from-arguments k v)) (map (comp callback-fn (fn [[k v]] [k (map second v)])) (group-by first *s*)) which would yield a sequence of the result of each call to callback- fn2. Note that while the resulting sequence would, technically, be lazy, group-by is not, so you would still consume the entire input sequence up front. Now, _if_ you can guarantee that your input sequence will always be sorted by key (that is, if you know for a fact that all the entries for a given key will be consecutive, as in your example: [tom tom anne anne], rather than mine above: [tom anne tom anne]), then you could use partition-by instead, which would (I believe) let you achieve full laziness: (use '[clojure.contrib.seq-utils :only (partition-by)]) (map (comp callback-fn (fn [part] [(ffirst part) (map second part)])) (partition-by first *s*)) Hope this helps, -David --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~--~~~~--~~--~--~---
Re: Michael newbee challange nr 1
By the first I mean this done by Meikel Brandmeyer. However I will check later also this one: (use '[clojure.contrib.seq-utils :only (partition-by)]) (map (comp callback-fn (fn [part] [(ffirst part) (map second part)])) (partition-by first *s*)) and the other question is if I have (def c [ [ 1 2 ] [ 3 4 ] ]) and want to get lazily [ 2 4 ] (values of tuplets of a sequence) will be this a correct (map #(- % fnext) c ) way? 2009/11/9 Michael Jaaka michael.ja...@googlemail.com Keys are always sorted. So once a key stops appearing it won't appear again. The first solution seems to be the right one, because all values are processed in a sequence manner in a lazy way. 2009/11/8 DTH dth...@gmail.com On Nov 8, 12:33 pm, Michael Jaaka michael.ja...@googlemail.com wrote: now I would like get such effect that callbackListener will be called twice for the example collection. once with tom and iterator (or a lazy seq) to lazy evaluated collection of (32 and 2333) and second with anne and iterator for collection of (12 and 55) snip Note that all must be evaluated lazily. Unless the keys in your sequence will always be sorted, I don't believe you can do this lazily; in order to generate a sequence of the values for a given key in your collection, you would have to traverse the entire collection first in order to check each key, i.e. if you had: [[ tom 32 ] [ anne 12 ] [ anne 55 ] [ tom 2333 ]] Then you'd need to get right to the end of the list in order to generate the sequence of values for tom. One, non-lazy, way to accomplish what you're after would be: (use '[clojure.contrib.seq-utils :only (group-by)]) (def *s* [[tom 32] [tom 2333] [anne 12] [anne 55]]) (defn callback-listener [k v] (println (str key: k , value: v))) (doseq [[k v] (map (fn [[k v]] [k (map second v)]) (group-by first *s*))] (callback-listener k v)) Note that since you used the name callback-listener, I'm assuming that it has side effects, which is why I used doseq. If callback- listener does _not_ have side effects (i.e. if it performs some computation upon it's arguments and simply yields a result, then you could use: (defn callback-fn [[k v]] (compute-some-value-from-arguments k v)) (map (comp callback-fn (fn [[k v]] [k (map second v)])) (group-by first *s*)) which would yield a sequence of the result of each call to callback- fn2. Note that while the resulting sequence would, technically, be lazy, group-by is not, so you would still consume the entire input sequence up front. Now, _if_ you can guarantee that your input sequence will always be sorted by key (that is, if you know for a fact that all the entries for a given key will be consecutive, as in your example: [tom tom anne anne], rather than mine above: [tom anne tom anne]), then you could use partition-by instead, which would (I believe) let you achieve full laziness: (use '[clojure.contrib.seq-utils :only (partition-by)]) (map (comp callback-fn (fn [part] [(ffirst part) (map second part)])) (partition-by first *s*)) Hope this helps, -David --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~--~~~~--~~--~--~---
Re: Michael newbee challange nr 1
Hi, On Nov 9, 3:14 pm, Michael Jaaka michael.ja...@googlemail.com wrote: and the other question is if I have (def c [ [ 1 2 ] [ 3 4 ] ]) and want to get lazily [ 2 4 ] (values of tuplets of a sequence) will be this a correct (map #(- % fnext) c ) way? (map second c) is what you want. This is missing from my solution, btw. The vs should go through exactly this map call: (map second vs). Sincerely Meikel --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~--~~~~--~~--~--~---
Re: Michael newbee challange nr 1
Hi, Am 08.11.2009 um 13:33 schrieb Michael Jaaka: Hi! How would you solve such problem: I have a collection of pairs (key, value) - [ [ tom 32 ] [ tom 2333 ] [ anne 12 ] [ anne 55 ] ] As you can see keys can occur more than once, also that collection is very large so it should be evaluated lazily (for example the collection cames from stream input). I have function with two arguments key and sequence of its values, the pseudo code would be: callbackListener(string key, iterator values) now I would like get such effect that callbackListener will be called twice for the example collection. once with tom and iterator (or a lazy seq) to lazy evaluated collection of (32 and 2333) and second with anne and iterator for collection of (12 and 55) In Java it is easy because the language is imperative. But how about such logic in Clojure? Note that all must be evaluated lazily. Obviously, this cannot be done completely lazy since changing the first element is not a stopping time. You'll always need at least one lookahead to determine the change of the key and you'll have to realize the input to find all values for a key. (defn grouped-seq [input] (lazy-seq (when-let [s (seq input)] (let [k (ffirst s) [vs r] (split-with #(- % first (= k)) s)] (cons [k vs] (grouped-seq r)) (map #(apply callbackListener %) (grouped-seq [[tom 32] [tom 2333] [anne 12] [anne 55]])) Note: if you try to avoid realizing all values of a given key, you'll need two sequences: one to keep track of the key and of for the values. While callbackListener realises the values, the key sequence will retain the head of the input causing all values to stay in memory. If you want really laziness, you'll have to traverse the input twice over independent input streams. So if you have a file containing the data, you'll have to read the file twice from independent streams to get full laziness. Sincerely Meikel smime.p7s Description: S/MIME cryptographic signature
Re: Michael newbee challange nr 1
On Sun, Nov 8, 2009 at 7:33 AM, Michael Jaaka michael.ja...@googlemail.comwrote: Hi! How would you solve such problem: I have a collection of pairs (key, value) - [ [ tom 32 ] [ tom 2333 ] [ anne 12 ] [ anne 55 ] ] As you can see keys can occur more than once, also that collection is very large so it should be evaluated lazily (for example the collection cames from stream input). I have function with two arguments key and sequence of its values, the pseudo code would be: callbackListener(string key, iterator values) now I would like get such effect that callbackListener will be called twice for the example collection. once with tom and iterator (or a lazy seq) to lazy evaluated collection of (32 and 2333) and second with anne and iterator for collection of (12 and 55) Something related to (let [ks (distinct (map first s)) vals-for-key (fn [k] (map second (filter #(= (first %) k) s)))] (doseq [k ks] (callback k (vals-for-key k or to collect and return the return values of the callbacks, use for instead of doseq. The problem is that this will realize s and hold onto its head, as Meikel points out. To make it really lazy you need something more like (defn do-it [callback producer] (let [ks (fn [p] (distinct (map first (p vals-for-key (fn [k p] (map second (filter #(= (first %) k) (p] (doseq [k (ks p)] (callback k (vals-for-key k p and call this with two arguments, the first the callback that will take a key and a lazy seq of the associated values, and the second a function that will produce anew a lazy sequence of key-value pairs from your disk file each time it's called, with the property that the lazy-seq clause that returns nil when the sequence is exhausted also has the side effect of closing the input stream so as not to leak file handles. This will grovel over the file N+1 times where N is the number of distinct keys, though, and will have in general two input streams open on the file at the same time, one of them crawling along discovering more distinct keys, and the other finding all the values for one key, then for the next, etc. It will also build up a hashset of all of the keys over time, hidden inside the distinct function's returned seq. So, if the summed lengths of the distinct keys is going to be big enough to blow the heap all on its own, you've got real problems and you'll probably need to use a scratch disk file. In that case I'd recommend a more imperative style with a loop/recur and lazy-seq, where the outer loop starts with a copy of the input file, runs the inner loop, and then deletes its input and reruns itself with the file generated by the inner loop (hence starting with a *copy* of the input file), and the inner loop reads the first key from the current input file, creates an empty temp file to be the current output file, then calls the callback with a lazy seq object whose realization actually advances through the input file, copying entries to the output file that are not matching the current key and producing lazy seq elements for entries that do match. The idea is that if your input file foo.dat reads [[x 1] [y 2] [x 3] [z 4] [y 5]] it will make a copy foo1.tmp, then call the callback with x and a lazy seq whose realization emits (1 3) while also generating foo2.tmp = [[y 2] [z 4] [y 5]]; then the outer loop deletes foo1.tmp and repeats on foo2.tmp and the callback is called with y and a lazy seq whose realization emits (2 5) while generating foo3.tmp = [[z 4]]; and then the outer loop deletes foo2.tmp and the callback is called with z and a lazy seq (4) whose realization creates an empty foo4.tmp; then the outer loop deletes foo3.tmp, sees foo4.tmp is empty, and deletes foo4.tmp and terminates. If you want the callback's return values, you'll need to make the outer loop another lazy-seq. I'm a bit uncomfortable with lazy-seq bodies that have side effects. You might want to consider if you need to be using a RDBMS instead of just disk files for whatever it is you're doing. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~--~~~~--~~--~--~---