Re: Michael newbee challange nr 1

2009-11-09 Thread DTH

On Nov 8, 12:33 pm, Michael Jaaka michael.ja...@googlemail.com
wrote:

 now I would like get such effect that callbackListener will be called twice
 for the example collection.

 once with tom and iterator (or a lazy seq) to lazy evaluated collection of
 (32 and 2333)
 and second with anne and iterator for collection of (12 and 55)
snip
 Note that all must be evaluated lazily.

Unless the keys in your sequence will always be sorted, I don't
believe you can do this lazily; in order to generate a sequence of the
values for a given key in your collection, you would have to traverse
the entire collection first in order to check each key, i.e. if you
had:

[[ tom 32 ] [ anne 12 ] [ anne 55 ] [ tom 2333 ]]

Then you'd need to get right to the end of the list in order to
generate the sequence of values for tom.

One, non-lazy, way to accomplish what you're after would be:

(use '[clojure.contrib.seq-utils :only (group-by)])

(def *s* [[tom 32] [tom 2333] [anne 12] [anne 55]])

(defn callback-listener
  [k v]
  (println (str key:  k , value:  v)))

(doseq [[k v] (map (fn [[k v]] [k (map second v)])
   (group-by first *s*))]
  (callback-listener k v))

Note that since you used the name callback-listener, I'm assuming
that it has side effects, which is why I used doseq.  If callback-
listener does _not_ have side effects (i.e. if it performs some
computation upon it's arguments and simply yields a result, then you
could use:

(defn callback-fn
  [[k v]]
  (compute-some-value-from-arguments k v))

(map (comp callback-fn (fn [[k v]] [k (map second v)]))
 (group-by first *s*))

which would yield a sequence of the result of each call to callback-
fn2.  Note that while the resulting sequence would, technically, be
lazy, group-by is not, so you would still consume the entire input
sequence up front.

Now, _if_ you can guarantee that your input sequence will always be
sorted by key (that is, if you know for a fact that all the entries
for a given key will be consecutive, as in your example: [tom tom anne
anne], rather than mine above: [tom anne tom anne]), then you could
use partition-by instead, which would (I believe) let you achieve full
laziness:

(use '[clojure.contrib.seq-utils :only (partition-by)])

(map (comp callback-fn (fn [part] [(ffirst part) (map second part)]))
 (partition-by first *s*))

Hope this helps,

-David

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~--~~~~--~~--~--~---



Re: Michael newbee challange nr 1

2009-11-09 Thread Michael Jaaka
Keys are always sorted. So once a key stops appearing it won't appear again.
The first solution seems to be the right one, because all values are
processed in a sequence manner in a lazy way.

2009/11/8 DTH dth...@gmail.com


 On Nov 8, 12:33 pm, Michael Jaaka michael.ja...@googlemail.com
 wrote:
 
  now I would like get such effect that callbackListener will be called
 twice
  for the example collection.
 
  once with tom and iterator (or a lazy seq) to lazy evaluated collection
 of
  (32 and 2333)
  and second with anne and iterator for collection of (12 and 55)
 snip
  Note that all must be evaluated lazily.

 Unless the keys in your sequence will always be sorted, I don't
 believe you can do this lazily; in order to generate a sequence of the
 values for a given key in your collection, you would have to traverse
 the entire collection first in order to check each key, i.e. if you
 had:

 [[ tom 32 ] [ anne 12 ] [ anne 55 ] [ tom 2333 ]]

 Then you'd need to get right to the end of the list in order to
 generate the sequence of values for tom.

 One, non-lazy, way to accomplish what you're after would be:

 (use '[clojure.contrib.seq-utils :only (group-by)])

 (def *s* [[tom 32] [tom 2333] [anne 12] [anne 55]])

 (defn callback-listener
  [k v]
  (println (str key:  k , value:  v)))

 (doseq [[k v] (map (fn [[k v]] [k (map second v)])
   (group-by first *s*))]
  (callback-listener k v))

 Note that since you used the name callback-listener, I'm assuming
 that it has side effects, which is why I used doseq.  If callback-
 listener does _not_ have side effects (i.e. if it performs some
 computation upon it's arguments and simply yields a result, then you
 could use:

 (defn callback-fn
  [[k v]]
  (compute-some-value-from-arguments k v))

 (map (comp callback-fn (fn [[k v]] [k (map second v)]))
 (group-by first *s*))

 which would yield a sequence of the result of each call to callback-
 fn2.  Note that while the resulting sequence would, technically, be
 lazy, group-by is not, so you would still consume the entire input
 sequence up front.

 Now, _if_ you can guarantee that your input sequence will always be
 sorted by key (that is, if you know for a fact that all the entries
 for a given key will be consecutive, as in your example: [tom tom anne
 anne], rather than mine above: [tom anne tom anne]), then you could
 use partition-by instead, which would (I believe) let you achieve full
 laziness:

 (use '[clojure.contrib.seq-utils :only (partition-by)])

 (map (comp callback-fn (fn [part] [(ffirst part) (map second part)]))
 (partition-by first *s*))

 Hope this helps,

 -David

 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~--~~~~--~~--~--~---



Re: Michael newbee challange nr 1

2009-11-09 Thread Michael Jaaka
By the first I mean this done by Meikel Brandmeyer. However I will check
later also this one:

(use '[clojure.contrib.seq-utils :only (partition-by)])

(map (comp callback-fn (fn [part] [(ffirst part) (map second part)]))
(partition-by first *s*))

and the other question is if I have (def c [ [ 1 2 ] [ 3 4 ] ]) and want to
get lazily [ 2 4 ] (values of tuplets of a sequence) will be this a correct
(map #(- % fnext)  c ) way?


2009/11/9 Michael Jaaka michael.ja...@googlemail.com

 Keys are always sorted. So once a key stops appearing it won't appear
 again.
 The first solution seems to be the right one, because all values are
 processed in a sequence manner in a lazy way.

 2009/11/8 DTH dth...@gmail.com


 On Nov 8, 12:33 pm, Michael Jaaka michael.ja...@googlemail.com
 wrote:
 
  now I would like get such effect that callbackListener will be called
 twice
  for the example collection.
 
  once with tom and iterator (or a lazy seq) to lazy evaluated
 collection of
  (32 and 2333)
  and second with anne and iterator for collection of (12 and 55)
 snip
  Note that all must be evaluated lazily.

 Unless the keys in your sequence will always be sorted, I don't
 believe you can do this lazily; in order to generate a sequence of the
 values for a given key in your collection, you would have to traverse
 the entire collection first in order to check each key, i.e. if you
 had:

 [[ tom 32 ] [ anne 12 ] [ anne 55 ] [ tom 2333 ]]

 Then you'd need to get right to the end of the list in order to
 generate the sequence of values for tom.

 One, non-lazy, way to accomplish what you're after would be:

 (use '[clojure.contrib.seq-utils :only (group-by)])

 (def *s* [[tom 32] [tom 2333] [anne 12] [anne 55]])

 (defn callback-listener
  [k v]
  (println (str key:  k , value:  v)))

 (doseq [[k v] (map (fn [[k v]] [k (map second v)])
   (group-by first *s*))]
  (callback-listener k v))

 Note that since you used the name callback-listener, I'm assuming
 that it has side effects, which is why I used doseq.  If callback-
 listener does _not_ have side effects (i.e. if it performs some
 computation upon it's arguments and simply yields a result, then you
 could use:

 (defn callback-fn
  [[k v]]
  (compute-some-value-from-arguments k v))

 (map (comp callback-fn (fn [[k v]] [k (map second v)]))
 (group-by first *s*))

 which would yield a sequence of the result of each call to callback-
 fn2.  Note that while the resulting sequence would, technically, be
 lazy, group-by is not, so you would still consume the entire input
 sequence up front.

 Now, _if_ you can guarantee that your input sequence will always be
 sorted by key (that is, if you know for a fact that all the entries
 for a given key will be consecutive, as in your example: [tom tom anne
 anne], rather than mine above: [tom anne tom anne]), then you could
 use partition-by instead, which would (I believe) let you achieve full
 laziness:

 (use '[clojure.contrib.seq-utils :only (partition-by)])

 (map (comp callback-fn (fn [part] [(ffirst part) (map second part)]))
 (partition-by first *s*))

 Hope this helps,

 -David

 



--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~--~~~~--~~--~--~---



Re: Michael newbee challange nr 1

2009-11-09 Thread Meikel Brandmeyer

Hi,

On Nov 9, 3:14 pm, Michael Jaaka michael.ja...@googlemail.com wrote:

 and the other question is if I have (def c [ [ 1 2 ] [ 3 4 ] ]) and want to
 get lazily [ 2 4 ] (values of tuplets of a sequence) will be this a correct
 (map #(- % fnext)  c ) way?

(map second c) is what you want. This is missing from my solution,
btw. The vs should go through exactly this map call: (map second vs).

Sincerely
Meikel

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~--~~~~--~~--~--~---



Re: Michael newbee challange nr 1

2009-11-08 Thread Meikel Brandmeyer
Hi,

Am 08.11.2009 um 13:33 schrieb Michael Jaaka:

 Hi! How would you solve such problem:

 I have a collection of pairs (key, value) -
 [  [ tom 32 ] [ tom 2333 ] [ anne 12 ] [ anne 55 ] ]

 As you can see keys can occur more than once, also that collection  
 is very large so it should be evaluated lazily (for example the  
 collection cames from stream input).

 I have function with two arguments key and sequence of its values,  
 the pseudo code would be:

 callbackListener(string key, iterator values)

 now I would like get such effect that callbackListener will be  
 called twice for the example collection.

 once with tom and iterator (or a lazy seq) to lazy evaluated  
 collection of (32 and 2333)
 and second with anne and iterator for collection of (12 and 55)

 In Java it is easy because the language is imperative. But how about  
 such logic in Clojure?
 Note that all must be evaluated lazily.

Obviously, this cannot be done completely lazy since changing the  
first element is not a stopping time. You'll always need at least one  
lookahead to determine the change of the key and you'll have to  
realize the input to find all values for a key.

(defn grouped-seq
   [input]
   (lazy-seq
 (when-let [s (seq input)]
   (let [k  (ffirst s)
 [vs r] (split-with #(- % first (= k)) s)]
 (cons [k vs] (grouped-seq r))

(map #(apply callbackListener %) (grouped-seq [[tom 32] [tom 2333]  
[anne 12] [anne 55]]))

Note: if you try to avoid realizing all values of a given key, you'll  
need two sequences: one to keep track of the key and of for the  
values. While callbackListener realises the values, the key sequence  
will retain the head of the input causing all values to stay in  
memory. If you want really laziness, you'll have to traverse the input  
twice over independent input streams. So if you have a file containing  
the data, you'll have to read the file twice from independent streams  
to get full laziness.

Sincerely
Meikel



smime.p7s
Description: S/MIME cryptographic signature


Re: Michael newbee challange nr 1

2009-11-08 Thread John Harrop
On Sun, Nov 8, 2009 at 7:33 AM, Michael Jaaka
michael.ja...@googlemail.comwrote:

 Hi! How would you solve such problem:

 I have a collection of pairs (key, value) -
 [  [ tom 32 ] [ tom 2333 ] [ anne 12 ] [ anne 55 ] ]

 As you can see keys can occur more than once, also that collection is very
 large so it should be evaluated lazily (for example the collection cames
 from stream input).

 I have function with two arguments key and sequence of its values, the
 pseudo code would be:

 callbackListener(string key, iterator values)

 now I would like get such effect that callbackListener will be called twice
 for the example collection.

 once with tom and iterator (or a lazy seq) to lazy evaluated collection
 of (32 and 2333)
 and second with anne and iterator for collection of (12 and 55)


Something related to

(let [ks (distinct (map first s))
  vals-for-key (fn [k] (map second (filter #(= (first %) k) s)))]
  (doseq [k ks] (callback k (vals-for-key k

or to collect and return the return values of the callbacks, use for
instead of doseq.

The problem is that this will realize s and hold onto its head, as Meikel
points out. To make it really lazy you need something more like

(defn do-it [callback producer]
  (let [ks (fn [p] (distinct (map first (p
vals-for-key (fn [k p] (map second (filter #(= (first %) k) (p]
(doseq [k (ks p)] (callback k (vals-for-key k p

and call this with two arguments, the first the callback that will take a
key and a lazy seq of the associated values, and the second a function that
will produce anew a lazy sequence of key-value pairs from your disk file
each time it's called, with the property that the lazy-seq clause that
returns nil when the sequence is exhausted also has the side effect of
closing the input stream so as not to leak file handles.

This will grovel over the file N+1 times where N is the number of distinct
keys, though, and will have in general two input streams open on the file at
the same time, one of them crawling along discovering more distinct keys,
and the other finding all the values for one key, then for the next, etc.

It will also build up a hashset of all of the keys over time, hidden inside
the distinct function's returned seq. So, if the summed lengths of the
distinct keys is going to be big enough to blow the heap all on its own,
you've got real problems and you'll probably need to use a scratch disk
file. In that case I'd recommend a more imperative style with a loop/recur
and lazy-seq, where the outer loop starts with a copy of the input file,
runs the inner loop, and then deletes its input and reruns itself with the
file generated by the inner loop (hence starting with a *copy* of the input
file), and the inner loop reads the first key from the current input file,
creates an empty temp file to be the current output file, then calls the
callback with a lazy seq object whose realization actually advances through
the input file, copying entries to the output file that are not matching the
current key and producing lazy seq elements for entries that do match.

The idea is that if your input file foo.dat reads [[x 1] [y 2] [x 3] [z 4]
[y 5]] it will make a copy foo1.tmp, then call the callback with x and a
lazy seq whose realization emits (1 3) while also generating foo2.tmp = [[y
2] [z 4] [y 5]]; then the outer loop deletes foo1.tmp and repeats on
foo2.tmp and the callback is called with y and a lazy seq whose realization
emits (2 5) while generating foo3.tmp = [[z 4]]; and then the outer loop
deletes foo2.tmp and the callback is called with z and a lazy seq (4) whose
realization creates an empty foo4.tmp; then the outer loop deletes foo3.tmp,
sees foo4.tmp is empty, and deletes foo4.tmp and terminates.

If you want the callback's return values, you'll need to make the outer loop
another lazy-seq.

I'm a bit uncomfortable with lazy-seq bodies that have side effects.

You might want to consider if you need to be using a RDBMS instead of just
disk files for whatever it is you're doing.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~--~~~~--~~--~--~---