Re: generalize distinct

2010-02-23 Thread Eugen Dück
I agree with you, Michal.

But let me rephrase the question, maybe my initial long-winded post
wasn't clear enough on that.

Rather than having a separate fn 'distinct-by' in addition to the
existing 'distinct', which, apart from the hard-coded keyfn would be
EXACTLY the same, shouldn't we just generalize distinct to default to
hard-coded if no keyfn is specified? (Btw I'm not considering my other
suggestion to allow the set to be passed in here, which I personally
like, but also understand can be seen as unorthogonal)

And this is not only true for distinct-by, it's true for some of the
other * / *-by pairs. It just seems this should be generalized in
order not to duplicate code for these cases. If we copy-and-paste
code, what's the justification? I'd say orthogonality is an argument
against copy-and-paste. Do we copy and paste rather than generalize in
order to have distinct a little bit faster due to it being hard-coded?

Eugen

On Feb 23, 5:13 am, Michał Marczyk michal.marc...@gmail.com wrote:
 On 22 February 2010 20:28, Sean Devlin francoisdev...@gmail.com wrote:

  Then is the seq (1 :a a) guaranteed?  How do I know that I won't get
  (2 :b b), (1 :b c), etc?  What if I want a specific combination
  instead?  I've had to actually code this specific problem, and I found
  that using group-by  some secondary mapping operation was the only
  thing that gave me the flexibility I needed (manufacturing is fun!).

 The ordering guarantees distinct-by makes are exactly those that
 distinct makes, because it uses the same code (as mentioned
 previously, I lifted it all from clojure.core, then tweaked to take
 the keyfn / eqfn into account). Basically this means that if your
 collection has an intrinsic ordering, it will be preserved (the result
 will include, for each equivalence class of items from the sequence
 modulo the user-defined equivalence relation, the one earliest w.r.t.
 that ordering). If it's a hash-map or a hash-set instead, you'll get
 whatever ordering (seq coll) happens to produce.

 As for group-by giving you more flexibility -- well, it gives you a
 lot of flexibility where it's appropriate to use it, but because of
 its choice of data structure for the result, you can't use it to
 reimplement distinct-by directly:

 user= (group-by class [1 2 3 :a :b :c 'a 'b 'c])
 java.lang.ClassCastException: java.lang.Class cannot be cast to
 java.lang.Comparable (NO_SOURCE_FILE:0)

 So no way to use non-Comparables as keys...

 And then there's the fact that you can't tell in which order the keys
 discovered by group-by appeared in the original collection, which is
 again because of its use of sorted-map, which has the consequence that
 order is being mangled on purpose! E.g.:

 user= (seq (group-by #(- %) [1 2 3 4 5]))
 ([-5 [5]] [-4 [4]] [-3 [3]] [-2 [2]] [-1 [1]])

 In other words: (seq (group-by f coll)) has an ordering possibly
 completely unrelated to that of coll (so you'd have to make a separate
 traversal through the coll to discover the original ordering of the
 keys), whereas (distinct-by f coll), for either version of
 distinct-by, preserves the ordering of coll. That's a desirable
 property for when that's what you want to do, whereas group-by will, I
 suppose, be more useful on other occasions. ;-)

 To sum it up, (1) distinct-by actually behaves in a very predictable
 way (which may or may not be useful for any particular purpose), (2)
 it cannot be implemented directly in terms of group-by. I'd say it's
 pretty orthogonal to the existing library functions (that I know of)
 actually...

 Sincerely,
 Michał

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: generalize distinct

2010-02-23 Thread Michał Marczyk
On 22 February 2010 20:28, Sean Devlin francoisdev...@gmail.com wrote:
 I've had to actually code this specific problem, and I found
 that using group-by  some secondary mapping operation was the only
 thing that gave me the flexibility I needed (manufacturing is fun!).

Incidentally, have you ever benefitted from group-by using a
sorted-map, as opposed to, say, a hash-map? I'm asking because I'm a
bit puzzled by the reasoning behind using this particular structure
with the resulting ordering only vaguely related (i.e. in a manner
which might be difficult to predict) to that of the input collection
anyway...

I haven't got a strong opinion on this, by the way, it's just that if
there isn't some huge benefit to sorted-map which I can't yet see,
perhaps a hash-map would make for a more versatile function?

Sincerely,
Michał

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: generalize distinct

2010-02-23 Thread Michał Marczyk
On 23 February 2010 00:17, Eugen Dück eu...@dueck.org wrote:
 Rather than having a separate fn 'distinct-by' in addition to the
 existing 'distinct', which, apart from the hard-coded keyfn would be
 EXACTLY the same, shouldn't we just generalize distinct to default to
 hard-coded if no keyfn is specified? (Btw I'm not considering my other
 suggestion to allow the set to be passed in here, which I personally
 like, but also understand can be seen as unorthogonal)

 And this is not only true for distinct-by, it's true for some of the
 other * / *-by pairs. It just seems this should be generalized in
 order not to duplicate code for these cases. If we copy-and-paste
 code, what's the justification? I'd say orthogonality is an argument
 against copy-and-paste. Do we copy and paste rather than generalize in
 order to have distinct a little bit faster due to it being hard-coded?

That seems like a very good point to me! Especially if
clojure.core/identity could be inlined by the compiler (which I
thought it is, although now that I look the inlining-related entries
are missing from (meta #'identity)), both distinct and sort could be
arity overloaded to accept an optional keyfn or not (sort would
basically have to be replaced with sort-by + an added unary variant
with defaults for both keyfn and comparator).

Then perhaps we could reuse the *-by names for Haskell-style binary
predicate-based versions. Haskell's sortBy, groupBy, nubBy all take
binary functions to decide ordering / equality of pairs of elements
from the input sequence... That's enables some neat code sometimes,
cf. the primes on liner. How about that?

Sincerely,
Michał

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: generalize distinct

2010-02-23 Thread Sean Devlin
Take a look at sort/sort-by in core.  That would be starting point for
fn style.

On Feb 23, 12:44 pm, Michał Marczyk michal.marc...@gmail.com wrote:
 On 23 February 2010 00:17, Eugen Dück eu...@dueck.org wrote:

  Rather than having a separate fn 'distinct-by' in addition to the
  existing 'distinct', which, apart from the hard-coded keyfn would be
  EXACTLY the same, shouldn't we just generalize distinct to default to
  hard-coded if no keyfn is specified? (Btw I'm not considering my other
  suggestion to allow the set to be passed in here, which I personally
  like, but also understand can be seen as unorthogonal)

  And this is not only true for distinct-by, it's true for some of the
  other * / *-by pairs. It just seems this should be generalized in
  order not to duplicate code for these cases. If we copy-and-paste
  code, what's the justification? I'd say orthogonality is an argument
  against copy-and-paste. Do we copy and paste rather than generalize in
  order to have distinct a little bit faster due to it being hard-coded?

 That seems like a very good point to me! Especially if
 clojure.core/identity could be inlined by the compiler (which I
 thought it is, although now that I look the inlining-related entries
 are missing from (meta #'identity)), both distinct and sort could be
 arity overloaded to accept an optional keyfn or not (sort would
 basically have to be replaced with sort-by + an added unary variant
 with defaults for both keyfn and comparator).

 Then perhaps we could reuse the *-by names for Haskell-style binary
 predicate-based versions. Haskell's sortBy, groupBy, nubBy all take
 binary functions to decide ordering / equality of pairs of elements
 from the input sequence... That's enables some neat code sometimes,
 cf. the primes on liner. How about that?

 Sincerely,
 Michał

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: generalize distinct

2010-02-23 Thread Sean Devlin
I guess I was getting at how to do the airity overloading.  My first
instinct when developing is do something like this

(def distinct
  ([coll] (distinct identity coll))
  ([f coll] (lot-of-code)))

However, if you look around core, you'll see that Rich repeats
himself a lot.  This really obvious w/ fns like juxt  partial.  It
turns out that this is results in faster code.  So I'd implement
distinct like this.

(def distinct
  ([coll] (lots-of-code))
  ([f coll] (lot-of-similar-code)))

That's all I was saying.

Sean

On Feb 23, 2:40 pm, Michał Marczyk michal.marc...@gmail.com wrote:
 On 23 February 2010 20:26, Sean Devlin francoisdev...@gmail.com wrote:

  Take a look at sort/sort-by in core.  That would be starting point for
  fn style.

 Not sure what you mean by that.

 If anything, having an arity-overloaded distinct (optionally accepting
 a binary predicate to use in place of =) and a separate distinct-by
 (accepting an extra keyfn argument plus optionally the binary
 predicate) would make for a more uniform standard lib.

 (I guess I suggested switching to the Haskell naming convention
 before, but there's no real reason to do that if the same
 functionality is available, plus I'd be confused by this myself,
 having gotten used to the Clojure names already... I retract that
 part.)

 Sincerely,
 Michał

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: generalize distinct

2010-02-23 Thread Michał Marczyk
On 23 February 2010 21:03, Sean Devlin francoisdev...@gmail.com wrote:
 I guess I was getting at how to do the airity overloading.  My first
 instinct when developing is do something like this

 (def distinct
  ([coll] (distinct identity coll))
  ([f coll] (lot-of-code)))

 However, if you look around core, you'll see that Rich repeats
 himself a lot.  This really obvious w/ fns like juxt  partial.  It
 turns out that this is results in faster code.  So I'd implement
 distinct like this.

 (def distinct
  ([coll] (lots-of-code))
  ([f coll] (lot-of-similar-code)))

 That's all I was saying.

 Sean

That's a very valid point! (Which I previously completely failed to grok.)

For a private utility function, I guess I normally prefer DRYer,
perhaps slightly slower code, but for a library function it's a
different matter... Anyway, thanks for the clarification!

How about that sorted-map vs hash-map thing with group-by? I'm
genuinely curious.

All best,
Michał

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: generalize distinct

2010-02-23 Thread Eugen Dück
Agree, it might make sense to do that performance-wise. Although I'm
hoping that the compiler would be able to inline that as well
(optimizing away calls to identity).

Btw, what does :inline-arities do? It sounds like this could do what
we want.

On Feb 24, 5:03 am, Sean Devlin francoisdev...@gmail.com wrote:
 I guess I was getting at how to do the airity overloading.  My first
 instinct when developing is do something like this

 (def distinct
   ([coll] (distinct identity coll))
   ([f coll] (lot-of-code)))

 However, if you look around core, you'll see that Rich repeats
 himself a lot.  This really obvious w/ fns like juxt  partial.  It
 turns out that this is results in faster code.  So I'd implement
 distinct like this.

 (def distinct
   ([coll] (lots-of-code))
   ([f coll] (lot-of-similar-code)))

 That's all I was saying.

 Sean

 On Feb 23, 2:40 pm, Michał Marczyk michal.marc...@gmail.com wrote:

  On 23 February 2010 20:26, Sean Devlin francoisdev...@gmail.com wrote:

   Take a look at sort/sort-by in core.  That would be starting point for
   fn style.

  Not sure what you mean by that.

  If anything, having an arity-overloaded distinct (optionally accepting
  a binary predicate to use in place of =) and a separate distinct-by
  (accepting an extra keyfn argument plus optionally the binary
  predicate) would make for a more uniform standard lib.

  (I guess I suggested switching to the Haskell naming convention
  before, but there's no real reason to do that if the same
  functionality is available, plus I'd be confused by this myself,
  having gotten used to the Clojure names already... I retract that
  part.)

  Sincerely,
  Michał

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: generalize distinct

2010-02-22 Thread Eugen Dueck
Yes, for the reasons stated.

First, and most important, clojure.core/distinct does not let me pass
in a keyfn, it's hard-coded.

Second, and as mentioned that's debatable, distinct could make more of
the filtering functionality it already has available to the caller,
for free.

On Feb 22, 11:53 am, Wilson MacGyver wmacgy...@gmail.com wrote:
 Any reason why you can't use
 distinct?

 http://richhickey.github.com/clojure/clojure.core-api.html#clojure.co...

 On Feb 21, 2010 10:24 AM, Eugen Dueck eu...@dueck.org wrote:

 Hi,

 Clojure is great! The gain in productivity from more low level
 languages like Java, but also more functional languages like Ruby and
 Common LISP etc. amazes me every day. Like how adding a simple map
 in front of the count here:
 (count colls)
 changes the code from counting the number of collections to
 enumerating the number of items in those collections. It's all these
 little things that take one minute or 5 in Java, but only 5 seconds in
 Clojure (in case you are a slow typer) that make you so much more
 efficient. With clojure, I spend most of my time thinking about
 problems at a conceptual level, rather than at the implementation
 level (Java: int[] vs. IterableInteger vs. ListInteger vs.
 Integer[] etc. - arrg!). Thanks Rich!

 That said, every now and then I come across a function in clojure.core
 that could be vastly improved in its range of applicability by just
 adding one or more optional parameters, defaulting to the currently
 hard-coded values (like the often implicit 'identity' function). Take
 'distinct'. I'd like to be able to specify the keyfn, that's an
 important one for me, and while we're at it, I'd like to pass in the
 set that that function builds up incrementally, and normally starts
 with an empty set, so that I can pre-initialize it, say with #{ nil },
 so that it also filters out nils.

 This 2nd point is not that important, and I'm not sure if it is that
 great an idea in terms of orthogonality of the functions, as we have
 filter. But you'd get rid off one level of indirection. distinct
 basically gives us a filter for free.

 But the 1st point I think certainly makes sense, and we have a couple
 of other fns in clojure that have a variant with a keyfn parameter,
 like sort-by etc. I guess they normally get a different name.

 I'm not so sure about what the best order of parameters is in clojure,
 and named parameters would make this a no brainer, but this is what I
 currently use, the first parameter being coll, at least in the variant
 with only one parameter that makes it a drop-in replacement candidate
 for distinct:

 (defn distinkt
  Returns a lazy sequence of the elements of coll with duplicates
 removed
  ([coll] (distinkt coll identity))
  ([coll keyfn] (distinkt coll keyfn #{}))
  ([coll keyfn seen-items]
    (let [step (fn step [xs seen]
                   (lazy-seq
                    ((fn [[f :as xs] seen]
                      (when-let [s (seq xs)]
                        (let [key (keyfn f)]
                              (if (contains? seen key)
                                (recur (rest s) seen)
                                (cons f (step (rest s) (conj seen key)))
                      xs seen)))]
      (step coll seen-items

 I don't mind writing - or better - copy-and-pasting this code and
 keeping it in my project, but I think it could be useful for others,
 so I wouldn't mind at all if this makes it into clojure.core... :)

 Or is the reason for hard coding an (implicit) 'identity' performance?
 Or did I miss some other way to achieve the same goal?

 Eugen

 --
 You received this message because you are subscribed to the Google
 Groups Clojure group.
 To post to this group, send email to clojure@googlegroups.com
 Note that posts from new members are moderated - please be patient with your
 first post.
 To unsubscribe from this group, send email to
 clojure+unsubscr...@googlegroups.comclojure%2bunsubscr...@googlegroups.com
 For more options, visit this group 
 athttp://groups.google.com/group/clojure?hl=en

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: generalize distinct

2010-02-22 Thread Sean Devlin
It sounds to me that you want to use c.c.seq-utils/group-by, not
distinct.

On Feb 22, 7:22 am, Eugen Dueck eu...@dueck.org wrote:
 Yes, for the reasons stated.

 First, and most important, clojure.core/distinct does not let me pass
 in a keyfn, it's hard-coded.

 Second, and as mentioned that's debatable, distinct could make more of
 the filtering functionality it already has available to the caller,
 for free.

 On Feb 22, 11:53 am, Wilson MacGyver wmacgy...@gmail.com wrote:

  Any reason why you can't use
  distinct?

 http://richhickey.github.com/clojure/clojure.core-api.html#clojure.co...

  On Feb 21, 2010 10:24 AM, Eugen Dueck eu...@dueck.org wrote:

  Hi,

  Clojure is great! The gain in productivity from more low level
  languages like Java, but also more functional languages like Ruby and
  Common LISP etc. amazes me every day. Like how adding a simple map
  in front of the count here:
  (count colls)
  changes the code from counting the number of collections to
  enumerating the number of items in those collections. It's all these
  little things that take one minute or 5 in Java, but only 5 seconds in
  Clojure (in case you are a slow typer) that make you so much more
  efficient. With clojure, I spend most of my time thinking about
  problems at a conceptual level, rather than at the implementation
  level (Java: int[] vs. IterableInteger vs. ListInteger vs.
  Integer[] etc. - arrg!). Thanks Rich!

  That said, every now and then I come across a function in clojure.core
  that could be vastly improved in its range of applicability by just
  adding one or more optional parameters, defaulting to the currently
  hard-coded values (like the often implicit 'identity' function). Take
  'distinct'. I'd like to be able to specify the keyfn, that's an
  important one for me, and while we're at it, I'd like to pass in the
  set that that function builds up incrementally, and normally starts
  with an empty set, so that I can pre-initialize it, say with #{ nil },
  so that it also filters out nils.

  This 2nd point is not that important, and I'm not sure if it is that
  great an idea in terms of orthogonality of the functions, as we have
  filter. But you'd get rid off one level of indirection. distinct
  basically gives us a filter for free.

  But the 1st point I think certainly makes sense, and we have a couple
  of other fns in clojure that have a variant with a keyfn parameter,
  like sort-by etc. I guess they normally get a different name.

  I'm not so sure about what the best order of parameters is in clojure,
  and named parameters would make this a no brainer, but this is what I
  currently use, the first parameter being coll, at least in the variant
  with only one parameter that makes it a drop-in replacement candidate
  for distinct:

  (defn distinkt
   Returns a lazy sequence of the elements of coll with duplicates
  removed
   ([coll] (distinkt coll identity))
   ([coll keyfn] (distinkt coll keyfn #{}))
   ([coll keyfn seen-items]
     (let [step (fn step [xs seen]
                    (lazy-seq
                     ((fn [[f :as xs] seen]
                       (when-let [s (seq xs)]
                         (let [key (keyfn f)]
                               (if (contains? seen key)
                                 (recur (rest s) seen)
                                 (cons f (step (rest s) (conj seen key)))
                       xs seen)))]
       (step coll seen-items

  I don't mind writing - or better - copy-and-pasting this code and
  keeping it in my project, but I think it could be useful for others,
  so I wouldn't mind at all if this makes it into clojure.core... :)

  Or is the reason for hard coding an (implicit) 'identity' performance?
  Or did I miss some other way to achieve the same goal?

  Eugen

  --
  You received this message because you are subscribed to the Google
  Groups Clojure group.
  To post to this group, send email to clojure@googlegroups.com
  Note that posts from new members are moderated - please be patient with your
  first post.
  To unsubscribe from this group, send email to
  clojure+unsubscr...@googlegroups.comclojure%2bunsubscr...@googlegroups.com
  For more options, visit this group 
  athttp://groups.google.com/group/clojure?hl=en

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: generalize distinct

2010-02-22 Thread Eugen Dück
group-by does not filter, in contrast to distinct.

And I'm really only interested in one of the values considered equal
according to my keyfn. So if I sed group-by, I'd have to wrap it with
a call to map or similar. This is certainly an option, but I'd rather
not create all the intermediate vectors if I don't need to.

On Feb 22, 11:13 pm, Sean Devlin francoisdev...@gmail.com wrote:
 It sounds to me that you want to use c.c.seq-utils/group-by, not
 distinct.

 On Feb 22, 7:22 am, Eugen Dueck eu...@dueck.org wrote:

  Yes, for the reasons stated.

  First, and most important, clojure.core/distinct does not let me pass
  in a keyfn, it's hard-coded.

  Second, and as mentioned that's debatable, distinct could make more of
  the filtering functionality it already has available to the caller,
  for free.

  On Feb 22, 11:53 am, Wilson MacGyver wmacgy...@gmail.com wrote:

   Any reason why you can't use
   distinct?

  http://richhickey.github.com/clojure/clojure.core-api.html#clojure.co...

   On Feb 21, 2010 10:24 AM, Eugen Dueck eu...@dueck.org wrote:

   Hi,

   Clojure is great! The gain in productivity from more low level
   languages like Java, but also more functional languages like Ruby and
   Common LISP etc. amazes me every day. Like how adding a simple map
   in front of the count here:
   (count colls)
   changes the code from counting the number of collections to
   enumerating the number of items in those collections. It's all these
   little things that take one minute or 5 in Java, but only 5 seconds in
   Clojure (in case you are a slow typer) that make you so much more
   efficient. With clojure, I spend most of my time thinking about
   problems at a conceptual level, rather than at the implementation
   level (Java: int[] vs. IterableInteger vs. ListInteger vs.
   Integer[] etc. - arrg!). Thanks Rich!

   That said, every now and then I come across a function in clojure.core
   that could be vastly improved in its range of applicability by just
   adding one or more optional parameters, defaulting to the currently
   hard-coded values (like the often implicit 'identity' function). Take
   'distinct'. I'd like to be able to specify the keyfn, that's an
   important one for me, and while we're at it, I'd like to pass in the
   set that that function builds up incrementally, and normally starts
   with an empty set, so that I can pre-initialize it, say with #{ nil },
   so that it also filters out nils.

   This 2nd point is not that important, and I'm not sure if it is that
   great an idea in terms of orthogonality of the functions, as we have
   filter. But you'd get rid off one level of indirection. distinct
   basically gives us a filter for free.

   But the 1st point I think certainly makes sense, and we have a couple
   of other fns in clojure that have a variant with a keyfn parameter,
   like sort-by etc. I guess they normally get a different name.

   I'm not so sure about what the best order of parameters is in clojure,
   and named parameters would make this a no brainer, but this is what I
   currently use, the first parameter being coll, at least in the variant
   with only one parameter that makes it a drop-in replacement candidate
   for distinct:

   (defn distinkt
    Returns a lazy sequence of the elements of coll with duplicates
   removed
    ([coll] (distinkt coll identity))
    ([coll keyfn] (distinkt coll keyfn #{}))
    ([coll keyfn seen-items]
      (let [step (fn step [xs seen]
                     (lazy-seq
                      ((fn [[f :as xs] seen]
                        (when-let [s (seq xs)]
                          (let [key (keyfn f)]
                                (if (contains? seen key)
                                  (recur (rest s) seen)
                                  (cons f (step (rest s) (conj seen 
   key)))
                        xs seen)))]
        (step coll seen-items

   I don't mind writing - or better - copy-and-pasting this code and
   keeping it in my project, but I think it could be useful for others,
   so I wouldn't mind at all if this makes it into clojure.core... :)

   Or is the reason for hard coding an (implicit) 'identity' performance?
   Or did I miss some other way to achieve the same goal?

   Eugen

   --
   You received this message because you are subscribed to the Google
   Groups Clojure group.
   To post to this group, send email to clojure@googlegroups.com
   Note that posts from new members are moderated - please be patient with 
   your
   first post.
   To unsubscribe from this group, send email to
   clojure+unsubscr...@googlegroups.comclojure%2bunsubscr...@googlegroups.com
   For more options, visit this group 
   athttp://groups.google.com/group/clojure?hl=en

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first 

Re: generalize distinct

2010-02-22 Thread Michał Marczyk
The Haskell approach is to have two functions, nub and nubBy, which
perform the task of distinct and the task of distinct with a
user-provided keyfn, respectively. I'd love to see a distinct-by in
core or contrib. (Mostly for completeness, it's easy enough to
write...)

As for the extra set argument, I'd rather leave it out. Use a
composition of distinct-by and filter.

Sincerely,
Michał

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: generalize distinct

2010-02-22 Thread Sean Devlin
What does distinct-by return in case of a collision?

(keys (group-by...))

(map first (vals (group-by ...)))

(map other-fn (vals (group-by ...)))

You're still better off w/ group-by, and manipulating the resulting
map appropriately.

Sean

On Feb 22, 1:15 pm, Michał Marczyk michal.marc...@gmail.com wrote:
 The Haskell approach is to have two functions, nub and nubBy, which
 perform the task of distinct and the task of distinct with a
 user-provided keyfn, respectively. I'd love to see a distinct-by in
 core or contrib. (Mostly for completeness, it's easy enough to
 write...)

 As for the extra set argument, I'd rather leave it out. Use a
 composition of distinct-by and filter.

 Sincerely,
 Michał

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: generalize distinct

2010-02-22 Thread Michał Marczyk
On 22 February 2010 19:25, Sean Devlin francoisdev...@gmail.com wrote:
 What does distinct-by return in case of a collision?

I'm not sure what you mean by that. distinct-by would do precisely
what distinct does, only instead of comparing the items of the
argument collection themselves, it would compare values of (keyfn
item). The return value ought to be a lazy-seq, in keeping with that
of distinct.

Sincerely,
Michał

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: generalize distinct

2010-02-22 Thread Michał Marczyk
PS. The code for both functions is lifted straight from clojure.core's distinct.

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: generalize distinct

2010-02-22 Thread Sean Devlin
Generally when you have a *-by fn, you return the a sequence of
original values, not the mapped values (e.g. sort-by)

However, in the case you suggest, you really should just be using map
 distinct, and not creating your own fn.  Using the built ins as much
as possible is more idiomatic Clojure.

(distinct (map f coll))

Sean

On Feb 22, 1:39 pm, Michał Marczyk michal.marc...@gmail.com wrote:
 On 22 February 2010 19:25, Sean Devlin francoisdev...@gmail.com wrote:

  What does distinct-by return in case of a collision?

 I'm not sure what you mean by that. distinct-by would do precisely
 what distinct does, only instead of comparing the items of the
 argument collection themselves, it would compare values of (keyfn
 item). The return value ought to be a lazy-seq, in keeping with that
 of distinct.

 Sincerely,
 Michał

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: generalize distinct

2010-02-22 Thread Michał Marczyk
On 22 February 2010 19:59, Sean Devlin francoisdev...@gmail.com wrote:
 Generally when you have a *-by fn, you return the a sequence of
 original values, not the mapped values (e.g. sort-by)

Which is precisely what both versions of distinct-by are doing. They
return a sequence of values taken from the original collection,
although determining which values those are involves calling a
user-supplied function and performing a bunch of comparisons of the
value thus obtained to values obtained from calls to said function at
previous steps.

E.g. (I'm repeating an example; this uses the second version of distinct-by):

(distinct-by class [1 2 3 :a :b :c 'a 'b 'c])
; = (1 :a a)

whereas

(distinct (map class [1 2 3 :a :b :c 'a 'b 'c])
; = (java.lang.Integer clojure.lang.Keyword clojure.lang.Symbol)

The primes one-liner using the nubBy-like version returns a list of
numbers, whereas the mapped values are Booleans.

Sincerely,
Michał

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: generalize distinct

2010-02-22 Thread Sean Devlin
Okay, your longer post makes more sense.

Then is the seq (1 :a a) guaranteed?  How do I know that I won't get
(2 :b b), (1 :b c), etc?  What if I want a specific combination
instead?  I've had to actually code this specific problem, and I found
that using group-by  some secondary mapping operation was the only
thing that gave me the flexibility I needed (manufacturing is fun!).

Sean

On Feb 22, 2:19 pm, Michał Marczyk michal.marc...@gmail.com wrote:
 On 22 February 2010 19:59, Sean Devlin francoisdev...@gmail.com wrote:

  Generally when you have a *-by fn, you return the a sequence of
  original values, not the mapped values (e.g. sort-by)

 Which is precisely what both versions of distinct-by are doing. They
 return a sequence of values taken from the original collection,
 although determining which values those are involves calling a
 user-supplied function and performing a bunch of comparisons of the
 value thus obtained to values obtained from calls to said function at
 previous steps.

 E.g. (I'm repeating an example; this uses the second version of distinct-by):

 (distinct-by class [1 2 3 :a :b :c 'a 'b 'c])
 ; = (1 :a a)

 whereas

 (distinct (map class [1 2 3 :a :b :c 'a 'b 'c])
 ; = (java.lang.Integer clojure.lang.Keyword clojure.lang.Symbol)

 The primes one-liner using the nubBy-like version returns a list of
 numbers, whereas the mapped values are Booleans.

 Sincerely,
 Michał

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: generalize distinct

2010-02-22 Thread Michał Marczyk
On 22 February 2010 20:28, Sean Devlin francoisdev...@gmail.com wrote:
 Then is the seq (1 :a a) guaranteed?  How do I know that I won't get
 (2 :b b), (1 :b c), etc?  What if I want a specific combination
 instead?  I've had to actually code this specific problem, and I found
 that using group-by  some secondary mapping operation was the only
 thing that gave me the flexibility I needed (manufacturing is fun!).

The ordering guarantees distinct-by makes are exactly those that
distinct makes, because it uses the same code (as mentioned
previously, I lifted it all from clojure.core, then tweaked to take
the keyfn / eqfn into account). Basically this means that if your
collection has an intrinsic ordering, it will be preserved (the result
will include, for each equivalence class of items from the sequence
modulo the user-defined equivalence relation, the one earliest w.r.t.
that ordering). If it's a hash-map or a hash-set instead, you'll get
whatever ordering (seq coll) happens to produce.

As for group-by giving you more flexibility -- well, it gives you a
lot of flexibility where it's appropriate to use it, but because of
its choice of data structure for the result, you can't use it to
reimplement distinct-by directly:

user= (group-by class [1 2 3 :a :b :c 'a 'b 'c])
java.lang.ClassCastException: java.lang.Class cannot be cast to
java.lang.Comparable (NO_SOURCE_FILE:0)

So no way to use non-Comparables as keys...

And then there's the fact that you can't tell in which order the keys
discovered by group-by appeared in the original collection, which is
again because of its use of sorted-map, which has the consequence that
order is being mangled on purpose! E.g.:

user= (seq (group-by #(- %) [1 2 3 4 5]))
([-5 [5]] [-4 [4]] [-3 [3]] [-2 [2]] [-1 [1]])

In other words: (seq (group-by f coll)) has an ordering possibly
completely unrelated to that of coll (so you'd have to make a separate
traversal through the coll to discover the original ordering of the
keys), whereas (distinct-by f coll), for either version of
distinct-by, preserves the ordering of coll. That's a desirable
property for when that's what you want to do, whereas group-by will, I
suppose, be more useful on other occasions. ;-)

To sum it up, (1) distinct-by actually behaves in a very predictable
way (which may or may not be useful for any particular purpose), (2)
it cannot be implemented directly in terms of group-by. I'd say it's
pretty orthogonal to the existing library functions (that I know of)
actually...

Sincerely,
Michał

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: generalize distinct

2010-02-22 Thread Sean Devlin
Excellent points.

+1 for orthogonality.

Sean

On Feb 22, 3:13 pm, Michał Marczyk michal.marc...@gmail.com wrote:
 On 22 February 2010 20:28, Sean Devlin francoisdev...@gmail.com wrote:

  Then is the seq (1 :a a) guaranteed?  How do I know that I won't get
  (2 :b b), (1 :b c), etc?  What if I want a specific combination
  instead?  I've had to actually code this specific problem, and I found
  that using group-by  some secondary mapping operation was the only
  thing that gave me the flexibility I needed (manufacturing is fun!).

 The ordering guarantees distinct-by makes are exactly those that
 distinct makes, because it uses the same code (as mentioned
 previously, I lifted it all from clojure.core, then tweaked to take
 the keyfn / eqfn into account). Basically this means that if your
 collection has an intrinsic ordering, it will be preserved (the result
 will include, for each equivalence class of items from the sequence
 modulo the user-defined equivalence relation, the one earliest w.r.t.
 that ordering). If it's a hash-map or a hash-set instead, you'll get
 whatever ordering (seq coll) happens to produce.

 As for group-by giving you more flexibility -- well, it gives you a
 lot of flexibility where it's appropriate to use it, but because of
 its choice of data structure for the result, you can't use it to
 reimplement distinct-by directly:

 user= (group-by class [1 2 3 :a :b :c 'a 'b 'c])
 java.lang.ClassCastException: java.lang.Class cannot be cast to
 java.lang.Comparable (NO_SOURCE_FILE:0)

 So no way to use non-Comparables as keys...

 And then there's the fact that you can't tell in which order the keys
 discovered by group-by appeared in the original collection, which is
 again because of its use of sorted-map, which has the consequence that
 order is being mangled on purpose! E.g.:

 user= (seq (group-by #(- %) [1 2 3 4 5]))
 ([-5 [5]] [-4 [4]] [-3 [3]] [-2 [2]] [-1 [1]])

 In other words: (seq (group-by f coll)) has an ordering possibly
 completely unrelated to that of coll (so you'd have to make a separate
 traversal through the coll to discover the original ordering of the
 keys), whereas (distinct-by f coll), for either version of
 distinct-by, preserves the ordering of coll. That's a desirable
 property for when that's what you want to do, whereas group-by will, I
 suppose, be more useful on other occasions. ;-)

 To sum it up, (1) distinct-by actually behaves in a very predictable
 way (which may or may not be useful for any particular purpose), (2)
 it cannot be implemented directly in terms of group-by. I'd say it's
 pretty orthogonal to the existing library functions (that I know of)
 actually...

 Sincerely,
 Michał

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


generalize distinct

2010-02-21 Thread Eugen Dueck
Hi,

Clojure is great! The gain in productivity from more low level
languages like Java, but also more functional languages like Ruby and
Common LISP etc. amazes me every day. Like how adding a simple map
in front of the count here:
(count colls)
changes the code from counting the number of collections to
enumerating the number of items in those collections. It's all these
little things that take one minute or 5 in Java, but only 5 seconds in
Clojure (in case you are a slow typer) that make you so much more
efficient. With clojure, I spend most of my time thinking about
problems at a conceptual level, rather than at the implementation
level (Java: int[] vs. IterableInteger vs. ListInteger vs.
Integer[] etc. - arrg!). Thanks Rich!

That said, every now and then I come across a function in clojure.core
that could be vastly improved in its range of applicability by just
adding one or more optional parameters, defaulting to the currently
hard-coded values (like the often implicit 'identity' function). Take
'distinct'. I'd like to be able to specify the keyfn, that's an
important one for me, and while we're at it, I'd like to pass in the
set that that function builds up incrementally, and normally starts
with an empty set, so that I can pre-initialize it, say with #{ nil },
so that it also filters out nils.

This 2nd point is not that important, and I'm not sure if it is that
great an idea in terms of orthogonality of the functions, as we have
filter. But you'd get rid off one level of indirection. distinct
basically gives us a filter for free.

But the 1st point I think certainly makes sense, and we have a couple
of other fns in clojure that have a variant with a keyfn parameter,
like sort-by etc. I guess they normally get a different name.

I'm not so sure about what the best order of parameters is in clojure,
and named parameters would make this a no brainer, but this is what I
currently use, the first parameter being coll, at least in the variant
with only one parameter that makes it a drop-in replacement candidate
for distinct:

(defn distinkt
  Returns a lazy sequence of the elements of coll with duplicates
removed
  ([coll] (distinkt coll identity))
  ([coll keyfn] (distinkt coll keyfn #{}))
  ([coll keyfn seen-items]
(let [step (fn step [xs seen]
   (lazy-seq
((fn [[f :as xs] seen]
  (when-let [s (seq xs)]
(let [key (keyfn f)]
  (if (contains? seen key)
(recur (rest s) seen)
(cons f (step (rest s) (conj seen key)))
  xs seen)))]
  (step coll seen-items

I don't mind writing - or better - copy-and-pasting this code and
keeping it in my project, but I think it could be useful for others,
so I wouldn't mind at all if this makes it into clojure.core... :)

Or is the reason for hard coding an (implicit) 'identity' performance?
Or did I miss some other way to achieve the same goal?

Eugen

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: generalize distinct

2010-02-21 Thread Wilson MacGyver
Any reason why you can't use
distinct?

http://richhickey.github.com/clojure/clojure.core-api.html#clojure.core/distinct

On Feb 21, 2010 10:24 AM, Eugen Dueck eu...@dueck.org wrote:

Hi,

Clojure is great! The gain in productivity from more low level
languages like Java, but also more functional languages like Ruby and
Common LISP etc. amazes me every day. Like how adding a simple map
in front of the count here:
(count colls)
changes the code from counting the number of collections to
enumerating the number of items in those collections. It's all these
little things that take one minute or 5 in Java, but only 5 seconds in
Clojure (in case you are a slow typer) that make you so much more
efficient. With clojure, I spend most of my time thinking about
problems at a conceptual level, rather than at the implementation
level (Java: int[] vs. IterableInteger vs. ListInteger vs.
Integer[] etc. - arrg!). Thanks Rich!

That said, every now and then I come across a function in clojure.core
that could be vastly improved in its range of applicability by just
adding one or more optional parameters, defaulting to the currently
hard-coded values (like the often implicit 'identity' function). Take
'distinct'. I'd like to be able to specify the keyfn, that's an
important one for me, and while we're at it, I'd like to pass in the
set that that function builds up incrementally, and normally starts
with an empty set, so that I can pre-initialize it, say with #{ nil },
so that it also filters out nils.

This 2nd point is not that important, and I'm not sure if it is that
great an idea in terms of orthogonality of the functions, as we have
filter. But you'd get rid off one level of indirection. distinct
basically gives us a filter for free.

But the 1st point I think certainly makes sense, and we have a couple
of other fns in clojure that have a variant with a keyfn parameter,
like sort-by etc. I guess they normally get a different name.

I'm not so sure about what the best order of parameters is in clojure,
and named parameters would make this a no brainer, but this is what I
currently use, the first parameter being coll, at least in the variant
with only one parameter that makes it a drop-in replacement candidate
for distinct:

(defn distinkt
 Returns a lazy sequence of the elements of coll with duplicates
removed
 ([coll] (distinkt coll identity))
 ([coll keyfn] (distinkt coll keyfn #{}))
 ([coll keyfn seen-items]
   (let [step (fn step [xs seen]
  (lazy-seq
   ((fn [[f :as xs] seen]
 (when-let [s (seq xs)]
   (let [key (keyfn f)]
 (if (contains? seen key)
   (recur (rest s) seen)
   (cons f (step (rest s) (conj seen key)))
 xs seen)))]
 (step coll seen-items

I don't mind writing - or better - copy-and-pasting this code and
keeping it in my project, but I think it could be useful for others,
so I wouldn't mind at all if this makes it into clojure.core... :)

Or is the reason for hard coding an (implicit) 'identity' performance?
Or did I miss some other way to achieve the same goal?

Eugen

--
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.comclojure%2bunsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: generalize distinct

2010-02-21 Thread Wilson MacGyver
let me expand on my comment. What I meant is, the type of functionality you
are adding into distinct, makes it feel less distinct. to me. If I
read what you are saying
correctly, you want to be also be able to pass a set, so in
addition to filter duplicate elements, it also will filter out
anything in the set.

to me that doesn't feel clojureish. I've always felt that clojure code is about
a lot of small functions that does one thing, but does the one thing really
well, so you can compose the functions together to do what you want.

adding what feels like an unrelated capability to a core function seems
odd to me. Just my 2 cents.

On Sun, Feb 21, 2010 at 7:25 AM, Eugen Dueck eu...@dueck.org wrote:
 Hi,

 Clojure is great! The gain in productivity from more low level
 languages like Java, but also more functional languages like Ruby and
 Common LISP etc. amazes me every day. Like how adding a simple map
 in front of the count here:
 (count colls)
 changes the code from counting the number of collections to
 enumerating the number of items in those collections. It's all these
 little things that take one minute or 5 in Java, but only 5 seconds in
 Clojure (in case you are a slow typer) that make you so much more
 efficient. With clojure, I spend most of my time thinking about
 problems at a conceptual level, rather than at the implementation
 level (Java: int[] vs. IterableInteger vs. ListInteger vs.
 Integer[] etc. - arrg!). Thanks Rich!

 That said, every now and then I come across a function in clojure.core
 that could be vastly improved in its range of applicability by just
 adding one or more optional parameters, defaulting to the currently
 hard-coded values (like the often implicit 'identity' function). Take
 'distinct'. I'd like to be able to specify the keyfn, that's an
 important one for me, and while we're at it, I'd like to pass in the
 set that that function builds up incrementally, and normally starts
 with an empty set, so that I can pre-initialize it, say with #{ nil },
 so that it also filters out nils.

 This 2nd point is not that important, and I'm not sure if it is that
 great an idea in terms of orthogonality of the functions, as we have
 filter. But you'd get rid off one level of indirection. distinct
 basically gives us a filter for free.

 But the 1st point I think certainly makes sense, and we have a couple
 of other fns in clojure that have a variant with a keyfn parameter,
 like sort-by etc. I guess they normally get a different name.

 I'm not so sure about what the best order of parameters is in clojure,
 and named parameters would make this a no brainer, but this is what I
 currently use, the first parameter being coll, at least in the variant
 with only one parameter that makes it a drop-in replacement candidate
 for distinct:

 (defn distinkt
  Returns a lazy sequence of the elements of coll with duplicates
 removed
  ([coll] (distinkt coll identity))
  ([coll keyfn] (distinkt coll keyfn #{}))
  ([coll keyfn seen-items]
    (let [step (fn step [xs seen]
                   (lazy-seq
                    ((fn [[f :as xs] seen]
                      (when-let [s (seq xs)]
                        (let [key (keyfn f)]
                              (if (contains? seen key)
                                (recur (rest s) seen)
                                (cons f (step (rest s) (conj seen key)))
                      xs seen)))]
      (step coll seen-items

 I don't mind writing - or better - copy-and-pasting this code and
 keeping it in my project, but I think it could be useful for others,
 so I wouldn't mind at all if this makes it into clojure.core... :)

 Or is the reason for hard coding an (implicit) 'identity' performance?
 Or did I miss some other way to achieve the same goal?

 Eugen

 --
 You received this message because you are subscribed to the Google
 Groups Clojure group.
 To post to this group, send email to clojure@googlegroups.com
 Note that posts from new members are moderated - please be patient with your 
 first post.
 To unsubscribe from this group, send email to
 clojure+unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/clojure?hl=en



-- 
Omnem crede diem tibi diluxisse supremum.

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en