Re: generalize distinct
I agree with you, Michal. But let me rephrase the question, maybe my initial long-winded post wasn't clear enough on that. Rather than having a separate fn 'distinct-by' in addition to the existing 'distinct', which, apart from the hard-coded keyfn would be EXACTLY the same, shouldn't we just generalize distinct to default to hard-coded if no keyfn is specified? (Btw I'm not considering my other suggestion to allow the set to be passed in here, which I personally like, but also understand can be seen as unorthogonal) And this is not only true for distinct-by, it's true for some of the other * / *-by pairs. It just seems this should be generalized in order not to duplicate code for these cases. If we copy-and-paste code, what's the justification? I'd say orthogonality is an argument against copy-and-paste. Do we copy and paste rather than generalize in order to have distinct a little bit faster due to it being hard-coded? Eugen On Feb 23, 5:13 am, Michał Marczyk michal.marc...@gmail.com wrote: On 22 February 2010 20:28, Sean Devlin francoisdev...@gmail.com wrote: Then is the seq (1 :a a) guaranteed? How do I know that I won't get (2 :b b), (1 :b c), etc? What if I want a specific combination instead? I've had to actually code this specific problem, and I found that using group-by some secondary mapping operation was the only thing that gave me the flexibility I needed (manufacturing is fun!). The ordering guarantees distinct-by makes are exactly those that distinct makes, because it uses the same code (as mentioned previously, I lifted it all from clojure.core, then tweaked to take the keyfn / eqfn into account). Basically this means that if your collection has an intrinsic ordering, it will be preserved (the result will include, for each equivalence class of items from the sequence modulo the user-defined equivalence relation, the one earliest w.r.t. that ordering). If it's a hash-map or a hash-set instead, you'll get whatever ordering (seq coll) happens to produce. As for group-by giving you more flexibility -- well, it gives you a lot of flexibility where it's appropriate to use it, but because of its choice of data structure for the result, you can't use it to reimplement distinct-by directly: user= (group-by class [1 2 3 :a :b :c 'a 'b 'c]) java.lang.ClassCastException: java.lang.Class cannot be cast to java.lang.Comparable (NO_SOURCE_FILE:0) So no way to use non-Comparables as keys... And then there's the fact that you can't tell in which order the keys discovered by group-by appeared in the original collection, which is again because of its use of sorted-map, which has the consequence that order is being mangled on purpose! E.g.: user= (seq (group-by #(- %) [1 2 3 4 5])) ([-5 [5]] [-4 [4]] [-3 [3]] [-2 [2]] [-1 [1]]) In other words: (seq (group-by f coll)) has an ordering possibly completely unrelated to that of coll (so you'd have to make a separate traversal through the coll to discover the original ordering of the keys), whereas (distinct-by f coll), for either version of distinct-by, preserves the ordering of coll. That's a desirable property for when that's what you want to do, whereas group-by will, I suppose, be more useful on other occasions. ;-) To sum it up, (1) distinct-by actually behaves in a very predictable way (which may or may not be useful for any particular purpose), (2) it cannot be implemented directly in terms of group-by. I'd say it's pretty orthogonal to the existing library functions (that I know of) actually... Sincerely, Michał -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: generalize distinct
On 22 February 2010 20:28, Sean Devlin francoisdev...@gmail.com wrote: I've had to actually code this specific problem, and I found that using group-by some secondary mapping operation was the only thing that gave me the flexibility I needed (manufacturing is fun!). Incidentally, have you ever benefitted from group-by using a sorted-map, as opposed to, say, a hash-map? I'm asking because I'm a bit puzzled by the reasoning behind using this particular structure with the resulting ordering only vaguely related (i.e. in a manner which might be difficult to predict) to that of the input collection anyway... I haven't got a strong opinion on this, by the way, it's just that if there isn't some huge benefit to sorted-map which I can't yet see, perhaps a hash-map would make for a more versatile function? Sincerely, Michał -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: generalize distinct
On 23 February 2010 00:17, Eugen Dück eu...@dueck.org wrote: Rather than having a separate fn 'distinct-by' in addition to the existing 'distinct', which, apart from the hard-coded keyfn would be EXACTLY the same, shouldn't we just generalize distinct to default to hard-coded if no keyfn is specified? (Btw I'm not considering my other suggestion to allow the set to be passed in here, which I personally like, but also understand can be seen as unorthogonal) And this is not only true for distinct-by, it's true for some of the other * / *-by pairs. It just seems this should be generalized in order not to duplicate code for these cases. If we copy-and-paste code, what's the justification? I'd say orthogonality is an argument against copy-and-paste. Do we copy and paste rather than generalize in order to have distinct a little bit faster due to it being hard-coded? That seems like a very good point to me! Especially if clojure.core/identity could be inlined by the compiler (which I thought it is, although now that I look the inlining-related entries are missing from (meta #'identity)), both distinct and sort could be arity overloaded to accept an optional keyfn or not (sort would basically have to be replaced with sort-by + an added unary variant with defaults for both keyfn and comparator). Then perhaps we could reuse the *-by names for Haskell-style binary predicate-based versions. Haskell's sortBy, groupBy, nubBy all take binary functions to decide ordering / equality of pairs of elements from the input sequence... That's enables some neat code sometimes, cf. the primes on liner. How about that? Sincerely, Michał -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: generalize distinct
Take a look at sort/sort-by in core. That would be starting point for fn style. On Feb 23, 12:44 pm, Michał Marczyk michal.marc...@gmail.com wrote: On 23 February 2010 00:17, Eugen Dück eu...@dueck.org wrote: Rather than having a separate fn 'distinct-by' in addition to the existing 'distinct', which, apart from the hard-coded keyfn would be EXACTLY the same, shouldn't we just generalize distinct to default to hard-coded if no keyfn is specified? (Btw I'm not considering my other suggestion to allow the set to be passed in here, which I personally like, but also understand can be seen as unorthogonal) And this is not only true for distinct-by, it's true for some of the other * / *-by pairs. It just seems this should be generalized in order not to duplicate code for these cases. If we copy-and-paste code, what's the justification? I'd say orthogonality is an argument against copy-and-paste. Do we copy and paste rather than generalize in order to have distinct a little bit faster due to it being hard-coded? That seems like a very good point to me! Especially if clojure.core/identity could be inlined by the compiler (which I thought it is, although now that I look the inlining-related entries are missing from (meta #'identity)), both distinct and sort could be arity overloaded to accept an optional keyfn or not (sort would basically have to be replaced with sort-by + an added unary variant with defaults for both keyfn and comparator). Then perhaps we could reuse the *-by names for Haskell-style binary predicate-based versions. Haskell's sortBy, groupBy, nubBy all take binary functions to decide ordering / equality of pairs of elements from the input sequence... That's enables some neat code sometimes, cf. the primes on liner. How about that? Sincerely, Michał -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: generalize distinct
I guess I was getting at how to do the airity overloading. My first instinct when developing is do something like this (def distinct ([coll] (distinct identity coll)) ([f coll] (lot-of-code))) However, if you look around core, you'll see that Rich repeats himself a lot. This really obvious w/ fns like juxt partial. It turns out that this is results in faster code. So I'd implement distinct like this. (def distinct ([coll] (lots-of-code)) ([f coll] (lot-of-similar-code))) That's all I was saying. Sean On Feb 23, 2:40 pm, Michał Marczyk michal.marc...@gmail.com wrote: On 23 February 2010 20:26, Sean Devlin francoisdev...@gmail.com wrote: Take a look at sort/sort-by in core. That would be starting point for fn style. Not sure what you mean by that. If anything, having an arity-overloaded distinct (optionally accepting a binary predicate to use in place of =) and a separate distinct-by (accepting an extra keyfn argument plus optionally the binary predicate) would make for a more uniform standard lib. (I guess I suggested switching to the Haskell naming convention before, but there's no real reason to do that if the same functionality is available, plus I'd be confused by this myself, having gotten used to the Clojure names already... I retract that part.) Sincerely, Michał -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: generalize distinct
On 23 February 2010 21:03, Sean Devlin francoisdev...@gmail.com wrote: I guess I was getting at how to do the airity overloading. My first instinct when developing is do something like this (def distinct ([coll] (distinct identity coll)) ([f coll] (lot-of-code))) However, if you look around core, you'll see that Rich repeats himself a lot. This really obvious w/ fns like juxt partial. It turns out that this is results in faster code. So I'd implement distinct like this. (def distinct ([coll] (lots-of-code)) ([f coll] (lot-of-similar-code))) That's all I was saying. Sean That's a very valid point! (Which I previously completely failed to grok.) For a private utility function, I guess I normally prefer DRYer, perhaps slightly slower code, but for a library function it's a different matter... Anyway, thanks for the clarification! How about that sorted-map vs hash-map thing with group-by? I'm genuinely curious. All best, Michał -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: generalize distinct
Agree, it might make sense to do that performance-wise. Although I'm hoping that the compiler would be able to inline that as well (optimizing away calls to identity). Btw, what does :inline-arities do? It sounds like this could do what we want. On Feb 24, 5:03 am, Sean Devlin francoisdev...@gmail.com wrote: I guess I was getting at how to do the airity overloading. My first instinct when developing is do something like this (def distinct ([coll] (distinct identity coll)) ([f coll] (lot-of-code))) However, if you look around core, you'll see that Rich repeats himself a lot. This really obvious w/ fns like juxt partial. It turns out that this is results in faster code. So I'd implement distinct like this. (def distinct ([coll] (lots-of-code)) ([f coll] (lot-of-similar-code))) That's all I was saying. Sean On Feb 23, 2:40 pm, Michał Marczyk michal.marc...@gmail.com wrote: On 23 February 2010 20:26, Sean Devlin francoisdev...@gmail.com wrote: Take a look at sort/sort-by in core. That would be starting point for fn style. Not sure what you mean by that. If anything, having an arity-overloaded distinct (optionally accepting a binary predicate to use in place of =) and a separate distinct-by (accepting an extra keyfn argument plus optionally the binary predicate) would make for a more uniform standard lib. (I guess I suggested switching to the Haskell naming convention before, but there's no real reason to do that if the same functionality is available, plus I'd be confused by this myself, having gotten used to the Clojure names already... I retract that part.) Sincerely, Michał -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: generalize distinct
Yes, for the reasons stated. First, and most important, clojure.core/distinct does not let me pass in a keyfn, it's hard-coded. Second, and as mentioned that's debatable, distinct could make more of the filtering functionality it already has available to the caller, for free. On Feb 22, 11:53 am, Wilson MacGyver wmacgy...@gmail.com wrote: Any reason why you can't use distinct? http://richhickey.github.com/clojure/clojure.core-api.html#clojure.co... On Feb 21, 2010 10:24 AM, Eugen Dueck eu...@dueck.org wrote: Hi, Clojure is great! The gain in productivity from more low level languages like Java, but also more functional languages like Ruby and Common LISP etc. amazes me every day. Like how adding a simple map in front of the count here: (count colls) changes the code from counting the number of collections to enumerating the number of items in those collections. It's all these little things that take one minute or 5 in Java, but only 5 seconds in Clojure (in case you are a slow typer) that make you so much more efficient. With clojure, I spend most of my time thinking about problems at a conceptual level, rather than at the implementation level (Java: int[] vs. IterableInteger vs. ListInteger vs. Integer[] etc. - arrg!). Thanks Rich! That said, every now and then I come across a function in clojure.core that could be vastly improved in its range of applicability by just adding one or more optional parameters, defaulting to the currently hard-coded values (like the often implicit 'identity' function). Take 'distinct'. I'd like to be able to specify the keyfn, that's an important one for me, and while we're at it, I'd like to pass in the set that that function builds up incrementally, and normally starts with an empty set, so that I can pre-initialize it, say with #{ nil }, so that it also filters out nils. This 2nd point is not that important, and I'm not sure if it is that great an idea in terms of orthogonality of the functions, as we have filter. But you'd get rid off one level of indirection. distinct basically gives us a filter for free. But the 1st point I think certainly makes sense, and we have a couple of other fns in clojure that have a variant with a keyfn parameter, like sort-by etc. I guess they normally get a different name. I'm not so sure about what the best order of parameters is in clojure, and named parameters would make this a no brainer, but this is what I currently use, the first parameter being coll, at least in the variant with only one parameter that makes it a drop-in replacement candidate for distinct: (defn distinkt Returns a lazy sequence of the elements of coll with duplicates removed ([coll] (distinkt coll identity)) ([coll keyfn] (distinkt coll keyfn #{})) ([coll keyfn seen-items] (let [step (fn step [xs seen] (lazy-seq ((fn [[f :as xs] seen] (when-let [s (seq xs)] (let [key (keyfn f)] (if (contains? seen key) (recur (rest s) seen) (cons f (step (rest s) (conj seen key))) xs seen)))] (step coll seen-items I don't mind writing - or better - copy-and-pasting this code and keeping it in my project, but I think it could be useful for others, so I wouldn't mind at all if this makes it into clojure.core... :) Or is the reason for hard coding an (implicit) 'identity' performance? Or did I miss some other way to achieve the same goal? Eugen -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.comclojure%2bunsubscr...@googlegroups.com For more options, visit this group athttp://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: generalize distinct
It sounds to me that you want to use c.c.seq-utils/group-by, not distinct. On Feb 22, 7:22 am, Eugen Dueck eu...@dueck.org wrote: Yes, for the reasons stated. First, and most important, clojure.core/distinct does not let me pass in a keyfn, it's hard-coded. Second, and as mentioned that's debatable, distinct could make more of the filtering functionality it already has available to the caller, for free. On Feb 22, 11:53 am, Wilson MacGyver wmacgy...@gmail.com wrote: Any reason why you can't use distinct? http://richhickey.github.com/clojure/clojure.core-api.html#clojure.co... On Feb 21, 2010 10:24 AM, Eugen Dueck eu...@dueck.org wrote: Hi, Clojure is great! The gain in productivity from more low level languages like Java, but also more functional languages like Ruby and Common LISP etc. amazes me every day. Like how adding a simple map in front of the count here: (count colls) changes the code from counting the number of collections to enumerating the number of items in those collections. It's all these little things that take one minute or 5 in Java, but only 5 seconds in Clojure (in case you are a slow typer) that make you so much more efficient. With clojure, I spend most of my time thinking about problems at a conceptual level, rather than at the implementation level (Java: int[] vs. IterableInteger vs. ListInteger vs. Integer[] etc. - arrg!). Thanks Rich! That said, every now and then I come across a function in clojure.core that could be vastly improved in its range of applicability by just adding one or more optional parameters, defaulting to the currently hard-coded values (like the often implicit 'identity' function). Take 'distinct'. I'd like to be able to specify the keyfn, that's an important one for me, and while we're at it, I'd like to pass in the set that that function builds up incrementally, and normally starts with an empty set, so that I can pre-initialize it, say with #{ nil }, so that it also filters out nils. This 2nd point is not that important, and I'm not sure if it is that great an idea in terms of orthogonality of the functions, as we have filter. But you'd get rid off one level of indirection. distinct basically gives us a filter for free. But the 1st point I think certainly makes sense, and we have a couple of other fns in clojure that have a variant with a keyfn parameter, like sort-by etc. I guess they normally get a different name. I'm not so sure about what the best order of parameters is in clojure, and named parameters would make this a no brainer, but this is what I currently use, the first parameter being coll, at least in the variant with only one parameter that makes it a drop-in replacement candidate for distinct: (defn distinkt Returns a lazy sequence of the elements of coll with duplicates removed ([coll] (distinkt coll identity)) ([coll keyfn] (distinkt coll keyfn #{})) ([coll keyfn seen-items] (let [step (fn step [xs seen] (lazy-seq ((fn [[f :as xs] seen] (when-let [s (seq xs)] (let [key (keyfn f)] (if (contains? seen key) (recur (rest s) seen) (cons f (step (rest s) (conj seen key))) xs seen)))] (step coll seen-items I don't mind writing - or better - copy-and-pasting this code and keeping it in my project, but I think it could be useful for others, so I wouldn't mind at all if this makes it into clojure.core... :) Or is the reason for hard coding an (implicit) 'identity' performance? Or did I miss some other way to achieve the same goal? Eugen -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.comclojure%2bunsubscr...@googlegroups.com For more options, visit this group athttp://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: generalize distinct
group-by does not filter, in contrast to distinct. And I'm really only interested in one of the values considered equal according to my keyfn. So if I sed group-by, I'd have to wrap it with a call to map or similar. This is certainly an option, but I'd rather not create all the intermediate vectors if I don't need to. On Feb 22, 11:13 pm, Sean Devlin francoisdev...@gmail.com wrote: It sounds to me that you want to use c.c.seq-utils/group-by, not distinct. On Feb 22, 7:22 am, Eugen Dueck eu...@dueck.org wrote: Yes, for the reasons stated. First, and most important, clojure.core/distinct does not let me pass in a keyfn, it's hard-coded. Second, and as mentioned that's debatable, distinct could make more of the filtering functionality it already has available to the caller, for free. On Feb 22, 11:53 am, Wilson MacGyver wmacgy...@gmail.com wrote: Any reason why you can't use distinct? http://richhickey.github.com/clojure/clojure.core-api.html#clojure.co... On Feb 21, 2010 10:24 AM, Eugen Dueck eu...@dueck.org wrote: Hi, Clojure is great! The gain in productivity from more low level languages like Java, but also more functional languages like Ruby and Common LISP etc. amazes me every day. Like how adding a simple map in front of the count here: (count colls) changes the code from counting the number of collections to enumerating the number of items in those collections. It's all these little things that take one minute or 5 in Java, but only 5 seconds in Clojure (in case you are a slow typer) that make you so much more efficient. With clojure, I spend most of my time thinking about problems at a conceptual level, rather than at the implementation level (Java: int[] vs. IterableInteger vs. ListInteger vs. Integer[] etc. - arrg!). Thanks Rich! That said, every now and then I come across a function in clojure.core that could be vastly improved in its range of applicability by just adding one or more optional parameters, defaulting to the currently hard-coded values (like the often implicit 'identity' function). Take 'distinct'. I'd like to be able to specify the keyfn, that's an important one for me, and while we're at it, I'd like to pass in the set that that function builds up incrementally, and normally starts with an empty set, so that I can pre-initialize it, say with #{ nil }, so that it also filters out nils. This 2nd point is not that important, and I'm not sure if it is that great an idea in terms of orthogonality of the functions, as we have filter. But you'd get rid off one level of indirection. distinct basically gives us a filter for free. But the 1st point I think certainly makes sense, and we have a couple of other fns in clojure that have a variant with a keyfn parameter, like sort-by etc. I guess they normally get a different name. I'm not so sure about what the best order of parameters is in clojure, and named parameters would make this a no brainer, but this is what I currently use, the first parameter being coll, at least in the variant with only one parameter that makes it a drop-in replacement candidate for distinct: (defn distinkt Returns a lazy sequence of the elements of coll with duplicates removed ([coll] (distinkt coll identity)) ([coll keyfn] (distinkt coll keyfn #{})) ([coll keyfn seen-items] (let [step (fn step [xs seen] (lazy-seq ((fn [[f :as xs] seen] (when-let [s (seq xs)] (let [key (keyfn f)] (if (contains? seen key) (recur (rest s) seen) (cons f (step (rest s) (conj seen key))) xs seen)))] (step coll seen-items I don't mind writing - or better - copy-and-pasting this code and keeping it in my project, but I think it could be useful for others, so I wouldn't mind at all if this makes it into clojure.core... :) Or is the reason for hard coding an (implicit) 'identity' performance? Or did I miss some other way to achieve the same goal? Eugen -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.comclojure%2bunsubscr...@googlegroups.com For more options, visit this group athttp://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first
Re: generalize distinct
The Haskell approach is to have two functions, nub and nubBy, which perform the task of distinct and the task of distinct with a user-provided keyfn, respectively. I'd love to see a distinct-by in core or contrib. (Mostly for completeness, it's easy enough to write...) As for the extra set argument, I'd rather leave it out. Use a composition of distinct-by and filter. Sincerely, Michał -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: generalize distinct
What does distinct-by return in case of a collision? (keys (group-by...)) (map first (vals (group-by ...))) (map other-fn (vals (group-by ...))) You're still better off w/ group-by, and manipulating the resulting map appropriately. Sean On Feb 22, 1:15 pm, Michał Marczyk michal.marc...@gmail.com wrote: The Haskell approach is to have two functions, nub and nubBy, which perform the task of distinct and the task of distinct with a user-provided keyfn, respectively. I'd love to see a distinct-by in core or contrib. (Mostly for completeness, it's easy enough to write...) As for the extra set argument, I'd rather leave it out. Use a composition of distinct-by and filter. Sincerely, Michał -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: generalize distinct
On 22 February 2010 19:25, Sean Devlin francoisdev...@gmail.com wrote: What does distinct-by return in case of a collision? I'm not sure what you mean by that. distinct-by would do precisely what distinct does, only instead of comparing the items of the argument collection themselves, it would compare values of (keyfn item). The return value ought to be a lazy-seq, in keeping with that of distinct. Sincerely, Michał -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: generalize distinct
PS. The code for both functions is lifted straight from clojure.core's distinct. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: generalize distinct
Generally when you have a *-by fn, you return the a sequence of original values, not the mapped values (e.g. sort-by) However, in the case you suggest, you really should just be using map distinct, and not creating your own fn. Using the built ins as much as possible is more idiomatic Clojure. (distinct (map f coll)) Sean On Feb 22, 1:39 pm, Michał Marczyk michal.marc...@gmail.com wrote: On 22 February 2010 19:25, Sean Devlin francoisdev...@gmail.com wrote: What does distinct-by return in case of a collision? I'm not sure what you mean by that. distinct-by would do precisely what distinct does, only instead of comparing the items of the argument collection themselves, it would compare values of (keyfn item). The return value ought to be a lazy-seq, in keeping with that of distinct. Sincerely, Michał -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: generalize distinct
On 22 February 2010 19:59, Sean Devlin francoisdev...@gmail.com wrote: Generally when you have a *-by fn, you return the a sequence of original values, not the mapped values (e.g. sort-by) Which is precisely what both versions of distinct-by are doing. They return a sequence of values taken from the original collection, although determining which values those are involves calling a user-supplied function and performing a bunch of comparisons of the value thus obtained to values obtained from calls to said function at previous steps. E.g. (I'm repeating an example; this uses the second version of distinct-by): (distinct-by class [1 2 3 :a :b :c 'a 'b 'c]) ; = (1 :a a) whereas (distinct (map class [1 2 3 :a :b :c 'a 'b 'c]) ; = (java.lang.Integer clojure.lang.Keyword clojure.lang.Symbol) The primes one-liner using the nubBy-like version returns a list of numbers, whereas the mapped values are Booleans. Sincerely, Michał -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: generalize distinct
Okay, your longer post makes more sense. Then is the seq (1 :a a) guaranteed? How do I know that I won't get (2 :b b), (1 :b c), etc? What if I want a specific combination instead? I've had to actually code this specific problem, and I found that using group-by some secondary mapping operation was the only thing that gave me the flexibility I needed (manufacturing is fun!). Sean On Feb 22, 2:19 pm, Michał Marczyk michal.marc...@gmail.com wrote: On 22 February 2010 19:59, Sean Devlin francoisdev...@gmail.com wrote: Generally when you have a *-by fn, you return the a sequence of original values, not the mapped values (e.g. sort-by) Which is precisely what both versions of distinct-by are doing. They return a sequence of values taken from the original collection, although determining which values those are involves calling a user-supplied function and performing a bunch of comparisons of the value thus obtained to values obtained from calls to said function at previous steps. E.g. (I'm repeating an example; this uses the second version of distinct-by): (distinct-by class [1 2 3 :a :b :c 'a 'b 'c]) ; = (1 :a a) whereas (distinct (map class [1 2 3 :a :b :c 'a 'b 'c]) ; = (java.lang.Integer clojure.lang.Keyword clojure.lang.Symbol) The primes one-liner using the nubBy-like version returns a list of numbers, whereas the mapped values are Booleans. Sincerely, Michał -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: generalize distinct
On 22 February 2010 20:28, Sean Devlin francoisdev...@gmail.com wrote: Then is the seq (1 :a a) guaranteed? How do I know that I won't get (2 :b b), (1 :b c), etc? What if I want a specific combination instead? I've had to actually code this specific problem, and I found that using group-by some secondary mapping operation was the only thing that gave me the flexibility I needed (manufacturing is fun!). The ordering guarantees distinct-by makes are exactly those that distinct makes, because it uses the same code (as mentioned previously, I lifted it all from clojure.core, then tweaked to take the keyfn / eqfn into account). Basically this means that if your collection has an intrinsic ordering, it will be preserved (the result will include, for each equivalence class of items from the sequence modulo the user-defined equivalence relation, the one earliest w.r.t. that ordering). If it's a hash-map or a hash-set instead, you'll get whatever ordering (seq coll) happens to produce. As for group-by giving you more flexibility -- well, it gives you a lot of flexibility where it's appropriate to use it, but because of its choice of data structure for the result, you can't use it to reimplement distinct-by directly: user= (group-by class [1 2 3 :a :b :c 'a 'b 'c]) java.lang.ClassCastException: java.lang.Class cannot be cast to java.lang.Comparable (NO_SOURCE_FILE:0) So no way to use non-Comparables as keys... And then there's the fact that you can't tell in which order the keys discovered by group-by appeared in the original collection, which is again because of its use of sorted-map, which has the consequence that order is being mangled on purpose! E.g.: user= (seq (group-by #(- %) [1 2 3 4 5])) ([-5 [5]] [-4 [4]] [-3 [3]] [-2 [2]] [-1 [1]]) In other words: (seq (group-by f coll)) has an ordering possibly completely unrelated to that of coll (so you'd have to make a separate traversal through the coll to discover the original ordering of the keys), whereas (distinct-by f coll), for either version of distinct-by, preserves the ordering of coll. That's a desirable property for when that's what you want to do, whereas group-by will, I suppose, be more useful on other occasions. ;-) To sum it up, (1) distinct-by actually behaves in a very predictable way (which may or may not be useful for any particular purpose), (2) it cannot be implemented directly in terms of group-by. I'd say it's pretty orthogonal to the existing library functions (that I know of) actually... Sincerely, Michał -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: generalize distinct
Excellent points. +1 for orthogonality. Sean On Feb 22, 3:13 pm, Michał Marczyk michal.marc...@gmail.com wrote: On 22 February 2010 20:28, Sean Devlin francoisdev...@gmail.com wrote: Then is the seq (1 :a a) guaranteed? How do I know that I won't get (2 :b b), (1 :b c), etc? What if I want a specific combination instead? I've had to actually code this specific problem, and I found that using group-by some secondary mapping operation was the only thing that gave me the flexibility I needed (manufacturing is fun!). The ordering guarantees distinct-by makes are exactly those that distinct makes, because it uses the same code (as mentioned previously, I lifted it all from clojure.core, then tweaked to take the keyfn / eqfn into account). Basically this means that if your collection has an intrinsic ordering, it will be preserved (the result will include, for each equivalence class of items from the sequence modulo the user-defined equivalence relation, the one earliest w.r.t. that ordering). If it's a hash-map or a hash-set instead, you'll get whatever ordering (seq coll) happens to produce. As for group-by giving you more flexibility -- well, it gives you a lot of flexibility where it's appropriate to use it, but because of its choice of data structure for the result, you can't use it to reimplement distinct-by directly: user= (group-by class [1 2 3 :a :b :c 'a 'b 'c]) java.lang.ClassCastException: java.lang.Class cannot be cast to java.lang.Comparable (NO_SOURCE_FILE:0) So no way to use non-Comparables as keys... And then there's the fact that you can't tell in which order the keys discovered by group-by appeared in the original collection, which is again because of its use of sorted-map, which has the consequence that order is being mangled on purpose! E.g.: user= (seq (group-by #(- %) [1 2 3 4 5])) ([-5 [5]] [-4 [4]] [-3 [3]] [-2 [2]] [-1 [1]]) In other words: (seq (group-by f coll)) has an ordering possibly completely unrelated to that of coll (so you'd have to make a separate traversal through the coll to discover the original ordering of the keys), whereas (distinct-by f coll), for either version of distinct-by, preserves the ordering of coll. That's a desirable property for when that's what you want to do, whereas group-by will, I suppose, be more useful on other occasions. ;-) To sum it up, (1) distinct-by actually behaves in a very predictable way (which may or may not be useful for any particular purpose), (2) it cannot be implemented directly in terms of group-by. I'd say it's pretty orthogonal to the existing library functions (that I know of) actually... Sincerely, Michał -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
generalize distinct
Hi, Clojure is great! The gain in productivity from more low level languages like Java, but also more functional languages like Ruby and Common LISP etc. amazes me every day. Like how adding a simple map in front of the count here: (count colls) changes the code from counting the number of collections to enumerating the number of items in those collections. It's all these little things that take one minute or 5 in Java, but only 5 seconds in Clojure (in case you are a slow typer) that make you so much more efficient. With clojure, I spend most of my time thinking about problems at a conceptual level, rather than at the implementation level (Java: int[] vs. IterableInteger vs. ListInteger vs. Integer[] etc. - arrg!). Thanks Rich! That said, every now and then I come across a function in clojure.core that could be vastly improved in its range of applicability by just adding one or more optional parameters, defaulting to the currently hard-coded values (like the often implicit 'identity' function). Take 'distinct'. I'd like to be able to specify the keyfn, that's an important one for me, and while we're at it, I'd like to pass in the set that that function builds up incrementally, and normally starts with an empty set, so that I can pre-initialize it, say with #{ nil }, so that it also filters out nils. This 2nd point is not that important, and I'm not sure if it is that great an idea in terms of orthogonality of the functions, as we have filter. But you'd get rid off one level of indirection. distinct basically gives us a filter for free. But the 1st point I think certainly makes sense, and we have a couple of other fns in clojure that have a variant with a keyfn parameter, like sort-by etc. I guess they normally get a different name. I'm not so sure about what the best order of parameters is in clojure, and named parameters would make this a no brainer, but this is what I currently use, the first parameter being coll, at least in the variant with only one parameter that makes it a drop-in replacement candidate for distinct: (defn distinkt Returns a lazy sequence of the elements of coll with duplicates removed ([coll] (distinkt coll identity)) ([coll keyfn] (distinkt coll keyfn #{})) ([coll keyfn seen-items] (let [step (fn step [xs seen] (lazy-seq ((fn [[f :as xs] seen] (when-let [s (seq xs)] (let [key (keyfn f)] (if (contains? seen key) (recur (rest s) seen) (cons f (step (rest s) (conj seen key))) xs seen)))] (step coll seen-items I don't mind writing - or better - copy-and-pasting this code and keeping it in my project, but I think it could be useful for others, so I wouldn't mind at all if this makes it into clojure.core... :) Or is the reason for hard coding an (implicit) 'identity' performance? Or did I miss some other way to achieve the same goal? Eugen -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: generalize distinct
Any reason why you can't use distinct? http://richhickey.github.com/clojure/clojure.core-api.html#clojure.core/distinct On Feb 21, 2010 10:24 AM, Eugen Dueck eu...@dueck.org wrote: Hi, Clojure is great! The gain in productivity from more low level languages like Java, but also more functional languages like Ruby and Common LISP etc. amazes me every day. Like how adding a simple map in front of the count here: (count colls) changes the code from counting the number of collections to enumerating the number of items in those collections. It's all these little things that take one minute or 5 in Java, but only 5 seconds in Clojure (in case you are a slow typer) that make you so much more efficient. With clojure, I spend most of my time thinking about problems at a conceptual level, rather than at the implementation level (Java: int[] vs. IterableInteger vs. ListInteger vs. Integer[] etc. - arrg!). Thanks Rich! That said, every now and then I come across a function in clojure.core that could be vastly improved in its range of applicability by just adding one or more optional parameters, defaulting to the currently hard-coded values (like the often implicit 'identity' function). Take 'distinct'. I'd like to be able to specify the keyfn, that's an important one for me, and while we're at it, I'd like to pass in the set that that function builds up incrementally, and normally starts with an empty set, so that I can pre-initialize it, say with #{ nil }, so that it also filters out nils. This 2nd point is not that important, and I'm not sure if it is that great an idea in terms of orthogonality of the functions, as we have filter. But you'd get rid off one level of indirection. distinct basically gives us a filter for free. But the 1st point I think certainly makes sense, and we have a couple of other fns in clojure that have a variant with a keyfn parameter, like sort-by etc. I guess they normally get a different name. I'm not so sure about what the best order of parameters is in clojure, and named parameters would make this a no brainer, but this is what I currently use, the first parameter being coll, at least in the variant with only one parameter that makes it a drop-in replacement candidate for distinct: (defn distinkt Returns a lazy sequence of the elements of coll with duplicates removed ([coll] (distinkt coll identity)) ([coll keyfn] (distinkt coll keyfn #{})) ([coll keyfn seen-items] (let [step (fn step [xs seen] (lazy-seq ((fn [[f :as xs] seen] (when-let [s (seq xs)] (let [key (keyfn f)] (if (contains? seen key) (recur (rest s) seen) (cons f (step (rest s) (conj seen key))) xs seen)))] (step coll seen-items I don't mind writing - or better - copy-and-pasting this code and keeping it in my project, but I think it could be useful for others, so I wouldn't mind at all if this makes it into clojure.core... :) Or is the reason for hard coding an (implicit) 'identity' performance? Or did I miss some other way to achieve the same goal? Eugen -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.comclojure%2bunsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: generalize distinct
let me expand on my comment. What I meant is, the type of functionality you are adding into distinct, makes it feel less distinct. to me. If I read what you are saying correctly, you want to be also be able to pass a set, so in addition to filter duplicate elements, it also will filter out anything in the set. to me that doesn't feel clojureish. I've always felt that clojure code is about a lot of small functions that does one thing, but does the one thing really well, so you can compose the functions together to do what you want. adding what feels like an unrelated capability to a core function seems odd to me. Just my 2 cents. On Sun, Feb 21, 2010 at 7:25 AM, Eugen Dueck eu...@dueck.org wrote: Hi, Clojure is great! The gain in productivity from more low level languages like Java, but also more functional languages like Ruby and Common LISP etc. amazes me every day. Like how adding a simple map in front of the count here: (count colls) changes the code from counting the number of collections to enumerating the number of items in those collections. It's all these little things that take one minute or 5 in Java, but only 5 seconds in Clojure (in case you are a slow typer) that make you so much more efficient. With clojure, I spend most of my time thinking about problems at a conceptual level, rather than at the implementation level (Java: int[] vs. IterableInteger vs. ListInteger vs. Integer[] etc. - arrg!). Thanks Rich! That said, every now and then I come across a function in clojure.core that could be vastly improved in its range of applicability by just adding one or more optional parameters, defaulting to the currently hard-coded values (like the often implicit 'identity' function). Take 'distinct'. I'd like to be able to specify the keyfn, that's an important one for me, and while we're at it, I'd like to pass in the set that that function builds up incrementally, and normally starts with an empty set, so that I can pre-initialize it, say with #{ nil }, so that it also filters out nils. This 2nd point is not that important, and I'm not sure if it is that great an idea in terms of orthogonality of the functions, as we have filter. But you'd get rid off one level of indirection. distinct basically gives us a filter for free. But the 1st point I think certainly makes sense, and we have a couple of other fns in clojure that have a variant with a keyfn parameter, like sort-by etc. I guess they normally get a different name. I'm not so sure about what the best order of parameters is in clojure, and named parameters would make this a no brainer, but this is what I currently use, the first parameter being coll, at least in the variant with only one parameter that makes it a drop-in replacement candidate for distinct: (defn distinkt Returns a lazy sequence of the elements of coll with duplicates removed ([coll] (distinkt coll identity)) ([coll keyfn] (distinkt coll keyfn #{})) ([coll keyfn seen-items] (let [step (fn step [xs seen] (lazy-seq ((fn [[f :as xs] seen] (when-let [s (seq xs)] (let [key (keyfn f)] (if (contains? seen key) (recur (rest s) seen) (cons f (step (rest s) (conj seen key))) xs seen)))] (step coll seen-items I don't mind writing - or better - copy-and-pasting this code and keeping it in my project, but I think it could be useful for others, so I wouldn't mind at all if this makes it into clojure.core... :) Or is the reason for hard coding an (implicit) 'identity' performance? Or did I miss some other way to achieve the same goal? Eugen -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -- Omnem crede diem tibi diluxisse supremum. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en