Re: clojure 1.2 seq fn enhancement FAQ

Mark Engelberg Wed, 28 Apr 2010 16:38:46 -0700

On Wed, Apr 28, 2010 at 2:39 PM, Douglas Philips <d...@mac.com> wrote:
> If some function I call uses seq-contains? (so that it can get all the 
> wonderful seq-ness abstractness clojure offers) I should have to know that 
> internal detail and not pass in a map or set? or worse, fail to get an 
> optimization in that case? C'mon, this should be an easy protocol win (or 
> maybe I don't really understand the limits on what protocols can do.)


Doug, I also am not particularly enamored of the name seq-contains?  I
agree that it conjures up notions of Scheme's proliferation of names
like list-ref, vector-ref, string-ref, etc.

But I think you're not fully appreciating the complexity of the
situation.  This is not just about performance, but a question of two
different possible semantics for "contains?", which is why it can't
*only* be addressed with a protocol.

The fundamental problem is that some Clojure data structures have a
dual nature.  For example, a vector can be thought of as a collection
of items, or you can think of it as a mapping from integers to items.
A hash map can be thought of as a map, or as a sequence of key-value
pairs.

The current "contains?" function is really a "contains-key?" function.
 It treats the input as a data structure with keys (a map or a set),
and asks whether it contains a given key.  This has been a tremendous
source of confusion to some people who apply the contains? function to
a vector, and are startled to discover that it is only asking whether
the item is a *key* in the vector, (i.e., in the case of a vector, a
valid index between 0 and length-1).  Arguably, contains-key? would
have been a much better name, and made the semantics much more clear.

There are some people who believe that Clojure would benefit from a
variant of "contains?" that views the input as a collection.  But how
to make this clear?  That's why the proposal is for a new function
called "seq-contains?".  It's not so much a warning of "this thing has
performance like a linear search", as much as, "it views your data as
a sequence".  So for example, calling seq-contains? on a vector will
actually test whether the item is an element of the vector collection.
 Calling seq-contains? on a map will actually test whether a given
[key value] pair is in the map (rather than just testing for the
existence of a key).

So what are some other options?:

1.  Don't include seq-contains?  The behavior you want can usually be
achieved by using (some #{item} coll).  Disadvantage - if you're
testing to see if the collection contains nil, that won't work.
2.  Rename contains? to contains-key? and make a function with the new
semantics called contains?  Disadvantage - breaks existing code.
3.  Keep one function called contains?, but use protocols to achieve
different semantics for different kind of data structures.  So for
sets and maps, contains? would work like a contains-key? function, but
for vectors and other sequences, it would be a more intuitive
contains-item? function.  Disadvantage - semantics depends on knowing
what kind of thing you're passing to the function, also might break
existing code if anyone is actually currently using contains? on
vectors.

I'm not crazy about the name seq-contains? (and I'm skeptical that it
even belongs in the core), but all the other solutions I can think of
also have major disadvantages.

That said, you make an excellent point about performance.  Even though
the new seq-contains? is intended to treat the underlying data
structure as a sequence from a semantics standpoint, there is
absolutely no reason at all why its performance should be bound by
this.  Clearly contains? and seq-contains? have identical semantics
for a set, so for a set data structure the new seq-contains? should
also use protocols to achieve the same speed, so seq-contains?
performs identically on a set as contains?  Similarly, even though the
new seq-contains? will have a new meaning for hash maps (testing for a
key-value pair rather than just a key), clearly it is possible to use
protocols to perform this test far faster than a linear sequential
search.

So, if things do move forward with seq-contains? to create a new
semantic notion in the clojure core of how to test for containment, I
absolutely agree that protocols should be used to make this new notion
to continue to have good performance on those sorts of collections for
which it is possible to have superior performance than a sequential
search.

--Mark

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: clojure 1.2 seq fn enhancement FAQ

Reply via email to