[Factor-talk] A protocol for sets

Daniel Ehrenberg Tue, 16 Feb 2010 08:45:48 -0800

Hi everyone,

Right now, there are a bunch of different ways to represent sets in
Factor. You could use a hashtable for a set, where key? tests
membership, or you could use a sequence for a set, where member? tests
membership, or you could use a bit array for a set, where ?nth tests
membership. Each of these are used different places in the Factor code
base. Changing representations for a set then takes a bunch of work:
all the code to construct and process the set has to be updated.


Sequences and assocs have protocols, and it seems like it'd be useful
for sets to have one too. This way, the only thing that you need to
change in your code, if you want to change representations, is how you
construct the set. I drafted out the basics of a set protocol in my
git repository at
http://github.com/littledan/Factor/blob/bags/extra/bags/bags.factor .
The most important generic words in the protocol are:

MIXIN: set
GENERIC: adjoin ( elt set -- )
GENERIC: in? ( elt set -- ? )
GENERIC: delete ( elt set -- )
GENERIC: set-like ( set exemplar -- set' )
GENERIC: fast-set ( set -- set' )
GENERIC: members ( set -- sequence )

To add something to a set, use the adjoin word. This word currently
works just on sequences. To test if something is in a set, use the in?
word. For sequences, this is implemented as member?. For
hashtable-based sets, this is key?. If sets get moved to core, then
maybe in? will be renamed to member?, subsuming the current member?
word.To remove an item from the set, use delete. This does nothing if
the set does not contain the element deleted. set-like is analogous to
like on sequences; it casts a set to a different set type based on an
exemplar. fast-set gets a representation of a set that's fast to
query--for most sets, this does nothing, but for sequences, it
converts them into a hash-set. members gives a sequence of the
contents of the set.

The set protocol was designed with two goals: that it be easy to
change set representations, and that the interface is a superset of
what's currently there in the sets vocab for sequences. Binary set
operations that are in sets now are also in the new sets system.
Actually, they are all generic words, so that other set
implementations (like bit sets) can override them for more efficient
implementations. But their default implementations work based just on
those simpler words listed above. I didn't make any binary destructive
set operations (like assoc-union!) because these didn't seem to be in
use anywhere in Factor's code base for sets.

There are a couple ways that this can cause a backwards
incompatibility, if this is put in the core sets vocabulary. First,
now the sets vocab will define a word set which potentially conflicts
with the set word in namespaces. However, most vocabs won't use either
of these vocabs, and many vocabs that use both of these won't ever
call the set word, so only a few things will have to be updated.
Additionally, once all code in Factor that uses hashtables as sets is
updated to use these sets, then it might make sense to remove the
conjoin word, potentially breaking code that's not included in the
repository.

Does anyone have any thoughts on this design? If you can see any
potential problems with this approach, I'd like to hear them.

Thanks,

Dan

------------------------------------------------------------------------------
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
_______________________________________________
Factor-talk mailing list
Factor-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/factor-talk

[Factor-talk] A protocol for sets

Reply via email to