Hi everyone, Right now, there are a bunch of different ways to represent sets in Factor. You could use a hashtable for a set, where key? tests membership, or you could use a sequence for a set, where member? tests membership, or you could use a bit array for a set, where ?nth tests membership. Each of these are used different places in the Factor code base. Changing representations for a set then takes a bunch of work: all the code to construct and process the set has to be updated.
Sequences and assocs have protocols, and it seems like it'd be useful for sets to have one too. This way, the only thing that you need to change in your code, if you want to change representations, is how you construct the set. I drafted out the basics of a set protocol in my git repository at http://github.com/littledan/Factor/blob/bags/extra/bags/bags.factor . The most important generic words in the protocol are: MIXIN: set GENERIC: adjoin ( elt set -- ) GENERIC: in? ( elt set -- ? ) GENERIC: delete ( elt set -- ) GENERIC: set-like ( set exemplar -- set' ) GENERIC: fast-set ( set -- set' ) GENERIC: members ( set -- sequence ) To add something to a set, use the adjoin word. This word currently works just on sequences. To test if something is in a set, use the in? word. For sequences, this is implemented as member?. For hashtable-based sets, this is key?. If sets get moved to core, then maybe in? will be renamed to member?, subsuming the current member? word.To remove an item from the set, use delete. This does nothing if the set does not contain the element deleted. set-like is analogous to like on sequences; it casts a set to a different set type based on an exemplar. fast-set gets a representation of a set that's fast to query--for most sets, this does nothing, but for sequences, it converts them into a hash-set. members gives a sequence of the contents of the set. The set protocol was designed with two goals: that it be easy to change set representations, and that the interface is a superset of what's currently there in the sets vocab for sequences. Binary set operations that are in sets now are also in the new sets system. Actually, they are all generic words, so that other set implementations (like bit sets) can override them for more efficient implementations. But their default implementations work based just on those simpler words listed above. I didn't make any binary destructive set operations (like assoc-union!) because these didn't seem to be in use anywhere in Factor's code base for sets. There are a couple ways that this can cause a backwards incompatibility, if this is put in the core sets vocabulary. First, now the sets vocab will define a word set which potentially conflicts with the set word in namespaces. However, most vocabs won't use either of these vocabs, and many vocabs that use both of these won't ever call the set word, so only a few things will have to be updated. Additionally, once all code in Factor that uses hashtables as sets is updated to use these sets, then it might make sense to remove the conjoin word, potentially breaking code that's not included in the repository. Does anyone have any thoughts on this design? If you can see any potential problems with this approach, I'd like to hear them. Thanks, Dan ------------------------------------------------------------------------------ SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev _______________________________________________ Factor-talk mailing list Factor-talk@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/factor-talk