I've been working on this sets project for a little bit. Now, in the
bags branch of my repository, the sets have replaced the core sets
vocabulary. The main thing left at this point is updating all of the
libraries to use the new sets. In their current form, I have three
words (prune, conjoin, conjoin-at and unique) which I want to
eliminate in favor of new set-based words (members, adjoin, adjoin-at
and <hash-set>, respectively). But this will be a bit of work to
update all the code in the repository, since this is a little more
complicated than just changing names--assoc operations must be
replaced with their equivalent set operations, and doing this requires
some understanding of the flow of data of the program. If anyone wants
to help update their own code, I would be grateful for the help: you
will probably find the update easier than me to do, and it will give
you a chance to learn about the new sets.

I apologize for the trouble that the backwards incompatiblity creates,
but the new set protocol should make it easier to write and change
code in the future. By not tying your code into a specific
representation of a set, it should be much easier to change
representations as the code evolves. On the end of set implementors,
the protocol gives efficient utilities based on some simple primitive
operations.

Scripting languages often don't have the power that a well-designed
class library provides, but Factor gives you the concise syntax and
easy-to-manipulate datastructures of a scripting language together
with something approaching the extensibility of more mature languages
using the sequence, assoc and set protocols, among others. The
high-profile efforts to give OCaml a new standard library and Java a
new library of data structures demonstrate the importance of getting
this right the first time.

Note that if you have Factor code that is not included in the Factor
repository, you will have to update this on your own if you want to
update to the next version of Factor once this is merged in. You have
a couple options: you can just copy the old sets defintions in, or you
can think about how to make the minor changes it will take to actually
use the new set protocol. Neither should take much effort.

Dan

On Tue, Feb 16, 2010 at 10:45 AM, Daniel Ehrenberg <micro...@gmail.com> wrote:
> Hi everyone,
>
> Right now, there are a bunch of different ways to represent sets in
> Factor. You could use a hashtable for a set, where key? tests
> membership, or you could use a sequence for a set, where member? tests
> membership, or you could use a bit array for a set, where ?nth tests
> membership. Each of these are used different places in the Factor code
> base. Changing representations for a set then takes a bunch of work:
> all the code to construct and process the set has to be updated.
>
> Sequences and assocs have protocols, and it seems like it'd be useful
> for sets to have one too. This way, the only thing that you need to
> change in your code, if you want to change representations, is how you
> construct the set. I drafted out the basics of a set protocol in my
> git repository at
> http://github.com/littledan/Factor/blob/bags/extra/bags/bags.factor .
> The most important generic words in the protocol are:
>
> MIXIN: set
> GENERIC: adjoin ( elt set -- )
> GENERIC: in? ( elt set -- ? )
> GENERIC: delete ( elt set -- )
> GENERIC: set-like ( set exemplar -- set' )
> GENERIC: fast-set ( set -- set' )
> GENERIC: members ( set -- sequence )
>
> To add something to a set, use the adjoin word. This word currently
> works just on sequences. To test if something is in a set, use the in?
> word. For sequences, this is implemented as member?. For
> hashtable-based sets, this is key?. If sets get moved to core, then
> maybe in? will be renamed to member?, subsuming the current member?
> word.To remove an item from the set, use delete. This does nothing if
> the set does not contain the element deleted. set-like is analogous to
> like on sequences; it casts a set to a different set type based on an
> exemplar. fast-set gets a representation of a set that's fast to
> query--for most sets, this does nothing, but for sequences, it
> converts them into a hash-set. members gives a sequence of the
> contents of the set.
>
> The set protocol was designed with two goals: that it be easy to
> change set representations, and that the interface is a superset of
> what's currently there in the sets vocab for sequences. Binary set
> operations that are in sets now are also in the new sets system.
> Actually, they are all generic words, so that other set
> implementations (like bit sets) can override them for more efficient
> implementations. But their default implementations work based just on
> those simpler words listed above. I didn't make any binary destructive
> set operations (like assoc-union!) because these didn't seem to be in
> use anywhere in Factor's code base for sets.
>
> There are a couple ways that this can cause a backwards
> incompatibility, if this is put in the core sets vocabulary. First,
> now the sets vocab will define a word set which potentially conflicts
> with the set word in namespaces. However, most vocabs won't use either
> of these vocabs, and many vocabs that use both of these won't ever
> call the set word, so only a few things will have to be updated.
> Additionally, once all code in Factor that uses hashtables as sets is
> updated to use these sets, then it might make sense to remove the
> conjoin word, potentially breaking code that's not included in the
> repository.
>
> Does anyone have any thoughts on this design? If you can see any
> potential problems with this approach, I'd like to hear them.
>
> Thanks,
>
> Dan
>

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Factor-talk mailing list
Factor-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/factor-talk

Reply via email to