To start off, I should clarify that I see little value for the
existence of a Bag type except for certain matters of syntactic or
semantic brevity, but that those alone can still warrant its
existence.
A Bag is for marking when your duplicate-allowing collection is
conceptually not ordered, and that is all that it is for. This
marker is useful for optimizing certain places a Seq would otherwise
use, such as implicitly permitting hyperthreading (a Set can also
hyperthread). And it is also useful as a language-enforced stricture
where you are prevented from doing order-dependent operations on that
collection because they don't make sense.
Aside from these optimizations and strictures afforded by a Bag type,
I see no reason to provide too many operators for them ... in fact, I
would argue that what one can do with a Bag be defined as an
intersection of what one can do with a Seq and a Set.
That said ...
At 4:16 PM +0100 11/23/06, TSa wrote:
Adriano Rodrigues wrote:
And we may argue as well that being Bag a multiset, the set is a
special case where all the elements have the same multiplicity.
Or specifically, a multiplicity of 1.
Yes, that would be a subset type. The thing I had in mind was
'role Seq does Bag' and 'role Bag does Set'. And classes with
the same names for creating instances.
I think you have something backwards here. While the 3 collection
types Seq,Bag,Set could be sequenced like that for some purposes of
explanation, where adjacent types have commonalities that the other
doesn't, I don't see that it falls to also chain .does() in the same
direction all the way across.
Seq and Set are *both* more specific or restricted than Bag. So it
would make more sense to say 'role Set does Bag' (and 'role Seq does
Bag'), not 'role Bag does Set'. For illustrative purposes, replace
"Set" with "Int" and "Bag" with "Num". Everything that is a valid
Set|Seq is a valid Bag, but the reverse isn't true.
(That's not to say that we can't cast a Bag as a Set, but that would
change the value, like doing round|floor|ceil|etc on a Num to get an
Int, and this is external to a .does relationship.)
This also allows us to reserve operators for Set that Bag can't or
won't have (because they depend on all collection elements being
distinct), as we can reserve operators for Seq that Bag can't have
(because they depend on the order of elements being significant).
Now, there is a small handful of operations that could easily be
ascribed to all 3 of those types, such as testing if an element
exists, or how many occurrances there are, or iterating through all
elements in an order-agnostic fashion. These can all have easily
predictable and consistent behaviour.
Moreover, some operations are clearly useable with only the Seq type,
such as iterating through elements in order or reading an element at
a specific index.
The operators [union, intersection, difference, disjoint-union, etc]
have clearly defined and predictable behaviour with a Set, since all
inputs and outputs have no duplicates.
The operational advantage of Set being a supertype of Seq is that
all set operations are available for Seq out of the box. Mixed
operations of Seq and Set would dispatch to the Set variant. The
Seq operations like hypering are naturally precluded for Sets.
But I would ask whether it is desirable for those Set operators to be
present in Bag|Seq, and if so, then what the desired semantics are.
For example, what would these return:
Bag(1,2,2,2,3,3) union Bag(1,2,2,4,4);
# Bag(1,1,2,2,2,2,2,3,3,4,4) or Bag(1,2,2,2,3,3,4,4) ?
Bag(1,2,2,2,3,3) intersection Bag(1,2,2,4,4);
# Bag(1,1,2,2,2,2,2) or Bag(1,2,2) ?
Bag(1,2,2,2,3,3) difference Bag(1,2,2,4,4);
# Bag(2,3,3) or Bag(3,3) ?
Bag(1,2,2,2,3,3) d_union Bag(1,2,2,4,4);
# Bag(2,3,3,4,4) or Bag(3,3,4,4) ?
Repeat again with Bag->Seq.
In my mind, it would be far simpler to reserve such operators to the
Set only, and cast a Bag|Seq as a Set to use them on it, if that is
desired whereupon the results are all distinct.
But still, it is something that should be decided on, one way or the other.
-- Darren Duncan