[Feature Request] Adding to D lang "set" built-in

2012-05-05 Thread bioinfornatics
Dear, i hope i will got some answer from druntime/phobos dev .
A set is an array of unique element:
>>> set( 1, 5, 7, 1, 7)
give this array => [1, 5, 7]

Currently in D the hack it is to use an associative array with any
value.
byte[size_t] a = [ 1:0, 5:0, 7:0, 1:0, 7:0 ];

then it is easy to have a set in D just add a syntax or a built-in where
use associative array without using value.

I hope really this little feature and seem really easy to add


thanks a lot for your great works



Re: [Feature Request] Adding to D lang "set" built-in

2012-05-05 Thread Gor Gyolchanyan
I think a much more realistic suggestion would be to add it to phobos
as a struct, which wraps this no-value AA solution.

On Sat, May 5, 2012 at 9:14 PM, bioinfornatics
 wrote:
> Dear, i hope i will got some answer from druntime/phobos dev .
> A set is an array of unique element:
 set( 1, 5, 7, 1, 7)
> give this array => [1, 5, 7]
>
> Currently in D the hack it is to use an associative array with any
> value.
> byte[size_t] a = [ 1:0, 5:0, 7:0, 1:0, 7:0 ];
>
> then it is easy to have a set in D just add a syntax or a built-in where
> use associative array without using value.
>
> I hope really this little feature and seem really easy to add
>
>
> thanks a lot for your great works
>



-- 
Bye,
Gor Gyolchanyan.


Re: [Feature Request] Adding to D lang "set" built-in

2012-05-05 Thread Chris Cain
There's a few ways to implement your own Sets right now, if 
needed. You found one way (using the built in AAs), but that's 
not all.


* You could use Array!bool (which efficiently stores bools) to 
store whether integers are in or out of your set.
* You can use std.container's redBlackTree to make a set which 
would work better if your items are objects or you're dealing 
with a sparse array.


There's probably a few other extremely simple methods, but those 
are the two I had on the top of my head. I'm not really sure 
having a particular "Set" in the library would be all that 
useful. Generally, if I want a set, I have a particular idea of 
how one should be implemented for my particular data.


Re: [Feature Request] Adding to D lang "set" built-in

2012-05-06 Thread H. S. Teoh
On Sat, May 05, 2012 at 07:14:27PM +0200, bioinfornatics wrote:
> Dear, i hope i will got some answer from druntime/phobos dev .
> A set is an array of unique element:
> >>> set( 1, 5, 7, 1, 7)
> give this array => [1, 5, 7]
> 
> Currently in D the hack it is to use an associative array with any
> value.
> byte[size_t] a = [ 1:0, 5:0, 7:0, 1:0, 7:0 ];
> 
> then it is easy to have a set in D just add a syntax or a built-in where
> use associative array without using value.
> 
> I hope really this little feature and seem really easy to add
[...]

It's not too hard to implement something like this yourself. One of the
first projects I did when I started learning D was to implement an
integer set (basically a bit array). It supports full set operations,
like union, intersection, difference, optimized filling with first n
numbers, foreach, etc.. Some of the operations use bit-twiddling tricks
to maximize performance.

The only flaw is that it's essentially just a bit array, so it only
works well for sets whose elements are relatively small numbers. If you
want sparse sets, you may be better off adapting D's AA implementation
to have an AA that consists of just keys. Basically, you still have the
hash table and buckets, just without the value field in each slot. (I
wonder if it's possible for the new AA implementation to support this by
specifying void as the value type.)


T

-- 
Klein bottle for rent ... inquire within. -- Stephen Mulraney


Re: [Feature Request] Adding to D lang "set" built-in

2012-05-06 Thread Chris Cain

On Sunday, 6 May 2012 at 22:10:03 UTC, H. S. Teoh wrote:

slot. (I
wonder if it's possible for the new AA implementation to 
support this by

specifying void as the value type.)


That would actually be very nice if it were the case.


Re: [Feature Request] Adding to D lang "set" built-in

2012-05-06 Thread Andrej Mitrovic
On 5/7/12, H. S. Teoh  wrote:
> you may be better off adapting D's AA implementation
> to have an AA that consists of just keys.

Yeah I do that in my code right now. It's pretty easy, just write an
some opApply/opBinary functions, add to!string(innerhash.keys) for
good measure and you're set (heh, set).

> Basically, you still have the
> hash table and buckets, just without the value field in each slot. (I
> wonder if it's possible for the new AA implementation to support this by
> specifying void as the value type.)

I think that can be done now with void[0], e.g.:

void[0][int] set;

I guess the way to add keys is to use a stub variable:
void[0][int] set;
void[0] val;
set[1] = val;

But I'm not sure how well serialization libraries can handle this
type, for my code I just use int[Key] in a wrapper struct.


Re: [Feature Request] Adding to D lang "set" built-in

2012-05-06 Thread Andrej Mitrovic
On 5/7/12, Andrej Mitrovic  wrote:
> just write an
> some

Beautiful way to start a sentence. :P


Re: [Feature Request] Adding to D lang "set" built-in

2012-05-06 Thread Russel Winder
On Mon, 2012-05-07 at 00:42 +0200, Andrej Mitrovic wrote:
> On 5/7/12, H. S. Teoh  wrote:
> > you may be better off adapting D's AA implementation
> > to have an AA that consists of just keys.
> 
> Yeah I do that in my code right now. It's pretty easy, just write an
> some opApply/opBinary functions, add to!string(innerhash.keys) for
> good measure and you're set (heh, set).

Any language with which the programmer has to develop their own set
implementation is sadly lacking.

It is true that set can be implemented using a map, but this should be
seen as implementation detail, not as programmer level tool. It
indicates that set should exist where set does. So for C++ it is in the
standard library, for Python it is a built-in (*).

The conclusion for D is straightforward. Which issue to I go to to vote?


(*) Although set was a second class citizen in Python 2, it is a first
class citizen in Python 3. There are even set comprehensions, just as
there are list comprehensions and dictionary comprehensions.  I just
love the influence of functional programming: be data evolution focused,
not control flow focused.

-- 
Russel.
=
Dr Russel Winder  t: +44 20 7585 2200   voip: sip:russel.win...@ekiga.net
41 Buckmaster Roadm: +44 7770 465 077   xmpp: rus...@winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder


signature.asc
Description: This is a digitally signed message part


Re: [Feature Request] Adding to D lang "set" built-in

2012-05-06 Thread Jonathan M Davis
On Monday, May 07, 2012 06:49:00 Russel Winder wrote:
> On Mon, 2012-05-07 at 00:42 +0200, Andrej Mitrovic wrote:
> > On 5/7/12, H. S. Teoh  wrote:
> > > you may be better off adapting D's AA implementation
> > > to have an AA that consists of just keys.
> > 
> > Yeah I do that in my code right now. It's pretty easy, just write an
> > some opApply/opBinary functions, add to!string(innerhash.keys) for
> > good measure and you're set (heh, set).
> 
> Any language with which the programmer has to develop their own set
> implementation is sadly lacking.
> 
> It is true that set can be implemented using a map, but this should be
> seen as implementation detail, not as programmer level tool. It
> indicates that set should exist where set does. So for C++ it is in the
> standard library, for Python it is a built-in (*).
> 
> The conclusion for D is straightforward. Which issue to I go to to vote?

We have RedBlackTree, so we have a set already. What we don't have is a 
HashSet. However, once the custom allocator stuff has been sorted out, I'd 
fully expect to have one in std.container along with a variety of other 
container types which we really should have but don't.

- Jonathan M Davis


Re: [Feature Request] Adding to D lang "set" built-in

2012-05-06 Thread Russel Winder
On Sun, 2012-05-06 at 22:53 -0700, Jonathan M Davis wrote:
[...]
> We have RedBlackTree, so we have a set already. What we don't have is a 
> HashSet. However, once the custom allocator stuff has been sorted out, I'd 
> fully expect to have one in std.container along with a variety of other 
> container types which we really should have but don't.

I certainly went immediately to std.container to look for the work set,
map, etc. in some guise. I only posted after finding nothing
appropriate.

I guess the issue here is that maps (aka dictionaries, aka associative
arrays) are in the D language core (as in Python) rather than being is
the library (as in C++). This indicates set should be in the core rather
than in the library.

I wonder if the tradition of exposing HashMap and TreeMap was a
disservice by C++ and Java? Map and Set are programmer level concepts.
Where there are algorithmic issues that require knowing about trees or
has tables then the programmer is not working at the map or set level.

And the programmer has no control over how D's associative arrays work.

And let's not mention Java's LinkedHashMap.  :-)

-- 
Russel.
=
Dr Russel Winder  t: +44 20 7585 2200   voip: sip:russel.win...@ekiga.net
41 Buckmaster Roadm: +44 7770 465 077   xmpp: rus...@winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder


signature.asc
Description: This is a digitally signed message part


Re: [Feature Request] Adding to D lang "set" built-in

2012-05-06 Thread Jonathan M Davis
On Monday, May 07, 2012 07:05:28 Russel Winder wrote:
> I wonder if the tradition of exposing HashMap and TreeMap was a
> disservice by C++ and Java? Map and Set are programmer level concepts.
> Where there are algorithmic issues that require knowing about trees or
> has tables then the programmer is not working at the map or set level.

???

There are multiple ways to implement a set or a map. If a programmer knows 
their data structures (as one would hope that they would), then they know the 
difference between a hash set and a tree set (or hash map and tree map), and 
they'll pick the one that's most appropriate for what they're doing. And 
because the containers are very similar, it should be quite possible in many 
cases to replace one with the other quite easily - especially if you're using 
templates. That's one of the reasons that Java has the Set interface with 
HashSet and TreeSet implenting it.

std.container is designed with the idea that containers will be named after 
their data structures and _not_ what they're used for. Programmers can then 
select the data structure that best serves their purposes. And I'd be worried 
about any professional programmer who didn't know that you can use a red-black 
tree as a set or a map.

> And the programmer has no control over how D's associative arrays work.

It's clearly a hash map. If they want a tree map, then we have RedBlackTree 
(though we should probably provide a wrapper that makes it easier to use 
RedBlackTree as a map, since it's a bit unwieldy to do that at the moment). I 
don't see the problem.

- Jonathan M Davis


Re: [Feature Request] Adding to D lang "set" built-in

2012-05-07 Thread Steven Schveighoffer
On Sat, 05 May 2012 13:14:27 -0400, bioinfornatics  
 wrote:



Dear, i hope i will got some answer from druntime/phobos dev .
A set is an array of unique element:

set( 1, 5, 7, 1, 7)

give this array => [1, 5, 7]

Currently in D the hack it is to use an associative array with any
value.
byte[size_t] a = [ 1:0, 5:0, 7:0, 1:0, 7:0 ];

then it is easy to have a set in D just add a syntax or a built-in where
use associative array without using value.

I hope really this little feature and seem really easy to add


thanks a lot for your great works


http://www.dsource.org/projects/dcollections

-Steve


Re: [Feature Request] Adding to D lang "set" built-in

2012-05-07 Thread David Nadlinger

On Monday, 7 May 2012 at 06:05:47 UTC, Russel Winder wrote:

I wonder if the tradition of exposing HashMap and TreeMap was a
disservice by C++ and Java? Map and Set are programmer level 
concepts.
Where there are algorithmic issues that require knowing about 
trees or
has tables then the programmer is not working at the map or set 
level.


Some people say that abstracting away big-O complexity should be 
a capital offense, and I agree (preferably in slightly less 
drastic words, though). Additionally, a tree-based map is 
naturally ordered, whereas a hash map is not – for me, that's 
enough to warrant exposing what seems to be a »detail«, at 
least in languages like C++ and D.


David


Re: [Feature Request] Adding to D lang "set" built-in

2012-05-07 Thread Steven Schveighoffer
On Mon, 07 May 2012 08:52:56 -0400, David Nadlinger   
wrote:



On Monday, 7 May 2012 at 06:05:47 UTC, Russel Winder wrote:

I wonder if the tradition of exposing HashMap and TreeMap was a
disservice by C++ and Java? Map and Set are programmer level concepts.
Where there are algorithmic issues that require knowing about trees or
has tables then the programmer is not working at the map or set level.


Some people say that abstracting away big-O complexity should be a  
capital offense, and I agree (preferably in slightly less drastic words,  
though). Additionally, a tree-based map is naturally ordered, whereas a  
hash map is not – for me, that's enough to warrant exposing what seems  
to be a »detail«, at least in languages like C++ and D.


Yes, I fully agree.

That being said, I think it's also important that some  
functions/structures should not have to care what algorithm is used.   
That's one of the reasons I like having the interface design that  
dcollections uses.


-Steve


Re: [Feature Request] Adding to D lang "set" built-in

2012-05-07 Thread Russel Winder
On Sun, 2012-05-06 at 23:23 -0700, Jonathan M Davis wrote:



> There are multiple ways to implement a set or a map. If a programmer knows 
> their data structures (as one would hope that they would), then they know the 
> difference between a hash set and a tree set (or hash map and tree map), and 
> they'll pick the one that's most appropriate for what they're doing. And 
> because the containers are very similar, it should be quite possible in many 
> cases to replace one with the other quite easily - especially if you're using 
> templates. That's one of the reasons that Java has the Set interface with 
> HashSet and TreeSet implenting it.

Indeed. And it is clear that Java's (and C++, but to a lesser extent)
obsession with coding only to interfaces with no concern for the
properties of the realization underneath can rapidly lead to appalling
performance of programs.  Classic example is sorting and the List
interface -- though the switch to Mergesort and modified Timsort with
hidden reflection did ameliorate most of the problems for Java 7.

> std.container is designed with the idea that containers will be named after 
> their data structures and _not_ what they're used for. Programmers can then 
> select the data structure that best serves their purposes. And I'd be worried 
> about any professional programmer who didn't know that you can use a 
> red-black 
> tree as a set or a map.

Works for me, but...

Having done more training of middle of the road programmers recently, I
am appalled at how poor the average programmer turns out to be. In
particular, their knowledge of data structures is at a level where, were
I still in academia, they would fail and be ejected from the computer
science programme.

The level seems to be:

array
map == associative array == hash table
list == sequence == singly-linked list

anything else is SEP. Sadly very much a "I just know enough to not get
sacked and pick up my pay packet" attitude. Mention red-black tree, 2-3
tree, trie, b-tree, and they glaze over.

My earlier comments we flavoured by the need for a language to have an
API that can cope with these folks as well as the folks who, like
everyone on lists such as this one, appreciate that there is always much
more to data structures than you think at first sight. Big-O, omega,
theta, etc. analysis is important and interesting but somehow there
needs to be a layer that gets away from this to provide a default, not
poor, implementation.

If I go in, I shall just rant. Hopefully I have now made the point in
this email that I should have made in the earlier email

> It's clearly a hash map. If they want a tree map, then we have RedBlackTree 
> (though we should probably provide a wrapper that makes it easier to use 
> RedBlackTree as a map, since it's a bit unwieldy to do that at the moment). I 
> don't see the problem.

I think it would be good for there to be façades to the implementing
data structure to provide a functional indication.

Perhaps the observation is that where a data structure type such as map
or set is realized in the core language there is a single
implementation. Where things are library based then you get the naming
to show function and realization. The Python experience is that more
programmers do more reasonable work with list, tuple, map, set all
having a single realization. Where more care is needed then there are
well crafted libraries to work with, not least of which is NumPy.

 

-- 
Russel.
=
Dr Russel Winder  t: +44 20 7585 2200   voip: sip:russel.win...@ekiga.net
41 Buckmaster Roadm: +44 7770 465 077   xmpp: rus...@winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder


signature.asc
Description: This is a digitally signed message part


Re: [Feature Request] Adding to D lang "set" built-in

2012-05-07 Thread Era Scarecrow

On Monday, 7 May 2012 at 05:54:04 UTC, Jonathan M Davis wrote:

On Monday, May 07, 2012 06:49:00 Russel Winder wrote:
Any language with which the programmer has to develop their  
own set implementation is sadly lacking.


It is true that set can be implemented using a map, but this  
should be seen as implementation detail, not as programmer 
level tool. It indicates that set should exist where set does. 
So for C++ it  is in the standard library, for Python it is a 
built-in (*).


The conclusion for D is straightforward. Which issue to I go  
to to vote?


We have RedBlackTree, so we have a set already. What we don't  
have is a HashSet. However, once the custom allocator stuff has 
been  sorted out, I'd fully expect to have one in std.container 
along with a variety  of other container types which we really 
should have but don't.


 Been thinking about this a little. With arrays, fixed arrays and 
associative arrays being built in, you don't need set to be built 
in. First it's already hash-map like so it's going to be O(1), 
and the data being kinda useless can either be a byte, and int 
(offset to an array perhaps), or to itself.


 It has the added benefit of being a sparse array when asked of 
it. So aside from a slight loss in memory space which is only 
going to be critical in embedded systems. The only thing the 
builtins don't offer is a way to automatically sort the data, 
which the libraries and RedBlackTree's do quite well so far as I 
can see.