Re: dcollections 1.0 and 2.0a beta released

2010-05-28 Thread Jacob Carlborg

On 2010-05-28 14:12, Steven Schveighoffer wrote:

On Fri, 28 May 2010 06:10:49 -0400, Jacob Carlborg  wrote:


On 2010-05-27 12:32, Steven Schveighoffer wrote:

On Wed, 26 May 2010 10:06:32 -0400, Bruno Medeiros
 wrote:


On 24/05/2010 16:45, Andrei Alexandrescu wrote:

In the past I have built a C++ library that abstracted features of
the OS. My goal was to make it possible to dynamically load a module
that abstracted things like setting the IP address of a network
interface. My modules used std::string instead of char * to lookup
services to get objects that implement the interface. Big mistake. On
a later version of the standard C++ runtime, the private
implementation of std::string changed, so the dynamically loaded
libraries crashed horribly. No change in string's interface, just the
private stuff changed, but because it's a template, the code that
uses it necessarily has to be aware of it. We ended up ditching the
standard C++ library's version of string, and used STLPort so we
could control the library.

I envision this same sort of problem would be likely with D
collection objects that were not used via interfaces.


I see no problem retrofitting a no-interface container into a formal
interface if so needed.



I don't understand this discussion: isn't the reason above pretty much
a dead-on hard requirement for the collections to have interfaces?
Something like, for example, an interface version of the range traits?


Only if you wish to have binary compatibility with dynamic libs. Such a
thing isn't likely today since dynamic libs aren't very well supported
in D, and even phobos or dcollections isn't a dynamic lib.


I've got my patch, for build Tango as a dynamic library on Mac, quite
recently included in trunk. And I've also have a patch for druntime
and Phobos in bugzilla just waiting to be included + one patch making
it easier creating dynamic libraries directly with DMD. I would say
it's a bad idea to still think that dynamic libraries aren't support,
we have to think forward and assume they will be supported.



I remember that, and I'm very encouraged by it. That being said, the
ultimate goal is to have dmd be able to build dynamic libraries easily.
D has had dynamic library "support" for years, but you have to do all
kinds of manual stuff, and the standard library isn't dynamic. Until the
standard library is dynamic, and I can build a dynamic library with a
-shared equivalent switch, dynamic libs are a laboratory feature, and
not many projects will use it.


Yes, exactly, I noticed there isn't an easy way to build dynamic 
libraries, among other things you have to know the path to the standard 
library when manually building.



Just so you know, I think it's important to support binary compatibility
in dcollections, and since std.container has not adopted dcollections,
I'm going to keep interfaces. I was just pointing out the position
others may have WRT binary support.

-Steve



--
/Jacob Carlborg


Re: dcollections 1.0 and 2.0a beta released

2010-05-28 Thread Steven Schveighoffer

On Fri, 28 May 2010 06:10:49 -0400, Jacob Carlborg  wrote:


On 2010-05-27 12:32, Steven Schveighoffer wrote:

On Wed, 26 May 2010 10:06:32 -0400, Bruno Medeiros
 wrote:


On 24/05/2010 16:45, Andrei Alexandrescu wrote:

In the past I have built a C++ library that abstracted features of
the OS. My goal was to make it possible to dynamically load a module
that abstracted things like setting the IP address of a network
interface. My modules used std::string instead of char * to lookup
services to get objects that implement the interface. Big mistake. On
a later version of the standard C++ runtime, the private
implementation of std::string changed, so the dynamically loaded
libraries crashed horribly. No change in string's interface, just the
private stuff changed, but because it's a template, the code that
uses it necessarily has to be aware of it. We ended up ditching the
standard C++ library's version of string, and used STLPort so we
could control the library.

I envision this same sort of problem would be likely with D
collection objects that were not used via interfaces.


I see no problem retrofitting a no-interface container into a formal
interface if so needed.



I don't understand this discussion: isn't the reason above pretty much
a dead-on hard requirement for the collections to have interfaces?
Something like, for example, an interface version of the range traits?


Only if you wish to have binary compatibility with dynamic libs. Such a
thing isn't likely today since dynamic libs aren't very well supported
in D, and even phobos or dcollections isn't a dynamic lib.


I've got my patch, for build Tango as a dynamic library on Mac, quite  
recently included in trunk. And I've also have a patch for druntime and  
Phobos in bugzilla just waiting to be included + one patch making it  
easier creating dynamic libraries directly with DMD. I would say it's a  
bad idea to still think that dynamic libraries aren't support, we have  
to think forward and assume they will be supported.




I remember that, and I'm very encouraged by it.  That being said, the  
ultimate goal is to have dmd be able to build dynamic libraries easily.  D  
has had dynamic library "support" for years, but you have to do all kinds  
of manual stuff, and the standard library isn't dynamic.  Until the  
standard library is dynamic, and I can build a dynamic library with a  
-shared equivalent switch, dynamic libs are a laboratory feature, and not  
many projects will use it.


Just so you know, I think it's important to support binary compatibility  
in dcollections, and since std.container has not adopted dcollections, I'm  
going to keep interfaces.  I was just pointing out the position others may  
have WRT binary support.


-Steve


Re: dcollections 1.0 and 2.0a beta released

2010-05-28 Thread Steven Schveighoffer
On Fri, 28 May 2010 06:24:26 -0400, Bruno Medeiros  
 wrote:



On 27/05/2010 11:32, Steven Schveighoffer wrote:

On Wed, 26 May 2010 10:06:32 -0400, Bruno Medeiros
 wrote:


On 24/05/2010 16:45, Andrei Alexandrescu wrote:

In the past I have built a C++ library that abstracted features of
the OS. My goal was to make it possible to dynamically load a module
that abstracted things like setting the IP address of a network
interface. My modules used std::string instead of char * to lookup
services to get objects that implement the interface. Big mistake. On
a later version of the standard C++ runtime, the private
implementation of std::string changed, so the dynamically loaded
libraries crashed horribly. No change in string's interface, just the
private stuff changed, but because it's a template, the code that
uses it necessarily has to be aware of it. We ended up ditching the
standard C++ library's version of string, and used STLPort so we
could control the library.

I envision this same sort of problem would be likely with D
collection objects that were not used via interfaces.


I see no problem retrofitting a no-interface container into a formal
interface if so needed.



I don't understand this discussion: isn't the reason above pretty much
a dead-on hard requirement for the collections to have interfaces?
Something like, for example, an interface version of the range traits?


Only if you wish to have binary compatibility with dynamic libs. Such a
thing isn't likely today since dynamic libs aren't very well supported
in D, and even phobos or dcollections isn't a dynamic lib.



Ah, nevermind, my mind slipped and I was thinking of any kind of  
library, that is, static ones as well.
Although even just dynamic library compatibility seems to be a valid  
enough case that we should consider from the start, even if its not well  
supported currently.


I agree, that's why dcollections has interfaces.

I'm keeping them there since std.container has gone its own direction.

-Steve


Re: dcollections 1.0 and 2.0a beta released

2010-05-28 Thread Bruno Medeiros

On 27/05/2010 11:32, Steven Schveighoffer wrote:

On Wed, 26 May 2010 10:06:32 -0400, Bruno Medeiros
 wrote:


On 24/05/2010 16:45, Andrei Alexandrescu wrote:

In the past I have built a C++ library that abstracted features of
the OS. My goal was to make it possible to dynamically load a module
that abstracted things like setting the IP address of a network
interface. My modules used std::string instead of char * to lookup
services to get objects that implement the interface. Big mistake. On
a later version of the standard C++ runtime, the private
implementation of std::string changed, so the dynamically loaded
libraries crashed horribly. No change in string's interface, just the
private stuff changed, but because it's a template, the code that
uses it necessarily has to be aware of it. We ended up ditching the
standard C++ library's version of string, and used STLPort so we
could control the library.

I envision this same sort of problem would be likely with D
collection objects that were not used via interfaces.


I see no problem retrofitting a no-interface container into a formal
interface if so needed.



I don't understand this discussion: isn't the reason above pretty much
a dead-on hard requirement for the collections to have interfaces?
Something like, for example, an interface version of the range traits?


Only if you wish to have binary compatibility with dynamic libs. Such a
thing isn't likely today since dynamic libs aren't very well supported
in D, and even phobos or dcollections isn't a dynamic lib.



Ah, nevermind, my mind slipped and I was thinking of any kind of 
library, that is, static ones as well.
Although even just dynamic library compatibility seems to be a valid 
enough case that we should consider from the start, even if its not well 
supported currently.



--
Bruno Medeiros - Software Engineer


Re: dcollections 1.0 and 2.0a beta released

2010-05-28 Thread Jacob Carlborg

On 2010-05-27 12:32, Steven Schveighoffer wrote:

On Wed, 26 May 2010 10:06:32 -0400, Bruno Medeiros
 wrote:


On 24/05/2010 16:45, Andrei Alexandrescu wrote:

In the past I have built a C++ library that abstracted features of
the OS. My goal was to make it possible to dynamically load a module
that abstracted things like setting the IP address of a network
interface. My modules used std::string instead of char * to lookup
services to get objects that implement the interface. Big mistake. On
a later version of the standard C++ runtime, the private
implementation of std::string changed, so the dynamically loaded
libraries crashed horribly. No change in string's interface, just the
private stuff changed, but because it's a template, the code that
uses it necessarily has to be aware of it. We ended up ditching the
standard C++ library's version of string, and used STLPort so we
could control the library.

I envision this same sort of problem would be likely with D
collection objects that were not used via interfaces.


I see no problem retrofitting a no-interface container into a formal
interface if so needed.



I don't understand this discussion: isn't the reason above pretty much
a dead-on hard requirement for the collections to have interfaces?
Something like, for example, an interface version of the range traits?


Only if you wish to have binary compatibility with dynamic libs. Such a
thing isn't likely today since dynamic libs aren't very well supported
in D, and even phobos or dcollections isn't a dynamic lib.


I've got my patch, for build Tango as a dynamic library on Mac, quite 
recently included in trunk. And I've also have a patch for druntime and 
Phobos in bugzilla just waiting to be included + one patch making it 
easier creating dynamic libraries directly with DMD. I would say it's a 
bad idea to still think that dynamic libraries aren't support, we have 
to think forward and assume they will be supported.



And I have specifically decided not to use interfaces with ranges
because that makes them reference types. Ranges work well as value
types, but not well as reference types. Therefore, to use dcollections
as interfaces, you must not require the range traits.

-Steve



--
/Jacob Carlborg


Re: dcollections 1.0 and 2.0a beta released

2010-05-27 Thread Steven Schveighoffer
On Wed, 19 May 2010 17:18:04 -0400, Steven Schveighoffer  
 wrote:



Andrej Mitrovic Wrote:



Well as long as you're here can I submit an error here? :)

I get an error while building the D2 branch:

Error: ArrayMultiset.obj : No such file or directory


Crud, I admit that I assumed anything that built on Linux would build on  
Windows.  I still believe it will, but of course, I need to change the  
batch file :)




It seems the ArrayMultiset.d is not in the dcollections folder for the  
D2.x  branch, although it is in trunk (I guess the trunk is the D1.x  
one?).


Yes, D1 is trunk, D2 is going to eventually be trunk



Is that module deprecated for d2.x? (although it's listed in the win32  
batch file)


Yes, just remove it.  I'll fix it when I get a chance.


I've fixed the windows build files and recreated the zip/tarball files for  
2.0a.  Learned some batch-fu online and hopefully the new versions will be  
future-proof.


Please re-download the zipfile to build with windows.

Thanks

-Steve


Re: dcollections 1.0 and 2.0a beta released

2010-05-27 Thread Steven Schveighoffer
On Wed, 26 May 2010 10:06:32 -0400, Bruno Medeiros  
 wrote:



On 24/05/2010 16:45, Andrei Alexandrescu wrote:

In the past I have built a C++ library that abstracted features of
the OS. My goal was to make it possible to dynamically load a module
that abstracted things like setting the IP address of a network
interface. My modules used std::string instead of char * to lookup
services to get objects that implement the interface. Big mistake. On
a later version of the standard C++ runtime, the private
implementation of std::string changed, so the dynamically loaded
libraries crashed horribly. No change in string's interface, just the
private stuff changed, but because it's a template, the code that
uses it necessarily has to be aware of it. We ended up ditching the
standard C++ library's version of string, and used STLPort so we
could control the library.

I envision this same sort of problem would be likely with D
collection objects that were not used via interfaces.


I see no problem retrofitting a no-interface container into a formal
interface if so needed.



I don't understand this discussion: isn't the reason above pretty much a  
dead-on hard requirement for the collections to have interfaces?

Something like, for example, an interface version of the range traits?


Only if you wish to have binary compatibility with dynamic libs.  Such a  
thing isn't likely today since dynamic libs aren't very well supported in  
D, and even phobos or dcollections isn't a dynamic lib.


And I have specifically decided not to use interfaces with ranges because  
that makes them reference types.  Ranges work well as value types, but not  
well as reference types.  Therefore, to use dcollections as interfaces,  
you must not require the range traits.


-Steve


Re: dcollections 1.0 and 2.0a beta released

2010-05-26 Thread Bruno Medeiros

On 24/05/2010 16:45, Andrei Alexandrescu wrote:

In the past I have built a C++ library that abstracted features of
the OS. My goal was to make it possible to dynamically load a module
that abstracted things like setting the IP address of a network
interface. My modules used std::string instead of char * to lookup
services to get objects that implement the interface. Big mistake. On
a later version of the standard C++ runtime, the private
implementation of std::string changed, so the dynamically loaded
libraries crashed horribly. No change in string's interface, just the
private stuff changed, but because it's a template, the code that
uses it necessarily has to be aware of it. We ended up ditching the
standard C++ library's version of string, and used STLPort so we
could control the library.

I envision this same sort of problem would be likely with D
collection objects that were not used via interfaces.


I see no problem retrofitting a no-interface container into a formal
interface if so needed.



I don't understand this discussion: isn't the reason above pretty much a 
dead-on hard requirement for the collections to have interfaces?

Something like, for example, an interface version of the range traits?

--
Bruno Medeiros - Software Engineer


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Justin Spahr-Summers
On Mon, 24 May 2010 11:01:06 -0400, Steven Schveighoffer 
 wrote:
> It's done all the time in Java and .NET.  For example, A GUI listbox  
> widget exposes its elements as an array of elements, which implement the  
> List interface.  You don't ever see the implementation or need it.   
> Granted Java and .NET have less problems than C++ and D with binary  
> compatibility, since the function tables are dynamic, but the potential is  
> there for D to make binary compatibility possible with interfaces.
> 

Cocoa (NeXT's/Apple's framework for Objective-C) uses a very successful 
and well-thought-out delegation pattern, whereby GUI elements 
representing large amounts of data, like table views, have delegates 
(not in the D sense of the word) that provide them with the actual 
contents. Granted, Objective-C's runtime is much more dynamic than D, 
but a simplified version of such a pattern could still work in D. After 
all, user interfacing is typically where dynamism is more important than 
speed.


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Simen kjaeraas

Steven Schveighoffer  wrote:
On the flip side, if containers did not implement interfaces, having to  
do this:


class WrappedSet!(alias Impl, V) : Set!V
{
private Impl!V impl;
int functionToSatisfySet() { return impl.functionToSatisfySet(); }
...
}

seems to me like a lot more crufty and bloated than simply adding :  
Set!V on the end of the class declarations.


This would not be necessary. We can get the function names off of the
interface, so we could have a template do this for us.

--
Simen


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Steven Schveighoffer
On Mon, 24 May 2010 17:47:18 -0400, Andrei Alexandrescu  
 wrote:



On 05/24/2010 04:38 PM, Steven Schveighoffer wrote:

On Mon, 24 May 2010 17:35:11 -0400, Andrei Alexandrescu
 wrote:


On 05/24/2010 04:08 PM, Steven Schveighoffer wrote:

On Mon, 24 May 2010 16:27:46 -0400, Andrei Alexandrescu
 wrote:


Sorry. Yes, by-key iteration should be possible.


OK, so we should be able to iterate keys. And the keys are not stored  
in

the trie structure itself. So how does one iterate the keys of the
container without reconstructing them from the trie nodes using the
heap?


You can't. At some point you need to copy tree traces into T[] arrays.
If the trie stores parent nodes, you don't need to do that as you can
expose a trace as a double-ended range containing the key.

There's a kind of trie that was defined to avoid such issues; it
stores the keys in clear, in the leaves, at the expense of
duplication. I don't remember the name offhand. Does anyone?


OK, so the keys function of Map should work just fine for a Trie
implementation. Good to know.


This wins a little battle but not the war. Long term you're facing the  
sysyphic job of either knocking new containers into the existing  
interfaces, or extending the interfaces to accommodate new containers.  
It doesn't look to me like a winning proposition.


It's always easy to say that there may come a day when the interfaces  
don't work -- none of us can see the future.  When that day comes, we can  
find a solution to it.  The worst case scenario is that you simply don't  
implement any interfaces.  It does not detract from the interfaces that  
exist.  It would seem odd that some of the collections do not implement  
the interfaces while others do.  At the very least though, all containers  
should be iterable, meaning they will implement the Iterator!V interface.   
That at least allows it to interact with the other container types.


On the flip side, if containers did not implement interfaces, having to do  
this:


class WrappedSet!(alias Impl, V) : Set!V
{
   private Impl!V impl;
   int functionToSatisfySet() { return impl.functionToSatisfySet(); }
   ...
}

seems to me like a lot more crufty and bloated than simply adding : Set!V  
on the end of the class declarations.


But I'm willing to drop the interfaces for now since interfaces obviously  
are an unpopular choice.


-Steve


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Andrei Alexandrescu

On 05/24/2010 04:38 PM, Steven Schveighoffer wrote:

On Mon, 24 May 2010 17:35:11 -0400, Andrei Alexandrescu
 wrote:


On 05/24/2010 04:08 PM, Steven Schveighoffer wrote:

On Mon, 24 May 2010 16:27:46 -0400, Andrei Alexandrescu
 wrote:


Sorry. Yes, by-key iteration should be possible.


OK, so we should be able to iterate keys. And the keys are not stored in
the trie structure itself. So how does one iterate the keys of the
container without reconstructing them from the trie nodes using the
heap?


You can't. At some point you need to copy tree traces into T[] arrays.
If the trie stores parent nodes, you don't need to do that as you can
expose a trace as a double-ended range containing the key.

There's a kind of trie that was defined to avoid such issues; it
stores the keys in clear, in the leaves, at the expense of
duplication. I don't remember the name offhand. Does anyone?


OK, so the keys function of Map should work just fine for a Trie
implementation. Good to know.


This wins a little battle but not the war. Long term you're facing the 
sysyphic job of either knocking new containers into the existing 
interfaces, or extending the interfaces to accommodate new containers. 
It doesn't look to me like a winning proposition.


The FIC (Federation of Independent Containers) does not have that 
problem. It does have its specifications and guidelines but whichever 
container doesn't support some or even all of the required methods can 
simply define its own under other names. Then users can play with the 
containers and with the ranges they define as they find fit.



Andrei


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Steven Schveighoffer
On Mon, 24 May 2010 17:35:11 -0400, Andrei Alexandrescu  
 wrote:



On 05/24/2010 04:08 PM, Steven Schveighoffer wrote:

On Mon, 24 May 2010 16:27:46 -0400, Andrei Alexandrescu
 wrote:


Sorry. Yes, by-key iteration should be possible.


OK, so we should be able to iterate keys. And the keys are not stored in
the trie structure itself. So how does one iterate the keys of the
container without reconstructing them from the trie nodes using the
heap?


You can't. At some point you need to copy tree traces into T[] arrays.  
If the trie stores parent nodes, you don't need to do that as you can  
expose a trace as a double-ended range containing the key.


There's a kind of trie that was defined to avoid such issues; it stores  
the keys in clear, in the leaves, at the expense of duplication. I don't  
remember the name offhand. Does anyone?


OK, so the keys function of Map should work just fine for a Trie  
implementation.  Good to know.


-Steve


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Andrei Alexandrescu

On 05/24/2010 04:08 PM, Steven Schveighoffer wrote:

On Mon, 24 May 2010 16:27:46 -0400, Andrei Alexandrescu
 wrote:


Sorry. Yes, by-key iteration should be possible.


OK, so we should be able to iterate keys. And the keys are not stored in
the trie structure itself. So how does one iterate the keys of the
container without reconstructing them from the trie nodes using the
heap?


You can't. At some point you need to copy tree traces into T[] arrays. 
If the trie stores parent nodes, you don't need to do that as you can 
expose a trace as a double-ended range containing the key.


There's a kind of trie that was defined to avoid such issues; it stores 
the keys in clear, in the leaves, at the expense of duplication. I don't 
remember the name offhand. Does anyone?



Andrei



Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Steven Schveighoffer
On Mon, 24 May 2010 16:27:46 -0400, Andrei Alexandrescu  
 wrote:



Sorry. Yes, by-key iteration should be possible.


OK, so we should be able to iterate keys.  And the keys are not stored in  
the trie structure itself.  So how does one iterate the keys of the  
container without reconstructing them from the trie nodes using the heap?   
Forget about interfaces, pretend you were writing it separate from  
dcollections.


-Steve


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Steven Schveighoffer
On Mon, 24 May 2010 16:07:22 -0400, Andrei Alexandrescu  
 wrote:



On 05/24/2010 01:49 PM, Steven Schveighoffer wrote:

On Mon, 24 May 2010 12:27:10 -0400, Andrei Alexandrescu
 wrote:

A pointer to the end element would be required for both appending and
back().


This further erodes my confidence. Size needs to be maintained _and_ a
pointer to the last element must be maintained. For many uses, all of
this stuff is unnecessary overhead.


You make it sound like incrementing a counter and changing a pointer are
incredibly slow operations :)


They are. Ask Walter for a good example.


Actually, I realize that pointing to the end node isn't good enough,  
because once you remove that node, there is no way to get to the previous  
node without traversing the list again.


So slist cannot implement the List interface, you were right.


The overhead of storage is minimal
compared to the elements of the list, which do not contain said  
overhead.


I'm talking containers of lists here.


I was more talking about the ratio of elements to lists.  For example, the  
Hash implementation uses the same link list nodes as elements in its  
table, and generally a hash has very few elements per list, so overhead on  
the list head would be too much.  On top of that, a LinkList has its own  
allocator, meaning each one consumes a large chunk of data assuming you  
will be adding lots of elements to it.


So maybe we need to refine the implementation building blocks that back  
the dcollections classes so they are useable in special situations where  
performance is critical.  It actually would be pretty trivial to define  
SLink as a specialization of dcollections' Link struct, and that would  
essentially give you a lot of functionality in terms of your ideal list  
type.



Which may be acceptable to you if you
want the bleeding fastest speed available. There will always be tailored
code crafted to be as fast as possible for some specific design and
dcollections and your slist just won't fit the bill.


Except that the set is considerably smaller for a well-designed slist.


The set of problems that an slist solves is also considerably smaller.   
Having that backwards capability enables more algorithms.


-Steve


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Steven Schveighoffer
On Mon, 24 May 2010 16:35:03 -0400, bearophile   
wrote:



Andrei Alexandrescu:

When was the last time you measured? I thought the speed has largely
improved since Steve integrated his work.


I have timed it after that integration. I have seen a performance  
improvement, but it's small. I can perform some syntactic benchmarks.


Performance was vastly improved for situations where one was appending to  
multiple arrays at once.  For appending to a single array in a loop, it is  
improved, but not that much.


But the main improvement was for safety.  The append operation on D's  
dynamic arrays is a tradeoff between footprint, speed, and safety.  It  
also does not and should not compromise performance on operations besides  
appending.


You will always be able to outperform the D append operation with more  
focus on the append operation, but D's array appending is fine for most  
situations.


-Steve


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread bearophile
Andrei Alexandrescu:
> When was the last time you measured? I thought the speed has largely 
> improved since Steve integrated his work.

I have timed it after that integration. I have seen a performance improvement, 
but it's small. I can perform some syntactic benchmarks.

Later,
bearophile


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Andrei Alexandrescu

On 05/24/2010 03:18 PM, Steven Schveighoffer wrote:

On Mon, 24 May 2010 15:58:46 -0400, Andrei Alexandrescu
 wrote:


On 05/24/2010 01:12 PM, Steven Schveighoffer wrote:

On Mon, 24 May 2010 13:36:01 -0400, Andrei Alexandrescu
 wrote:


On 05/24/2010 12:11 PM, bearophile wrote:

Steven Schveighoffer:

Is it unreasonable to expect to be able
to iterate the keys in a trie? (I don't really know, I've never
worked
with them)


On tries you can iterate on all keys or on a sorted range of the keys.


There's one more way of iteration that's unique to tries - by key
element (e.g. if key is string, you iterate by character). That would
be an exploratory iteration.

At each step in the iteration you pass a character, so popFront()
takes an argument, it's popFront(char). trie.popFront('a') takes you
to all elements in the trie that start with 'a'. Then you can continue
exploring by e.g. trie.popFront('x') which takes you to all elements
that start with "ax". At any point the range may become dry (nothing
with this prefix). If the range is not empty, front() gives you
information about all strings that share the particular prefix you
have spanned (count and breadth comes to mind).


That could be supported.


To paraphrase a commercial, "there's an interface for that" :o).


No there isn't. I wouldn't make that an interface, as it's not
foreachable. I'd build a special range for it.


But iterating over the keys, is that not advisable? Would your trie
iterator be the only way to iterate over the elements? It seems rather
non-useful.


I think a trie, like many other nontrivial containers, supports
several means of iteration.


Including keys?!!!

Still not a straight answer. If you should be able to iterate keys, then
say you should be able to iterate keys. If you shouldn't, then say that.


Sorry. Yes, by-key iteration should be possible.

Andrei


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Steven Schveighoffer
On Mon, 24 May 2010 15:58:46 -0400, Andrei Alexandrescu  
 wrote:



On 05/24/2010 01:12 PM, Steven Schveighoffer wrote:

On Mon, 24 May 2010 13:36:01 -0400, Andrei Alexandrescu
 wrote:


On 05/24/2010 12:11 PM, bearophile wrote:

Steven Schveighoffer:

Is it unreasonable to expect to be able
to iterate the keys in a trie? (I don't really know, I've never  
worked

with them)


On tries you can iterate on all keys or on a sorted range of the keys.


There's one more way of iteration that's unique to tries - by key
element (e.g. if key is string, you iterate by character). That would
be an exploratory iteration.

At each step in the iteration you pass a character, so popFront()
takes an argument, it's popFront(char). trie.popFront('a') takes you
to all elements in the trie that start with 'a'. Then you can continue
exploring by e.g. trie.popFront('x') which takes you to all elements
that start with "ax". At any point the range may become dry (nothing
with this prefix). If the range is not empty, front() gives you
information about all strings that share the particular prefix you
have spanned (count and breadth comes to mind).


That could be supported.


To paraphrase a commercial, "there's an interface for that" :o).


No there isn't.  I wouldn't make that an interface, as it's not  
foreachable.  I'd build a special range for it.



But iterating over the keys, is that not advisable? Would your trie
iterator be the only way to iterate over the elements? It seems rather
non-useful.


I think a trie, like many other nontrivial containers, supports several  
means of iteration.


Including keys?!!!

Still not a straight answer.  If you should be able to iterate keys, then  
say you should be able to iterate keys.  If you shouldn't, then say that.



Or is there another way to iterating the keys that would be more
efficient than building an array to contain the 'key' for each node?


I don't know. I'm just glad I can explore that without having to think  
in interfaces.


I don't really know what this means.  The interfaces are a way to plug  
containers in at runtime.  They are not meant to be the entire API for the  
collection.  You can absolutely add whatever functionality that does not  
fit in the interface as an additional feature, and users will have to use  
the exact type in order to use that feature.


-Steve


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Andrei Alexandrescu

On 05/24/2010 02:29 PM, bearophile wrote:

Steven Schveighoffer:

Look at D's arrays.  Is appending with D's arrays the fastest it possibly
could be?  Hell no, but it's good enough for most situations, and safe.


Append in D dynamic arrays is awful, it's slow and requires complex code (and 
currently there are not explanations about how it works in the D docs, see 
http://d.puremagic.com/issues/show_bug.cgi?id=4091 ), so it's not a good 
example to be copied :-) It's not fast enough for most situations, that's why 
there is the need of an Appender still.


When was the last time you measured? I thought the speed has largely 
improved since Steve integrated his work.


Andrei


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Andrei Alexandrescu

On 05/24/2010 02:01 PM, Pelle wrote:

On 05/24/2010 06:27 PM, Andrei Alexandrescu wrote:

struct List { int v; List * next; }



While I do agree with that design for a list, that is no reference type.


List* is.

My point was that the pressure for a really simple hand-rolled SLL is 
huge. A good abstraction should convince the Walters of this world that 
the abstraction is actually better than the hand-rolled SLL.


Andrei


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Andrei Alexandrescu

On 05/24/2010 01:49 PM, Steven Schveighoffer wrote:

On Mon, 24 May 2010 12:27:10 -0400, Andrei Alexandrescu
 wrote:

A pointer to the end element would be required for both appending and
back().


This further erodes my confidence. Size needs to be maintained _and_ a
pointer to the last element must be maintained. For many uses, all of
this stuff is unnecessary overhead.


You make it sound like incrementing a counter and changing a pointer are
incredibly slow operations :)


They are. Ask Walter for a good example.


The overhead of storage is minimal
compared to the elements of the list, which do not contain said overhead.


I'm talking containers of lists here.


I think we need to build the shared vision that in Phobos we're
competing against hand-written, highly-optimized code. This is the
fight STL took and largely won. We can't afford to lower our standards
for one tiny bit.

I was once talking over a beer about the advantages of container
abstractions. Walter slapped my face by defining the fastest SLL in
the world on a napkin. It looked like this:

struct List { int v; List * next; }

He took so little time to define that, and the primitives he needed
were so simple and fast, that it was an absolute overkill to replace
that with a generic whiz-bang container. If I'd mentioned there'd be
any extra data involved, that would mean the end of negotiation. "Why
do you need that?" "For length()!" "But I don't need length()!" "Well
you have to." "That's Communism in design!"

And I agree.


But something like the above is prone to all sorts of nasty things, like
inadvertent cycles in your list.


Yeah, and I'd have my list of arguments to add too.


Which may be acceptable to you if you
want the bleeding fastest speed available. There will always be tailored
code crafted to be as fast as possible for some specific design and
dcollections and your slist just won't fit the bill.


Except that the set is considerably smaller for a well-designed slist.


And dcollections' link list node is exactly what you wrote there (with a
prev pointer of course). The only difference is, I don't define an
element is the same as a list. The whole list has it's own data,
including a pointer to the first element in the list.

Look at D's arrays. Is appending with D's arrays the fastest it possibly
could be? Hell no, but it's good enough for most situations, and safe.


If I could make them faster without adversely affecting the performance 
of other operations, I would. There's considerable constraints at plain 
in the array design for historical and other reasons.



Andrei


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread bearophile
Steven Schveighoffer:

> complex code?
> a ~= b;
> Seems pretty not complex.

I meant on the implementation side. Arrays are used all the time, and they are 
basic blocks to be used everywhere so a programmer probably must understand how 
they work inside.


> That part is still pretty new.  What do you find is messy and complex  
> about it?

The internal semantics of the current design of dynamic arrays is not 
transparent enough, this means I often don't know what it is doing. This can be 
acceptable for a more complex data structure as the associative arrays.
And it's not easy to know exactly where to use assumeSafeAppend.
Despite their complexity, D dynamic arrays are not safe enough, see the small 
thread about passing D dynamic arrays by reference on default.


> When I said "good enough for most situations" I didn't mean "most  
> situations where appending speed is crucial to the success of the  
> application" :P

I have written few hundreds of small D programs, and I have seen that appending 
often happens (more often that I have originally thought) in points where 
performance is important enough. Using the Appender of my dlibs was usually 
able to solve the problem.

I don't know better solutions, so maybe I'm just wasting your time, I am sorry. 
Before seeing D I have never thought that designing good dynamic arrays can be 
so hard :-)

Bye,
bearophile


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Andrei Alexandrescu

On 05/24/2010 01:12 PM, Steven Schveighoffer wrote:

On Mon, 24 May 2010 13:36:01 -0400, Andrei Alexandrescu
 wrote:


On 05/24/2010 12:11 PM, bearophile wrote:

Steven Schveighoffer:

Is it unreasonable to expect to be able
to iterate the keys in a trie? (I don't really know, I've never worked
with them)


On tries you can iterate on all keys or on a sorted range of the keys.


There's one more way of iteration that's unique to tries - by key
element (e.g. if key is string, you iterate by character). That would
be an exploratory iteration.

At each step in the iteration you pass a character, so popFront()
takes an argument, it's popFront(char). trie.popFront('a') takes you
to all elements in the trie that start with 'a'. Then you can continue
exploring by e.g. trie.popFront('x') which takes you to all elements
that start with "ax". At any point the range may become dry (nothing
with this prefix). If the range is not empty, front() gives you
information about all strings that share the particular prefix you
have spanned (count and breadth comes to mind).


That could be supported.


To paraphrase a commercial, "there's an interface for that" :o).


But iterating over the keys, is that not advisable? Would your trie
iterator be the only way to iterate over the elements? It seems rather
non-useful.


I think a trie, like many other nontrivial containers, supports several 
means of iteration.



Or is there another way to iterating the keys that would be more
efficient than building an array to contain the 'key' for each node?


I don't know. I'm just glad I can explore that without having to think 
in interfaces.



Andrei


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Steven Schveighoffer
On Mon, 24 May 2010 15:29:23 -0400, bearophile   
wrote:



Steven Schveighoffer:
Look at D's arrays.  Is appending with D's arrays the fastest it  
possibly

could be?  Hell no, but it's good enough for most situations, and safe.


Append in D dynamic arrays is awful, it's slow and requires complex code.


complex code?

a ~= b;

Seems pretty not complex.

-Steve


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Steven Schveighoffer
On Mon, 24 May 2010 15:32:01 -0400, bearophile   
wrote:


And the appending is hard to use too, see the ridiculously complex to  
use & messy capacity, reserve and  and assumeSafeAppend. So overall it's  
a good example of what I will never want to copy.


That part is still pretty new.  What do you find is messy and complex  
about it?


When I said "good enough for most situations" I didn't mean "most  
situations where appending speed is crucial to the success of the  
application" :P


Most situations means you need to do appending once in a while.  A perfect  
example is something like toStringz.  And most situations do not require  
the three above mentioned functions.


-Steve


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread bearophile
And the appending is hard to use too, see the ridiculously complex to use & 
messy capacity, reserve and  and assumeSafeAppend. So overall it's a good 
example of what I will never want to copy.

Bye,
bearophile


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread bearophile
Steven Schveighoffer:
> Look at D's arrays.  Is appending with D's arrays the fastest it possibly  
> could be?  Hell no, but it's good enough for most situations, and safe.

Append in D dynamic arrays is awful, it's slow and requires complex code (and 
currently there are not explanations about how it works in the D docs, see 
http://d.puremagic.com/issues/show_bug.cgi?id=4091 ), so it's not a good 
example to be copied :-) It's not fast enough for most situations, that's why 
there is the need of an Appender still.

Bye,
bearophile


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Pelle

On 05/24/2010 06:27 PM, Andrei Alexandrescu wrote:

struct List { int v; List * next; }



While I do agree with that design for a list, that is no reference type.

I thought we wanted reference types.


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Steven Schveighoffer
On Mon, 24 May 2010 12:27:10 -0400, Andrei Alexandrescu  
 wrote:



On 05/24/2010 11:14 AM, Steven Schveighoffer wrote:

On Mon, 24 May 2010 11:45:44 -0400, Andrei Alexandrescu
 wrote:


On 05/24/2010 10:23 AM, Steven Schveighoffer wrote:

On Mon, 24 May 2010 10:10:51 -0400, Andrei Alexandrescu
 wrote:


On 05/24/2010 06:54 AM, Steven Schveighoffer wrote:

length is allowed to return NO_LENGTH_SUPPORT if O(1) length isn't
possible,


Do you agree that's an awkwardness?


Yes, at that point it's an optimization. The only place where I assume
length could be invalid is when working with the Iterator!V type,  
which

might not be a dcollections object.


That doesn't matter. Since you define NO_LENGTH_SUPPORT and you have
interfaces, the door is ajar forever. Client code can't write this:

auto total = cont1.length + cont2.length;

because that is incorrect code (that compiles, runs, and produces some
odd result).

So don't take it lightly. NO_LENGTH_SUPPORT means no length support.
Then pretending it's supported will only harm everyone.


I get what you are saying. What I meant was it's moot if you are not
using interfaces. If you don't know what the underlying type is, then
you have to do a runtime check.

I agree it's awkward, and I could have not included length as a member
of Iterator, in which case it would be on all the container types and
NO_LENGTH_SUPPORT would not exist. All containers are meant to have a
valid length, it is only with non-container Iterators that length can be
NO_LENGTH_SUPPORT.


However, supporting non-length containers via
generic programming vs. interfaces would be much smoother.


In that case, you can do
coll.begin.empty to determine if the collection has any elements.


Then why not make length optional and define empty? From the above it
looks like an obviously better design.


But all dcollections are bi-directional anyways, there is no singly
linked list. That was a decision I made early on, because it allows
more
assumptions about dcollections' containers. I think length-less  
singly

linked lists would be a good addition to phobos that are not part
of the
collection hierarchy, they are almost on par with builtin arrays as
being so simple.


Do you see dcollections' inability to express slists as a problem?


I don't see them as unable to express slists. They would break the
design guidelines that I set for collections being iterable in both
directions, but an slist fits fine within the List interface.


What's the required complexity of concat?


O(n) with n being the number of elements added


Is add expected to put things in a specific place? How does slist
implement back()?

slist does not fit the List interface.


A pointer to the end element would be required for both appending and
back().


This further erodes my confidence. Size needs to be maintained _and_ a  
pointer to the last element must be maintained. For many uses, all of  
this stuff is unnecessary overhead.


You make it sound like incrementing a counter and changing a pointer are  
incredibly slow operations :)  The overhead of storage is minimal compared  
to the elements of the list, which do not contain said overhead.


I think we need to build the shared vision that in Phobos we're  
competing against hand-written, highly-optimized code. This is the fight  
STL took and largely won. We can't afford to lower our standards for one  
tiny bit.


I was once talking over a beer about the advantages of container  
abstractions. Walter slapped my face by defining the fastest SLL in the  
world on a napkin. It looked like this:


struct List { int v; List * next; }

He took so little time to define that, and the primitives he needed were  
so simple and fast, that it was an absolute overkill to replace that  
with a generic whiz-bang container. If I'd mentioned there'd be any  
extra data involved, that would mean the end of negotiation. "Why do you  
need that?" "For length()!" "But I don't need length()!" "Well you have  
to." "That's Communism in design!"


And I agree.


But something like the above is prone to all sorts of nasty things, like  
inadvertent cycles in your list.  Which may be acceptable to you if you  
want the bleeding fastest speed available.  There will always be tailored  
code crafted to be as fast as possible for some specific design and  
dcollections and your slist just won't fit the bill.


And dcollections' link list node is exactly what you wrote there (with a  
prev pointer of course).  The only difference is, I don't define an  
element is the same as a list.  The whole list has it's own data,  
including a pointer to the first element in the list.


Look at D's arrays.  Is appending with D's arrays the fastest it possibly  
could be?  Hell no, but it's good enough for most situations, and safe.


-Steve


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Steven Schveighoffer
On Thu, 20 May 2010 02:11:58 -0400, Bernard Helyer   
wrote:



On 20/05/10 13:39, Bernard Helyer wrote:

Oooohhh goody goody goody!  n_n


I'm in the process of making a little toy project ATM. I'll shall
integrate dcollections 2.0a into ASAP.


ArrayList doesn't compile with warnings as it overrides opEquals, but  
doesn't mark it as such.


http://www.dsource.org/projects/dcollections/ticket/5

Any other problems, please create a ticket (including your Thing one, but  
I'm not sure what you were doing there :)


-Steve


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Steven Schveighoffer
On Mon, 24 May 2010 13:36:01 -0400, Andrei Alexandrescu  
 wrote:



On 05/24/2010 12:11 PM, bearophile wrote:

Steven Schveighoffer:

Is it unreasonable to expect to be able
to iterate the keys in a trie?  (I don't really know, I've never worked
with them)


On tries you can iterate on all keys or on a sorted range of the keys.


There's one more way of iteration that's unique to tries - by key  
element (e.g. if key is string, you iterate by character). That would be  
an exploratory iteration.


At each step in the iteration you pass a character, so popFront() takes  
an argument, it's popFront(char). trie.popFront('a') takes you to all  
elements in the trie that start with 'a'. Then you can continue  
exploring by e.g. trie.popFront('x') which takes you to all elements  
that start with "ax". At any point the range may become dry (nothing  
with this prefix). If the range is not empty, front() gives you  
information about all strings that share the particular prefix you have  
spanned (count and breadth comes to mind).


That could be supported.

But iterating over the keys, is that not advisable?  Would your trie  
iterator be the only way to iterate over the elements?  It seems rather  
non-useful.


Or is there another way to iterating the keys that would be more efficient  
than building an array to contain the 'key' for each node?


-Steve


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Andrei Alexandrescu

On 05/24/2010 12:11 PM, bearophile wrote:

Steven Schveighoffer:

Is it unreasonable to expect to be able
to iterate the keys in a trie?  (I don't really know, I've never worked
with them)


On tries you can iterate on all keys or on a sorted range of the keys.


There's one more way of iteration that's unique to tries - by key 
element (e.g. if key is string, you iterate by character). That would be 
an exploratory iteration.


At each step in the iteration you pass a character, so popFront() takes 
an argument, it's popFront(char). trie.popFront('a') takes you to all 
elements in the trie that start with 'a'. Then you can continue 
exploring by e.g. trie.popFront('x') which takes you to all elements 
that start with "ax". At any point the range may become dry (nothing 
with this prefix). If the range is not empty, front() gives you 
information about all strings that share the particular prefix you have 
spanned (count and breadth comes to mind).



Andrei


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Steven Schveighoffer
On Mon, 24 May 2010 13:11:27 -0400, Simen kjaeraas  
 wrote:



Steven Schveighoffer  wrote:

And if 3rd party X _needs_ to use interfaces, and 3rd party Y _needs_  
to use interfaces, and your code depends on X and Y, and interfaces  
aren't defined by dcollections, where are you then?


You're at the point where the language allows you to create a class
following an interface, which whose all methods redirect to those of
the struct. This could even be done automagically.


A situation Robert has stated he would like to avoid.

-Steve


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Simen kjaeraas

Steven Schveighoffer  wrote:

And if 3rd party X _needs_ to use interfaces, and 3rd party Y _needs_ to  
use interfaces, and your code depends on X and Y, and interfaces aren't  
defined by dcollections, where are you then?


You're at the point where the language allows you to create a class
following an interface, which whose all methods redirect to those of
the struct. This could even be done automagically.

--
Simen


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread bearophile
Steven Schveighoffer:
> Is it unreasonable to expect to be able  
> to iterate the keys in a trie?  (I don't really know, I've never worked  
> with them)

On tries you can iterate on all keys or on a sorted range of the keys.

Bye,
bearophile


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Steven Schveighoffer
On Mon, 24 May 2010 13:04:31 -0400, bearophile   
wrote:



Steven Schveighoffer:
Scope class members should solve this.  It's been thrown around forever  
--
essentially, you should be able to define that a class member's storage  
is

contained within the owner object.


I too have proposed this idea, but in my opinion you can't design a big  
part of a standard lib on the base of a syntax/language optimization  
that doesn't exists yet.


Yeah, but essentially, it's an optimization.  Whether the storage is  
stored in the same heap block or a different one really doesn't matter in  
terms of functionality.  Allocating one heap block vs. two is faster,  
that's what the OP was saying.


In other words, scope class members aren't essential to using dcollections.

-Steve


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread bearophile
Steven Schveighoffer:
> Scope class members should solve this.  It's been thrown around forever --  
> essentially, you should be able to define that a class member's storage is  
> contained within the owner object.

I too have proposed this idea, but in my opinion you can't design a big part of 
a standard lib on the base of a syntax/language optimization that doesn't 
exists yet.

Bye,
bearophile


Re: struct vs. class containers (was: Re: dcollections 1.0 and 2.0a beta released)

2010-05-24 Thread Steven Schveighoffer
On Mon, 24 May 2010 12:18:02 -0400, Andrei Alexandrescu  
 wrote:



On 05/24/2010 10:58 AM, Steven Schveighoffer wrote:

On Sun, 23 May 2010 17:36:52 -0400, Andrei Alexandrescu
 wrote:


I've thought for a very long time about the class vs. struct choice in
a container, and I came to a startling conclusion: it (almost) doesn't
matter. Could be either, and the tradeoffs involved are nonessential.
Here they are:

1. Using a class makes implementing members easier because there's no
need to do work through an additional member. With a struct, you need
a pimpl approach. For example:

struct Array {
struct Impl {
...
}
Impl * impl;
...
@property length() const { return impl->length; }
...
}

2. struct gives you more power in managing collection's own memory, as
long as the collection doesn't escape item addresses or other
addresses of internal handles. So all other things being equal, struct
has a net advantage.


Yes, but most containers are node-based, so as long as you use structs
for nodes, you are generally in control of the bulk of allocation
(dcollections does this).


I'm saying, the net advantage of struct is that it can safely and  
deterministically release memory in its destructor.


OK.




3. The creation syntaxes are different. For Phobos, I suggest adding a
simple function make() to std.algorithm. make!T(a, b, c) returns a
newly constructed object T by invoking the constructor with arguments
a, b, and c. That way we can make client code virtually agnostic of
the class/struct choice for a container.



Classes have builtin object monitors. But you could handle that via a
struct as well. The language support wouldn't be there though.


Oh, and the monitor isn't needed so that's an unused word there. I'd  
forgotten about it.


Why isn't the monitor needed?  Will we no longer be able to use a globally  
shared container object?





That's it. Otherwise, one could use either to build a container. Let
me note that I have reached the conclusion that containers should be
at best reference types, with a meta-constructor Value!(C) that takes
a container C and makes it into a value type.


The thing that classes really give you is the language support. Structs
need to be hand-crafted to support the same syntax. Classes are enforced
as always being reference types and always being on the heap.


I'd agreed classes are more convenient because they make the  
implementation straightforward.



The
Value!(C) is questionable because creating the head of a container on
the stack leads to easily escaped stack references.


I don't understand this. Could you give an example?


For example, dcollections' TreeMap has an 'end' node that defines the end  
of the container.  It actually is the root of the RB tree and all the data  
is on the left child of the 'end' node.  This makes it always the last  
node iterated.


The end node is actually member of the class, it's not allocated  
separately on the heap for efficiency and a less complex constructor.  So  
let's say TreeMap could be on the stack, here's a situation where an  
escape would occur:


auto foo()
{
   TreeMap!(string, uint) tmap; // allocated on stack
   ... // fill up tmap;
   return tmap["hi"..tmap.end]; // oops! end is part of the range, and it  
was allocated on the stack.

}

In dcollections, all the classes have some sort of members that are  
associated with the entire container, not the elements.  These would  
disappear once a stack-based container exited scope, but there may be  
dangling references returned.


To get around this, you could ensure that all data was allocated on the  
heap, but then wouldn't Value!(C) be almost useless?  What data would be  
allocated on the stack that is part of the container that would never be  
referenced by the heap-based parts?


-Steve


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Steven Schveighoffer
On Mon, 24 May 2010 12:06:18 -0400, Robert Jacques   
wrote:


On Mon, 24 May 2010 08:06:29 -0400, Steven Schveighoffer  
 wrote:
On Thu, 20 May 2010 12:46:59 -0400, Robert Jacques   
wrote:


On Thu, 20 May 2010 06:34:42 -0400, Steven Schveighoffer  
 wrote:

[snip]


I understand these points, but I'm already using interfaces to copy  
data between containers.  I don't have to, I could have used generic  
code, but this way, only one function is instantiated to copy data  
from all the other containers.  The problem with using generic code  
is that the compiler will needlessly duplicate functions that are  
identical.


This sounds like a failure of design. Why aren't you using ranges to  
do this?


Why are ranges necessarily better?  I'm using the container's opApply,  
which I'd probably still use even if there were no interfaces.  opApply  
allows much more possibilities in traversal than ranges which cannot  
use stack recursion without heap activity.


Ranges are not necessarily better and may have some minor amount of  
overhead over a well optimized opApply. Then again, the opposite may be  
true. The point is that if the difference between opApply and a range is  
more than trivial, you've probably had a failure of design occur.


The difference is trivial.  I support both ranges and opApply.  The main  
benefit from using opApply being only one function is compiled by the  
compiler.




Using interfaces is not as viral as you think.  My interfaces can be  
used in generic code, as long as the generic code uses functions in  
the interfaces.  If a library returns an interface, the author is  
saying "I don't want you using any functions outside this interface,"  
so why is that a bad thing?


Well, needlessly duplicated functions for one. :) More importantly,  
the example I gave was about third party libraries which I have no  
control over. So this solution explicitly doesn't work. And even if  
everyone used templates everywhere in order to be compatible with both  
interfaces and classes, isn't that a viral effect of having both?


If a 3rd party library uses interfaces, it's probably for good reason.   
They most likely want to remain binary compatible with other libs,  
and/or want to abstract the implementation of some custom container  
type.  If you don't like their requirements, don't use the library.


-Steve


No, if a 3rd party library _needs_ to use interfaces it's probably for a  
good reason. The problem is, if they exist, people are going to use them  
even if they don't need them. Which therein lies the problem.


And if 3rd party X _needs_ to use interfaces, and 3rd party Y _needs_ to  
use interfaces, and your code depends on X and Y, and interfaces aren't  
defined by dcollections, where are you then?  I don't think there's any  
way you can define a generic standard library such that people won't  
create bad designs, that's a lost cause.


I think part of the problem with all this is that interfaces aren't likely  
to be needed any time soon (D's dynamic lib support is almost  
non-existent), so I'm probably going to drop the interfaces for now in the  
interest of progress.


-Steve


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Steven Schveighoffer
On Mon, 24 May 2010 11:45:51 -0400, Andrei Alexandrescu  
 wrote:



On 05/24/2010 10:01 AM, Steven Schveighoffer wrote:

On Fri, 21 May 2010 13:42:14 -0400, Andrei Alexandrescu
 wrote:


On 05/19/2010 08:42 PM, Steven Schveighoffer wrote:

Andrei Alexandrescu Wrote:


I.e. there aren't many kinds of HashMaps that derive from each
other. But the interfaces are not detrimental to your ideas. The
only thing interfaces require is that the entities implementing
them are classes and not structs. As long as you agree that
classes are the right call, then interfaces can co-exist with
your other suggestions without interference.


This brings back a discussion I had with Walter a while ago, with
echoes in the newsgroup. Basically the conclusion was as follows:
if a container never escapes the addresses of its elements, it can
manage its own storage. That forces, however, the container to be a
struct because copying references to a class container would break
that encapsulation. I called those "perfectly encapsulated
containers" and I think they are good candidates for manual memory
management because they tend to deal in relatively large chunks.

I noticed that your collections return things by value, so they are
 good candidates for perfect encapsulation.


Then you need reference counting, I don't think that is a good
design.


Why?

I think refcounting for containers is perfect.

Refcounting small objects is bad because the counter overhead is large.
Also, small objects tend to be created and passed around a lot, so the
time overhead is significant.

In contrast, refcounting and containers are a perfect marriage. The
container is a relatively large conglomerate of objects so refcounting
that is bound to yield good benefits in terms of memory reclamation.


I meant refcounting elements.  If you are simply refcounting the  
container, what do you do when someone wants to remove a node?  Not allow  
it if more than one reference to the container exists?




Java has some warts as you have rightfully pointed out in the past
(i.e. O(n) lookup), but I have attempted to remove all those warts.
Dcollections I would hope does not suffer from the problems that
Java's collections suffer from.


That's great. But let me quote what you said: "I myself don't really use
the interface aspect of the classes, it is mostly a carryover from the
Java/Tango inspirations." I took that to mean you don't have a strong
justification for structuring dcollections as a hierarchy, and
furthermore that makes me hope it's possible you'd be willing to yank
that dinosaur brain.


I did not have a strong justification for using interfaces originally, and  
my first interface design shows that.  But I think with careful design,  
the interfaces in the D2 version are pretty useful in some situations, all  
of which could be essential to a project's design.


In other words, I don't see the fact that Java or Tango's original  
collection package had interfaces as a "wart".


The problem I see right now is, a lot of that utility is moot since D is  
not very good at dynamic linking.  Since you necessarily have to  
statically link Phobos and dcollections, it makes no sense to keep binary  
compatibility, or hide implementation.



Looks like we're heading straight to stalemate once again.

In the interest of time, it would be better to get to stalemate (or,  
hopefully, agreement) so we know whether dcollections will be integrated  
within Phobos or whether I should spend this and next weeks' evenings  
coding. To that end, please let me know whether it's worth that I spend  
time on putting together a list of proposed changes.


I think if we can keep dcollections as classes, then the opportunity to  
have interfaces still exists for the future.  If that's the case, we can  
ditch the interfaces for now, and revisit it when D's dynamic lib support  
gets better.


So let's move on to other issues.  I haven't changed my mind on interface  
utility, but there seems to be not much support for the idea.


-Steve


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Andrei Alexandrescu

On 05/24/2010 11:14 AM, Steven Schveighoffer wrote:

On Mon, 24 May 2010 11:45:44 -0400, Andrei Alexandrescu
 wrote:


On 05/24/2010 10:23 AM, Steven Schveighoffer wrote:

On Mon, 24 May 2010 10:10:51 -0400, Andrei Alexandrescu
 wrote:


On 05/24/2010 06:54 AM, Steven Schveighoffer wrote:

length is allowed to return NO_LENGTH_SUPPORT if O(1) length isn't
possible,


Do you agree that's an awkwardness?


Yes, at that point it's an optimization. The only place where I assume
length could be invalid is when working with the Iterator!V type, which
might not be a dcollections object.


That doesn't matter. Since you define NO_LENGTH_SUPPORT and you have
interfaces, the door is ajar forever. Client code can't write this:

auto total = cont1.length + cont2.length;

because that is incorrect code (that compiles, runs, and produces some
odd result).

So don't take it lightly. NO_LENGTH_SUPPORT means no length support.
Then pretending it's supported will only harm everyone.


I get what you are saying. What I meant was it's moot if you are not
using interfaces. If you don't know what the underlying type is, then
you have to do a runtime check.

I agree it's awkward, and I could have not included length as a member
of Iterator, in which case it would be on all the container types and
NO_LENGTH_SUPPORT would not exist. All containers are meant to have a
valid length, it is only with non-container Iterators that length can be
NO_LENGTH_SUPPORT.


However, supporting non-length containers via
generic programming vs. interfaces would be much smoother.


In that case, you can do
coll.begin.empty to determine if the collection has any elements.


Then why not make length optional and define empty? From the above it
looks like an obviously better design.


But all dcollections are bi-directional anyways, there is no singly
linked list. That was a decision I made early on, because it allows
more
assumptions about dcollections' containers. I think length-less singly
linked lists would be a good addition to phobos that are not part
of the
collection hierarchy, they are almost on par with builtin arrays as
being so simple.


Do you see dcollections' inability to express slists as a problem?


I don't see them as unable to express slists. They would break the
design guidelines that I set for collections being iterable in both
directions, but an slist fits fine within the List interface.


What's the required complexity of concat?


O(n) with n being the number of elements added


Is add expected to put things in a specific place? How does slist
implement back()?

slist does not fit the List interface.


A pointer to the end element would be required for both appending and
back().


This further erodes my confidence. Size needs to be maintained _and_ a 
pointer to the last element must be maintained. For many uses, all of 
this stuff is unnecessary overhead.


I think we need to build the shared vision that in Phobos we're 
competing against hand-written, highly-optimized code. This is the fight 
STL took and largely won. We can't afford to lower our standards for one 
tiny bit.


I was once talking over a beer about the advantages of container 
abstractions. Walter slapped my face by defining the fastest SLL in the 
world on a napkin. It looked like this:


struct List { int v; List * next; }

He took so little time to define that, and the primitives he needed were 
so simple and fast, that it was an absolute overkill to replace that 
with a generic whiz-bang container. If I'd mentioned there'd be any 
extra data involved, that would mean the end of negotiation. "Why do you 
need that?" "For length()!" "But I don't need length()!" "Well you have 
to." "That's Communism in design!"


And I agree.


Andrei


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Steven Schveighoffer
On Mon, 24 May 2010 11:51:39 -0400, Andrei Alexandrescu  
 wrote:



On 05/24/2010 10:39 AM, Steven Schveighoffer wrote:

On Mon, 24 May 2010 11:21:20 -0400, Andrei Alexandrescu
 wrote:


On 05/24/2010 06:54 AM, Steven Schveighoffer wrote:

I am not familiar with tries,


Missed that upon the first read. I suggest you look at tries and the
following other structures as good examples that it's futile to fit
collections into hierarchies.

http://en.wikipedia.org/wiki/Trie
http://linux.thai.net/~thep/datrie/datrie.html
http://en.wikipedia.org/wiki/Suffix_tree
http://en.wikipedia.org/wiki/Kd-tree

We'd want to implement in time those and many more in Phobos without
worrying that some of their primitives won't fit the existing
interfaces, and also without the unjustified effort of formalizing
interfaces for each of them in thinking that another very, very
similar container will come along.



 From a cursory look, I don't see why tries would not be possible in
dcollections.

I'd probably start with something like this:

class TrieMap(K, V) : Map!(K[], V)

The array brackets enforce the ability to use prefixes on the keys.


The key point of tries is that they have distributed storage. Thus,  
shoehorning them into the interface function


Iterator!(K[]) keys();

forces allocation and copying.


I wasn't thinking of that, I was thinking the key point of tries being  
more efficient at lookup for strings of things.


One solution is that Trie could be a separate interface (i.e. a sibling of  
Map).


One thing to point out is that D's arrays are adept at appending and  
prefixing.  A K[] key would not necessarily be reallocating for each node  
traversal, but it certainly would be copying data.


How would you define a way to get all the keys of a Trie?  Or would you  
simply leave that function off?  Is it unreasonable to expect to be able  
to iterate the keys in a trie?  (I don't really know, I've never worked  
with them)


-Steve


Re: struct vs. class containers (was: Re: dcollections 1.0 and 2.0a beta released)

2010-05-24 Thread Andrei Alexandrescu

On 05/24/2010 10:58 AM, Steven Schveighoffer wrote:

On Sun, 23 May 2010 17:36:52 -0400, Andrei Alexandrescu
 wrote:


I've thought for a very long time about the class vs. struct choice in
a container, and I came to a startling conclusion: it (almost) doesn't
matter. Could be either, and the tradeoffs involved are nonessential.
Here they are:

1. Using a class makes implementing members easier because there's no
need to do work through an additional member. With a struct, you need
a pimpl approach. For example:

struct Array {
struct Impl {
...
}
Impl * impl;
...
@property length() const { return impl->length; }
...
}

2. struct gives you more power in managing collection's own memory, as
long as the collection doesn't escape item addresses or other
addresses of internal handles. So all other things being equal, struct
has a net advantage.


Yes, but most containers are node-based, so as long as you use structs
for nodes, you are generally in control of the bulk of allocation
(dcollections does this).


I'm saying, the net advantage of struct is that it can safely and 
deterministically release memory in its destructor.



3. The creation syntaxes are different. For Phobos, I suggest adding a
simple function make() to std.algorithm. make!T(a, b, c) returns a
newly constructed object T by invoking the constructor with arguments
a, b, and c. That way we can make client code virtually agnostic of
the class/struct choice for a container.



Classes have builtin object monitors. But you could handle that via a
struct as well. The language support wouldn't be there though.


Oh, and the monitor isn't needed so that's an unused word there. I'd 
forgotten about it.



That's it. Otherwise, one could use either to build a container. Let
me note that I have reached the conclusion that containers should be
at best reference types, with a meta-constructor Value!(C) that takes
a container C and makes it into a value type.


The thing that classes really give you is the language support. Structs
need to be hand-crafted to support the same syntax. Classes are enforced
as always being reference types and always being on the heap.


I'd agreed classes are more convenient because they make the 
implementation straightforward.



The
Value!(C) is questionable because creating the head of a container on
the stack leads to easily escaped stack references.


I don't understand this. Could you give an example?


But yeah, a struct as a 'smart pointer' could work, as long as you don't
'auto-create' like AA's do.



It seems Steve took the weekend off.


My son's 2nd birthday party was Saturday, and I had visitors, sorry :) I
did some responses, but at some point if you want to remain married, you
have to pay attention to your family.


Congrats. And yes, wise words :o).


Andrei



Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Steven Schveighoffer
On Mon, 24 May 2010 11:45:44 -0400, Andrei Alexandrescu  
 wrote:



On 05/24/2010 10:23 AM, Steven Schveighoffer wrote:

On Mon, 24 May 2010 10:10:51 -0400, Andrei Alexandrescu
 wrote:


On 05/24/2010 06:54 AM, Steven Schveighoffer wrote:

length is allowed to return NO_LENGTH_SUPPORT if O(1) length isn't
possible,


Do you agree that's an awkwardness?


Yes, at that point it's an optimization. The only place where I assume
length could be invalid is when working with the Iterator!V type, which
might not be a dcollections object.


That doesn't matter. Since you define NO_LENGTH_SUPPORT and you have  
interfaces, the door is ajar forever. Client code can't write this:


auto total = cont1.length + cont2.length;

because that is incorrect code (that compiles, runs, and produces some  
odd result).


So don't take it lightly. NO_LENGTH_SUPPORT means no length support.  
Then pretending it's supported will only harm everyone.


I get what you are saying.  What I meant was it's moot if you are not  
using interfaces.  If you don't know what the underlying type is, then you  
have to do a runtime check.


I agree it's awkward, and I could have not included length as a member of  
Iterator, in which case it would be on all the container types and  
NO_LENGTH_SUPPORT would not exist.  All containers are meant to have a  
valid length, it is only with non-container Iterators that length can be  
NO_LENGTH_SUPPORT.



However, supporting non-length containers via
generic programming vs. interfaces would be much smoother.


In that case, you can do
coll.begin.empty to determine if the collection has any elements.


Then why not make length optional and define empty? From the above it
looks like an obviously better design.


But all dcollections are bi-directional anyways, there is no singly
linked list. That was a decision I made early on, because it allows  
more

assumptions about dcollections' containers. I think length-less singly
linked lists would be a good addition to phobos that are not part of  
the

collection hierarchy, they are almost on par with builtin arrays as
being so simple.


Do you see dcollections' inability to express slists as a problem?


I don't see them as unable to express slists. They would break the
design guidelines that I set for collections being iterable in both
directions, but an slist fits fine within the List interface.


What's the required complexity of concat?


O(n) with n being the number of elements added

Is add expected to put things in a specific place? How does slist  
implement back()?


slist does not fit the List interface.


A pointer to the end element would be required for both appending and  
back().


-Steve


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Robert Jacques
On Mon, 24 May 2010 08:06:29 -0400, Steven Schveighoffer  
 wrote:
On Thu, 20 May 2010 12:46:59 -0400, Robert Jacques   
wrote:


On Thu, 20 May 2010 06:34:42 -0400, Steven Schveighoffer  
 wrote:

[snip]


I understand these points, but I'm already using interfaces to copy  
data between containers.  I don't have to, I could have used generic  
code, but this way, only one function is instantiated to copy data  
from all the other containers.  The problem with using generic code is  
that the compiler will needlessly duplicate functions that are  
identical.


This sounds like a failure of design. Why aren't you using ranges to do  
this?


Why are ranges necessarily better?  I'm using the container's opApply,  
which I'd probably still use even if there were no interfaces.  opApply  
allows much more possibilities in traversal than ranges which cannot use  
stack recursion without heap activity.


Ranges are not necessarily better and may have some minor amount of  
overhead over a well optimized opApply. Then again, the opposite may be  
true. The point is that if the difference between opApply and a range is  
more than trivial, you've probably had a failure of design occur.  
Honestly, I'm having trouble thinking of a container which allows stack  
recursion for transversal and doesn't have an efficient range/loop  
variant. For that matter, a range using heap activity is also a clear  
indication of a design failure.




Using interfaces is not as viral as you think.  My interfaces can be  
used in generic code, as long as the generic code uses functions in  
the interfaces.  If a library returns an interface, the author is  
saying "I don't want you using any functions outside this interface,"  
so why is that a bad thing?


Well, needlessly duplicated functions for one. :) More importantly, the  
example I gave was about third party libraries which I have no control  
over. So this solution explicitly doesn't work. And even if everyone  
used templates everywhere in order to be compatible with both  
interfaces and classes, isn't that a viral effect of having both?


If a 3rd party library uses interfaces, it's probably for good reason.   
They most likely want to remain binary compatible with other libs,  
and/or want to abstract the implementation of some custom container  
type.  If you don't like their requirements, don't use the library.


-Steve


No, if a 3rd party library _needs_ to use interfaces it's probably for a  
good reason. The problem is, if they exist, people are going to use them  
even if they don't need them. Which therein lies the problem.


Re: struct vs. class containers (was: Re: dcollections 1.0 and 2.0a beta released)

2010-05-24 Thread Steven Schveighoffer
On Sun, 23 May 2010 17:36:52 -0400, Andrei Alexandrescu  
 wrote:


I've thought for a very long time about the class vs. struct choice in a  
container, and I came to a startling conclusion: it (almost) doesn't  
matter. Could be either, and the tradeoffs involved are nonessential.  
Here they are:


1. Using a class makes implementing members easier because there's no  
need to do work through an additional member. With a struct, you need a  
pimpl approach. For example:


struct Array {
 struct Impl {
 ...
 }
 Impl * impl;
 ...
 @property length() const { return impl->length; }
 ...
}

2. struct gives you more power in managing collection's own memory, as  
long as the collection doesn't escape item addresses or other addresses  
of internal handles. So all other things being equal, struct has a net  
advantage.


Yes, but most containers are node-based, so as long as you use structs for  
nodes, you are generally in control of the bulk of allocation  
(dcollections does this).




3. The creation syntaxes are different. For Phobos, I suggest adding a  
simple function make() to std.algorithm. make!T(a, b, c) returns a newly  
constructed object T by invoking the constructor with arguments a, b,  
and c. That way we can make client code virtually agnostic of the  
class/struct choice for a container.




Classes have builtin object monitors.  But you could handle that via a  
struct as well.  The language support wouldn't be there though.


That's it. Otherwise, one could use either to build a container. Let me  
note that I have reached the conclusion that containers should be at  
best reference types, with a meta-constructor Value!(C) that takes a  
container C and makes it into a value type.


The thing that classes really give you is the language support.  Structs  
need to be hand-crafted to support the same syntax.  Classes are enforced  
as always being reference types and always being on the heap.  The  
Value!(C) is questionable because creating the head of a container on the  
stack leads to easily escaped stack references.


But yeah, a struct as a 'smart pointer' could work, as long as you don't  
'auto-create' like AA's do.




It seems Steve took the weekend off.


My son's 2nd birthday party was Saturday, and I had visitors, sorry :)  I  
did some responses, but at some point if you want to remain married, you  
have to pay attention to your family.


-Steve


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Andrei Alexandrescu

On 05/24/2010 10:39 AM, Steven Schveighoffer wrote:

On Mon, 24 May 2010 11:21:20 -0400, Andrei Alexandrescu
 wrote:


On 05/24/2010 06:54 AM, Steven Schveighoffer wrote:

I am not familiar with tries,


Missed that upon the first read. I suggest you look at tries and the
following other structures as good examples that it's futile to fit
collections into hierarchies.

http://en.wikipedia.org/wiki/Trie
http://linux.thai.net/~thep/datrie/datrie.html
http://en.wikipedia.org/wiki/Suffix_tree
http://en.wikipedia.org/wiki/Kd-tree

We'd want to implement in time those and many more in Phobos without
worrying that some of their primitives won't fit the existing
interfaces, and also without the unjustified effort of formalizing
interfaces for each of them in thinking that another very, very
similar container will come along.



 From a cursory look, I don't see why tries would not be possible in
dcollections.

I'd probably start with something like this:

class TrieMap(K, V) : Map!(K[], V)

The array brackets enforce the ability to use prefixes on the keys.


The key point of tries is that they have distributed storage. Thus, 
shoehorning them into the interface function


Iterator!(K[]) keys();

forces allocation and copying.


Andrei


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Andrei Alexandrescu

On 05/24/2010 10:01 AM, Steven Schveighoffer wrote:

On Fri, 21 May 2010 13:42:14 -0400, Andrei Alexandrescu
 wrote:


On 05/19/2010 08:42 PM, Steven Schveighoffer wrote:

Andrei Alexandrescu Wrote:


I.e. there aren't many kinds of HashMaps that derive from each
other. But the interfaces are not detrimental to your ideas. The
only thing interfaces require is that the entities implementing
them are classes and not structs. As long as you agree that
classes are the right call, then interfaces can co-exist with
your other suggestions without interference.


This brings back a discussion I had with Walter a while ago, with
echoes in the newsgroup. Basically the conclusion was as follows:
if a container never escapes the addresses of its elements, it can
manage its own storage. That forces, however, the container to be a
struct because copying references to a class container would break
that encapsulation. I called those "perfectly encapsulated
containers" and I think they are good candidates for manual memory
management because they tend to deal in relatively large chunks.

I noticed that your collections return things by value, so they are
 good candidates for perfect encapsulation.


Then you need reference counting, I don't think that is a good
design.


Why?

I think refcounting for containers is perfect.

Refcounting small objects is bad because the counter overhead is large.
Also, small objects tend to be created and passed around a lot, so the
time overhead is significant.

In contrast, refcounting and containers are a perfect marriage. The
container is a relatively large conglomerate of objects so refcounting
that is bound to yield good benefits in terms of memory reclamation.


Yes, if you want to define "this function needs something that
is both addable and purgeable, I don't have an interface for
that. But a concept can certainly define that generically (which
is what you want anyways), or you could just say "I need a List"
and get those functions also. It also does not force entities
other than dcollections objects to be classes, they could be
structs and implement the correct concepts.

I myself don't really use the interface aspect of the classes, it
is mostly a carryover from the Java/Tango inspirations.


I don't know Tango, but Java's containers are a terrible example to
 follow. Java's container library is a ill-advised design on top of
an underpowered language, patched later with some half-understood
seeming of genericity. I think Java containers are a huge
disservice to the programming community because they foster bad
design.


Java has some warts as you have rightfully pointed out in the past
(i.e. O(n) lookup), but I have attempted to remove all those warts.
Dcollections I would hope does not suffer from the problems that
Java's collections suffer from.


That's great. But let me quote what you said: "I myself don't really use
the interface aspect of the classes, it is mostly a carryover from the
Java/Tango inspirations." I took that to mean you don't have a strong
justification for structuring dcollections as a hierarchy, and
furthermore that makes me hope it's possible you'd be willing to yank
that dinosaur brain.


But I can see one good reason to keep them -- binary
interoperability. For example, it might be the case some day when
D has good support with dynamic libraries that a library exposes
some piece of itself as a Map or List interface.


I need to disagree with that. I've done and I do a ton of binary
interoperability stuff. You never expose a generic container
interface! Interoperable objects always embody high-level logic
that is specific to the application. They might use containers
inside, but they invariably expose high-level, application-specific
functionality.


It's done all the time in Java and .NET. For example, A GUI listbox
widget exposes its elements as an array of elements, which implement
the List interface. You don't ever see the implementation or need it.
 Granted Java and .NET have less problems than C++ and D with binary
 compatibility, since the function tables are dynamic, but the
potential is there for D to make binary compatibility possible with
interfaces.


I see.


In the past I have built a C++ library that abstracted features of
the OS. My goal was to make it possible to dynamically load a module
that abstracted things like setting the IP address of a network
interface. My modules used std::string instead of char * to lookup
services to get objects that implement the interface. Big mistake. On
a later version of the standard C++ runtime, the private
implementation of std::string changed, so the dynamically loaded
libraries crashed horribly. No change in string's interface, just the
private stuff changed, but because it's a template, the code that
uses it necessarily has to be aware of it. We ended up ditching the
standard C++ library's version of string, and used STLPort so we
could control the library.

I envision this same sort of problem would

Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Andrei Alexandrescu

On 05/24/2010 10:23 AM, Steven Schveighoffer wrote:

On Mon, 24 May 2010 10:10:51 -0400, Andrei Alexandrescu
 wrote:


On 05/24/2010 06:54 AM, Steven Schveighoffer wrote:

length is allowed to return NO_LENGTH_SUPPORT if O(1) length isn't
possible,


Do you agree that's an awkwardness?


Yes, at that point it's an optimization. The only place where I assume
length could be invalid is when working with the Iterator!V type, which
might not be a dcollections object.


That doesn't matter. Since you define NO_LENGTH_SUPPORT and you have 
interfaces, the door is ajar forever. Client code can't write this:


auto total = cont1.length + cont2.length;

because that is incorrect code (that compiles, runs, and produces some 
odd result).


So don't take it lightly. NO_LENGTH_SUPPORT means no length support. 
Then pretending it's supported will only harm everyone.



but all dcollections define length.


I was hoping we're on the verge of agreeing to yank all interfaces and
let the collection roam free. If that's a possibility, then your
design isn't constrained to define length anymore unless in can.


I agree, removing all the interfaces would get rid of the need for
NO_LENGTH_SUPPORT. But all dcollections define a valid length, so that
point is kinda moot :)


Not moot because you have interfaces.


However, supporting non-length containers via
generic programming vs. interfaces would be much smoother.


In that case, you can do
coll.begin.empty to determine if the collection has any elements.


Then why not make length optional and define empty? From the above it
looks like an obviously better design.


But all dcollections are bi-directional anyways, there is no singly
linked list. That was a decision I made early on, because it allows more
assumptions about dcollections' containers. I think length-less singly
linked lists would be a good addition to phobos that are not part of the
collection hierarchy, they are almost on par with builtin arrays as
being so simple.


Do you see dcollections' inability to express slists as a problem?


I don't see them as unable to express slists. They would break the
design guidelines that I set for collections being iterable in both
directions, but an slist fits fine within the List interface.


What's the required complexity of concat? Is add expected to put things 
in a specific place? How does slist implement back()?


slist does not fit the List interface.


And singly linked vs. doubly linked does not make any difference whether
O(1) length is possible or not. As you say, it's O(1) splicing or O(1)
length, regardless of single or double links.

I disagree with your assessment that length is a less used operation
than splicing. I think I have never used splicing with std::list. C++'s
default is O(1) length, and I followed that design.


I didn't assess that.


I felt like you did. Your statement was: "It works against splicing (an
important list primitive) and most of the time you don't need it so why
waste time updating it."

While you didn't specifically say it was used more, you mentioned the
importance of splicing, and the non-importance of length. I guess I
should have said, I disagree that splicing is more important than
caching length.


My main point was that if one defines an slist, in many cases one
defines it to be very small, simple, and cheap. Maintaining size will
come on the radar and would discourage the use of the abstraction in
favor of a hand-rolled implementation. This is failure to abstract.


All dcollections own their elements, it is not like an array, where
there can be multiple references to subsets of the data. An slist would
simply be an slist as you describe with a slightly bigger head. In other
words, the head node has the length cache and allocator, and object
monitor, and whatever else you need for the whole list. The nodes
themselves would be a simple element and pointer. But the elements are
never exposed except via ranges and cursors. The ranges and cursors
cannot affect the topology of the list, you need the head "owner" node
to do that.


I understand. The problem is when you many such lists or when you 
manipulate a few intensively.



I look at it this way, dcollections are "good enough" for most uses, if
you need highly tailored data structures with specific uses in mind, you
can make those on your own.

For example, dcollections' LinkList can easily take the place of a
simple slist. It may not be as fast with your specific requirements in
mind, but it works. The benefit is, it works like all the other
collection types and it's easy to learn all of them together because
they all have certain properties.


That's what STL said about slists. Next thing you know... forward_list 
is being standardized.



Andrei


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Steven Schveighoffer
On Fri, 21 May 2010 22:56:54 -0400, Vladimir Panteleev  
 wrote:


On Thu, 20 May 2010 04:42:35 +0300, Steven Schveighoffer  
 wrote:



interfaces


Does that imply that the most important methods are virtual?

If so, say good-bye to inlining, and hello to an additional level of  
dereferencing.


Without meaning any disrespect to all the work you did, allow me to say  
that I won't use a library that could be faster (without making usage  
too clumsy), but isn't. (I prefer my D programs to be as fast as  
reasonably possible - if I didn't care about speed, I'd use another  
language.) For the same reasons, I'd be disappointed if the library was  
admitted as-is into Phobos, since it doesn't live up to my personal  
ideal of what D should be.


With the future update that all the classes are final, then they are not  
virtual as long as you don't use the interfaces.


-Steve


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Steven Schveighoffer
On Mon, 24 May 2010 11:21:20 -0400, Andrei Alexandrescu  
 wrote:



On 05/24/2010 06:54 AM, Steven Schveighoffer wrote:

I am not familiar with tries,


Missed that upon the first read. I suggest you look at tries and the  
following other structures as good examples that it's futile to fit  
collections into hierarchies.


http://en.wikipedia.org/wiki/Trie
http://linux.thai.net/~thep/datrie/datrie.html
http://en.wikipedia.org/wiki/Suffix_tree
http://en.wikipedia.org/wiki/Kd-tree

We'd want to implement in time those and many more in Phobos without  
worrying that some of their primitives won't fit the existing  
interfaces, and also without the unjustified effort of formalizing  
interfaces for each of them in thinking that another very, very similar  
container will come along.




From a cursory look, I don't see why tries would not be possible in  
dcollections.


I'd probably start with something like this:

class TrieMap(K, V) : Map!(K[], V)

The array brackets enforce the ability to use prefixes on the keys.

-Stvee


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Steven Schveighoffer
On Mon, 24 May 2010 10:10:51 -0400, Andrei Alexandrescu  
 wrote:



On 05/24/2010 06:54 AM, Steven Schveighoffer wrote:

length is allowed to return NO_LENGTH_SUPPORT if O(1) length isn't
possible,


Do you agree that's an awkwardness?


Yes, at that point it's an optimization.  The only place where I assume  
length could be invalid is when working with the Iterator!V type, which  
might not be a dcollections object.





but all dcollections define length.


I was hoping we're on the verge of agreeing to yank all interfaces and  
let the collection roam free. If that's a possibility, then your design  
isn't constrained to define length anymore unless in can.


I agree, removing all the interfaces would get rid of the need for  
NO_LENGTH_SUPPORT.  But all dcollections define a valid length, so that  
point is kinda moot :)  However, supporting non-length containers via  
generic programming vs. interfaces would be much smoother.



In that case, you can do
coll.begin.empty to determine if the collection has any elements.


Then why not make length optional and define empty? From the above it  
looks like an obviously better design.



But all dcollections are bi-directional anyways, there is no singly
linked list.  That was a decision I made early on, because it allows  
more

assumptions about dcollections' containers. I think length-less singly
linked lists would be a good addition to phobos that are not part of the
collection hierarchy, they are almost on par with builtin arrays as
being so simple.


Do you see dcollections' inability to express slists as a problem?


I don't see them as unable to express slists.  They would break the design  
guidelines that I set for collections being iterable in both directions,  
but an slist fits fine within the List interface.



And singly linked vs. doubly linked does not make any difference whether
O(1) length is possible or not. As you say, it's O(1) splicing or O(1)
length, regardless of single or double links.

I disagree with your assessment that length is a less used operation
than splicing. I think I have never used splicing with std::list. C++'s
default is O(1) length, and I followed that design.


I didn't assess that.


I felt like you did.  Your statement was: "It works against splicing (an  
important list primitive) and most of the time you don't need it so why  
waste time updating it."


While you didn't specifically say it was used more, you mentioned the  
importance of splicing, and the non-importance of length.  I guess I  
should have said, I disagree that splicing is more important than caching  
length.


My main point was that if one defines an slist, in many cases one  
defines it to be very small, simple, and cheap. Maintaining size will  
come on the radar and would discourage the use of the abstraction in  
favor of a hand-rolled implementation. This is failure to abstract.


All dcollections own their elements, it is not like an array, where there  
can be multiple references to subsets of the data.  An slist would simply  
be an slist as you describe with a slightly bigger head.  In other words,  
the head node has the length cache and allocator, and object monitor, and  
whatever else you need for the whole list.  The nodes themselves would be  
a simple element and pointer.  But the elements are never exposed except  
via ranges and cursors.  The ranges and cursors cannot affect the topology  
of the list, you need the head "owner" node to do that.


I look at it this way, dcollections are "good enough" for most uses, if  
you need highly tailored data structures with specific uses in mind, you  
can make those on your own.


For example, dcollections' LinkList can easily take the place of a simple  
slist.  It may not be as fast with your specific requirements in mind, but  
it works.  The benefit is, it works like all the other collection types  
and it's easy to learn all of them together because they all have certain  
properties.


I don't know of a word for "hierarchy with interfaces as root and inner  
nodes and classes as leaf nodes" so I just call that class hierarchy.


I view them as classes with interfaces tacked on because the  
implementations are similar :)


-Steve


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Andrei Alexandrescu

On 05/24/2010 06:54 AM, Steven Schveighoffer wrote:

I am not familiar with tries,


Missed that upon the first read. I suggest you look at tries and the 
following other structures as good examples that it's futile to fit 
collections into hierarchies.


http://en.wikipedia.org/wiki/Trie
http://linux.thai.net/~thep/datrie/datrie.html
http://en.wikipedia.org/wiki/Suffix_tree
http://en.wikipedia.org/wiki/Kd-tree

We'd want to implement in time those and many more in Phobos without 
worrying that some of their primitives won't fit the existing 
interfaces, and also without the unjustified effort of formalizing 
interfaces for each of them in thinking that another very, very similar 
container will come along.



Andrei



Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Steven Schveighoffer
On Fri, 21 May 2010 13:42:14 -0400, Andrei Alexandrescu  
 wrote:



On 05/19/2010 08:42 PM, Steven Schveighoffer wrote:

Andrei Alexandrescu Wrote:


I.e. there aren't many
kinds of HashMaps that derive from each other.  But the interfaces
are not detrimental to your ideas.  The only thing interfaces require
is that the entities implementing them are classes and not structs.
As long as you agree that classes are the right call, then interfaces
can co-exist with your other suggestions without interference.


This brings back a discussion I had with Walter a while ago, with echoes  
in the newsgroup. Basically the conclusion was as follows: if a  
container never escapes the addresses of its elements, it can manage its  
own storage. That forces, however, the container to be a struct because  
copying references to a class container would break that encapsulation.  
I called those "perfectly encapsulated containers" and I think they are  
good candidates for manual memory management because they tend to deal  
in relatively large chunks.


I noticed that your collections return things by value, so they are good  
candidates for perfect encapsulation.


Then you need reference counting, I don't think that is a good design.




Yes, if you want to define "this function needs something that is
both addable and purgeable, I don't have an interface for that.  But
a concept can certainly define that generically (which is what you
want anyways), or you could just say "I need a List" and get those
functions also.  It also does not force entities other than
dcollections objects to be classes, they could be structs and
implement the correct concepts.

I myself don't really use the interface aspect of the classes, it is
mostly a carryover from the Java/Tango inspirations.


I don't know Tango, but Java's containers are a terrible example to  
follow. Java's container library is a ill-advised design on top of an  
underpowered language, patched later with some half-understood seeming  
of genericity. I think Java containers are a huge disservice to the  
programming community because they foster bad design.


Java has some warts as you have rightfully pointed out in the past (i.e.  
O(n) lookup), but I have attempted to remove all those warts.   
Dcollections I would hope does not suffer from the problems that Java's  
collections suffer from.





But I can see
one good reason to keep them -- binary interoperability.  For
example, it might be the case some day when D has good support with
dynamic libraries that a library exposes some piece of itself as a
Map or List interface.


I need to disagree with that. I've done and I do a ton of binary  
interoperability stuff. You never expose a generic container interface!  
Interoperable objects always embody high-level logic that is specific to  
the application. They might use containers inside, but they invariably  
expose high-level, application-specific functionality.


It's done all the time in Java and .NET.  For example, A GUI listbox  
widget exposes its elements as an array of elements, which implement the  
List interface.  You don't ever see the implementation or need it.   
Granted Java and .NET have less problems than C++ and D with binary  
compatibility, since the function tables are dynamic, but the potential is  
there for D to make binary compatibility possible with interfaces.


In the past I have built a C++ library that abstracted features of the  
OS.  My goal was to make it possible to dynamically load a module that  
abstracted things like setting the IP address of a network interface.  My  
modules used std::string instead of char * to lookup services to get  
objects that implement the interface.  Big mistake.  On a later version of  
the standard C++ runtime, the private implementation of std::string  
changed, so the dynamically loaded libraries crashed horribly.  No change  
in string's interface, just the private stuff changed, but because it's a  
template, the code that uses it necessarily has to be aware of it.  We  
ended up ditching the standard C++ library's version of string, and used  
STLPort so we could control the library.


I envision this same sort of problem would be likely with D collection  
objects that were not used via interfaces.





So my answer is -- go ahead and define these concepts and required
names, and you can ignore the interfaces if they don't interest you.
They do not subtract from the possibilities, and others may find good
use for them.

Does that make sense?


I understand I could ignore the interfaces and call it a day, but it  
seems that at this point we are both convinced they are not quite good  
at anything: you only put them in because you suffered the Stockholm  
syndrome with Java, and I hate them with a passion.


The reason I put them in is because they existed before, but thinking  
about what they would be useful for, D doesn't really support dynamic  
libraries all that well (recent advances have

Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Andrei Alexandrescu

On 05/24/2010 06:54 AM, Steven Schveighoffer wrote:

length is allowed to return NO_LENGTH_SUPPORT if O(1) length isn't
possible,


Do you agree that's an awkwardness?


but all dcollections define length.


I was hoping we're on the verge of agreeing to yank all interfaces and 
let the collection roam free. If that's a possibility, then your design 
isn't constrained to define length anymore unless in can.



In that case, you can do
coll.begin.empty to determine if the collection has any elements.


Then why not make length optional and define empty? From the above it 
looks like an obviously better design.



But all dcollections are bi-directional anyways, there is no singly
linked list.  That was a decision I made early on, because it allows more
assumptions about dcollections' containers. I think length-less singly
linked lists would be a good addition to phobos that are not part of the
collection hierarchy, they are almost on par with builtin arrays as
being so simple.


Do you see dcollections' inability to express slists as a problem?


And singly linked vs. doubly linked does not make any difference whether
O(1) length is possible or not. As you say, it's O(1) splicing or O(1)
length, regardless of single or double links.

I disagree with your assessment that length is a less used operation
than splicing. I think I have never used splicing with std::list. C++'s
default is O(1) length, and I followed that design.


I didn't assess that. My main point was that if one defines an slist, in 
many cases one defines it to be very small, simple, and cheap. 
Maintaining size will come on the radar and would discourage the use of 
the abstraction in favor of a hand-rolled implementation. This is 
failure to abstract.



OTish: What are your thoughts on a bimap implementation and a
child/sibling or general tree implementation as part of
dcollections?


I haven't the slightest what a bimap is :) I am not an expert in
collections or data structures, I just reimplement things I have
understood. My implementations are basically copied from my
algorithm book, and refined as much as I can do.

That being said, if you have any implementation of a tree or hash, it
should be easy to insert into dcollections. If you have ideas for
other collection types (i.e. other than Map, Set, Multiset or List),
then I can look into that if you point me at an implementation or
have one of your own. I purposefully left out multi-map because I've
never had a huge use for it, and it seemed like a awkward thing to
create an interface for...


Tries of various kinds come up again and again. I don't think
dcollections' current abstractions capture them, which further
supports my point that containers have too much personality to be
caught in the straitjacket of class hierarchies.


I am not familiar with tries, and dcollections has no class hierarchy.


I don't know of a word for "hierarchy with interfaces as root and inner 
nodes and classes as leaf nodes" so I just call that class hierarchy.



Andrei


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Steven Schveighoffer
On Sat, 22 May 2010 07:56:31 -0400, Michel Fortin  
 wrote:


On 2010-05-21 22:55:16 -0400, Walter Bright   
said:



Walter Bright wrote:
If we can get anywhere close to that level of success with ranges and  
containers, we should all be well pleased.
 Mike Taylor has a phrase for that I think is well-coined: "impedance  
matching",
defined as the work necessary to get one library module to work with  
another

library module.


This makes me think about something.

In principle, I like the idea of containers being reference type. It  
works well when passing a container to functions. But at the same time,  
I despite it. By-reference containers forces you to have extra  
indirections even when you don't need them, and you have to worry about  
null. Sometime a value-type would be better, when creating more complex  
data structures for instance:


class Channel {
private {
Array!Message inbound;
Array!Message outbound;
}

...
}

What's the point of having extra indirection here?


Scope class members should solve this.  It's been thrown around forever --  
essentially, you should be able to define that a class member's storage is  
contained within the owner object.


a value-type node-based containers just don't work well -- it's too easy  
to escape references, and too easy to accidentally duplicate the entire  
node set when passing to functions.  It works fine for arrays because the  
array has a very simple structure -- data and length.  And the length and  
data are somewhat orthogonal.


-Steve


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Steven Schveighoffer
On Fri, 21 May 2010 14:00:42 -0400, Andrei Alexandrescu  
 wrote:



On 05/20/2010 05:34 AM, Steven Schveighoffer wrote:



I understand these points, but I'm already using interfaces to copy
data between containers.  I don't have to, I could have used generic
code, but this way, only one function is instantiated to copy data
from all the other containers.  The problem with using generic code
is that the compiler will needlessly duplicate functions that are
identical.


There is a copy() function that copies any range to any other range. It
might need a revisit, but I think the way to go about copying is generic.


This implies the space for the elements already exists in the target  
range.  Copying data from a list to a set for instance can't just allocate  
space in the set and then use some generic copy function to copy the data  
in.


If generic copying is to be used, it would be a copy function defined by  
the collection, not a standard one.





Using interfaces is not as viral as you think.  My interfaces can be
used in generic code, as long as the generic code uses functions in
the interfaces.  If a library returns an interface, the author is
saying "I don't want you using any functions outside this interface,"
so why is that a bad thing?

Forcing people to *not* use interfaces has its drawbacks too.
Dcollections gives the most flexible design I could muster, while
still being useful.

I'm not saying I'm against removing the interfaces until some later
date, but I don't see any convincing arguments yet, especially since
I've already seen benefits from having them.


What are the benefits that you have noticed? I think you'd need to back  
up the copying argument with some data if you want to frame it as a  
benefit.


When I mean see benefits, the benefits are not data-related but design  
related -- more designs are possible with interfaces and generic  
programming combined than with generic programming alone.


-Steve


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Steven Schveighoffer
On Thu, 20 May 2010 12:46:59 -0400, Robert Jacques   
wrote:


On Thu, 20 May 2010 06:34:42 -0400, Steven Schveighoffer  
 wrote:

Robert Jacques Wrote:


On Wed, 19 May 2010 21:42:35 -0400, Steven Schveighoffer
>
> Does that make sense?
>
> -Steve

Yes and No. I understand where your coming from, but I think it's a bad
idea. First, I think it needlessly expands the radius of comprehension
needed to understand and use the library. (See Tangled up in tools
http://www.pragprog.com/magazines/2010-04/tangled-up-in-tools) Second,  
I

think designing a library to be flexible enough to meet some future,
anticipated need (e.g. dlls) is a good idea, but actually implementing
vaporous future needs is fraught with peril; it's too easy to guess  
wrong.
Third, interface base design is viral; If library X uses interfaces  
then I
have to use interfaces to interface with it. And if another library Y  
uses
classes, then I'm going have to write a (needless) wrapper around one  
of

them.


I understand these points, but I'm already using interfaces to copy  
data between containers.  I don't have to, I could have used generic  
code, but this way, only one function is instantiated to copy data from  
all the other containers.  The problem with using generic code is that  
the compiler will needlessly duplicate functions that are identical.


This sounds like a failure of design. Why aren't you using ranges to do  
this?


Why are ranges necessarily better?  I'm using the container's opApply,  
which I'd probably still use even if there were no interfaces.  opApply  
allows much more possibilities in traversal than ranges which cannot use  
stack recursion without heap activity.




Using interfaces is not as viral as you think.  My interfaces can be  
used in generic code, as long as the generic code uses functions in the  
interfaces.  If a library returns an interface, the author is saying "I  
don't want you using any functions outside this interface," so why is  
that a bad thing?


Well, needlessly duplicated functions for one. :) More importantly, the  
example I gave was about third party libraries which I have no control  
over. So this solution explicitly doesn't work. And even if everyone  
used templates everywhere in order to be compatible with both interfaces  
and classes, isn't that a viral effect of having both?


If a 3rd party library uses interfaces, it's probably for good reason.   
They most likely want to remain binary compatible with other libs, and/or  
want to abstract the implementation of some custom container type.  If you  
don't like their requirements, don't use the library.


-Steve


Re: dcollections 1.0 and 2.0a beta released

2010-05-24 Thread Steven Schveighoffer
On Sat, 22 May 2010 16:58:12 -0400, Andrei Alexandrescu  
 wrote:



On 05/19/2010 03:07 PM, Steven Schveighoffer wrote:

Ellery Newcomer Wrote:


Are the collections supposed to not have isEmpty members?


No.  Use length == 0.  O(1) length is always supported for all
collections.


One thing before I forget: I think any good collection abstraction must
be concretized back to the classic collection instances. Singly-linked
lists definitely can't be left off the list! It would be an epic  
failure. Imagine the papers! New York Times: "D has containers, but no  
singly-linked lists". The New Yorker: "D's abstractions are too  
abstract". The National Enquirer: "The Bat Boy stole D's singly-linked  
lists". Pyongyang Times: "Another failure of the so-called Western  
Democracy -- yet Juche doesn't need singly-linked lists anyway."


Keeping the length cached in a singly-linked list is a costly mistake.
It works against splicing (an important list primitive) and most of the
time you don't need it so why waste time updating it. Adding any cruft
beyond { T payload; List* next; } is very strong incentive for the coder
to write their own. Why do you think they're using an SLL in the first
place? Because it's simple and has efficient primitives!


length is allowed to return NO_LENGTH_SUPPORT if O(1) length isn't  
possible, but all dcollections define length.  In that case, you can do  
coll.begin.empty to determine if the collection has any elements.


But all dcollections are bi-directional anyways, there is no singly linked  
list.  That was a decision I made early on, because it allows more  
assumptions about dcollections' containers.  I think length-less singly  
linked lists would be a good addition to phobos that are not part of the  
collection hierarchy, they are almost on par with builtin arrays as being  
so simple.


And singly linked vs. doubly linked does not make any difference whether  
O(1) length is possible or not.  As you say, it's O(1) splicing or O(1)  
length, regardless of single or double links.


I disagree with your assessment that length is a less used operation than  
splicing.  I think I have never used splicing with std::list.  C++'s  
default is O(1) length, and I followed that design.





OTish: What are your thoughts on a bimap implementation and a
child/sibling or general tree implementation as part of
dcollections?


I haven't the slightest what a bimap is :)  I am not an expert in
collections or data structures, I just reimplement things I have
understood.  My implementations are basically copied from my
algorithm book, and refined as much as I can do.

That being said, if you have any implementation of a tree or hash, it
should be easy to insert into dcollections.  If you have ideas for
other collection types (i.e. other than Map, Set, Multiset or List),
then I can look into that if you point me at an implementation or
have one of your own.  I purposefully left out multi-map because I've
never had a huge use for it, and it seemed like a awkward thing to
create an interface for...


Tries of various kinds come up again and again. I don't think  
dcollections' current abstractions capture them, which further supports  
my point that containers have too much personality to be caught in the  
straitjacket of class hierarchies.


I am not familiar with tries, and dcollections has no class hierarchy.

-Steve


Re: struct vs. class containers (was: Re: dcollections 1.0 and 2.0a beta released)

2010-05-24 Thread Jacob Carlborg

On 2010-05-23 23.36, Andrei Alexandrescu wrote:

I've thought for a very long time about the class vs. struct choice in a
container, and I came to a startling conclusion: it (almost) doesn't
matter. Could be either, and the tradeoffs involved are nonessential.
Here they are:

1. Using a class makes implementing members easier because there's no
need to do work through an additional member. With a struct, you need a
pimpl approach. For example:

struct Array {
struct Impl {
...
}
Impl * impl;
...
@property length() const { return impl->length; }
...
}


Isn't that just what "alias this" and "opDispatch" are for ?


2. struct gives you more power in managing collection's own memory, as
long as the collection doesn't escape item addresses or other addresses
of internal handles. So all other things being equal, struct has a net
advantage.

3. The creation syntaxes are different. For Phobos, I suggest adding a
simple function make() to std.algorithm. make!T(a, b, c) returns a newly
constructed object T by invoking the constructor with arguments a, b,
and c. That way we can make client code virtually agnostic of the
class/struct choice for a container.

That's it. Otherwise, one could use either to build a container. Let me
note that I have reached the conclusion that containers should be at
best reference types, with a meta-constructor Value!(C) that takes a
container C and makes it into a value type.

This result is the end of a long journey. I am quite sure I have it
right. If anything, I am amazed at how much dogma I had to unlearn
before reaching it. And I'm glad I did - this is very D-like: I wasn't
looking for a class-based solution, and I wasn't looking for a
struct-based solution. I was only looking for the truth - and surprise,
this slightly asymmetric duality came forth.

It seems Steve took the weekend off. At this point I'm poised to prepare
a list of items that would condition putting dcollections in phobos. But
I'd rather have Steve (and everyone) acquire their own motivation from
my arguments, instead of operating changes as a concession.


Andrei



--
/Jacob Carlborg


Re: dcollections 1.0 and 2.0a beta released

2010-05-23 Thread BLS

Fantastic work Steve,
Pretty good design IMHO, not sure why .. but somehow the collection 
battle reminds me to


Steve Vai vs Ry Cooder

Bjoern


Re: struct vs. class containers (was: Re: dcollections 1.0 and 2.0a beta released)

2010-05-23 Thread Michel Fortin
On 2010-05-23 17:36:52 -0400, Andrei Alexandrescu 
 said:


I've thought for a very long time about the class vs. struct choice in 
a container, and I came to a startling conclusion: it (almost) doesn't 
matter. Could be either, and the tradeoffs involved are nonessential.


I'm starting to wonder whether this discussion really belongs in 
D.announce. Perhaps you should repost this on the general newsgroup and 
we shall discuss it there?


This newsgroup was obviously the right place for the initial 
dcollection announcement, be we've sidetracked quite a lot since then.


--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/



struct vs. class containers (was: Re: dcollections 1.0 and 2.0a beta released)

2010-05-23 Thread Andrei Alexandrescu
I've thought for a very long time about the class vs. struct choice in a 
container, and I came to a startling conclusion: it (almost) doesn't 
matter. Could be either, and the tradeoffs involved are nonessential. 
Here they are:


1. Using a class makes implementing members easier because there's no 
need to do work through an additional member. With a struct, you need a 
pimpl approach. For example:


struct Array {
struct Impl {
...
}
Impl * impl;
...
@property length() const { return impl->length; }
...
}

2. struct gives you more power in managing collection's own memory, as 
long as the collection doesn't escape item addresses or other addresses 
of internal handles. So all other things being equal, struct has a net 
advantage.


3. The creation syntaxes are different. For Phobos, I suggest adding a 
simple function make() to std.algorithm. make!T(a, b, c) returns a newly 
constructed object T by invoking the constructor with arguments a, b, 
and c. That way we can make client code virtually agnostic of the 
class/struct choice for a container.


That's it. Otherwise, one could use either to build a container. Let me 
note that I have reached the conclusion that containers should be at 
best reference types, with a meta-constructor Value!(C) that takes a 
container C and makes it into a value type.


This result is the end of a long journey. I am quite sure I have it 
right. If anything, I am amazed at how much dogma I had to unlearn 
before reaching it. And I'm glad I did - this is very D-like: I wasn't 
looking for a class-based solution, and I wasn't looking for a 
struct-based solution. I was only looking for the truth - and surprise, 
this slightly asymmetric duality came forth.


It seems Steve took the weekend off. At this point I'm poised to prepare 
a list of items that would condition putting dcollections in phobos. But 
I'd rather have Steve (and everyone) acquire their own motivation from 
my arguments, instead of operating changes as a concession.



Andrei


Re: dcollections 1.0 and 2.0a beta released

2010-05-23 Thread Michel Fortin

On 2010-05-22 16:01:37 -0400, Walter Bright  said:


Michel Fortin wrote:

What's the point of having extra indirection here?


Good question. I think the answer is:

1. When do you ever want to copy a collection? I almost never do, 
because copying one is an inherently expensive operation.


Whenever you need to preserve the previous state of something before 
applying some transformation on it. But I agree that the copy should be 
explicit because it is O(n), hence my suggestion of disabling implicit 
copying for containers.


Since we're at it, a reference types container sometime makes it too 
easy to just create a new reference to the same container when what you 
really want is to make a copy. I happen to have a bug of this sort to 
fix in my Objective-C program right now where a reference to a 
container leaked where it should have been a copy, causing unwanted 
mutations to the .



2. When you copy a collection, do you copy the container or the 
elements in the container or both? One would have to carefully read the 
documentation to see which it is. That increases substantially the 
"cognitive load" of using them.


I don't see the extra cognitive load. Assuming you disable implicit 
copying of the container, you'll have to use ".dup", which will work 
exactly as an array. The way items are copied are exactly the same as 
if the container was a reference type (you call a "dup" or equivalent 
function and things get copied).



3. Going to lengths to make them value types, but then disabling 
copying them because you want people to use them as reference types, 
seems like a failure of design somewhere.


I agree in a way. But at the same time, forcing everyone to use a 
reference type when sometime a value-type would be more adequate also 
looks like a failure of design to me. To me, the best tradeoff seems to 
use a value-type because it's quite trivial to create a reference-type 
from a value type when you need it; the reverse is awkward.



4. That silly extra level of indirection actually isn't there. Consider 
that even value locals are accessed via indirection: offset[ESP]. For a 
reference collection we have: offset[EBX]. No difference (yes, EBX has 
to be loaded, but if it is done more than once it gets cached in a 
register by the compiler).


Have you ever worked with containers of containers? Surely yes since D 
associative arrays are one of them. So assume we want to implement our 
associative arrays like this:


class HashTable(Key, Value) {
Array!(Tuple!(Hash!Key, TreeSet!(Tuple!(Key, Value buckets;
}

Do you find it reasonable that the TreeSet be a reference type?

Reference-type containers would mean one indirection and one extra 
allocated block for each bucket. Then add that 'Value' could itself be 
a struct or class containing its own container, and you're stuck with a 
third unnecessary level of indirection and extra calls to the GC 
allocate containers and/or check for null. Sound quite wasteful to me. 
In addition, those extra allocations add more logic to our hash table 
and thus more chances for bugs.


Here I'm using a hash table as an example, the same problem applies to 
many other data structures, whether they are generic or specific to a 
particular problem. Container should be efficient and easy to use when 
composed one inside another. That's the greatest strengths of C++ 
value-type containers in my opinion.



5. Just making them all reference types zeans the documentation and use 
become a lot simpler.


Simpler to explain, maybe. Simpler to use, I have my doubts. You're 
just moving the burden to somewhere else. A reference-type container 
requires a "new Container()" somewhere, and some protection logic 
against null. In exchange, you don't need to write 'ref' in functions 
taking containers, and can easily copy references to the container 
everywhere (sometime too easily). But the reference-type benefits 
aren't entirely lost with a value-type, because it's trivial to change 
a value-type as a reference-type.



--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/



Re: dcollections 1.0 and 2.0a beta released

2010-05-22 Thread Andrei Alexandrescu

On 05/22/2010 09:07 PM, Sean Kelly wrote:

Andrei Alexandrescu Wrote:


One thing before I forget: I think any good collection abstraction must
be concretized back to the classic collection instances. Singly-linked
lists definitely can't be left off the list! It would be an epic
failure. Imagine the papers! New York Times: "D has containers, but no
singly-linked lists".


We could always say we were following C++'s lead :-)


C++(0|1)x has forward_list. Needless to say, supporting size() at 
constant complexity was a matter of huge debate. The proposal that I 
like is this:


http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2543.htm

Search the page for "size()".

I don't know how the proposal was amended.


Andrei


Re: dcollections 1.0 and 2.0a beta released

2010-05-22 Thread Sean Kelly
Andrei Alexandrescu Wrote:
> 
> One thing before I forget: I think any good collection abstraction must
> be concretized back to the classic collection instances. Singly-linked
> lists definitely can't be left off the list! It would be an epic 
> failure. Imagine the papers! New York Times: "D has containers, but no 
> singly-linked lists".

We could always say we were following C++'s lead :-)


Re: dcollections 1.0 and 2.0a beta released

2010-05-22 Thread bearophile
Andrei Alexandrescu:

> Keeping the length cached in a singly-linked list is a costly mistake.
> It works against splicing (an important list primitive) and most of the
> time you don't need it so why waste time updating it.

... LinkedList(T, bool WithLength=True) {
  static if(WithLength) {
// defines  @attribute length here etc
  }
}
Inside the methods like the add there is another static if(WithLength) that 
enables/disables the code that updates the length.

This allows to keep the length by default, but makes it easy to remove it when 
desired.

Bye,
bearophile


Re: dcollections 1.0 and 2.0a beta released

2010-05-22 Thread Andrei Alexandrescu

On 05/19/2010 03:07 PM, Steven Schveighoffer wrote:

Ellery Newcomer Wrote:


Are the collections supposed to not have isEmpty members?


No.  Use length == 0.  O(1) length is always supported for all
collections.


One thing before I forget: I think any good collection abstraction must
be concretized back to the classic collection instances. Singly-linked
lists definitely can't be left off the list! It would be an epic 
failure. Imagine the papers! New York Times: "D has containers, but no 
singly-linked lists". The New Yorker: "D's abstractions are too 
abstract". The National Enquirer: "The Bat Boy stole D's singly-linked 
lists". Pyongyang Times: "Another failure of the so-called Western 
Democracy -- yet Juche doesn't need singly-linked lists anyway."


Keeping the length cached in a singly-linked list is a costly mistake.
It works against splicing (an important list primitive) and most of the
time you don't need it so why waste time updating it. Adding any cruft
beyond { T payload; List* next; } is very strong incentive for the coder
to write their own. Why do you think they're using an SLL in the first
place? Because it's simple and has efficient primitives!


OTish: What are your thoughts on a bimap implementation and a
child/sibling or general tree implementation as part of
dcollections?


I haven't the slightest what a bimap is :)  I am not an expert in
collections or data structures, I just reimplement things I have
understood.  My implementations are basically copied from my
algorithm book, and refined as much as I can do.

That being said, if you have any implementation of a tree or hash, it
should be easy to insert into dcollections.  If you have ideas for
other collection types (i.e. other than Map, Set, Multiset or List),
then I can look into that if you point me at an implementation or
have one of your own.  I purposefully left out multi-map because I've
never had a huge use for it, and it seemed like a awkward thing to
create an interface for...


Tries of various kinds come up again and again. I don't think 
dcollections' current abstractions capture them, which further supports 
my point that containers have too much personality to be caught in the 
straitjacket of class hierarchies.



Andrei


Re: dcollections 1.0 and 2.0a beta released

2010-05-22 Thread Walter Bright

Michel Fortin wrote:
On 2010-05-21 22:55:16 -0400, Walter Bright  
said:



Walter Bright wrote:
If we can get anywhere close to that level of success with ranges and 
containers, we should all be well pleased.


Mike Taylor has a phrase for that I think is well-coined: "impedance 
matching",
defined as the work necessary to get one library module to work with 
another

library module.


This makes me think about something.

In principle, I like the idea of containers being reference type. It 
works well when passing a container to functions. But at the same time, 
I despite it. By-reference containers forces you to have extra 
indirections even when you don't need them, and you have to worry about 
null. Sometime a value-type would be better, when creating more complex 
data structures for instance:


class Channel {
private {
Array!Message inbound;
Array!Message outbound;
}

...
}

What's the point of having extra indirection here?


Good question. I think the answer is:

1. When do you ever want to copy a collection? I almost never do, because 
copying one is an inherently expensive operation.


2. When you copy a collection, do you copy the container or the elements in the 
container or both? One would have to carefully read the documentation to see 
which it is. That increases substantially the "cognitive load" of using them.


3. Going to lengths to make them value types, but then disabling copying them 
because you want people to use them as reference types, seems like a failure of 
design somewhere.


4. That silly extra level of indirection actually isn't there. Consider that 
even value locals are accessed via indirection: offset[ESP]. For a reference 
collection we have: offset[EBX]. No difference (yes, EBX has to be loaded, but 
if it is done more than once it gets cached in a register by the compiler).


5. Just making them all reference types means the documentation and use become a 
lot simpler.


Re: dcollections 1.0 and 2.0a beta released

2010-05-22 Thread Ellery Newcomer

On 05/22/2010 06:28 AM, bearophile wrote:

Pelle:

Yes, it works, but it doesn't gain anything from it, which is what I
said. :)


Then what you have said was useless.


No it isn't. The point Pelle was making is that there are three use 
cases for parameter passed arrays:


1. read-only (corresponds to const)
2. read/write + resizing/reallocation (corresponds to ref)
3. read/write, no resizing (corresponds to no modifier)

With (1) and (2), if I see those modifiers in the signature, I can 
deduce the use of the array.


With (3), I can't really tell from the signature if the programmer meant 
that use case or forgot something, but the case is valuable enough that 
it should be enforceable.


With associative arrays, case (3) is rare (is it even possible?) to the 
point that it can drop out of consideration.


Re: dcollections 1.0 and 2.0a beta released

2010-05-22 Thread Pelle

On 05/22/2010 01:28 PM, bearophile wrote:

Pelle:

Yes, it works, but it doesn't gain anything from it, which is what I
said. :)


Then what you have said was useless.

Bye,
bearophile


How so? Passing by reference is slower, and sometimes completely 
meaningless. Therefore, having a rule that requires passing by ref is 
not a good.


Also:

int[] get_xs() { return new int[](100); }

void main() {
get_xs.inc_all;

get_xs.inc_all_by_ref; // Error: get_xs() is not an lvalue
}


Re: dcollections 1.0 and 2.0a beta released

2010-05-22 Thread Simen kjaeraas

BCS  wrote:

Cool. Now how do I write code so that it will always iterate the  
collection with the bigger O() lookup time (O(n) before O(log2(n))  
before O(log16(n)) before O(1))? :D


Add a function.

auto foo( R1, R2 )( R1 r1, R2 r2 ) if ( R1.complexity( 10_000 ) >  
R2.complexity( 10_000 ) ) {

...
}

auto foo( R1, R2 )( R2 r1, R1 r2 ) if ( R1.complexity( 10_000 ) <  
R2.complexity( 10_000 ) ) {

...
}

--
Simen


Re: dcollections 1.0 and 2.0a beta released

2010-05-22 Thread Michel Fortin

On 2010-05-22 08:16:27 -0400, bearophile  said:


Michel Fortin:

Couldn't we just make a struct that cannot be implicitly copied?


The @disable attribute was invented for this purpose too:

struct Foo {
int x;
@disable this(this) {}
}
void main() {
Foo f1;
Foo f2 = f1;
}


That prints:
test.d(7): Error: struct test9.Foo is not copyable because it is 
annotated with @disable


Indeed, thanks. I figured it out by myself while you were writing this.


--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/



Re: dcollections 1.0 and 2.0a beta released

2010-05-22 Thread bearophile
Michel Fortin:
> and you have to worry about null.

Nonnull references of course solve this problem.


> Couldn't we just make a struct that cannot be implicitly copied? 

The @disable attribute was invented for this purpose too:

struct Foo {
int x;
@disable this(this) {}
}
void main() {
Foo f1;
Foo f2 = f1;
}


That prints:
test.d(7): Error: struct test9.Foo is not copyable because it is annotated with 
@disable

Bye,
bearophile


Re: dcollections 1.0 and 2.0a beta released

2010-05-22 Thread Michel Fortin

On 2010-05-22 07:56:31 -0400, Michel Fortin  said:


@explicitdup struct Array {  }

void testVal(Array array);
void testRef(ref Array array);

unittest {
Array array;
testVal(array); // error, cannot copy array implicitly
testVal(array.dup); // ok, array is copied
testRef(array); // ok, array is passed by reference
}

If there's already a way to achieve this, I couldn't find it.


Apparently it's already achievable this way:

struct Array {
@disable this(this);
Array dup() { return Array(...); }
...
}

It also blocks simple assignments:

Array array2 = array; // error, array is not copyable
Array array3 = array.dup; // ok

With this, I don't think we need containers to be reference types 
anymore. The naive error of copying containers everywhere without you 
knowing about it (as it occurs in C++) is simply no longer possible.


--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/



Re: dcollections 1.0 and 2.0a beta released

2010-05-22 Thread Michel Fortin

On 2010-05-21 22:55:16 -0400, Walter Bright  said:


Walter Bright wrote:
If we can get anywhere close to that level of success with ranges and 
containers, we should all be well pleased.


Mike Taylor has a phrase for that I think is well-coined: "impedance matching",
defined as the work necessary to get one library module to work with another
library module.


This makes me think about something.

In principle, I like the idea of containers being reference type. It 
works well when passing a container to functions. But at the same time, 
I despite it. By-reference containers forces you to have extra 
indirections even when you don't need them, and you have to worry about 
null. Sometime a value-type would be better, when creating more complex 
data structures for instance:


class Channel {
private {
Array!Message inbound;
Array!Message outbound;
}

...
}

What's the point of having extra indirection here?

I've been thinking about having both by-value and by-reference 
containers. The first-class name would be given to the by-reference 
container to give it more visibility, but that by-reference container 
would be a simple wrapper for a by-value container "part" implemented 
in a struct:


struct ArrayPart(T) { ... } // by value array container.

class Array(T) {
ArrayPart!T part;
alias part this;
}

So now, if you want to reuse a container "part" to build some kind of 
more complex data structure, you can do it like this:


class Channel {
private {
ArrayPart!Message inbound;
ArrayPart!Message outbound;
}

...
}

No silly extra indirection.

That said, all this gives me a feeling of an overcomplicated design. 
After all, the problem is that you want to pass the container by 
reference in function arguments, but it's __too easy to forget the 
ref__. Perhaps that's the problem that should be fixed.


Couldn't we just make a struct that cannot be implicitly copied? 
Perhaps something like this:


@explicitdup struct Array {  }

void testVal(Array array);
void testRef(ref Array array);

unittest {
Array array;
testVal(array); // error, cannot copy array implicitly
testVal(array.dup); // ok, array is copied
testRef(array); // ok, array is passed by reference
}

If there's already a way to achieve this, I couldn't find it.

--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/



Re: dcollections 1.0 and 2.0a beta released

2010-05-22 Thread bearophile
Pelle:
> Yes, it works, but it doesn't gain anything from it, which is what I 
> said. :)

Then what you have said was useless.

Bye,
bearophile


Re: dcollections 1.0 and 2.0a beta released

2010-05-22 Thread Pelle

On 05/22/2010 01:00 PM, bearophile wrote:

Pelle:

Wouldn't gain anything from ref, and wouldn't work with const.


You are wrong, it works correctly with ref:


Yes, it works, but it doesn't gain anything from it, which is what I 
said. :)


Re: dcollections 1.0 and 2.0a beta released

2010-05-22 Thread bearophile
Pelle:

> It's not a very good rule anyway:
> 
> void inc_all(int[] xs) {
>  foreach (ref x; xs) {
>  x += 1;
>  }
> }
> 
> Wouldn't gain anything from ref, and wouldn't work with const.

You are wrong, it works correctly with ref:

import std.stdio: writeln;

void inc_all(ref int[] array) {
foreach (ref x; array)
 x++;
}

void main() {
auto a = new int[10];
a.inc_all();
writeln(a);
}


It prints 1 1 1 1 1 1 1 1 1 1

Bye,
bearophile


Re: dcollections 1.0 and 2.0a beta released

2010-05-22 Thread Pelle

On 05/22/2010 12:20 PM, bearophile wrote:

BCS:

Maybe the style rule should be: dynamic arrays and AA's should be passed
as const or ref.


Something like that must be enforced by the compiler, or the design of 
arrays/AAs must be changed.

Bye,
bearophile


It's not a very good rule anyway:

void inc_all(int[] xs) {
foreach (ref x; xs) {
x += 1;
}
}

Wouldn't gain anything from ref, and wouldn't work with const.


Re: dcollections 1.0 and 2.0a beta released

2010-05-22 Thread bearophile
BCS:
> Maybe the style rule should be: dynamic arrays and AA's should be passed 
> as const or ref.  

Something like that must be enforced by the compiler, or the design of 
arrays/AAs must be changed.

Bye,
bearophile


Re: dcollections 1.0 and 2.0a beta released

2010-05-21 Thread Walter Bright

Walter Bright wrote:
If we can get anywhere close to that level of success with ranges and 
containers, we should all be well pleased.


Mike Taylor has a phrase for that I think is well-coined: "impedance matching", 
defined as the work necessary to get one library module to work with another 
library module.


One example of bad impedance matching is C++ iostreams' attempt to make a memory 
buffer look like a file. Phobos propagated that mistake in its own streams. A 
better way is to use ranges, where the file is accessed via a range and a memory 
buffer is accessed via a range. Then anything that operates on data is written 
to the range, not a fake file interface.


Re: dcollections 1.0 and 2.0a beta released

2010-05-21 Thread Sean Kelly
Andrei Alexandrescu Wrote:
> 
> I don't know Tango, but Java's containers are a terrible example to 
> follow. Java's container library is a ill-advised design on top of an 
> underpowered language, patched later with some half-understood seeming 
> of genericity. I think Java containers are a huge disservice to the 
> programming community because they foster bad design.

Tango's container library is or was a port of Doug Lea's containers for Java.  
While Doug Lea is an absolute master with concurrency, I never liked his 
container library very much.  As you've said, the common abstractions it draws 
are weird and not terribly useful.
 
> I need to disagree with that. I've done and I do a ton of binary 
> interoperability stuff. You never expose a generic container interface! 
> Interoperable objects always embody high-level logic that is specific to 
> the application. They might use containers inside, but they invariably 
> expose high-level, application-specific functionality.

This.  Iterators (or ranges) are passed all over the place, but when operating 
directly on a container I always want to know what kind of container I'm 
dealing with.  Cost of operations is an issue, I may need to manually sort at 
some point or be aware that iterators will be invalidated, etc.

Steve has already said he doesn't use the interfaces, and I'm a huge fan of not 
doing speculative design.  It's invariably wrong and then you get stuck 
supporting it.  I'd vote to drop the interfaces.  They can always be added back 
later anyway.


Re: dcollections 1.0 and 2.0a beta released

2010-05-21 Thread BCS

Hello Andrei,


On 05/20/2010 08:22 AM, Steven Schveighoffer wrote:


Michel Fortin Wrote:


On 2010-05-20 06:34:42 -0400, Steven
Schveighoffer  said:

I understand these points, but I'm already using interfaces to copy
data between containers.  I don't have to, I could have used
generic code, but this way, only one function is instantiated to
copy data from all the other containers.  The problem with using
generic code is that the compiler will needlessly duplicate
functions that are identical.


One question. Have you calculated the speed difference between using
an interface and using generic code? Surely going through all those
virtual calls slows things down a lot.

I do like interfaces in principle, but I fear it'll make things much
slower when people implement things in term of interfaces. That's
why I'm not sure it's a good idea to offer container interfaces in
the standard library.


It's not that much slower.  You get a much higher speedup when things
can be inlined than virtual vs. non-virtual.

However, I should probably make all the functions in the concrete
implementations final.  I made several of them final, but I should do
it across the board.


Yup. Even Java does that. I forgot whether it was a recanting of a
former stance (as was the case with synchronized) or things were like
that from the get-go.


One thing I just thought of -- in dcollections, similar types can be
compared to one another.  For example, you can check to see if a
HashSet is equal to a TreeSet.  But that would not be possible
without interfaces.


Of course it would be possible. You write a generic function that
takes two generic container and constrain the inputs such that at
least one has element lookup capability. (Complexity-oriented design
for the win!)

Andrei



Cool. Now how do I write code so that it will always iterate the collection 
with the bigger O() lookup time (O(n) before O(log2(n)) before O(log16(n)) 
before O(1))? :D


--
... <





Re: dcollections 1.0 and 2.0a beta released

2010-05-21 Thread BCS

Hello Vladimir,


On Thu, 20 May 2010 04:42:35 +0300, Steven Schveighoffer
 wrote:


interfaces


Does that imply that the most important methods are virtual?

If so, say good-bye to inlining, and hello to an additional level of
dereferencing.



From a technical standpoint there is no reason that a method needs to be 
called virtually from a class reference just because the same method gets 
called virtually from an interface reference.


--
... <





Re: dcollections 1.0 and 2.0a beta released

2010-05-21 Thread BCS

Hello BLS,



ok Some collection types belong to the same family. So they should be
loosely coupled. How to do that without having Interfaces ?


How about this: Make a flat family of collections. Name the methods so that 
across the lib, the same name does the same kind of thing and different names 
do different kinds of things. Then define a flat family of whatever interfaces 
are useful and have each class derive from all the interfaces they happen 
to implement (but don't go out of your way to make anything fit).


--
... <





Re: dcollections 1.0 and 2.0a beta released

2010-05-21 Thread BCS

Hello Bill,


On Fri, May 21, 2010 at 9:43 AM, Steven Schveighoffer
 wrote:


void foo(int[int] x)
{
x[5] = 5;
}
void main()
{
int[int] x;
foo(x);
assert(x[5] == 5); // fails
}

And with arrays at least it's even more insidious, because sometimes
it will seem to work, and sometimes it won't.
void foo(int[] x) {
x ~= 10;
}
Caller's .length will never get updated by that, but it won't crash so
it may take a while to find the bug.


Maybe the style rule should be: dynamic arrays and AA's should be passed 
as const or ref.  


--
... <





Re: dcollections 1.0 and 2.0a beta released

2010-05-21 Thread BCS

Hello superdan,



dun tell me it dun work. i dun explain shit again. it works coz a
struct cant be null. but a struct can be a ref if it only haz one
pointer inside. methinks the builtin hash iz dat way.




void foo(container!shit poo)
{
if(!poo) poo = new container!shit; // fuck dat shit
poo.addElement(Shit(diarrhea));
}
dat sucks bull ballz.


That code is broken, if poo is of a class type, that just new's a container, 
adds an element, and then drops the reference to get GC'ed. To many it do 
anything generally useful, it would need to have another level of indirection.


--
... <





Re: dcollections 1.0 and 2.0a beta released

2010-05-21 Thread Vladimir Panteleev
On Thu, 20 May 2010 04:42:35 +0300, Steven Schveighoffer  
 wrote:



interfaces


Does that imply that the most important methods are virtual?

If so, say good-bye to inlining, and hello to an additional level of  
dereferencing.


Without meaning any disrespect to all the work you did, allow me to say  
that I won't use a library that could be faster (without making usage too  
clumsy), but isn't. (I prefer my D programs to be as fast as reasonably  
possible - if I didn't care about speed, I'd use another language.) For  
the same reasons, I'd be disappointed if the library was admitted as-is  
into Phobos, since it doesn't live up to my personal ideal of what D  
should be.


--
Best regards,
 Vladimirmailto:vladi...@thecybershadow.net


Re: dcollections 1.0 and 2.0a beta released

2010-05-21 Thread Walter Bright

Walter Bright wrote:
If we can get anywhere close to that level of success with ranges and 
containers, we should all be well pleased.


Mike Taylor has a phrase for that I think is well-coined: "impedance matching",
defined as the work necessary to get one library module to work with another
library module.

One example of bad impedance matching is C++ iostreams' attempt to make a memory
buffer look like a file. Phobos propagated that mistake in its own streams.


Re: dcollections 1.0 and 2.0a beta released

2010-05-21 Thread Walter Bright

Andrei Alexandrescu wrote:

On 05/19/2010 09:59 PM, Robert Jacques wrote:

Yes and No. I understand where your coming from, but I think it's a bad
idea. First, I think it needlessly expands the radius of comprehension
needed to understand and use the library. (See Tangled up in tools
http://www.pragprog.com/magazines/2010-04/tangled-up-in-tools)


For the record, I strongly agree with this.


I do too, but that's the easy part. Living up to those ideals is extremely hard, 
usually because most designers think they have designed simple interfaces when 
everyone else thinks they didn't.


In this whole container discussion, I'd like to point out something Andrei 
pointed out to me. The one extremely successful example of pluggable components 
is the unix filter model. Essentially, one program pipes its output to the next, 
which pipes its output to the next, etc. The unix console is designed around 
that paradigm.


If we can get anywhere close to that level of success with ranges and 
containers, we should all be well pleased.


Re: dcollections 1.0 and 2.0a beta released

2010-05-21 Thread BLS

On 21/05/2010 19:15, Andrei Alexandrescu wrote:

I'll now slowly answer the great replies I got in this thread, in
chronological order. Since I read them all before replying, I might
sometimes refer to future posts. Hopefully that won't be too confusing.

Andrei


Following the dcollection thread, it seems that your argument is that 
Interfaces in a non hierarchical collection library are useless.
Or : Hierarchical collection libs are nonsense and ergo interfaces are 
obsolete.


ok Some collection types belong to the same family. So they should be 
loosely coupled. How to do that without having Interfaces ?
Next  f.i. forwardranges belongs to a certain family of collections. and 
Ranges are (or should be) Interfaces.

You see me very confused.

Bjoern
given
struct node
class col : IFWRange
node Node;
// No ?


Re: dcollections 1.0 and 2.0a beta released

2010-05-21 Thread Alex Makhotin

Steven Schveighoffer wrote:


I myself don't really use the interface aspect of the classes, it is mostly a 
carryover from the Java/Tango inspirations.  But I can see one good reason to 
keep them -- binary interoperability.  For example, it might be the case some 
day when D has good support with dynamic libraries that a library exposes some 
piece of itself as a Map or List interface.



Hi, Steven.
You made a good point on interoperability.
Strict, precise, readable, interfaces, that's what I would like in the 
really good standard library for D.
But, is D mature enough to this kind of possibilities, considering, at 
least, known bugs involving interfaces?
I don't even feel myself free to use interfaces in my code because of 
undefined behavior it may cause.

Shared libraries...
Is this going to happen on Linux?
When?

--
Alex Makhotin,
the founder of BITPROX,
http://bitprox.com


Re: dcollections 1.0 and 2.0a beta released

2010-05-21 Thread Walter Bright

Andrei Alexandrescu wrote:

See the -mergefunc compiler switch of LDC, to merge identical
functions (like ones produced by template instantiations). This
feature is not very powerful yet, but it's present and I have seen it
works.


Indeed. I'm no expert in linkers, but in my opinion this is one of the
most basic optimizations a linker should perform. And C++ has pushed
linkers to do that for years now so I'd expect most linkers to do that
already...

The problem with templates are more the multiple slightly different
instanciations. In general this is good for performance, but it's only
needed for the code paths that need to be fast. I think generic
containers should be fast.


There's been talk that Walter's slow porting of the linker to C and then 
D will put him in the position to introduce such an optimization.


Unfortunately, it's an agonizingly slow process. It also does nothing for Linux 
or OSX.


Re: dcollections 1.0 and 2.0a beta released

2010-05-21 Thread Andrei Alexandrescu

On 05/21/2010 01:34 PM, Walter Bright wrote:

Andrei Alexandrescu wrote:

I wrote a solution to the problem in native D. It goes like this:

alias Container!(int, addable | purgeable) Messerschmidt;

void messWith(Messerschmidt i) {
... use i's capabilities to add and purge ...
}


I agree with Michael Fortin that the | is questionable. I'd like to
suggest instead that it should instead be a variadic list of names, like:

alias Container!(int, addable, purgeable) Msserschmidt;

Perhaps the names should follow a naming convention,


alias Container!(int, ContainerAddable, ContainerPurgeable) Msserschmidt;

The problem with using scoped names, like Container.Addable, is scoped
names cannot be added to.


Well the problem stays: compound interfaces grow combinatorially with 
the number of components, because an interface X!(A, B, C) inherits at 
the same time X!(A, B), X!(A, C), and X!(B, C).


Check the epic inheritance diagram here:

http://www.artima.com/cppsource/codefeatures2.html

and the equally epic table here:

http://www.artima.com/cppsource/codefeatures3.html


Andrei


Re: dcollections 1.0 and 2.0a beta released

2010-05-21 Thread Walter Bright

Andrei Alexandrescu wrote:

I wrote a solution to the problem in native D. It goes like this:

alias Container!(int, addable | purgeable) Messerschmidt;

void messWith(Messerschmidt i) {
... use i's capabilities to add and purge ...
}


I agree with Michael Fortin that the | is questionable. I'd like to suggest 
instead that it should instead be a variadic list of names, like:


alias Container!(int, addable, purgeable) Msserschmidt;

Perhaps the names should follow a naming convention,


alias Container!(int, ContainerAddable, ContainerPurgeable) Msserschmidt;

The problem with using scoped names, like Container.Addable, is scoped names 
cannot be added to.


Re: dcollections 1.0 and 2.0a beta released

2010-05-21 Thread Bill Baxter
On Fri, May 21, 2010 at 10:50 AM, superdan  wrote:

>> void foo(int[int] x)
>> {
>>    x[5] = 5;
>> }
>> void main()
>> {
>>    int[int] x;
>>    foo(x);
>>    assert(x[5] == 5); // fails
>> }
>> -Steve
>
> wrote a long post but it got lost. shit. bottom line dats a bug in dmd or 
> phobos.

Unfortunately it works exactly as designed.

--bb


Re: dcollections 1.0 and 2.0a beta released

2010-05-21 Thread Andrei Alexandrescu

On 05/20/2010 02:47 PM, bearophile wrote:

Michel Fortin:


Devirtualization is only possible in certain cases: when the
function knows exactly which type it'll get.<


You are wrong, in most cases there are ways to de-virtualize, even
when the runtime type isn't exactly known, but sometimes to do it you
have to work too much. This is probably why C# dotnet doesn't perform
this optimization.


If we get deeper into this branch, we'll forget where we started from 
(are interfaces sensible for this design?) and we'll reframe the 
interfaces vs. no interfaces decision into a speed loss vs. no speed 
loss decision. There are other arguments to look at besides performance.



Andrei


  1   2   >